linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mm: Two small fixes for recent syzbot reports
@ 2020-04-08  1:40 Peter Xu
  2020-04-08  1:40 ` [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal Peter Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Peter Xu @ 2020-04-08  1:40 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds, linux-mm; +Cc: Andrew Morton, peterx

The two patches should fix below syzbot reports:

  BUG: unable to handle kernel paging request in kernel_get_mempolicy
  https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/

  WARNING: bad unlock balance in __get_user_pages_remote
  https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/

Note that the 1st patch also applied two extra small changes comparing
to when posted on the list in that: (1) it squashed an "interupt"
spelling error that Andrew has pointed out when picked up, and (2) it
also initializes the "page" pointer to NULL.  But I'm fairly confident
it shouldn't affect the correctness of the patch.

The 2nd patch is exactly the patch posted previously.

Thanks,

Peter Xu (2):
  mm/mempolicy: Allow lookup_node() to handle fatal signal
  mm/gup: Mark lock taken only after a successful retake

 mm/gup.c       | 2 +-
 mm/mempolicy.c | 7 +++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-08  1:40 [PATCH 0/2] mm: Two small fixes for recent syzbot reports Peter Xu
@ 2020-04-08  1:40 ` Peter Xu
  2020-04-08 10:21   ` Michal Hocko
  2020-04-09  7:02   ` Michal Hocko
  2020-04-08  1:40 ` [PATCH 2/2] mm/gup: Mark lock taken only after a successful retake Peter Xu
  2020-04-09  0:47 ` [PATCH 0/2] mm: Two small fixes for recent syzbot reports Andrew Morton
  2 siblings, 2 replies; 56+ messages in thread
From: Peter Xu @ 2020-04-08  1:40 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds, linux-mm
  Cc: Andrew Morton, peterx, syzbot+693dc11fcb53120b5559

lookup_node() uses gup to pin the page and get node information.  It
checks against ret>=0 assuming the page will be filled in.  However
it's also possible that gup will return zero, for example, when the
thread is quickly killed with a fatal signal.  Teach lookup_node() to
gracefully return an error -EFAULT if it happens.

Meanwhile, initialize "page" to NULL to avoid potential risk of
exploiting the pointer.

Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/mempolicy.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 5fb427aed612..c7ca6a808fb1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -897,12 +897,15 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
 
 static int lookup_node(struct mm_struct *mm, unsigned long addr)
 {
-	struct page *p;
+	struct page *p = NULL;
 	int err;
 
 	int locked = 1;
 	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
-	if (err >= 0) {
+	if (err == 0) {
+		/* E.g. GUP interrupted by fatal signal */
+		err = -EFAULT;
+	} else if (err > 0) {
 		err = page_to_nid(p);
 		put_page(p);
 	}
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/2] mm/gup: Mark lock taken only after a successful retake
  2020-04-08  1:40 [PATCH 0/2] mm: Two small fixes for recent syzbot reports Peter Xu
  2020-04-08  1:40 ` [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal Peter Xu
@ 2020-04-08  1:40 ` Peter Xu
  2020-04-09  0:47 ` [PATCH 0/2] mm: Two small fixes for recent syzbot reports Andrew Morton
  2 siblings, 0 replies; 56+ messages in thread
From: Peter Xu @ 2020-04-08  1:40 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds, linux-mm
  Cc: Andrew Morton, peterx, syzbot+a8c70b7f3579fc0587dc

It's definitely incorrect to mark the lock as taken even if
down_read_killable() failed.  It's overlooked when we switched from
down_read() to down_read_killable() because down_read() won't fail
while down_read_killable() could.

Reported-by: syzbot+a8c70b7f3579fc0587dc@syzkaller.appspotmail.com
Fixes: 71335f37c5e8 ("mm/gup: allow to react to fatal signals")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index da3e03185144..1f9a9b3a5869 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1328,7 +1328,6 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 		if (fatal_signal_pending(current))
 			break;
 
-		*locked = 1;
 		ret = down_read_killable(&mm->mmap_sem);
 		if (ret) {
 			BUG_ON(ret > 0);
@@ -1337,6 +1336,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 			break;
 		}
 
+		*locked = 1;
 		ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
 				       pages, NULL, locked);
 		if (!*locked) {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-08  1:40 ` [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal Peter Xu
@ 2020-04-08 10:21   ` Michal Hocko
  2020-04-08 14:20     ` Peter Xu
  2020-04-09  7:02   ` Michal Hocko
  1 sibling, 1 reply; 56+ messages in thread
From: Michal Hocko @ 2020-04-08 10:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Tue 07-04-20 21:40:09, Peter Xu wrote:
> lookup_node() uses gup to pin the page and get node information.  It
> checks against ret>=0 assuming the page will be filled in.  However
> it's also possible that gup will return zero, for example, when the
> thread is quickly killed with a fatal signal.  Teach lookup_node() to
> gracefully return an error -EFAULT if it happens.
> 
> Meanwhile, initialize "page" to NULL to avoid potential risk of
> exploiting the pointer.
> 
> Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")

I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
on a fatal signal?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-08 10:21   ` Michal Hocko
@ 2020-04-08 14:20     ` Peter Xu
  2020-04-08 14:30       ` Michal Hocko
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Xu @ 2020-04-08 14:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > lookup_node() uses gup to pin the page and get node information.  It
> > checks against ret>=0 assuming the page will be filled in.  However
> > it's also possible that gup will return zero, for example, when the
> > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > gracefully return an error -EFAULT if it happens.
> > 
> > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > exploiting the pointer.
> > 
> > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> 
> I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> on a fatal signal?

Hi, Michal,

I do see quite a few usages on -ERESTARTSYS, but also some others,
majorly -EINTR, or even -EFAULT.  I think it could be a more general
question rather than a specific question to this patch only.

I saw some other discussions about this return value issue, I'll CC
you in the other thread when I raise this as a general question.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-08 14:20     ` Peter Xu
@ 2020-04-08 14:30       ` Michal Hocko
  2020-04-08 15:24         ` Peter Xu
  0 siblings, 1 reply; 56+ messages in thread
From: Michal Hocko @ 2020-04-08 14:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Wed 08-04-20 10:20:39, Peter Xu wrote:
> On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> > On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > > lookup_node() uses gup to pin the page and get node information.  It
> > > checks against ret>=0 assuming the page will be filled in.  However
> > > it's also possible that gup will return zero, for example, when the
> > > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > > gracefully return an error -EFAULT if it happens.
> > > 
> > > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > > exploiting the pointer.
> > > 
> > > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> > 
> > I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> > on a fatal signal?
> 
> Hi, Michal,
> 
> I do see quite a few usages on -ERESTARTSYS, but also some others,
> majorly -EINTR, or even -EFAULT.  I think it could be a more general
> question rather than a specific question to this patch only.

I am sorry but I was probably not clear enough. I was mostly worried
that gup doesn't return ERESTARTSYS or EINTR when it backed off because
of fatal signal pending. Your patch is checking for 0 an indicating that
this is that condition.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-08 14:30       ` Michal Hocko
@ 2020-04-08 15:24         ` Peter Xu
  2020-04-08 15:26           ` Michal Hocko
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Xu @ 2020-04-08 15:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Wed, Apr 08, 2020 at 04:30:24PM +0200, Michal Hocko wrote:
> On Wed 08-04-20 10:20:39, Peter Xu wrote:
> > On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> > > On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > > > lookup_node() uses gup to pin the page and get node information.  It
> > > > checks against ret>=0 assuming the page will be filled in.  However
> > > > it's also possible that gup will return zero, for example, when the
> > > > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > > > gracefully return an error -EFAULT if it happens.
> > > > 
> > > > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > > > exploiting the pointer.
> > > > 
> > > > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > > > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> > > 
> > > I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> > > on a fatal signal?
> > 
> > Hi, Michal,
> > 
> > I do see quite a few usages on -ERESTARTSYS, but also some others,
> > majorly -EINTR, or even -EFAULT.  I think it could be a more general
> > question rather than a specific question to this patch only.
> 
> I am sorry but I was probably not clear enough. I was mostly worried
> that gup doesn't return ERESTARTSYS or EINTR when it backed off because
> of fatal signal pending. Your patch is checking for 0 an indicating that
> this is that condition.

Yeah I just noticed the fact, sorry!

Hillf just posted a fix there for recovering the behavior:

https://lore.kernel.org/lkml/20200408151213.GE66033@xz-x1/

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-08 15:24         ` Peter Xu
@ 2020-04-08 15:26           ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2020-04-08 15:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Wed 08-04-20 11:24:35, Peter Xu wrote:
> On Wed, Apr 08, 2020 at 04:30:24PM +0200, Michal Hocko wrote:
> > On Wed 08-04-20 10:20:39, Peter Xu wrote:
> > > On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> > > > On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > > > > lookup_node() uses gup to pin the page and get node information.  It
> > > > > checks against ret>=0 assuming the page will be filled in.  However
> > > > > it's also possible that gup will return zero, for example, when the
> > > > > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > > > > gracefully return an error -EFAULT if it happens.
> > > > > 
> > > > > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > > > > exploiting the pointer.
> > > > > 
> > > > > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > > > > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> > > > 
> > > > I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> > > > on a fatal signal?
> > > 
> > > Hi, Michal,
> > > 
> > > I do see quite a few usages on -ERESTARTSYS, but also some others,
> > > majorly -EINTR, or even -EFAULT.  I think it could be a more general
> > > question rather than a specific question to this patch only.
> > 
> > I am sorry but I was probably not clear enough. I was mostly worried
> > that gup doesn't return ERESTARTSYS or EINTR when it backed off because
> > of fatal signal pending. Your patch is checking for 0 an indicating that
> > this is that condition.
> 
> Yeah I just noticed the fact, sorry!
> 
> Hillf just posted a fix there for recovering the behavior:
> 
> https://lore.kernel.org/lkml/20200408151213.GE66033@xz-x1/

yeah, that is the proper fix.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-08  1:40 [PATCH 0/2] mm: Two small fixes for recent syzbot reports Peter Xu
  2020-04-08  1:40 ` [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal Peter Xu
  2020-04-08  1:40 ` [PATCH 2/2] mm/gup: Mark lock taken only after a successful retake Peter Xu
@ 2020-04-09  0:47 ` Andrew Morton
  2020-04-09 11:49   ` Matthew Wilcox
  2020-04-09 12:55   ` Dmitry Vyukov
  2 siblings, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2020-04-09  0:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: linux-kernel, Linus Torvalds, linux-mm

On Tue,  7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote:

> The two patches should fix below syzbot reports:
> 
>   BUG: unable to handle kernel paging request in kernel_get_mempolicy
>   https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/
> 
>   WARNING: bad unlock balance in __get_user_pages_remote
>   https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/

(Is there an email address for the syzbot operators?)

sysbot does test linux-next, yet these patches sat in linux-next for a
month without a peep, but all hell broke loose when they hit Linus's
tree.  How could this have happened?

Possibly I've been carrying a later patch which fixed all this up, but
I'm not seeing anything like that.  Nothing at all against mm/gup.c.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-08  1:40 ` [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal Peter Xu
  2020-04-08 10:21   ` Michal Hocko
@ 2020-04-09  7:02   ` Michal Hocko
  2020-04-09 12:52     ` Peter Xu
  2020-04-09 16:42     ` Linus Torvalds
  1 sibling, 2 replies; 56+ messages in thread
From: Michal Hocko @ 2020-04-09  7:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

This patch has been merged and it is actually wrong after ae46d2aa6a7f
has been merged. We can either revert or I suggest just handling >0,
like the patch below:

From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 9 Apr 2020 08:26:57 +0200
Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node

ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
added a special casing for 0 return value because that was a possible
gup return value when interrupted by fatal signal. This has been fixed
by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
for fatal signal") in the mean time so ba841078cd05 can be reverted.
This patch however doesn't go all the way to revert it because 0 return
value is impossible. We always get an error or 1 for a single page
request.

Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/mempolicy.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 48ba9729062e..1965e2681877 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
 
 	int locked = 1;
 	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
-	if (err == 0) {
-		/* E.g. GUP interrupted by fatal signal */
-		err = -EFAULT;
-	} else if (err > 0) {
+	if (err > 0) {
 		err = page_to_nid(p);
 		put_page(p);
 	}
-- 
2.25.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09  0:47 ` [PATCH 0/2] mm: Two small fixes for recent syzbot reports Andrew Morton
@ 2020-04-09 11:49   ` Matthew Wilcox
  2020-04-09 13:00     ` Dmitry Vyukov
  2020-04-09 12:55   ` Dmitry Vyukov
  1 sibling, 1 reply; 56+ messages in thread
From: Matthew Wilcox @ 2020-04-09 11:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Xu, linux-kernel, Linus Torvalds, linux-mm, syzkaller-bugs

On Wed, Apr 08, 2020 at 05:47:32PM -0700, Andrew Morton wrote:
> On Tue,  7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote:
> 
> > The two patches should fix below syzbot reports:
> > 
> >   BUG: unable to handle kernel paging request in kernel_get_mempolicy
> >   https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/
> > 
> >   WARNING: bad unlock balance in __get_user_pages_remote
> >   https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/
> 
> (Is there an email address for the syzbot operators?)

I'd suggest syzkaller-bugs@googlegroups.com (added to the Cc).

But there's a deeper problem in that we don't have anywhere to stash
that kind of information in the kernel tree right now.  Perhaps a special
entry in the MAINTAINERS file for bot operators?  Or one entry per bot?

> sysbot does test linux-next, yet these patches sat in linux-next for a
> month without a peep, but all hell broke loose when they hit Linus's
> tree.  How could this have happened?
> 
> Possibly I've been carrying a later patch which fixed all this up, but
> I'm not seeing anything like that.  Nothing at all against mm/gup.c.
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-09  7:02   ` Michal Hocko
@ 2020-04-09 12:52     ` Peter Xu
  2020-04-09 13:00       ` Peter Xu
  2020-04-09 13:53       ` Michal Hocko
  2020-04-09 16:42     ` Linus Torvalds
  1 sibling, 2 replies; 56+ messages in thread
From: Peter Xu @ 2020-04-09 12:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Thu, Apr 09, 2020 at 09:02:53AM +0200, Michal Hocko wrote:
> This patch has been merged and it is actually wrong after ae46d2aa6a7f
> has been merged. We can either revert or I suggest just handling >0,
> like the patch below:
> 
> From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 9 Apr 2020 08:26:57 +0200
> Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node
> 
> ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
> added a special casing for 0 return value because that was a possible
> gup return value when interrupted by fatal signal. This has been fixed
> by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
> for fatal signal") in the mean time so ba841078cd05 can be reverted.
> This patch however doesn't go all the way to revert it because 0 return
> value is impossible. We always get an error or 1 for a single page
> request.
> 
> Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/mempolicy.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 48ba9729062e..1965e2681877 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  
>  	int locked = 1;
>  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err == 0) {
> -		/* E.g. GUP interrupted by fatal signal */
> -		err = -EFAULT;
> -	} else if (err > 0) {
> +	if (err > 0) {
>  		err = page_to_nid(p);
>  		put_page(p);
>  	}

Hi, Michal,

I'm totally not against this, but note that get_user_pages_locked()
could still return zero. Although I'm not 100% sure now on whether
npages==0 will be the only case, it won't hurt to keep this ret==0
check until we consolidate the whole gup code to never return zero.

Assuming there's another case (even possible for a future gup bug)
that could return a zero, your patch will let err be anything (which
you didn't initialize err with your patch), then the function will
return a random value.  So even if you really want this change, I
would suggest you initialize err to some error code.

I just don't see much gain we get from removing that check.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09  0:47 ` [PATCH 0/2] mm: Two small fixes for recent syzbot reports Andrew Morton
  2020-04-09 11:49   ` Matthew Wilcox
@ 2020-04-09 12:55   ` Dmitry Vyukov
  2020-04-09 16:32     ` Linus Torvalds
  1 sibling, 1 reply; 56+ messages in thread
From: Dmitry Vyukov @ 2020-04-09 12:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Xu, LKML, Linus Torvalds, Linux-MM

On Thu, Apr 9, 2020 at 2:49 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue,  7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote:
>
> > The two patches should fix below syzbot reports:
> >
> >   BUG: unable to handle kernel paging request in kernel_get_mempolicy
> >   https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/
> >
> >   WARNING: bad unlock balance in __get_user_pages_remote
> >   https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/
>
> (Is there an email address for the syzbot operators?)
>
> sysbot does test linux-next, yet these patches sat in linux-next for a
> month without a peep, but all hell broke loose when they hit Linus's
> tree.  How could this have happened?

The same thing:
https://groups.google.com/d/msg/syzkaller-bugs/phowYdNXHck/qU1P0TsjBAAJ

linux-next is boot-broken for more than a month and bugs are piling
onto bugs, I've seen at least 3 different ones.
syzbot can't get any working linux-next build for testing for a very
long time now.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-09 12:52     ` Peter Xu
@ 2020-04-09 13:00       ` Peter Xu
  2020-04-09 13:53       ` Michal Hocko
  1 sibling, 0 replies; 56+ messages in thread
From: Peter Xu @ 2020-04-09 13:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Thu, Apr 09, 2020 at 08:52:58AM -0400, Peter Xu wrote:
> On Thu, Apr 09, 2020 at 09:02:53AM +0200, Michal Hocko wrote:
> > This patch has been merged and it is actually wrong after ae46d2aa6a7f
> > has been merged. We can either revert or I suggest just handling >0,
> > like the patch below:
> > 
> > From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 9 Apr 2020 08:26:57 +0200
> > Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node
> > 
> > ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
> > added a special casing for 0 return value because that was a possible
> > gup return value when interrupted by fatal signal. This has been fixed
> > by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
> > for fatal signal") in the mean time so ba841078cd05 can be reverted.
> > This patch however doesn't go all the way to revert it because 0 return
> > value is impossible. We always get an error or 1 for a single page
> > request.
> > 
> > Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/mempolicy.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 48ba9729062e..1965e2681877 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
> >  
> >  	int locked = 1;
> >  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> > -	if (err == 0) {
> > -		/* E.g. GUP interrupted by fatal signal */
> > -		err = -EFAULT;
> > -	} else if (err > 0) {
> > +	if (err > 0) {
> >  		err = page_to_nid(p);
> >  		put_page(p);
> >  	}
> 
> Hi, Michal,
> 
> I'm totally not against this, but note that get_user_pages_locked()
> could still return zero. Although I'm not 100% sure now on whether
> npages==0 will be the only case, it won't hurt to keep this ret==0
> check until we consolidate the whole gup code to never return zero.
> 
> Assuming there's another case (even possible for a future gup bug)
> that could return a zero, your patch will let err be anything (which
> you didn't initialize err with your patch), then the function will
> return a random value.  So even if you really want this change, I
> would suggest you initialize err to some error code.

I'm sorry, not a random value, but err=0 will be returned as the mem
policy by lookup_node().

> 
> I just don't see much gain we get from removing that check.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 11:49   ` Matthew Wilcox
@ 2020-04-09 13:00     ` Dmitry Vyukov
  2020-04-09 18:16       ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Dmitry Vyukov @ 2020-04-09 13:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Peter Xu, LKML, Linus Torvalds, Linux-MM,
	syzkaller-bugs, syzkaller

On Thu, Apr 9, 2020 at 1:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Apr 08, 2020 at 05:47:32PM -0700, Andrew Morton wrote:
> > On Tue,  7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote:
> >
> > > The two patches should fix below syzbot reports:
> > >
> > >   BUG: unable to handle kernel paging request in kernel_get_mempolicy
> > >   https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/
> > >
> > >   WARNING: bad unlock balance in __get_user_pages_remote
> > >   https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/
> >
> > (Is there an email address for the syzbot operators?)
>
> I'd suggest syzkaller-bugs@googlegroups.com (added to the Cc).

syzkaller@googlegroups.com is a better one.
syzkaller-bugs@googlegroups.com plays more of an LKML role.

> But there's a deeper problem in that we don't have anywhere to stash
> that kind of information in the kernel tree right now.  Perhaps a special
> entry in the MAINTAINERS file for bot operators?  Or one entry per bot?

I don't mind adding syzkaller. Some time ago I wanted to contact
KernelCI, CKI, LKFT, 0-day owners, finding relevant lists wasn't
impossible, but for some it was hard.

For syzkaller it would be:

https://github.com/google/syzkaller/issues for bugs/feature requests.
syzkaller@googlegroups.com for discussions.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-09 12:52     ` Peter Xu
  2020-04-09 13:00       ` Peter Xu
@ 2020-04-09 13:53       ` Michal Hocko
  1 sibling, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2020-04-09 13:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Linus Torvalds, linux-mm, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Thu 09-04-20 08:52:58, Peter Xu wrote:
> On Thu, Apr 09, 2020 at 09:02:53AM +0200, Michal Hocko wrote:
> > This patch has been merged and it is actually wrong after ae46d2aa6a7f
> > has been merged. We can either revert or I suggest just handling >0,
> > like the patch below:
> > 
> > From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 9 Apr 2020 08:26:57 +0200
> > Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node
> > 
> > ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
> > added a special casing for 0 return value because that was a possible
> > gup return value when interrupted by fatal signal. This has been fixed
> > by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
> > for fatal signal") in the mean time so ba841078cd05 can be reverted.
> > This patch however doesn't go all the way to revert it because 0 return
> > value is impossible. We always get an error or 1 for a single page
> > request.
> > 
> > Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/mempolicy.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 48ba9729062e..1965e2681877 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
> >  
> >  	int locked = 1;
> >  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> > -	if (err == 0) {
> > -		/* E.g. GUP interrupted by fatal signal */
> > -		err = -EFAULT;
> > -	} else if (err > 0) {
> > +	if (err > 0) {
> >  		err = page_to_nid(p);
> >  		put_page(p);
> >  	}
> 
> Hi, Michal,
> 
> I'm totally not against this, but note that get_user_pages_locked()
> could still return zero. Although I'm not 100% sure now on whether
> npages==0 will be the only case, it won't hurt to keep this ret==0
> check until we consolidate the whole gup code to never return zero.

As we have discussed in other email thread, returning 0 should be really
possible only for an nr_pages == 0. And even in that case we should
rather return EINVAL. I wanted to do that change as well but gup is a
heavily used interface and I do not have time to check all existing
callers.
 
> Assuming there's another case (even possible for a future gup bug)
> that could return a zero, your patch will let err be anything (which
> you didn't initialize err with your patch), then the function will
> return a random value.  So even if you really want this change, I
> would suggest you initialize err to some error code.

I wouldn't really overcomplicate it. If you are worried about future
bugs then we can warn into the log when !err && nr_pages somewher inside
gup code. But let's keep callers as simple as possible. We surely do not
want to check for !err in all users now.

> I just don't see much gain we get from removing that check.

The code clarity is the primary reason.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 12:55   ` Dmitry Vyukov
@ 2020-04-09 16:32     ` Linus Torvalds
  2020-04-09 16:58       ` Qian Cai
  2020-04-09 23:29       ` Stephen Rothwell
  0 siblings, 2 replies; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 16:32 UTC (permalink / raw)
  To: Dmitry Vyukov, Stephen Rothwell; +Cc: Andrew Morton, Peter Xu, LKML, Linux-MM

On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> linux-next is boot-broken for more than a month and bugs are piling
> onto bugs, I've seen at least 3 different ones.
> syzbot can't get any working linux-next build for testing for a very
> long time now.

Ouch.

Ok, that's not good. It means that linux-next has basically only done
build-testing this whole cycle.

Stephen, Dmitry - is there some way linux-next could possibly kick out
trees more aggressively if syzbot can't even boot?

This merge window has seemed otherwise fairly smooth to me (famous
last words), and it's really sad how the nice page fault cleanups
ended up being such an ongoing pain when the problems _could_ have
been caught earlier. We started to get syzbot reports very quickly
after they got merged into my tree, so this is clearly something that
gets exercised well - but it would have been oh-so-much better if it
had gotten noticed in linux-next.

Kicking trees out of linux-next and making noise if they cause syzbot
failures might also make some maintainers react more..

                Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-09  7:02   ` Michal Hocko
  2020-04-09 12:52     ` Peter Xu
@ 2020-04-09 16:42     ` Linus Torvalds
  2020-04-14 11:04       ` Michal Hocko
  1 sibling, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 16:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Xu, Linux Kernel Mailing List, Linux-MM, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Thu, Apr 9, 2020 at 12:03 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> This patch however doesn't go all the way to revert it because 0 return
> value is impossible.

I'm not convinced it's impossible. And if it is, then the current code
is harmless.

Now, I do agree that we probably should go through and clarify the
whole range of different get_user_pages() cases of returning zero (or
not doing so), but right now it's so confusing that I'd prefer to keep
that (possibly unnecessary) belt-and-suspenders check for zero in
there.

If/when somebody actually does a real audit and the result is "these
functions cannot return zero" and it's documented, then we can remove
those checks.

Ok?

              Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 16:32     ` Linus Torvalds
@ 2020-04-09 16:58       ` Qian Cai
  2020-04-09 17:05         ` Linus Torvalds
  2020-04-09 23:29       ` Stephen Rothwell
  1 sibling, 1 reply; 56+ messages in thread
From: Qian Cai @ 2020-04-09 16:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM



> On Apr 9, 2020, at 12:32 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> Kicking trees out of linux-next and making noise if they cause syzbot
> failures might also make some maintainers react more..

On the other hand, this makes me worry who is testing on linux-next every day. The worst nightmare I am having right now is some maintainers pick up commits that only have been in -next for a few days and then push to the mainline but then it is becoming my burden to fix those commits in case they introduced regressions because it is much harder to revert patches once in mainline.

Kicking out of trees in linux-next on the other hand could make the situation worst unless we have a counter solution that make sure commits must be in -next for a certain time (a month?) before merged in mainline.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 16:58       ` Qian Cai
@ 2020-04-09 17:05         ` Linus Torvalds
  2020-04-09 17:58           ` Qian Cai
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 17:05 UTC (permalink / raw)
  To: Qian Cai
  Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM

On Thu, Apr 9, 2020 at 9:58 AM Qian Cai <cai@lca.pw> wrote:
>
> On the other hand, this makes me worry who is testing on linux-next every day.

Well, probably not very many people outside of robots.

Which is fine, but is also why I'd like robot failures to then be a big deal.

           Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 17:05         ` Linus Torvalds
@ 2020-04-09 17:58           ` Qian Cai
  2020-04-09 18:06             ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Qian Cai @ 2020-04-09 17:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM



> On Apr 9, 2020, at 1:05 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> Well, probably not very many people outside of robots.
> 
> Which is fine, but is also why I'd like robot failures to then be a big deal.

Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human. Thus, the same mistakes could happen again because maintainers could simply push those little or none linux-next exposure patches to mainline with no restrictions. There is a balance to strike.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 17:58           ` Qian Cai
@ 2020-04-09 18:06             ` Linus Torvalds
  2020-04-09 21:14               ` Qian Cai
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 18:06 UTC (permalink / raw)
  To: Qian Cai
  Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM

On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote:
>
> Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human.

Sure. But I'd want to be notified when something gets kicked out, so
that I then know not to pull it.

So it would reduce the exposure of patches, but it would also make
sure those patches then don't make it upstream.

Untested patches is fine - as long as nobody else has to suffer through them.

               Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 13:00     ` Dmitry Vyukov
@ 2020-04-09 18:16       ` Andrew Morton
  2020-04-09 18:53         ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2020-04-09 18:16 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Matthew Wilcox, Peter Xu, LKML, Linus Torvalds, Linux-MM,
	syzkaller-bugs, syzkaller

On Thu, 9 Apr 2020 15:00:20 +0200 Dmitry Vyukov <dvyukov@google.com> wrote:

> On Thu, Apr 9, 2020 at 1:49 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Apr 08, 2020 at 05:47:32PM -0700, Andrew Morton wrote:
> > > On Tue,  7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote:
> > >
> > > > The two patches should fix below syzbot reports:
> > > >
> > > >   BUG: unable to handle kernel paging request in kernel_get_mempolicy
> > > >   https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/
> > > >
> > > >   WARNING: bad unlock balance in __get_user_pages_remote
> > > >   https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/
> > >
> > > (Is there an email address for the syzbot operators?)
> >
> > I'd suggest syzkaller-bugs@googlegroups.com (added to the Cc).
> 
> syzkaller@googlegroups.com is a better one.
> syzkaller-bugs@googlegroups.com plays more of an LKML role.
> 
> > But there's a deeper problem in that we don't have anywhere to stash
> > that kind of information in the kernel tree right now.  Perhaps a special
> > entry in the MAINTAINERS file for bot operators?  Or one entry per bot?
> 
> I don't mind adding syzkaller. Some time ago I wanted to contact
> KernelCI, CKI, LKFT, 0-day owners, finding relevant lists wasn't
> impossible, but for some it was hard.
> 
> For syzkaller it would be:
> 
> https://github.com/google/syzkaller/issues for bugs/feature requests.
> syzkaller@googlegroups.com for discussions.

OK, thanks.  A MAINTAINERS entry would be great.

Could I please direct attention back to my original question regarding
the problems we've recently discovered in 4426e945df58 ("mm/gup: allow
VM_FAULT_RETRY for multiple times") and 71335f37c5e8 ("mm/gup: allow to
react to fatal signals")?

> sysbot does test linux-next, yet these patches sat in linux-next for a
> month without a peep, but all hell broke loose when they hit Linus's
> tree.  How could this have happened?
> 
> Possibly I've been carrying a later patch which fixed all this up, but
> I'm not seeing anything like that.  Nothing at all against mm/gup.c.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 18:16       ` Andrew Morton
@ 2020-04-09 18:53         ` Linus Torvalds
  2020-04-09 19:12           ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 18:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dmitry Vyukov, Matthew Wilcox, Peter Xu, LKML, Linux-MM,
	syzkaller-bugs, syzkaller

On Thu, Apr 9, 2020 at 11:16 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Could I please direct attention back to my original question regarding
> the problems we've recently discovered in 4426e945df58 ("mm/gup: allow
> VM_FAULT_RETRY for multiple times") and 71335f37c5e8 ("mm/gup: allow to
> react to fatal signals")?

What earlier question? The "how could this happen" one?

Dmitry already answered that one - are you perhaps missing the emails?

linux-next has apparently not worked at all for over a month. So it
got no testing at all, and thus also all the gup patches got no
testing in linux-next.

Only when they hit my tree, did they start getting testing. Not
because my tree is the only thing getting tested, but because my tree
is the only tree that _works_.

           Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 18:53         ` Linus Torvalds
@ 2020-04-09 19:12           ` Andrew Morton
  2020-04-09 19:46             ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2020-04-09 19:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Vyukov, Matthew Wilcox, Peter Xu, LKML, Linux-MM,
	syzkaller-bugs, syzkaller

On Thu, 9 Apr 2020 11:53:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Apr 9, 2020 at 11:16 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Could I please direct attention back to my original question regarding
> > the problems we've recently discovered in 4426e945df58 ("mm/gup: allow
> > VM_FAULT_RETRY for multiple times") and 71335f37c5e8 ("mm/gup: allow to
> > react to fatal signals")?
> 
> What earlier question? The "how could this happen" one?
> 
> Dmitry already answered that one - are you perhaps missing the emails?

Yup, email threading got broken.

> linux-next has apparently not worked at all for over a month. So it
> got no testing at all, and thus also all the gup patches got no
> testing in linux-next.
> 
> Only when they hit my tree, did they start getting testing. Not
> because my tree is the only thing getting tested, but because my tree
> is the only tree that _works_.
> 

And now the challenge is to protect your tree from the bad patches.

https://groups.google.com/forum/#!msg/syzkaller-bugs/phowYdNXHck/qU1P0TsjBAAJ
points at

net/openvswitch/conntrack.c
net/bluetooth/l2cap_sock.c
sound/core/oss/pcm_plugin.c

and other things, but it's 2+ weeks old.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 19:12           ` Andrew Morton
@ 2020-04-09 19:46             ` Linus Torvalds
  2020-04-09 19:56               ` Matthew Wilcox
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dmitry Vyukov, Matthew Wilcox, Peter Xu, LKML, Linux-MM,
	syzkaller-bugs, syzkaller

On Thu, Apr 9, 2020 at 12:12 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> And now the challenge is to protect your tree from the bad patches.

Well, right now, yes.

But in the longer term, I think we want to protect linux-next from the
bad patches so that they don't poison the testing that the bots can
do.

So that's why I suggested that linux-next and syzbot have some
protocol to have things that cause syzbot pain to be removed from
linux-next more aggressively.

              Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 19:46             ` Linus Torvalds
@ 2020-04-09 19:56               ` Matthew Wilcox
  2020-04-09 19:58                 ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Wilcox @ 2020-04-09 19:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM,
	syzkaller-bugs, Stephen Rothwell

On Thu, Apr 09, 2020 at 12:46:08PM -0700, Linus Torvalds wrote:
> On Thu, Apr 9, 2020 at 12:12 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > And now the challenge is to protect your tree from the bad patches.
> 
> Well, right now, yes.
> 
> But in the longer term, I think we want to protect linux-next from the
> bad patches so that they don't poison the testing that the bots can
> do.
> 
> So that's why I suggested that linux-next and syzbot have some
> protocol to have things that cause syzbot pain to be removed from
> linux-next more aggressively.

We should probably give Stephen a cc here ...


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 19:56               ` Matthew Wilcox
@ 2020-04-09 19:58                 ` Linus Torvalds
  2020-04-09 20:27                   ` Eric Biggers
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 19:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM,
	syzkaller-bugs, Stephen Rothwell

On Thu, Apr 9, 2020 at 12:56 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> We should probably give Stephen a cc here ...

Heh. I already did, but then that got broken because Andrew had lost
that part of the thread and the discussion re-started.

So Stephen was already cc'd for my original request to have linux-next
kick things out aggressively.

            Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 19:58                 ` Linus Torvalds
@ 2020-04-09 20:27                   ` Eric Biggers
  2020-04-09 20:34                     ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Eric Biggers @ 2020-04-09 20:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Andrew Morton, Dmitry Vyukov, Peter Xu, LKML,
	Linux-MM, syzkaller-bugs, Stephen Rothwell

On Thu, Apr 09, 2020 at 12:58:48PM -0700, Linus Torvalds wrote:
> On Thu, Apr 9, 2020 at 12:56 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > We should probably give Stephen a cc here ...
> 
> Heh. I already did, but then that got broken because Andrew had lost
> that part of the thread and the discussion re-started.
> 
> So Stephen was already cc'd for my original request to have linux-next
> kick things out aggressively.
> 

Well, if (for example) we look at
"linux-next test error: WARNING: suspicious RCU usage in ovs_ct_exit"
(https://lkml.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com/),
it was sent to the maintainers of net/openvswitch/ where the warning occurred.
It was then ignored.

Would it help if bugs blocking testing on linux-next were Cc'ed to
linux-next@vger.kernel.org, so that Stephen could investigate?

FWIW, the issue of "syzbot report sent and ignored for months/years" is actually
a much broader one which applies to all bugs, not just build / test breakages.
There are tons of open bugs on https://syzkaller.appspot.com/upstream which are
definitely still valid (sort by "Last" occurred).  Long-term, to fix this we
really need syzbot to start sending reminders.  But first there's work needed to
make the noise level low enough so that people don't just tune them out.

- Eric


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 20:27                   ` Eric Biggers
@ 2020-04-09 20:34                     ` Linus Torvalds
  2020-04-09 23:34                       ` Stephen Rothwell
  2020-04-10  1:11                       ` Theodore Y. Ts'o
  0 siblings, 2 replies; 56+ messages in thread
From: Linus Torvalds @ 2020-04-09 20:34 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Matthew Wilcox, Andrew Morton, Dmitry Vyukov, Peter Xu, LKML,
	Linux-MM, syzkaller-bugs, Stephen Rothwell

On Thu, Apr 9, 2020 at 1:27 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Would it help if bugs blocking testing on linux-next were Cc'ed to
> linux-next@vger.kernel.org, so that Stephen could investigate?

Maybe. I'll let Stephen say.

But I think the big issue is the "blocking testing" part.

If it's "just" regular bugs, then:

> FWIW, the issue of "syzbot report sent and ignored for months/years" is actually
> a much broader one which applies to all bugs, not just build / test breakages.

I don't  know what to do about that, but it may be that people just
don't judge the bugs interesting or assume that they are old.

That's what made bugzilla so useless - being flooded with stale bugs
that might not be worth worrying about, and no way to really tell.

So old bugs generally should be aged out, and then if they still
happen, prioritized. With "this keeps us from even finding new bugs"
being a fairly high priority..

One de-motivational issue with syzbot reported bugs may be that they
sometimes get sent to the wrong set of people - but still wide enough
that everybody feels it's somebody elses issue. A kind of bystander
effect for bugs.

                Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 18:06             ` Linus Torvalds
@ 2020-04-09 21:14               ` Qian Cai
  2020-04-10 13:12                 ` Tetsuo Handa
  0 siblings, 1 reply; 56+ messages in thread
From: Qian Cai @ 2020-04-09 21:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM



> On Apr 9, 2020, at 2:06 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote:
>> 
>> Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human.
> 
> Sure. But I'd want to be notified when something gets kicked out, so
> that I then know not to pull it.
> 
> So it would reduce the exposure of patches, but it would also make
> sure those patches then don't make it upstream.
> 
> Untested patches is fine - as long as nobody else has to suffer through them.

Excellent. It now very much depends on how Stephen will notify you when
a tree, a patchset or even a developer should be blacklisted for some time
to make this a success.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 16:32     ` Linus Torvalds
  2020-04-09 16:58       ` Qian Cai
@ 2020-04-09 23:29       ` Stephen Rothwell
  2020-04-13 22:06         ` Qian Cai
  2020-04-14  4:07         ` Hillf Danton
  1 sibling, 2 replies; 56+ messages in thread
From: Stephen Rothwell @ 2020-04-09 23:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dmitry Vyukov, Andrew Morton, Peter Xu, LKML, Linux-MM

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

Hi Linus,

On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > linux-next is boot-broken for more than a month and bugs are piling
> > onto bugs, I've seen at least 3 different ones.
> > syzbot can't get any working linux-next build for testing for a very
> > long time now.  
> 
> Ouch.
> 
> Ok, that's not good. It means that linux-next has basically only done
> build-testing this whole cycle.

Well, there are other CI's beyond syzbot .. Does syzbot only build/test
a single kernel arch/config?

> Stephen, Dmitry - is there some way linux-next could possibly kick out
> trees more aggressively if syzbot can't even boot?

Of course that could be done if I knew that there were problems.  From
memory and my mail archives, I was only cc'd on 3 problems by sysbot
since last November and they were all responded to by the appropriate
maintainers/developers.

Currently, when I am cc'd on reports, if they are also sent to who
seem like the appropriate people, I just file the report assuming it
will be dealt with.

> Kicking trees out of linux-next and making noise if they cause syzbot
> failures might also make some maintainers react more..

That may be true, but in some cases I have carried fixups/reverts/older
versions of trees for quite some time before things get fixed.  But at
least if that happens, I do tend to remind people.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 20:34                     ` Linus Torvalds
@ 2020-04-09 23:34                       ` Stephen Rothwell
  2020-04-10  1:11                       ` Theodore Y. Ts'o
  1 sibling, 0 replies; 56+ messages in thread
From: Stephen Rothwell @ 2020-04-09 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Biggers, Matthew Wilcox, Andrew Morton, Dmitry Vyukov,
	Peter Xu, LKML, Linux-MM, syzkaller-bugs

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

Hi all,

On Thu, 9 Apr 2020 13:34:18 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 9, 2020 at 1:27 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Would it help if bugs blocking testing on linux-next were Cc'ed to
> > linux-next@vger.kernel.org, so that Stephen could investigate?  
> 
> Maybe. I'll let Stephen say.

It would certainly help so I could at least chase people and maybe
revert commits.  Dropping trees can be problematic once Andrew has
built his quilt series on top of them, so I try not to drop whole trees.
I can also use a previous version of a tree (which is usually what I
do if I discover a build problem).

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 20:34                     ` Linus Torvalds
  2020-04-09 23:34                       ` Stephen Rothwell
@ 2020-04-10  1:11                       ` Theodore Y. Ts'o
  1 sibling, 0 replies; 56+ messages in thread
From: Theodore Y. Ts'o @ 2020-04-10  1:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Biggers, Matthew Wilcox, Andrew Morton, Dmitry Vyukov,
	Peter Xu, LKML, Linux-MM, syzkaller-bugs, Stephen Rothwell

On Thu, Apr 09, 2020 at 01:34:18PM -0700, Linus Torvalds wrote:
> > FWIW, the issue of "syzbot report sent and ignored for months/years" is actually
> > a much broader one which applies to all bugs, not just build / test breakages.
> 
> I don't  know what to do about that, but it may be that people just
> don't judge the bugs interesting or assume that they are old.

Syzkaller bugs which requuire (a) root privileges to trigger, or (b)
require a deliberately corrupted file system are things which I don't
consider super interesting.  (For the latter, I'll usually wait for
some other file system fuzzer to find it, such as Hydra, because
Syzkaller makes it painful extract out the file system image, where as
other file system fuzzers are *much* more file system developer
friendly.)

This shouldn't be a surprise to Dmitry, because I've given these
feedbacks to him before.

It would be nice if there was some way we could triage Syzkaller bugs
into different buckets (requires root, lower to P2; requires a
corrupted file system image, lower to P2).  Unfortunately, that would
require Syzkaller to have some kind of login system and way to track
state, and Dmitry doesn't want to replicate the functionality of a bug
tracker.

> That's what made bugzilla so useless - being flooded with stale bugs
> that might not be worth worrying about, and no way to really tell.

At least with Bugzilla, it becomes possible to attach priorities and
flags to them, instead of trying to assume that developers should
treat all Syzkaller bugs as the same priority.  Because when you do
insist that all bugs be treated as high priority, many people will
just treat them *all* as a P2 bug, especially when there are so many.

      	   	       	    	       	     	- Ted


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 21:14               ` Qian Cai
@ 2020-04-10 13:12                 ` Tetsuo Handa
  2020-04-10 14:26                   ` Qian Cai
  0 siblings, 1 reply; 56+ messages in thread
From: Tetsuo Handa @ 2020-04-10 13:12 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton,
	Peter Xu, LKML, Linux-MM

On 2020/04/10 6:14, Qian Cai wrote:
> 
> 
>> On Apr 9, 2020, at 2:06 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote:
>>>
>>> Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human.
>>
>> Sure. But I'd want to be notified when something gets kicked out, so
>> that I then know not to pull it.
>>
>> So it would reduce the exposure of patches, but it would also make
>> sure those patches then don't make it upstream.
>>
>> Untested patches is fine - as long as nobody else has to suffer through them.
> 
> Excellent. It now very much depends on how Stephen will notify you when
> a tree, a patchset or even a developer should be blacklisted for some time
> to make this a success.
> 

Since patch flow forms tree structure, I don't know whether maintainers can
afford remembering which tree, patchset or developer should be blacklisted
when problems come from leaf git trees.



By the way...

Removing problematic trees might confuse "#syz test:" request, for
developers might ask syzbot to test proposed patches on a kernel which
does not contain problematic trees. In lucky case, test request fails
as patch failure or build failure. But in unlucky case, syzbot fails to
detect that proposed patch was tested on a kernel without problematic
trees. A bit related to https://github.com/google/syzkaller/issues/1609 .


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-10 13:12                 ` Tetsuo Handa
@ 2020-04-10 14:26                   ` Qian Cai
  2020-04-10 17:26                     ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Qian Cai @ 2020-04-10 14:26 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dmitry Vyukov, Linus Torvalds, Stephen Rothwell, Andrew Morton,
	Peter Xu, LKML, Linux-MM



> On Apr 10, 2020, at 9:12 AM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
> On 2020/04/10 6:14, Qian Cai wrote:
>> 
>> 
>>> On Apr 9, 2020, at 2:06 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>> 
>>> On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote:
>>>> 
>>>> Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human.
>>> 
>>> Sure. But I'd want to be notified when something gets kicked out, so
>>> that I then know not to pull it.
>>> 
>>> So it would reduce the exposure of patches, but it would also make
>>> sure those patches then don't make it upstream.
>>> 
>>> Untested patches is fine - as long as nobody else has to suffer through them.
>> 
>> Excellent. It now very much depends on how Stephen will notify you when
>> a tree, a patchset or even a developer should be blacklisted for some time
>> to make this a success.
>> 
> 
> Since patch flow forms tree structure, I don't know whether maintainers can
> afford remembering which tree, patchset or developer should be blacklisted
> when problems come from leaf git trees.
> 
> 
> 
> By the way...
> 
> Removing problematic trees might confuse "#syz test:" request, for
> developers might ask syzbot to test proposed patches on a kernel which
> does not contain problematic trees. In lucky case, test request fails
> as patch failure or build failure. But in unlucky case, syzbot fails to
> detect that proposed patch was tested on a kernel without problematic
> trees. A bit related to https://github.com/google/syzkaller/issues/1609 .
> 

I looked at those blocking bug list sent by Dmitry. I wonder “boys, why they
did’t send those out earlier to linux-next or somewhere more visible?” because
I had dealt with most of those before, and I knew the solutions to unblock them!

Even though my testing setup is somewhat different from syzbot. I don’t do
fuzzers, and my config is only focus on mm, iommu and a few core kernel pieces
with more debugging options on, but it does bare-metal and multi-arch, there are
still lots of opportunities to help each other with dealing with blocking issues.

A few things I am doing differently with syzbot on linux-next where would help to
be run continuous without blocking most of the time are,

I don’t set panic_on_warn. I’ll deal with warnings afterwards.

Occasionally, there are hard failures that I have to deal with right now. I’ll get to
the end of it, and figured out the exact commit caused it.

In syzbot mode, the bisection (by robot) is the hard part, because if you don’t
figure out the exact commit, most of times people (CC by the bug reports) would
have no clue and will be ignored. (even if the bad commit was figured out, it is
not 100% guaranteed developers would know what’s going on but it helps
dramatically, and at least we can revert it without blocking if everything else fails).

Thus, it would be really help if syzbot (or human operators) could help bisect, even
if it could only figure out one of merge commit in linux-next is bad (where with high
accuracy) and may get those ignored less.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-10 14:26                   ` Qian Cai
@ 2020-04-10 17:26                     ` Andrew Morton
  2020-04-10 19:46                       ` Qian Cai
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2020-04-10 17:26 UTC (permalink / raw)
  To: Qian Cai
  Cc: Tetsuo Handa, Dmitry Vyukov, Linus Torvalds, Stephen Rothwell,
	Peter Xu, LKML, Linux-MM

On Fri, 10 Apr 2020 10:26:23 -0400 Qian Cai <cai@lca.pw> wrote:

> I don't set panic_on_warn. I'll deal with warnings afterwards.

I'm not understanding why sysbot sets panic_on_warn.  This decision
will needlessly turn many kernel errors into wont-boot situations and
will block further testing?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-10 17:26                     ` Andrew Morton
@ 2020-04-10 19:46                       ` Qian Cai
  0 siblings, 0 replies; 56+ messages in thread
From: Qian Cai @ 2020-04-10 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tetsuo Handa, Dmitry Vyukov, Linus Torvalds, Stephen Rothwell,
	Peter Xu, LKML, Linux-MM



> On Apr 10, 2020, at 1:26 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Fri, 10 Apr 2020 10:26:23 -0400 Qian Cai <cai@lca.pw> wrote:
> 
>> I don't set panic_on_warn. I'll deal with warnings afterwards.
> 
> I'm not understanding why sysbot sets panic_on_warn.  This decision
> will needlessly turn many kernel errors into wont-boot situations and
> will block further testing?

I can feel that it is very reasonable to set panic_on_warn for the fully
automatic systems because once those warnings happen, the rest of
things can no longer to be trusted, so the goal to kill the first enemy
on sight, and then deal the next one.

It could be a good idea for some trees more stable like the mainline,
but for linux-next, I could only dream of set it one day.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 23:29       ` Stephen Rothwell
@ 2020-04-13 22:06         ` Qian Cai
  2020-04-13 23:05           ` Jens Axboe
  2020-04-14 11:12           ` Dmitry Vyukov
  2020-04-14  4:07         ` Hillf Danton
  1 sibling, 2 replies; 56+ messages in thread
From: Qian Cai @ 2020-04-13 22:06 UTC (permalink / raw)
  To: Linus Torvalds, Stephen Rothwell, Andrew Morton
  Cc: Dmitry Vyukov, Peter Xu, LKML, Linux-MM, Jens Axboe,
	Christoph Lameter, Johannes Weiner



> On Apr 9, 2020, at 7:29 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
> Hi Linus,
> 
> On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> 
>> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>> 
>>> linux-next is boot-broken for more than a month and bugs are piling
>>> onto bugs, I've seen at least 3 different ones.
>>> syzbot can't get any working linux-next build for testing for a very
>>> long time now.  
>> 
>> Ouch.
>> 
>> Ok, that's not good. It means that linux-next has basically only done
>> build-testing this whole cycle.
> 
> Well, there are other CI's beyond syzbot .. Does syzbot only build/test
> a single kernel arch/config?
> 
>> Stephen, Dmitry - is there some way linux-next could possibly kick out
>> trees more aggressively if syzbot can't even boot?
> 
> Of course that could be done if I knew that there were problems.  From
> memory and my mail archives, I was only cc'd on 3 problems by sysbot
> since last November and they were all responded to by the appropriate
> maintainers/developers.
> 
> Currently, when I am cc'd on reports, if they are also sent to who
> seem like the appropriate people, I just file the report assuming it
> will be dealt with.
> 
>> Kicking trees out of linux-next and making noise if they cause syzbot
>> failures might also make some maintainers react more..
> 
> That may be true, but in some cases I have carried fixups/reverts/older
> versions of trees for quite some time before things get fixed.  But at
> least if that happens, I do tend to remind people.

BTW, I’ll be adding fuzzers to my daily linux-next routines where it triggers this
io_uring/scheduler bug almost immediately, so hopefully would buy syzbot some
time to resume on linux-next.

[67493.516737][T211750] BUG: unable to handle page fault for address: ffffffffffffffe8
[67493.557315][T211750] #PF: supervisor read access in kernel mode
[67493.586726][T211750] #PF: error_code(0x0000) - not-present page
[67493.614434][T211750] PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0 
[67493.644846][T211750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[67493.674127][T211750] CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G    B        L    5.7.0-rc1-next-20200413 #4
[67493.722516][T211750] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017
[67493.764925][T211750] RIP: 0010:__wake_up_common+0x98/0x290
__wake_up_common at kernel/sched/wait.c:87
[67493.790675][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
[67493.881650][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
[67493.909854][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
[67493.947131][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
[67493.983829][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
[67494.020861][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
[67494.059249][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
[67494.099699][T211750] FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
[67494.141858][T211750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[67494.172660][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
[67494.209760][T211750] Call Trace:
[67494.224720][T211750]  __wake_up_common_lock+0xea/0x150
(inlined by) __wake_up_common_lock at kernel/sched/wait.c:124
[67494.248753][T211750]  ? __wake_up_common+0x290/0x290
[67494.272014][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
[67494.296139][T211750]  __wake_up+0x13/0x20
[67494.314946][T211750]  io_cqring_ev_posted+0x75/0xe0
(inlined by) io_cqring_ev_posted at fs/io_uring.c:1160
[67494.337726][T211750]  io_ring_ctx_wait_and_kill+0x1c0/0x2f0
io_ring_ctx_wait_and_kill at fs/io_uring.c:7305
[67494.363840][T211750]  io_uring_create+0xa8d/0x13b0
[67494.386526][T211750]  ? io_req_defer_prep+0x990/0x990
[67494.410119][T211750]  ? __kasan_check_write+0x14/0x20
[67494.433646][T211750]  io_uring_setup+0xb8/0x130
[67494.454870][T211750]  ? io_uring_create+0x13b0/0x13b0
[67494.478342][T211750]  ? check_flags.part.28+0x220/0x220
[67494.502947][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
[67494.526965][T211750]  __x64_sys_io_uring_setup+0x31/0x40
[67494.551820][T211750]  do_syscall_64+0xcc/0xaf0
[67494.574829][T211750]  ? syscall_return_slowpath+0x580/0x580
[67494.604591][T211750]  ? lockdep_hardirqs_off+0x1f/0x140
[67494.628901][T211750]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
[67494.657616][T211750]  ? trace_hardirqs_off_caller+0x3a/0x150
[67494.683999][T211750]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[67494.709982][T211750]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[67494.737167][T211750] RIP: 0033:0x7fdcb9dd76ed
[67494.757698][T211750] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
[67494.849485][T211750] RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
[67494.887906][T211750] RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed
[67494.924754][T211750] RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54
[67494.961516][T211750] RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000
[67494.998485][T211750] R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002
[67495.035510][T211750] R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000
[67495.072679][T211750] Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc]
[67495.221120][T211750] CR2: ffffffffffffffe8
[67495.240237][T211750] ---[ end trace f9502383d57e0e22 ]---
[67495.265301][T211750] RIP: 0010:__wake_up_common+0x98/0x290
[67495.290903][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
[67495.382302][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
[67495.410551][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
[67495.447570][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
[67495.484252][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
[67495.521068][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
[67495.557461][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
[67495.594607][T211750] FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
[67495.639332][T211750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[67495.669033][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
[67495.704569][T211750] Kernel panic - not syncing: Fatal exception
[67495.731758][T211750] Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[67495.784988][T211750] ---[ end Kernel panic - not syncing: Fatal exception ]—

Also, I’ll need to deal with this slub sysfs and memcg lockdep splat first, so lockdep would still be functioning during the fuzzing.

[ 8137.254287][T53013] WARNING: possible circular locking dependency detected
[ 8137.261231][T53013] 5.7.0-rc1-next-20200413+ #2 Not tainted
[ 8137.266981][T53013] ------------------------------------------------------
[ 8137.274016][T53013] trinity-c10/53013 is trying to acquire lock:
[ 8137.280127][T53013] ffffffff89ad2968 (slab_mutex){+.+.}-{3:3}, at: slab_attr_store+0x79/0xf0
[ 8137.288660][T53013] 
[ 8137.288660][T53013] but task is already holding lock:
[ 8137.295944][T53013] ffff9ea421940dd8 (kn->count#88){++++}-{0:0}, at: kernfs_fop_write+0x10e/0x280
[ 8137.305004][T53013] 
[ 8137.305004][T53013] which lock already depends on the new lock.
[ 8137.305004][T53013] 
[ 8137.315347][T53013] 
[ 8137.315347][T53013] the existing dependency chain (in reverse order) is:
[ 8137.324470][T53013] 
[ 8137.324470][T53013] -> #1 (kn->count#88){++++}-{0:0}:
[ 8137.331774][T53013]        __kernfs_remove+0x3bb/0x420
[ 8137.336977][T53013]        kernfs_remove+0x2c/0x40
[ 8137.341829][T53013]        sysfs_remove_dir+0x7e/0x90
[ 8137.346988][T53013]        kobject_del+0x60/0xb0
[ 8137.351663][T53013]        sysfs_slab_unlink+0x1c/0x20
[ 8137.356853][T53013]        shutdown_cache+0x155/0x1c0
[ 8137.361958][T53013]        kmemcg_cache_shutdown_fn+0xe/0x20
[ 8137.367756][T53013]        kmemcg_workfn+0x35/0x50
[ 8137.372705][T53013]        process_one_work+0x560/0xba0
[ 8137.377985][T53013]        worker_thread+0x80/0x5f0
[ 8137.382920][T53013]        kthread+0x1db/0x200
[ 8137.387467][T53013]        ret_from_fork+0x27/0x50
[ 8137.392296][T53013] 
[ 8137.392296][T53013] -> #0 (slab_mutex){+.+.}-{3:3}:
[ 8137.399587][T53013]        __lock_acquire+0x1673/0x23a0
[ 8137.404865][T53013]        lock_acquire+0xcd/0x410
[ 8137.409710][T53013]        __mutex_lock+0xc9/0xbf0
[ 8137.414561][T53013]        mutex_lock_nested+0x31/0x40
[ 8137.419823][T53013]        slab_attr_store+0x79/0xf0
[ 8137.424991][T53013]        sysfs_kf_write+0x9a/0xb0
[ 8137.429919][T53013]        kernfs_fop_write+0x15e/0x280
[ 8137.435272][T53013]        do_iter_write+0x261/0x2c0
[ 8137.440291][T53013]        vfs_writev+0xe6/0x170
[ 8137.444995][T53013]        do_pwritev+0xab/0xd0
[ 8137.449731][T53013]        __x64_sys_pwritev+0x61/0x80
[ 8137.455079][T53013]        do_syscall_64+0x91/0xb10
[ 8137.460010][T53013]        entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 8137.466330][T53013] 
[ 8137.466330][T53013] other info that might help us debug this:
[ 8137.466330][T53013] 
[ 8137.476695][T53013]  Possible unsafe locking scenario:
[ 8137.476695][T53013] 
[ 8137.484067][T53013]        CPU0                    CPU1
[ 8137.489337][T53013]        ----                    ----
[ 8137.494609][T53013]   lock(kn->count#88);
[ 8137.498718][T53013]                                lock(slab_mutex);
[ 8137.505130][T53013]                                lock(kn->count#88);
[ 8137.511718][T53013]   lock(slab_mutex);
[ 8137.515594][T53013] 
[ 8137.515594][T53013]  *** DEADLOCK ***
[ 8137.515594][T53013] 
[ 8137.523780][T53013] 3 locks held by trinity-c10/53013:
[ 8137.528967][T53013]  #0: ffff9ea42c841430 (sb_writers#3){.+.+}-{0:0}, at: vfs_writev+0x13f/0x170
[ 8137.537847][T53013]  #1: ffff9ea3c2c9c288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write+0xfe/0x280
[ 8137.547149][T53013]  #2: ffff9ea421940dd8 (kn->count#88){++++}-{0:0}, at: kernfs_fop_write+0x10e/0x280
[ 8137.556554][T53013] 
[ 8137.556554][T53013] stack backtrace:
[ 8137.562360][T53013] CPU: 22 PID: 53013 Comm: trinity-c10 Not tainted 5.7.0-rc1-next-20200413+ #2
[ 8137.571216][T53013] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[ 8137.580670][T53013] Call Trace:
[ 8137.583865][T53013]  dump_stack+0xa4/0x100
[ 8137.588105][T53013]  print_circular_bug.cold.49+0x13c/0x141
[ 8137.593746][T53013]  check_noncircular+0x183/0x1a0
[ 8137.598818][T53013]  ? do_raw_spin_unlock+0x10b/0x1c0
[ 8137.603971][T53013]  __lock_acquire+0x1673/0x23a0
[ 8137.608732][T53013]  ? print_unreferenced+0x224/0x230
[ 8137.613846][T53013]  lock_acquire+0xcd/0x410
[ 8137.618169][T53013]  ? slab_attr_store+0x79/0xf0
[ 8137.619804][T52801] VFS: "mand" mount option not supported
[ 8137.623085][T53013]  __mutex_lock+0xc9/0xbf0
[ 8137.632939][T53013]  ? slab_attr_store+0x79/0xf0
[ 8137.637679][T53013]  ? _find_next_bit.constprop.1+0xd0/0x100
[ 8137.643477][T53013]  ? slab_attr_store+0x79/0xf0
[ 8137.648252][T53013]  ? find_next_bit+0x36/0x40
[ 8137.652748][T53013]  ? cpumask_next+0x46/0x60
[ 8137.657155][T53013]  mutex_lock_nested+0x31/0x40
[ 8137.661897][T53013]  ? slab_attr_show+0x30/0x30
[ 8137.666476][T53013]  ? mutex_lock_nested+0x31/0x40
[ 8137.671373][T53013]  slab_attr_store+0x79/0xf0
[ 8137.676035][T53013]  ? slab_attr_show+0x30/0x30
[ 8137.680672][T53013]  sysfs_kf_write+0x9a/0xb0
[ 8137.685085][T53013]  ? sysfs_file_ops+0xb0/0xb0
[ 8137.689673][T53013]  kernfs_fop_write+0x15e/0x280
[ 8137.694433][T53013]  do_iter_write+0x261/0x2c0
[ 8137.698997][T53013]  vfs_writev+0xe6/0x170
[ 8137.703142][T53013]  ? lock_acquire+0xcd/0x410
[ 8137.707696][T53013]  ? find_held_lock+0x35/0xa0
[ 8137.712279][T53013]  ? __fget_light+0xa3/0x170
[ 8137.716897][T53013]  do_pwritev+0xab/0xd0
[ 8137.720960][T53013]  __x64_sys_pwritev+0x61/0x80
[ 8137.725808][T53013]  do_syscall_64+0x91/0xb10
[ 8137.730274][T53013]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 8137.735733][T53013]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 8137.741588][T53013] RIP: 0033:0x7f54d0a386ed

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-13 22:06         ` Qian Cai
@ 2020-04-13 23:05           ` Jens Axboe
  2020-04-14 11:12           ` Dmitry Vyukov
  1 sibling, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2020-04-13 23:05 UTC (permalink / raw)
  To: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton
  Cc: Dmitry Vyukov, Peter Xu, LKML, Linux-MM, Christoph Lameter,
	Johannes Weiner

On 4/13/20 4:06 PM, Qian Cai wrote:
> 
> 
>> On Apr 9, 2020, at 7:29 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>> Hi Linus,
>>
>> On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>
>>> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>
>>>> linux-next is boot-broken for more than a month and bugs are piling
>>>> onto bugs, I've seen at least 3 different ones.
>>>> syzbot can't get any working linux-next build for testing for a very
>>>> long time now.  
>>>
>>> Ouch.
>>>
>>> Ok, that's not good. It means that linux-next has basically only done
>>> build-testing this whole cycle.
>>
>> Well, there are other CI's beyond syzbot .. Does syzbot only build/test
>> a single kernel arch/config?
>>
>>> Stephen, Dmitry - is there some way linux-next could possibly kick out
>>> trees more aggressively if syzbot can't even boot?
>>
>> Of course that could be done if I knew that there were problems.  From
>> memory and my mail archives, I was only cc'd on 3 problems by sysbot
>> since last November and they were all responded to by the appropriate
>> maintainers/developers.
>>
>> Currently, when I am cc'd on reports, if they are also sent to who
>> seem like the appropriate people, I just file the report assuming it
>> will be dealt with.
>>
>>> Kicking trees out of linux-next and making noise if they cause syzbot
>>> failures might also make some maintainers react more..
>>
>> That may be true, but in some cases I have carried fixups/reverts/older
>> versions of trees for quite some time before things get fixed.  But at
>> least if that happens, I do tend to remind people.
> 
> BTW, I’ll be adding fuzzers to my daily linux-next routines where it
> triggers this io_uring/scheduler bug almost immediately, so hopefully
> would buy syzbot some time to resume on linux-next.
> 
> [67493.516737][T211750] BUG: unable to handle page fault for address: ffffffffffffffe8
> [67493.557315][T211750] #PF: supervisor read access in kernel mode
> [67493.586726][T211750] #PF: error_code(0x0000) - not-present page
> [67493.614434][T211750] PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0 
> [67493.644846][T211750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [67493.674127][T211750] CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G    B        L    5.7.0-rc1-next-20200413 #4
> [67493.722516][T211750] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017
> [67493.764925][T211750] RIP: 0010:__wake_up_common+0x98/0x290
> __wake_up_common at kernel/sched/wait.c:87
> [67493.790675][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
> [67493.881650][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
> [67493.909854][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
> [67493.947131][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
> [67493.983829][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
> [67494.020861][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
> [67494.059249][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
> [67494.099699][T211750] FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
> [67494.141858][T211750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [67494.172660][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
> [67494.209760][T211750] Call Trace:
> [67494.224720][T211750]  __wake_up_common_lock+0xea/0x150
> (inlined by) __wake_up_common_lock at kernel/sched/wait.c:124
> [67494.248753][T211750]  ? __wake_up_common+0x290/0x290
> [67494.272014][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
> [67494.296139][T211750]  __wake_up+0x13/0x20
> [67494.314946][T211750]  io_cqring_ev_posted+0x75/0xe0
> (inlined by) io_cqring_ev_posted at fs/io_uring.c:1160
> [67494.337726][T211750]  io_ring_ctx_wait_and_kill+0x1c0/0x2f0
> io_ring_ctx_wait_and_kill at fs/io_uring.c:7305
> [67494.363840][T211750]  io_uring_create+0xa8d/0x13b0
> [67494.386526][T211750]  ? io_req_defer_prep+0x990/0x990
> [67494.410119][T211750]  ? __kasan_check_write+0x14/0x20
> [67494.433646][T211750]  io_uring_setup+0xb8/0x130
> [67494.454870][T211750]  ? io_uring_create+0x13b0/0x13b0
> [67494.478342][T211750]  ? check_flags.part.28+0x220/0x220
> [67494.502947][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
> [67494.526965][T211750]  __x64_sys_io_uring_setup+0x31/0x40
> [67494.551820][T211750]  do_syscall_64+0xcc/0xaf0
> [67494.574829][T211750]  ? syscall_return_slowpath+0x580/0x580
> [67494.604591][T211750]  ? lockdep_hardirqs_off+0x1f/0x140
> [67494.628901][T211750]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
> [67494.657616][T211750]  ? trace_hardirqs_off_caller+0x3a/0x150
> [67494.683999][T211750]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [67494.709982][T211750]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> [67494.737167][T211750] RIP: 0033:0x7fdcb9dd76ed
> [67494.757698][T211750] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
> [67494.849485][T211750] RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> [67494.887906][T211750] RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed
> [67494.924754][T211750] RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54
> [67494.961516][T211750] RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000
> [67494.998485][T211750] R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002
> [67495.035510][T211750] R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000
> [67495.072679][T211750] Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc]
> [67495.221120][T211750] CR2: ffffffffffffffe8
> [67495.240237][T211750] ---[ end trace f9502383d57e0e22 ]---
> [67495.265301][T211750] RIP: 0010:__wake_up_common+0x98/0x290
> [67495.290903][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
> [67495.382302][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
> [67495.410551][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
> [67495.447570][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
> [67495.484252][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
> [67495.521068][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
> [67495.557461][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
> [67495.594607][T211750] FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
> [67495.639332][T211750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [67495.669033][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
> [67495.704569][T211750] Kernel panic - not syncing: Fatal exception
> [67495.731758][T211750] Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [67495.784988][T211750] ---[ end Kernel panic - not syncing: Fatal exception ]—

This looks like it can happen if you fail setting up the ring (you
probably have some error injection enabled), and then
io_poll_remove_all() does an unconditional io_cqring_ev_posted() even if
we didn't post any events.

I think the best fix here is to ensure io_cqring_ev_posted() is only
called if we actually post events. If we did, we know for a fact that
rings have been setup.

I'll queue this up for 5.7.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5190bfb6a665..c0aa72e738b4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4342,7 +4342,7 @@ static void io_poll_remove_all(struct io_ring_ctx *ctx)
 {
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
-	int i;
+	int posted = 0, i;
 
 	spin_lock_irq(&ctx->completion_lock);
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
@@ -4350,11 +4350,12 @@ static void io_poll_remove_all(struct io_ring_ctx *ctx)
 
 		list = &ctx->cancel_hash[i];
 		hlist_for_each_entry_safe(req, tmp, list, hash_node)
-			io_poll_remove_one(req);
+			posted += io_poll_remove_one(req);
 	}
 	spin_unlock_irq(&ctx->completion_lock);
 
-	io_cqring_ev_posted(ctx);
+	if (posted)
+		io_cqring_ev_posted(ctx);
 }
 
 static int io_poll_cancel(struct io_ring_ctx *ctx, __u64 sqe_addr)

-- 
Jens Axboe



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-09 23:29       ` Stephen Rothwell
  2020-04-13 22:06         ` Qian Cai
@ 2020-04-14  4:07         ` Hillf Danton
  2020-04-14  4:31           ` Jens Axboe
  1 sibling, 1 reply; 56+ messages in thread
From: Hillf Danton @ 2020-04-14  4:07 UTC (permalink / raw)
  To: Qian Cai; +Cc: LKML, Linux-MM, Jens Axboe


On Mon, 13 Apr 2020 18:06:21 -0400 Qian Cai wrote:
> 
> BTW, I=E2=80=99ll be adding fuzzers to my daily linux-next routines =
> where it triggers this
> io_uring/scheduler bug almost immediately, so hopefully would buy syzbot =
> some
> time to resume on linux-next.
> 
> [67493.516737][T211750] BUG: unable to handle page fault for address: =
> ffffffffffffffe8
> [67493.557315][T211750] #PF: supervisor read access in kernel mode
> [67493.586726][T211750] #PF: error_code(0x0000) - not-present page
> [67493.614434][T211750] PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0=20=
> 
> [67493.644846][T211750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [67493.674127][T211750] CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: =
> G    B        L    5.7.0-rc1-next-20200413 #4
> [67493.722516][T211750] Hardware name: HP ProLiant DL380 Gen9/ProLiant =
> DL380 Gen9, BIOS P89 04/12/2017
> [67493.764925][T211750] RIP: 0010:__wake_up_common+0x98/0x290
> __wake_up_common at kernel/sched/wait.c:87
> [67493.790675][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 =
> 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 =
> 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 =
> 8b 47 10
> [67493.881650][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
> [67493.909854][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: =
> ffffffffaa9636b8
> [67493.947131][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: =
> ffffffffffffffe8
> [67493.983829][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: =
> fffffbfff582c5fd
> [67494.020861][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: =
> 0000000000000000
> [67494.059249][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: =
> ffffffffffffffe8
> [67494.099699][T211750] FS:  00007fdcba4c4740(0000) =
> GS:ffff889033780000(0000) knlGS:0000000000000000
> [67494.141858][T211750] CS:  0010 DS: 0000 ES: 0000 CR0: =
> 0000000080050033
> [67494.172660][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: =
> 00000000001606e0
> [67494.209760][T211750] Call Trace:
> [67494.224720][T211750]  __wake_up_common_lock+0xea/0x150
> (inlined by) __wake_up_common_lock at kernel/sched/wait.c:124
> [67494.248753][T211750]  ? __wake_up_common+0x290/0x290
> [67494.272014][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
> [67494.296139][T211750]  __wake_up+0x13/0x20
> [67494.314946][T211750]  io_cqring_ev_posted+0x75/0xe0
> (inlined by) io_cqring_ev_posted at fs/io_uring.c:1160
> [67494.337726][T211750]  io_ring_ctx_wait_and_kill+0x1c0/0x2f0
> io_ring_ctx_wait_and_kill at fs/io_uring.c:7305
> [67494.363840][T211750]  io_uring_create+0xa8d/0x13b0
> [67494.386526][T211750]  ? io_req_defer_prep+0x990/0x990
> [67494.410119][T211750]  ? __kasan_check_write+0x14/0x20
> [67494.433646][T211750]  io_uring_setup+0xb8/0x130
> [67494.454870][T211750]  ? io_uring_create+0x13b0/0x13b0
> [67494.478342][T211750]  ? check_flags.part.28+0x220/0x220
> [67494.502947][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
> [67494.526965][T211750]  __x64_sys_io_uring_setup+0x31/0x40
> [67494.551820][T211750]  do_syscall_64+0xcc/0xaf0
> [67494.574829][T211750]  ? syscall_return_slowpath+0x580/0x580
> [67494.604591][T211750]  ? lockdep_hardirqs_off+0x1f/0x140
> [67494.628901][T211750]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
> [67494.657616][T211750]  ? trace_hardirqs_off_caller+0x3a/0x150
> [67494.683999][T211750]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [67494.709982][T211750]  entry_SYSCALL_64_after_hwframe+0x49/0xb3


See if it makes your fuzzers happy.

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -934,6 +934,7 @@ static struct io_ring_ctx *io_ring_ctx_a
 	idr_init(&ctx->personality_idr);
 	mutex_init(&ctx->uring_lock);
 	init_waitqueue_head(&ctx->wait);
+	init_waitqueue_head(&ctx->sqo_wait);
 	spin_lock_init(&ctx->completion_lock);
 	INIT_LIST_HEAD(&ctx->poll_list);
 	INIT_LIST_HEAD(&ctx->defer_list);
@@ -6824,7 +6825,6 @@ static int io_sq_offload_start(struct io
 {
 	int ret;
 
-	init_waitqueue_head(&ctx->sqo_wait);
 	mmgrab(current->mm);
 	ctx->sqo_mm = current->mm;
 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-14  4:07         ` Hillf Danton
@ 2020-04-14  4:31           ` Jens Axboe
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2020-04-14  4:31 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Qian Cai, LKML, Linux-MM


> On Apr 13, 2020, at 10:07 PM, Hillf Danton <hdanton@sina.com> wrote:
> 
> 
>> On Mon, 13 Apr 2020 18:06:21 -0400 Qian Cai wrote:
>> 
>> BTW, I=E2=80=99ll be adding fuzzers to my daily linux-next routines =
>> where it triggers this
>> io_uring/scheduler bug almost immediately, so hopefully would buy syzbot =
>> some
>> time to resume on linux-next.
>> 
>> [67493.516737][T211750] BUG: unable to handle page fault for address: =
>> ffffffffffffffe8
>> [67493.557315][T211750] #PF: supervisor read access in kernel mode
>> [67493.586726][T211750] #PF: error_code(0x0000) - not-present page
>> [67493.614434][T211750] PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0=20=
>> 
>> [67493.644846][T211750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
>> [67493.674127][T211750] CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: =
>> G    B        L    5.7.0-rc1-next-20200413 #4
>> [67493.722516][T211750] Hardware name: HP ProLiant DL380 Gen9/ProLiant =
>> DL380 Gen9, BIOS P89 04/12/2017
>> [67493.764925][T211750] RIP: 0010:__wake_up_common+0x98/0x290
>> __wake_up_common at kernel/sched/wait.c:87
>> [67493.790675][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 =
>> 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 =
>> 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 =
>> 8b 47 10
>> [67493.881650][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
>> [67493.909854][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: =
>> ffffffffaa9636b8
>> [67493.947131][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: =
>> ffffffffffffffe8
>> [67493.983829][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: =
>> fffffbfff582c5fd
>> [67494.020861][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: =
>> 0000000000000000
>> [67494.059249][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: =
>> ffffffffffffffe8
>> [67494.099699][T211750] FS:  00007fdcba4c4740(0000) =
>> GS:ffff889033780000(0000) knlGS:0000000000000000
>> [67494.141858][T211750] CS:  0010 DS: 0000 ES: 0000 CR0: =
>> 0000000080050033
>> [67494.172660][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: =
>> 00000000001606e0
>> [67494.209760][T211750] Call Trace:
>> [67494.224720][T211750]  __wake_up_common_lock+0xea/0x150
>> (inlined by) __wake_up_common_lock at kernel/sched/wait.c:124
>> [67494.248753][T211750]  ? __wake_up_common+0x290/0x290
>> [67494.272014][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
>> [67494.296139][T211750]  __wake_up+0x13/0x20
>> [67494.314946][T211750]  io_cqring_ev_posted+0x75/0xe0
>> (inlined by) io_cqring_ev_posted at fs/io_uring.c:1160
>> [67494.337726][T211750]  io_ring_ctx_wait_and_kill+0x1c0/0x2f0
>> io_ring_ctx_wait_and_kill at fs/io_uring.c:7305
>> [67494.363840][T211750]  io_uring_create+0xa8d/0x13b0
>> [67494.386526][T211750]  ? io_req_defer_prep+0x990/0x990
>> [67494.410119][T211750]  ? __kasan_check_write+0x14/0x20
>> [67494.433646][T211750]  io_uring_setup+0xb8/0x130
>> [67494.454870][T211750]  ? io_uring_create+0x13b0/0x13b0
>> [67494.478342][T211750]  ? check_flags.part.28+0x220/0x220
>> [67494.502947][T211750]  ? lockdep_hardirqs_on+0x16/0x2c0
>> [67494.526965][T211750]  __x64_sys_io_uring_setup+0x31/0x40
>> [67494.551820][T211750]  do_syscall_64+0xcc/0xaf0
>> [67494.574829][T211750]  ? syscall_return_slowpath+0x580/0x580
>> [67494.604591][T211750]  ? lockdep_hardirqs_off+0x1f/0x140
>> [67494.628901][T211750]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
>> [67494.657616][T211750]  ? trace_hardirqs_off_caller+0x3a/0x150
>> [67494.683999][T211750]  ? trace_hardirqs_off_thunk+0x1a/0x1c
>> [67494.709982][T211750]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> 
> See if it makes your fuzzers happy.

It will, and we should probably do that separately too. I already posted another fix that avoids the posted call that triggers it.

— 
Jens Axboe



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-09 16:42     ` Linus Torvalds
@ 2020-04-14 11:04       ` Michal Hocko
  2020-04-14 13:49         ` Peter Xu
  2020-04-20 12:47         ` Michal Hocko
  0 siblings, 2 replies; 56+ messages in thread
From: Michal Hocko @ 2020-04-14 11:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, Linux Kernel Mailing List, Linux-MM, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Thu 09-04-20 09:42:20, Linus Torvalds wrote:
> On Thu, Apr 9, 2020 at 12:03 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > This patch however doesn't go all the way to revert it because 0 return
> > value is impossible.
> 
> I'm not convinced it's impossible.

__get_user_pages is documented as
 * -- If nr_pages is 0, returns 0.
 * -- If nr_pages is >0, but no pages were pinned, returns -errno.
 * -- If nr_pages is >0, and some pages were pinned, returns the number of
 *    pages pinned. Again, this may be less than nr_pages.

but let me double check the actual code... There seem to be only one
exception the above rule AFAICS. faultin_page returning EBUSY will
be overriden to either 0 for the first page or return the number of
already pinned pages. So nr_pages > 0 && ret = 0 is indeed possible
from __get_user_pages :/ That will be the case only for VM_FAULT_RETRY,
thoug.

Now __get_user_pages_locked behaves differently. It keeps retrying the
fault until it succeeds unless FOLL_NOWAIT is specified. Then it would
return 0. Why we need to return 0 is not really clear to me but it
seem to be a long term behavior. I believe we need to document it.

> And if it is, then the current code is harmless.

Yes from the above it seems that the check is indeed harmless becasue
this path doesn't use FOLL_NOWAIT and so it will never see 0 return.
I find a reference to EINTR confusing so I would still love to change
that.

> Now, I do agree that we probably should go through and clarify the
> whole range of different get_user_pages() cases of returning zero (or
> not doing so), but right now it's so confusing that I'd prefer to keep
> that (possibly unnecessary) belt-and-suspenders check for zero in
> there.
>
> If/when somebody actually does a real audit and the result is "these
> functions cannot return zero" and it's documented, then we can remove
> those checks.

Would you mind this patch instead?

commit bc6c0fa7c7fb5eb54963dca65ae4a62ba04c9efa
Author: Michal Hocko <mhocko@suse.com>
Date:   Thu Apr 9 08:26:57 2020 +0200

    mm, mempolicy: fix up gup usage in lookup_node
    
    ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
    added a special casing for 0 return value because that was a possible
    gup return value when interrupted by fatal signal. This has been fixed
    by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
    for fatal signal") in the mean time so ba841078cd05 can be reverted.
    
    This patch however doesn't go all the way to revert it because the check
    for 0 is wrong and confusing here. Firstly it is inherently unsafe to
    access the page when get_user_pages_locked returns 0 (aka no page
    returned).
    Fortunatelly this will not happen because get_user_pages_locked will not
    return 0 when nr_pages > 0 unless FOLL_NOWAIT is specified which is not
    the case here. Document this potential error code in gup code while we
    are at it.
    
    Signed-off-by: Michal Hocko <mhocko@suse.com>

diff --git a/mm/gup.c b/mm/gup.c
index 50681f0286de..a8575b880baf 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -980,6 +980,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
  * -- If nr_pages is >0, but no pages were pinned, returns -errno.
  * -- If nr_pages is >0, and some pages were pinned, returns the number of
  *    pages pinned. Again, this may be less than nr_pages.
+ * -- 0 return value is possible when the fault would need to be retried.
  *
  * The caller is responsible for releasing returned @pages, via put_page().
  *
@@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL_GPL(fixup_user_fault);
 
+/*
+ * Please note that this function, unlike __get_user_pages will not
+ * return 0 for nr_pages > 0 without FOLL_NOWAIT
+ */
 static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 						struct mm_struct *mm,
 						unsigned long start,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 48ba9729062e..1965e2681877 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
 
 	int locked = 1;
 	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
-	if (err == 0) {
-		/* E.g. GUP interrupted by fatal signal */
-		err = -EFAULT;
-	} else if (err > 0) {
+	if (err > 0) {
 		err = page_to_nid(p);
 		put_page(p);
 	}
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-13 22:06         ` Qian Cai
  2020-04-13 23:05           ` Jens Axboe
@ 2020-04-14 11:12           ` Dmitry Vyukov
  2020-04-14 11:59             ` Qian Cai
                               ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Dmitry Vyukov @ 2020-04-14 11:12 UTC (permalink / raw)
  To: Qian Cai
  Cc: Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML,
	Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner,
	syzkaller, Dan Rue

On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote:
> > On Apr 9, 2020, at 7:29 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > Hi Linus,
> >
> > On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>
> >> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>
> >>> linux-next is boot-broken for more than a month and bugs are piling
> >>> onto bugs, I've seen at least 3 different ones.
> >>> syzbot can't get any working linux-next build for testing for a very
> >>> long time now.
> >>
> >> Ouch.
> >>
> >> Ok, that's not good. It means that linux-next has basically only done
> >> build-testing this whole cycle.
> >
> > Well, there are other CI's beyond syzbot .. Does syzbot only build/test
> > a single kernel arch/config?
> >
> >> Stephen, Dmitry - is there some way linux-next could possibly kick out
> >> trees more aggressively if syzbot can't even boot?

Hello all,

Sorry for corona/holiday-delays. I will try to answer/comment on all
things in this thread in this email.

AI: we need to CC linux-next@ on linux-next build/boot failures. I
will work on this.
We have functionality to CC given emails on _all_ bugs on the given
tree, but we don't have this for build/boot bugs only. I will try to
add this soon.
Stephen, do you want to be CCed as well? Or just linux-next@?

> So old bugs generally should be aged out

This actually happens now.
Bugs without reproducers are auto-closed after 60-120 days since last
occurrence (based on past frequency). And for linux-next the range is
40-60 days.
Bugs with reproducers are not auto-closed. But they are fix bisected
and cause bisected, both of which are only ~66% correct, but still
frequently provide a useful signal. Also bugs with reproducers are
just generally easier to handle.

Another important distinction from Bugzilla is that syzbot dashboard
has up-to-date "Last crash time" information. Click on the "Last"
column here:
https://syzkaller.appspot.com/upstream
It's very easy to ignore everything that happened months ago for
starters, if that's the concern.

So it's not as perfect as it would be with a dedicated human team
attached, but I would say it's now in a reasonable shape with ~400
open bugs that happened within the last month.
And now we have data to confirm that "old" does not mean "irrelevant".
Our leader:
BUG: please report to dccp@vger.kernel.org => prev = 0, last = 0 at
net/dccp/ccids/lib/packet_history.c:LINE/tfrc_rx_hist_sample_rtt()
https://syzkaller.appspot.com/bug?id=0881c535c265ca965edc49c0ac3d0a9850d26eb1
was first triggered 964 days ago, but pretty much still there all that time.

> It would be nice if there was some way we could triage Syzkaller
> bugs into different buckets.

Though, yes, I am afraid of stepping onto the slippery slope of
implementing a full-fledged bug tracking system, I think syzbot will
gather more bug tracker features and tags will happen. We still have
https://github.com/google/syzkaller/issues/608 open and it's mainly
the question of allocating resources for implementation and figuring
out the actual tags hierarchy.
For login and credentials, I guess we will go with just "whoever can
send emails is a root" because we are doing this already anyways
(closing a bug is more critical than changing a tag) :)

Re panic_on_warn.
We don't have a dedicated engineer to sheriff and give manual
consideration and judgement to each case. And as Qian noted, in such
circumstances it's reasonable to don't trust anything after a warning.
Some notorious examples: LOCKDEP warnings disable LOCKDEP; so if we
boot in such state with eyes closed and then try to do fuzzing, or
"better" test a patch for a LOCKDEP error, or do bisection of a
LOCKDEP error, we will immediately give bogus testing results or
bisection culprit.
Or a warning about hung task may re-appear later during testing and
confuse results again.
Or if we ignore KASAN warning, we boot potentially with corrupted
memory with who-knows-what consequences.
A "normal" WARNING may be benign (misuse of WARNING), or maybe not.
Impossible to figure out automatically. And in the end, if we ignore
that, who/when will notice and fix that?
We get this far with this black-and-white criteria for kernel bugs. I
think it had some positive effects on a number of areas, as we go
forward I think it's better to extend panic_on_warn to more testing
systems. Then non-fatal bugs will be no different from fatal bugs
during boot, which we need to handle in a reasonable timeframe anyway.
Which gets me to the next "interesting" point.

> Well, there are other CI's beyond syzbot.
> On the other hand, this makes me worry who is testing on linux-next every day.

How do these use-after-free's and locking bugs get past the
unit-testing systems (which syzbot is not) and remain unnoticed for so
long?...
syzbot uses the dumbest VMs (GCE), so everything it triggers during
boot should be triggerable pretty much everywhere.
It seems to be an action point for the testing systems. "Boot to ssh"
is not the best criteria. Again if there is a LOCKDEP error, we are
not catching any more LOCKDEP errors during subsequent testing. If
there is a use-after-free, that's a serious error on its own and KASAN
produces only 1 error by default as well. And as far as I understand,
lots of kernel testing systems don't even enable KASAN, which is very
wrong.
I've talked to +Dan Rue re this few days ago. Hopefully LKFT will
start catching these as part of unit testing. Which should help with
syzbot testing as well.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-14 11:12           ` Dmitry Vyukov
@ 2020-04-14 11:59             ` Qian Cai
  2020-04-14 12:05               ` Dmitry Vyukov
  2020-04-14 19:28             ` Dan Rue
  2020-04-16  0:34             ` Stephen Rothwell
  2 siblings, 1 reply; 56+ messages in thread
From: Qian Cai @ 2020-04-14 11:59 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML,
	Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner,
	syzkaller, Dan Rue



> On Apr 14, 2020, at 7:13 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> 
> How do these use-after-free's and locking bugs get past the
> unit-testing systems (which syzbot is not) and remain unnoticed for so
> long?...
> syzbot uses the dumbest VMs (GCE), so everything it triggers during
> boot should be triggerable pretty much everywhere.

There are many reasons that any early testing would not be able to catch ALL the syzbot blockers.

The Kconfigs are different. For example, I don’t have openvswitch enabled, so would miss that ovs rcu-list lockdep warning. Same for that use-after-free in net/bluetooth and a warning in sound subsystem.

But, notifying Linux-next ML is a good start, so at least we could ask Paul or Steve to pull out the commit which enabling rcu-list debugging by default with PROVE_RCU.

I learned through that restricted kconfig to some degree of minimal could save a lot of troubles late on especially those options that I have no way to exercise like net/bluetooth and sound currently. It is going to be extra works though because those default options in Linux-next or even defconfigs are not always pleasant and would want to enable something I don’t need if not given human intervention.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-14 11:59             ` Qian Cai
@ 2020-04-14 12:05               ` Dmitry Vyukov
  0 siblings, 0 replies; 56+ messages in thread
From: Dmitry Vyukov @ 2020-04-14 12:05 UTC (permalink / raw)
  To: Qian Cai
  Cc: Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML,
	Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner,
	syzkaller, Dan Rue

On Tue, Apr 14, 2020 at 1:59 PM Qian Cai <cai@lca.pw> wrote:
> > On Apr 14, 2020, at 7:13 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > How do these use-after-free's and locking bugs get past the
> > unit-testing systems (which syzbot is not) and remain unnoticed for so
> > long?...
> > syzbot uses the dumbest VMs (GCE), so everything it triggers during
> > boot should be triggerable pretty much everywhere.
>
> There are many reasons that any early testing would not be able to catch ALL the syzbot blockers.
>
> The Kconfigs are different. For example, I don’t have openvswitch enabled, so would miss that ovs rcu-list lockdep warning. Same for that use-after-free in net/bluetooth and a warning in sound subsystem.
>
> But, notifying Linux-next ML is a good start, so at least we could ask Paul or Steve to pull out the commit which enabling rcu-list debugging by default with PROVE_RCU.
>
> I learned through that restricted kconfig to some degree of minimal could save a lot of troubles late on especially those options that I have no way to exercise like net/bluetooth and sound currently. It is going to be extra works though because those default options in Linux-next or even defconfigs are not always pleasant and would want to enable something I don’t need if not given human intervention.

We only try to enable what we can reach. There is significant reach
for sound and net/bluetooth even without any hardware. So I would
assume generic testing systems like KernelCI, LKFT, CKI should enable
these as well. Hopefully we don't have all of the sound and
net/bluetooth completely untested in linux-next.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-14 11:04       ` Michal Hocko
@ 2020-04-14 13:49         ` Peter Xu
  2020-04-14 14:18           ` Michal Hocko
  2020-04-20 12:47         ` Michal Hocko
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Xu @ 2020-04-14 13:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Linux Kernel Mailing List, Linux-MM,
	Andrew Morton, syzbot+693dc11fcb53120b5559

On Tue, Apr 14, 2020 at 01:04:29PM +0200, Michal Hocko wrote:

[...]

> @@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL_GPL(fixup_user_fault);
>  
> +/*
> + * Please note that this function, unlike __get_user_pages will not
> + * return 0 for nr_pages > 0 without FOLL_NOWAIT
> + */
>  static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
>  						struct mm_struct *mm,
>  						unsigned long start,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 48ba9729062e..1965e2681877 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  
>  	int locked = 1;
>  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err == 0) {
> -		/* E.g. GUP interrupted by fatal signal */
> -		err = -EFAULT;
> -	} else if (err > 0) {
> +	if (err > 0) {
>  		err = page_to_nid(p);
>  		put_page(p);
>  	}

Hi, Michal,

IIUC this is not the only place that we check against ret==0 for gup.
For example, the other direct caller of the same function,
get_vaddr_frames(), which will set -EFAULT too if ret==0.  So do we
want to change all the places and don't check against zero explicitly?

I'm now thinking whether this would be good even if we refactored gup
and only allow it to return either >0 as number of page pinned, or <0
for all the rest.  I'm not sure how others will see this, but the
answer is probably the same at least to me as before for this issue.

As a caller, I'll see gup as a black box.  Even if the gup function
guarantees that the retcode won't be zero and documented it, I (as a
caller) will be using that to index page array so I'd still better to
check that value before I do anything (because it's meaningless to
index an array with zero size), and a convertion of "ret==0" -->
"-EFAULT" (or some other failures) in this case still makes sense.
While removing that doesn't help a lot, imho, but instead make it
slightly unsafer.

Maybe that's also why ret==0 hasn't been reworked for years?  Maybe
there is just never a reason strong enough to do that explicitly,
because it's still good to check against ret==0 after all...

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-14 13:49         ` Peter Xu
@ 2020-04-14 14:18           ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2020-04-14 14:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, Linux Kernel Mailing List, Linux-MM,
	Andrew Morton, syzbot+693dc11fcb53120b5559

On Tue 14-04-20 09:49:06, Peter Xu wrote:
> On Tue, Apr 14, 2020 at 01:04:29PM +0200, Michal Hocko wrote:
> 
> [...]
> 
> > @@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
> >  }
> >  EXPORT_SYMBOL_GPL(fixup_user_fault);
> >  
> > +/*
> > + * Please note that this function, unlike __get_user_pages will not
> > + * return 0 for nr_pages > 0 without FOLL_NOWAIT
> > + */
> >  static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
> >  						struct mm_struct *mm,
> >  						unsigned long start,
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 48ba9729062e..1965e2681877 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
> >  
> >  	int locked = 1;
> >  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> > -	if (err == 0) {
> > -		/* E.g. GUP interrupted by fatal signal */
> > -		err = -EFAULT;
> > -	} else if (err > 0) {
> > +	if (err > 0) {
> >  		err = page_to_nid(p);
> >  		put_page(p);
> >  	}
> 
> Hi, Michal,
> 
> IIUC this is not the only place that we check against ret==0 for gup.
> For example, the other direct caller of the same function,
> get_vaddr_frames(), which will set -EFAULT too if ret==0.  So do we
> want to change all the places and don't check against zero explicitly?

This would require to analyze each such a call. For example
get_vaddr_frames has to handle get_user_pages_locked returning 0 because
it allows callers to specify FOLL_NOWAIT. Whether EFAULT is a proper
return value for that case is a question I didn't really get to analyze.

> I'm now thinking whether this would be good even if we refactored gup
> and only allow it to return either >0 as number of page pinned, or <0
> for all the rest.  I'm not sure how others will see this, but the
> answer is probably the same at least to me as before for this issue.

I would consider a semantic without that special case for FOLL_NOWAIT
much more clear but I do not really understand the historical background
for it TBH so I do not dare to touch that.

> As a caller, I'll see gup as a black box.  Even if the gup function
> guarantees that the retcode won't be zero and documented it, I (as a
> caller) will be using that to index page array so I'd still better to
> check that value before I do anything (because it's meaningless to
> index an array with zero size), and a convertion of "ret==0" -->
> "-EFAULT" (or some other failures) in this case still makes sense.
> While removing that doesn't help a lot, imho, but instead make it
> slightly unsafer.

Well, my experience tells me that people really love to copy&paste code
and error handling and if the error handling is bogus it just spreads
all over the place until it really defines a new standard which is close
to impossible to get rid of. So if the error handling can be done
properly then I would really prefer it. In the above case it is clearly
misleading, because fatal signal should be never reflected by err==0.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-14 11:12           ` Dmitry Vyukov
  2020-04-14 11:59             ` Qian Cai
@ 2020-04-14 19:28             ` Dan Rue
  2020-04-15 11:09               ` Dmitry Vyukov
  2020-04-16  0:34             ` Stephen Rothwell
  2 siblings, 1 reply; 56+ messages in thread
From: Dan Rue @ 2020-04-14 19:28 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton,
	Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter,
	Johannes Weiner, syzkaller

On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote:
> > Well, there are other CI's beyond syzbot.
> > On the other hand, this makes me worry who is testing on linux-next every day.
> 
> How do these use-after-free's and locking bugs get past the
> unit-testing systems (which syzbot is not) and remain unnoticed for so
> long?...
> syzbot uses the dumbest VMs (GCE), so everything it triggers during
> boot should be triggerable pretty much everywhere.
> It seems to be an action point for the testing systems. "Boot to ssh"
> is not the best criteria. Again if there is a LOCKDEP error, we are
> not catching any more LOCKDEP errors during subsequent testing. If
> there is a use-after-free, that's a serious error on its own and KASAN
> produces only 1 error by default as well. And as far as I understand,
> lots of kernel testing systems don't even enable KASAN, which is very
> wrong.
> I've talked to +Dan Rue re this few days ago. Hopefully LKFT will
> start catching these as part of unit testing. Which should help with
> syzbot testing as well.

LKFT has recently added testing with KASAN enabled and improved the
kernel log parsing to catch more of this class of errors while
performing our regular functional testing.

Incidentally, -next was also broken for us from March 25 through April 5
due to a perf build failure[0], which eventually made itself all the way
down into v5.6 release and I believe the first two 5.6.x stable
releases.

For -next, LKFT's gap is primarily reporting. We do build and run over
30k tests on every -next daily release, but we send out issues manually
when we see them because triaging is still a manual effort. We're
working to build better automated reporting. If anyone is interested in
watching LKFT's -next results more closely (warning, it's a bit noisy),
please let me know. Watching the results at https://lkft.linaro.org
provides some overall health indications, but again, it gets pretty
difficult to figure out signal from noise once you start drilling down
without sufficient context of the system.

Dan

[0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@mail.gmail.com/

-- 
Linaro LKFT
https://lkft.linaro.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-14 19:28             ` Dan Rue
@ 2020-04-15 11:09               ` Dmitry Vyukov
  2020-04-15 16:23                 ` Dan Rue
  0 siblings, 1 reply; 56+ messages in thread
From: Dmitry Vyukov @ 2020-04-15 11:09 UTC (permalink / raw)
  To: Dan Rue
  Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton,
	Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter,
	Johannes Weiner, syzkaller

On Tue, Apr 14, 2020 at 9:28 PM Dan Rue <dan.rue@linaro.org> wrote:
>
> On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote:
> > > Well, there are other CI's beyond syzbot.
> > > On the other hand, this makes me worry who is testing on linux-next every day.
> >
> > How do these use-after-free's and locking bugs get past the
> > unit-testing systems (which syzbot is not) and remain unnoticed for so
> > long?...
> > syzbot uses the dumbest VMs (GCE), so everything it triggers during
> > boot should be triggerable pretty much everywhere.
> > It seems to be an action point for the testing systems. "Boot to ssh"
> > is not the best criteria. Again if there is a LOCKDEP error, we are
> > not catching any more LOCKDEP errors during subsequent testing. If
> > there is a use-after-free, that's a serious error on its own and KASAN
> > produces only 1 error by default as well. And as far as I understand,
> > lots of kernel testing systems don't even enable KASAN, which is very
> > wrong.
> > I've talked to +Dan Rue re this few days ago. Hopefully LKFT will
> > start catching these as part of unit testing. Which should help with
> > syzbot testing as well.
>
> LKFT has recently added testing with KASAN enabled and improved the
> kernel log parsing to catch more of this class of errors while
> performing our regular functional testing.
>
> Incidentally, -next was also broken for us from March 25 through April 5
> due to a perf build failure[0], which eventually made itself all the way
> down into v5.6 release and I believe the first two 5.6.x stable
> releases.
>
> For -next, LKFT's gap is primarily reporting. We do build and run over
> 30k tests on every -next daily release, but we send out issues manually
> when we see them because triaging is still a manual effort. We're
> working to build better automated reporting. If anyone is interested in
> watching LKFT's -next results more closely (warning, it's a bit noisy),
> please let me know. Watching the results at https://lkft.linaro.org
> provides some overall health indications, but again, it gets pretty
> difficult to figure out signal from noise once you start drilling down
> without sufficient context of the system.

What kind of failures and noise do you get? Is it flaky tests?
I would assume build failures are ~0% flaky/noisy. And boot failures
are maybe ~1% flaky/noisy due to some infra issues.

I can't find any actual test failure logs in the UI. I've got to this page:
https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/suite/kselftest/tests/
which seem to contain failed tests on mainline. But I still can't find
the actual test failure logs.


> Dan
>
> [0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@mail.gmail.com/
>
> --
> Linaro LKFT
> https://lkft.linaro.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-15 11:09               ` Dmitry Vyukov
@ 2020-04-15 16:23                 ` Dan Rue
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Rue @ 2020-04-15 16:23 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton,
	Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter,
	Johannes Weiner, syzkaller

On Wed, Apr 15, 2020 at 01:09:32PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 14, 2020 at 9:28 PM Dan Rue <dan.rue@linaro.org> wrote:
> >
> > On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote:
> > > On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote:
> > > > Well, there are other CI's beyond syzbot.
> > > > On the other hand, this makes me worry who is testing on linux-next every day.
> > >
> > > How do these use-after-free's and locking bugs get past the
> > > unit-testing systems (which syzbot is not) and remain unnoticed for so
> > > long?...
> > > syzbot uses the dumbest VMs (GCE), so everything it triggers during
> > > boot should be triggerable pretty much everywhere.
> > > It seems to be an action point for the testing systems. "Boot to ssh"
> > > is not the best criteria. Again if there is a LOCKDEP error, we are
> > > not catching any more LOCKDEP errors during subsequent testing. If
> > > there is a use-after-free, that's a serious error on its own and KASAN
> > > produces only 1 error by default as well. And as far as I understand,
> > > lots of kernel testing systems don't even enable KASAN, which is very
> > > wrong.
> > > I've talked to +Dan Rue re this few days ago. Hopefully LKFT will
> > > start catching these as part of unit testing. Which should help with
> > > syzbot testing as well.
> >
> > LKFT has recently added testing with KASAN enabled and improved the
> > kernel log parsing to catch more of this class of errors while
> > performing our regular functional testing.
> >
> > Incidentally, -next was also broken for us from March 25 through April 5
> > due to a perf build failure[0], which eventually made itself all the way
> > down into v5.6 release and I believe the first two 5.6.x stable
> > releases.
> >
> > For -next, LKFT's gap is primarily reporting. We do build and run over
> > 30k tests on every -next daily release, but we send out issues manually
> > when we see them because triaging is still a manual effort. We're
> > working to build better automated reporting. If anyone is interested in
> > watching LKFT's -next results more closely (warning, it's a bit noisy),
> > please let me know. Watching the results at https://lkft.linaro.org
> > provides some overall health indications, but again, it gets pretty
> > difficult to figure out signal from noise once you start drilling down
> > without sufficient context of the system.
> 
> What kind of failures and noise do you get? Is it flaky tests?
> I would assume build failures are ~0% flaky/noisy. And boot failures
> are maybe ~1% flaky/noisy due to some infra issues.

Right - infrastructure problems aside (which are the easy part), tests
are quite flaky/noisy.

I guess we're getting quite off topic now, but in LKFT's case we run
tests that are available from the likes of LTP, kselftest, and a variety
of other test suites. Every test was written by a developer with certain
assumptions in place - many of which we violate when we run them on a
small arm board, for example. And many may just be low quality to begin
with, but they often work well enough for the original author's
use-case.

In such cases, we mark them (manually at this point) as a known issue.
For example, here are our kselftest known issues:
https://github.com/Linaro/qa-reports-known-issues/blob/master/kselftests-production.yaml

These lists are quite a chore to keep up to date, and so they tend to
lag reality. What's needed (and what we're working toward) is more
sophisticated analytics on top of our results to determine actual
regressions.

I'll give just one example, randomly selected but typical. Here's a
timer test that sometimes passes and sometimes fails, which compares how
much time something takes with a hard coded value of what the author
expects. Running on small arm hosts or under qemu, the following check
sometimes fails:
https://github.com/torvalds/linux/blob/master/tools/testing/selftests/timers/rtcpie.c#L104-L111

There are _many_ such tests - hundreds or thousands, which rely on hard
coded expectations and are quite hard to "fix". But we run them all
because most of them haven't failed yet, and if they do we'll find out
why.

We ignore the tests which either always fail, or which sometimes fail,
in general. I'm sure there are some legitimate bugs in that set of
failures, but they're probably not "regressions" so just as syzkaller
lets old bugs close automatically, we ignore tests that have a history
of failing.

> 
> I can't find any actual test failure logs in the UI. I've got to this page:
> https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/suite/kselftest/tests/
> which seem to contain failed tests on mainline. But I still can't find
> the actual test failure logs.

From the link you gave, if you go up one level to
https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/,
you will see links to the "Log File" which takes you to
https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/log.

In some test suite cases (perhaps just LTP), we have logs per test. In
most, we just have one large log of the entire run. Even when we have a
log-per-test, it may miss some asynchronous dmesg output I expect,
causing an investigator to look at the whole log anyway.

Dan

> 
> 
> > Dan
> >
> > [0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@mail.gmail.com/
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org

-- 
Linaro LKFT
https://lkft.linaro.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-14 11:12           ` Dmitry Vyukov
  2020-04-14 11:59             ` Qian Cai
  2020-04-14 19:28             ` Dan Rue
@ 2020-04-16  0:34             ` Stephen Rothwell
  2020-05-11 15:29               ` Dmitry Vyukov
  2 siblings, 1 reply; 56+ messages in thread
From: Stephen Rothwell @ 2020-04-16  0:34 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Qian Cai, Linus Torvalds, Andrew Morton, Peter Xu, LKML,
	Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner,
	syzkaller, Dan Rue

[-- Attachment #1: Type: text/plain, Size: 480 bytes --]

Hi Dmitry,

On Tue, 14 Apr 2020 13:12:50 +0200 Dmitry Vyukov <dvyukov@google.com> wrote:
>
> AI: we need to CC linux-next@ on linux-next build/boot failures. I
> will work on this.
> We have functionality to CC given emails on _all_ bugs on the given
> tree, but we don't have this for build/boot bugs only. I will try to
> add this soon.
> Stephen, do you want to be CCed as well? Or just linux-next@?

Please cc me as well, thanks.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-14 11:04       ` Michal Hocko
  2020-04-14 13:49         ` Peter Xu
@ 2020-04-20 12:47         ` Michal Hocko
  2020-04-20 17:31           ` Linus Torvalds
  1 sibling, 1 reply; 56+ messages in thread
From: Michal Hocko @ 2020-04-20 12:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, Linux Kernel Mailing List, Linux-MM, Andrew Morton,
	syzbot+693dc11fcb53120b5559

Any opinion on this Linus? Should I just repost the patch?

On Tue 14-04-20 13:04:32, Michal Hocko wrote:
> On Thu 09-04-20 09:42:20, Linus Torvalds wrote:
> > On Thu, Apr 9, 2020 at 12:03 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > This patch however doesn't go all the way to revert it because 0 return
> > > value is impossible.
> > 
> > I'm not convinced it's impossible.
> 
> __get_user_pages is documented as
>  * -- If nr_pages is 0, returns 0.
>  * -- If nr_pages is >0, but no pages were pinned, returns -errno.
>  * -- If nr_pages is >0, and some pages were pinned, returns the number of
>  *    pages pinned. Again, this may be less than nr_pages.
> 
> but let me double check the actual code... There seem to be only one
> exception the above rule AFAICS. faultin_page returning EBUSY will
> be overriden to either 0 for the first page or return the number of
> already pinned pages. So nr_pages > 0 && ret = 0 is indeed possible
> from __get_user_pages :/ That will be the case only for VM_FAULT_RETRY,
> thoug.
> 
> Now __get_user_pages_locked behaves differently. It keeps retrying the
> fault until it succeeds unless FOLL_NOWAIT is specified. Then it would
> return 0. Why we need to return 0 is not really clear to me but it
> seem to be a long term behavior. I believe we need to document it.
> 
> > And if it is, then the current code is harmless.
> 
> Yes from the above it seems that the check is indeed harmless becasue
> this path doesn't use FOLL_NOWAIT and so it will never see 0 return.
> I find a reference to EINTR confusing so I would still love to change
> that.
> 
> > Now, I do agree that we probably should go through and clarify the
> > whole range of different get_user_pages() cases of returning zero (or
> > not doing so), but right now it's so confusing that I'd prefer to keep
> > that (possibly unnecessary) belt-and-suspenders check for zero in
> > there.
> >
> > If/when somebody actually does a real audit and the result is "these
> > functions cannot return zero" and it's documented, then we can remove
> > those checks.
> 
> Would you mind this patch instead?
> 
> commit bc6c0fa7c7fb5eb54963dca65ae4a62ba04c9efa
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Thu Apr 9 08:26:57 2020 +0200
> 
>     mm, mempolicy: fix up gup usage in lookup_node
>     
>     ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
>     added a special casing for 0 return value because that was a possible
>     gup return value when interrupted by fatal signal. This has been fixed
>     by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
>     for fatal signal") in the mean time so ba841078cd05 can be reverted.
>     
>     This patch however doesn't go all the way to revert it because the check
>     for 0 is wrong and confusing here. Firstly it is inherently unsafe to
>     access the page when get_user_pages_locked returns 0 (aka no page
>     returned).
>     Fortunatelly this will not happen because get_user_pages_locked will not
>     return 0 when nr_pages > 0 unless FOLL_NOWAIT is specified which is not
>     the case here. Document this potential error code in gup code while we
>     are at it.
>     
>     Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 50681f0286de..a8575b880baf 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -980,6 +980,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>   * -- If nr_pages is >0, but no pages were pinned, returns -errno.
>   * -- If nr_pages is >0, and some pages were pinned, returns the number of
>   *    pages pinned. Again, this may be less than nr_pages.
> + * -- 0 return value is possible when the fault would need to be retried.
>   *
>   * The caller is responsible for releasing returned @pages, via put_page().
>   *
> @@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL_GPL(fixup_user_fault);
>  
> +/*
> + * Please note that this function, unlike __get_user_pages will not
> + * return 0 for nr_pages > 0 without FOLL_NOWAIT
> + */
>  static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
>  						struct mm_struct *mm,
>  						unsigned long start,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 48ba9729062e..1965e2681877 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  
>  	int locked = 1;
>  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err == 0) {
> -		/* E.g. GUP interrupted by fatal signal */
> -		err = -EFAULT;
> -	} else if (err > 0) {
> +	if (err > 0) {
>  		err = page_to_nid(p);
>  		put_page(p);
>  	}
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-20 12:47         ` Michal Hocko
@ 2020-04-20 17:31           ` Linus Torvalds
  2020-04-21  7:09             ` Michal Hocko
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2020-04-20 17:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Xu, Linux Kernel Mailing List, Linux-MM, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Mon, Apr 20, 2020 at 5:48 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> Any opinion on this Linus? Should I just repost the patch?

I'm ok with the patch, but it's not exactly urgent and I was assuming
it would go the usual path through the -mm tree.

                 Linus


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal
  2020-04-20 17:31           ` Linus Torvalds
@ 2020-04-21  7:09             ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2020-04-21  7:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, Linux Kernel Mailing List, Linux-MM, Andrew Morton,
	syzbot+693dc11fcb53120b5559

On Mon 20-04-20 10:31:17, Linus Torvalds wrote:
> On Mon, Apr 20, 2020 at 5:48 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > Any opinion on this Linus? Should I just repost the patch?
> 
> I'm ok with the patch, but it's not exactly urgent and I was assuming
> it would go the usual path through the -mm tree.

OK, I will repost then. Thanks!
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
  2020-04-16  0:34             ` Stephen Rothwell
@ 2020-05-11 15:29               ` Dmitry Vyukov
  0 siblings, 0 replies; 56+ messages in thread
From: Dmitry Vyukov @ 2020-05-11 15:29 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Qian Cai, Linus Torvalds, Andrew Morton, Peter Xu, LKML,
	Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner,
	syzkaller, Dan Rue

On Thu, Apr 16, 2020 at 2:34 AM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi Dmitry,
>
> On Tue, 14 Apr 2020 13:12:50 +0200 Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > AI: we need to CC linux-next@ on linux-next build/boot failures. I
> > will work on this.
> > We have functionality to CC given emails on _all_ bugs on the given
> > tree, but we don't have this for build/boot bugs only. I will try to
> > add this soon.
> > Stephen, do you want to be CCed as well? Or just linux-next@?
>
> Please cc me as well, thanks.

To close the loop: I implemented and deployed 2 improvements:

1. Notion of per-repo "build maintainers":
https://github.com/google/syzkaller/commit/88cb3e92ba25303ab67aaceb083fe7304fccd32f
Now linux-next@ and Stephen should be CCed on all linux-next breakages
automatically.

2. Finding maintainers for build errors:
https://github.com/google/syzkaller/commit/65a44e22ba217ef7272b9d3735e9d12cfaa204f6
syzbot will attempt to extract the broken file name from make output
and then run get_matainers.pl on it to find relevant emails.


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2020-05-11 15:29 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08  1:40 [PATCH 0/2] mm: Two small fixes for recent syzbot reports Peter Xu
2020-04-08  1:40 ` [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal Peter Xu
2020-04-08 10:21   ` Michal Hocko
2020-04-08 14:20     ` Peter Xu
2020-04-08 14:30       ` Michal Hocko
2020-04-08 15:24         ` Peter Xu
2020-04-08 15:26           ` Michal Hocko
2020-04-09  7:02   ` Michal Hocko
2020-04-09 12:52     ` Peter Xu
2020-04-09 13:00       ` Peter Xu
2020-04-09 13:53       ` Michal Hocko
2020-04-09 16:42     ` Linus Torvalds
2020-04-14 11:04       ` Michal Hocko
2020-04-14 13:49         ` Peter Xu
2020-04-14 14:18           ` Michal Hocko
2020-04-20 12:47         ` Michal Hocko
2020-04-20 17:31           ` Linus Torvalds
2020-04-21  7:09             ` Michal Hocko
2020-04-08  1:40 ` [PATCH 2/2] mm/gup: Mark lock taken only after a successful retake Peter Xu
2020-04-09  0:47 ` [PATCH 0/2] mm: Two small fixes for recent syzbot reports Andrew Morton
2020-04-09 11:49   ` Matthew Wilcox
2020-04-09 13:00     ` Dmitry Vyukov
2020-04-09 18:16       ` Andrew Morton
2020-04-09 18:53         ` Linus Torvalds
2020-04-09 19:12           ` Andrew Morton
2020-04-09 19:46             ` Linus Torvalds
2020-04-09 19:56               ` Matthew Wilcox
2020-04-09 19:58                 ` Linus Torvalds
2020-04-09 20:27                   ` Eric Biggers
2020-04-09 20:34                     ` Linus Torvalds
2020-04-09 23:34                       ` Stephen Rothwell
2020-04-10  1:11                       ` Theodore Y. Ts'o
2020-04-09 12:55   ` Dmitry Vyukov
2020-04-09 16:32     ` Linus Torvalds
2020-04-09 16:58       ` Qian Cai
2020-04-09 17:05         ` Linus Torvalds
2020-04-09 17:58           ` Qian Cai
2020-04-09 18:06             ` Linus Torvalds
2020-04-09 21:14               ` Qian Cai
2020-04-10 13:12                 ` Tetsuo Handa
2020-04-10 14:26                   ` Qian Cai
2020-04-10 17:26                     ` Andrew Morton
2020-04-10 19:46                       ` Qian Cai
2020-04-09 23:29       ` Stephen Rothwell
2020-04-13 22:06         ` Qian Cai
2020-04-13 23:05           ` Jens Axboe
2020-04-14 11:12           ` Dmitry Vyukov
2020-04-14 11:59             ` Qian Cai
2020-04-14 12:05               ` Dmitry Vyukov
2020-04-14 19:28             ` Dan Rue
2020-04-15 11:09               ` Dmitry Vyukov
2020-04-15 16:23                 ` Dan Rue
2020-04-16  0:34             ` Stephen Rothwell
2020-05-11 15:29               ` Dmitry Vyukov
2020-04-14  4:07         ` Hillf Danton
2020-04-14  4:31           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).