All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Finn Thain <fthain@linux-m68k.org>
Cc: linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-ia64@vger.kernel.org, linux-hexagon@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, Michal Simek <monstr@monstr.eu>,
	Dinh Nguyen <dinguyen@kernel.org>,
	openrisc@lists.librecores.org, linux-parisc@vger.kernel.org,
	linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 04/10] m68k: fix livelock in uaccess
Date: Sun, 5 Feb 2023 20:39:44 +0000	[thread overview]
Message-ID: <Y+AUEJpWYdUzW0OD@ZenIV> (raw)
In-Reply-To: <92a4aa45-0a7c-a389-798a-2f3e3cfa516f@linux-m68k.org>

On Sun, Feb 05, 2023 at 05:18:08PM +1100, Finn Thain wrote:

> That could be a bug I was chasing back in 2021 but never found. The mmap 
> stressors in stress-ng were triggering a crash on a Mac Quadras, though 
> only rarely. Sometimes it would run all day without a failure.
> 
> Last year when I started using GCC 12 to build the kernel, I saw the same 
> workload fail again but the failure mode had become a silent hang/livelock 
> instead of the oopses I got with GCC 6.
> 
> When I press the NMI button after the livelock I always see 
> do_page_fault() in the backtrace. So I've been testing your patch. I've 
> been running the same stress-ng reproducer for about 12 hours now with no 
> failures which looks promising.
> 
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> BTW, how did you identify that bug in do_page_fault()? If its the same bug 
> I was chasing, it could be an old one. The stress-ng logs I collected last 
> year include a crash from a v4.14 build.

Went to reread the current state of mm/gup.c, decided to reread handle_mm_fault()
and its callers, noticed fault_signal_pending() which hadn't been there back
when I last crawled through that area, realized what it had replaced, went
to check if everything had been converted (arch/um got missed, BTW).  Noticed
the difference between the architectures (the first hit was on alpha, without
the "sod off to no_context if it's a user fault" logics, the last - xtensa, with
it).  Checked the log for xtensa, found the commit from 2021 adding that part;
looked on arm and arm64, found commits from 2017 doing the same thing, then,
on x86, Linus' commit from 2014 adding the x86 counterpart...  Figuring out
what all of those had been for wasn't particularly hard, and it was easy
to check which architectures still needed the same thing...

BTW, since these patches would be much easier to backport than any unification
work, I think the right thing to do would be to have further unification done on
top of them.

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@zeniv.linux.org.uk>
To: Finn Thain <fthain@linux-m68k.org>
Cc: linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-ia64@vger.kernel.org, linux-hexagon@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, Michal Simek <monstr@monstr.eu>,
	Dinh Nguyen <dinguyen@kernel.org>,
	openrisc@lists.librecores.org, linux-parisc@vger.kernel.org,
	linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 04/10] m68k: fix livelock in uaccess
Date: Sun, 5 Feb 2023 20:39:44 +0000	[thread overview]
Message-ID: <Y+AUEJpWYdUzW0OD@ZenIV> (raw)
In-Reply-To: <92a4aa45-0a7c-a389-798a-2f3e3cfa516f@linux-m68k.org>

On Sun, Feb 05, 2023 at 05:18:08PM +1100, Finn Thain wrote:

> That could be a bug I was chasing back in 2021 but never found. The mmap 
> stressors in stress-ng were triggering a crash on a Mac Quadras, though 
> only rarely. Sometimes it would run all day without a failure.
> 
> Last year when I started using GCC 12 to build the kernel, I saw the same 
> workload fail again but the failure mode had become a silent hang/livelock 
> instead of the oopses I got with GCC 6.
> 
> When I press the NMI button after the livelock I always see 
> do_page_fault() in the backtrace. So I've been testing your patch. I've 
> been running the same stress-ng reproducer for about 12 hours now with no 
> failures which looks promising.
> 
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> BTW, how did you identify that bug in do_page_fault()? If its the same bug 
> I was chasing, it could be an old one. The stress-ng logs I collected last 
> year include a crash from a v4.14 build.

Went to reread the current state of mm/gup.c, decided to reread handle_mm_fault()
and its callers, noticed fault_signal_pending() which hadn't been there back
when I last crawled through that area, realized what it had replaced, went
to check if everything had been converted (arch/um got missed, BTW).  Noticed
the difference between the architectures (the first hit was on alpha, without
the "sod off to no_context if it's a user fault" logics, the last - xtensa, with
it).  Checked the log for xtensa, found the commit from 2021 adding that part;
looked on arm and arm64, found commits from 2017 doing the same thing, then,
on x86, Linus' commit from 2014 adding the x86 counterpart...  Figuring out
what all of those had been for wasn't particularly hard, and it was easy
to check which architectures still needed the same thing...

BTW, since these patches would be much easier to backport than any unification
work, I think the right thing to do would be to have further unification done on
top of them.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@zeniv.linux.org.uk>
To: Finn Thain <fthain@linux-m68k.org>
Cc: linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-ia64@vger.kernel.org, linux-hexagon@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, Michal Simek <monstr@monstr.eu>,
	Dinh Nguyen <dinguyen@kernel.org>,
	openrisc@lists.librecores.org, linux-parisc@vger.kernel.org,
	linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 04/10] m68k: fix livelock in uaccess
Date: Sun, 05 Feb 2023 20:39:44 +0000	[thread overview]
Message-ID: <Y+AUEJpWYdUzW0OD@ZenIV> (raw)
In-Reply-To: <92a4aa45-0a7c-a389-798a-2f3e3cfa516f@linux-m68k.org>

On Sun, Feb 05, 2023 at 05:18:08PM +1100, Finn Thain wrote:

> That could be a bug I was chasing back in 2021 but never found. The mmap 
> stressors in stress-ng were triggering a crash on a Mac Quadras, though 
> only rarely. Sometimes it would run all day without a failure.
> 
> Last year when I started using GCC 12 to build the kernel, I saw the same 
> workload fail again but the failure mode had become a silent hang/livelock 
> instead of the oopses I got with GCC 6.
> 
> When I press the NMI button after the livelock I always see 
> do_page_fault() in the backtrace. So I've been testing your patch. I've 
> been running the same stress-ng reproducer for about 12 hours now with no 
> failures which looks promising.
> 
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> BTW, how did you identify that bug in do_page_fault()? If its the same bug 
> I was chasing, it could be an old one. The stress-ng logs I collected last 
> year include a crash from a v4.14 build.

Went to reread the current state of mm/gup.c, decided to reread handle_mm_fault()
and its callers, noticed fault_signal_pending() which hadn't been there back
when I last crawled through that area, realized what it had replaced, went
to check if everything had been converted (arch/um got missed, BTW).  Noticed
the difference between the architectures (the first hit was on alpha, without
the "sod off to no_context if it's a user fault" logics, the last - xtensa, with
it).  Checked the log for xtensa, found the commit from 2021 adding that part;
looked on arm and arm64, found commits from 2017 doing the same thing, then,
on x86, Linus' commit from 2014 adding the x86 counterpart...  Figuring out
what all of those had been for wasn't particularly hard, and it was easy
to check which architectures still needed the same thing...

BTW, since these patches would be much easier to backport than any unification
work, I think the right thing to do would be to have further unification done on
top of them.

  parent reply	other threads:[~2023-02-05 20:39 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31 20:02 [RFC][PATCHSET] VM_FAULT_RETRY fixes Al Viro
2023-01-31 20:02 ` Al Viro
2023-01-31 20:03 ` [PATCH 01/10] alpha: fix livelock in uaccess Al Viro
2023-01-31 20:03   ` Al Viro
2023-03-07  0:48   ` patchwork-bot+linux-riscv
2023-03-07  0:48     ` patchwork-bot+linux-riscv
2023-01-31 20:03 ` [PATCH 02/10] hexagon: " Al Viro
2023-01-31 20:03   ` Al Viro
2023-02-10  2:59   ` Brian Cain
2023-02-10  2:59     ` Brian Cain
2023-01-31 20:04 ` [PATCH 03/10] ia64: " Al Viro
2023-01-31 20:04   ` Al Viro
2023-01-31 20:04 ` [PATCH 04/10] m68k: " Al Viro
2023-01-31 20:04   ` Al Viro
2023-02-05  6:18   ` Finn Thain
2023-02-05  6:18     ` Finn Thain
2023-02-05  6:18     ` Finn Thain
2023-02-05 18:51     ` Linus Torvalds
2023-02-05 18:51       ` Linus Torvalds
2023-02-05 18:51       ` Linus Torvalds
2023-02-07  3:07       ` Finn Thain
2023-02-07  3:07         ` Finn Thain
2023-02-07  3:07         ` Finn Thain
2023-02-05 20:39     ` Al Viro [this message]
2023-02-05 20:39       ` Al Viro
2023-02-05 20:39       ` Al Viro
2023-02-05 20:41       ` Linus Torvalds
2023-02-05 20:41         ` Linus Torvalds
2023-02-05 20:41         ` Linus Torvalds
2023-02-06 12:08   ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-01-31 20:05 ` [PATCH 05/10] microblaze: " Al Viro
2023-01-31 20:05   ` Al Viro
2023-01-31 20:05 ` [PATCH 06/10] nios2: " Al Viro
2023-01-31 20:05   ` Al Viro
2023-01-31 20:06 ` [PATCH 07/10] openrisc: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-01-31 20:06 ` [PATCH 08/10] parisc: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-02-06 16:58   ` Helge Deller
2023-02-06 16:58     ` Helge Deller
2023-02-06 16:58     ` Helge Deller
2023-02-28 17:34     ` Al Viro
2023-02-28 17:34       ` Al Viro
2023-02-28 18:26       ` Helge Deller
2023-02-28 19:14         ` Al Viro
2023-02-28 19:32           ` Helge Deller
2023-02-28 20:00             ` Helge Deller
2023-02-28 20:22               ` Helge Deller
2023-02-28 22:57                 ` Al Viro
2023-03-01  4:00                   ` Helge Deller
2023-03-02 17:53                     ` Al Viro
2023-02-28 15:22   ` Guenter Roeck
2023-02-28 15:22     ` Guenter Roeck
2023-02-28 15:22     ` Guenter Roeck
2023-02-28 19:18     ` Michael Schmitz
2023-02-28 19:18       ` Michael Schmitz
2023-02-28 19:18       ` Michael Schmitz
2023-01-31 20:06 ` [PATCH 09/10] riscv: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-02-06 20:06   ` Björn Töpel
2023-02-06 20:06     ` Björn Töpel
2023-02-06 20:06     ` Björn Töpel
2023-02-07 16:11   ` Geert Uytterhoeven
2023-02-07 16:11     ` Geert Uytterhoeven
2023-02-07 16:11     ` Geert Uytterhoeven
2023-01-31 20:07 ` [PATCH 10/10] sparc: " Al Viro
2023-01-31 20:07   ` Al Viro
2023-01-31 20:24 ` [RFC][PATCHSET] VM_FAULT_RETRY fixes Linus Torvalds
2023-01-31 20:24   ` Linus Torvalds
2023-01-31 20:24   ` Linus Torvalds
2023-01-31 21:10   ` Al Viro
2023-01-31 21:10     ` Al Viro
2023-01-31 21:19     ` Linus Torvalds
2023-01-31 21:19       ` Linus Torvalds
2023-01-31 21:19       ` Linus Torvalds
2023-01-31 21:49       ` Al Viro
2023-01-31 21:49         ` Al Viro
2023-02-01  0:00         ` Linus Torvalds
2023-02-01  0:00           ` Linus Torvalds
2023-02-01  0:00           ` Linus Torvalds
2023-02-01 19:48           ` Peter Xu
2023-02-01 19:48             ` Peter Xu
2023-02-01 19:48             ` Peter Xu
2023-02-01 22:18             ` Al Viro
2023-02-01 22:18               ` Al Viro
2023-02-01 22:18               ` Al Viro
2023-02-02  0:57               ` Al Viro
2023-02-02  0:57                 ` Al Viro
2023-02-02  0:57                 ` Al Viro
2023-02-02 22:56               ` Peter Xu
2023-02-02 22:56                 ` Peter Xu
2023-02-02 22:56                 ` Peter Xu
2023-02-04  0:26                 ` Al Viro
2023-02-04  0:26                   ` Al Viro
2023-02-04  0:26                   ` Al Viro
2023-02-05  5:10                   ` Al Viro
2023-02-05  5:10                     ` Al Viro
2023-02-05  5:10                     ` Al Viro
2023-02-04  0:47         ` [loongarch oddities] " Al Viro
2023-02-01  8:21       ` Helge Deller
2023-02-01  8:21         ` Helge Deller
2023-02-01  8:21         ` Helge Deller
2023-02-01 19:51         ` Linus Torvalds
2023-02-01 19:51           ` Linus Torvalds
2023-02-01 19:51           ` Linus Torvalds
2023-02-02  6:58       ` Al Viro
2023-02-02  6:58         ` Al Viro
2023-02-02  8:54         ` Michael Cree
2023-02-02  9:56           ` John Paul Adrian Glaubitz
2023-02-02 15:20           ` Al Viro
2023-02-02 20:20             ` Al Viro
2023-02-02 20:34         ` Linus Torvalds
2023-02-01 10:50 ` Mark Rutland
2023-02-01 10:50   ` Mark Rutland
2023-02-01 10:50   ` Mark Rutland
2023-02-06 12:08   ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+AUEJpWYdUzW0OD@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=dinguyen@kernel.org \
    --cc=fthain@linux-m68k.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-hexagon@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=monstr@monstr.eu \
    --cc=openrisc@lists.librecores.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.