All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 20:02 ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:02 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

	Page fault handlers generally react to VM_FAULT_RETRY from
handle_mm_fault() by repeating the whole thing (starting with locking
mmap) with FAULT_FLAG_TRIED added to flags.

	However, there are two cases when that's not the right thing
to do:

1) fault has happened in userland and we have a pending signal.  In that
case we'd better return from fault handler immediately.

2) fault has happened in kernel (e.g. in something like copy_from_user())
and we have a pending *fatal* signal.  Solution is to handle that as if
handle_mm_fault() had failed; we have come from kernel mode, so we'd
better have an exception table entry for the fauling insn.  Find it
and deal with it; from the copy_from_user() (or whatever it was that
triggered our fault) caller's POV it's indistinguishable from running
into an unmapped area, so it will fail.  The process is not going to
survive anyway.

Quietly returning from #PF handler in the second case is asking for
livelock - one common case when handle_mm_fault() returns VM_FAULT_RETRY
is when it needs to wait for page lock and gets hit by a fatal signal.
Running into that in any uaccess primitive will end up repeating the
faulting insn again and again, as long as we hit that case in
handle_mm_fault().  Eventually it might get out (e.g. trylock
manages to get page lock without hitting the "wait for it" codepath),
but it's obviously not a good situation.

On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
"ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
in do_page_fault", etc.

However, it never had been done on a bunch of architectures - the
current mainline still has that bug on alpha, hexagon, itanic, m68k,
microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
sparc64).  Fixes are trivial, but I've no way to test them for most
of those architectures.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 20:02 ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:02 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

	Page fault handlers generally react to VM_FAULT_RETRY from
handle_mm_fault() by repeating the whole thing (starting with locking
mmap) with FAULT_FLAG_TRIED added to flags.

	However, there are two cases when that's not the right thing
to do:

1) fault has happened in userland and we have a pending signal.  In that
case we'd better return from fault handler immediately.

2) fault has happened in kernel (e.g. in something like copy_from_user())
and we have a pending *fatal* signal.  Solution is to handle that as if
handle_mm_fault() had failed; we have come from kernel mode, so we'd
better have an exception table entry for the fauling insn.  Find it
and deal with it; from the copy_from_user() (or whatever it was that
triggered our fault) caller's POV it's indistinguishable from running
into an unmapped area, so it will fail.  The process is not going to
survive anyway.

Quietly returning from #PF handler in the second case is asking for
livelock - one common case when handle_mm_fault() returns VM_FAULT_RETRY
is when it needs to wait for page lock and gets hit by a fatal signal.
Running into that in any uaccess primitive will end up repeating the
faulting insn again and again, as long as we hit that case in
handle_mm_fault().  Eventually it might get out (e.g. trylock
manages to get page lock without hitting the "wait for it" codepath),
but it's obviously not a good situation.

On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
"ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
in do_page_fault", etc.

However, it never had been done on a bunch of architectures - the
current mainline still has that bug on alpha, hexagon, itanic, m68k,
microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
sparc64).  Fixes are trivial, but I've no way to test them for most
of those architectures.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 01/10] alpha: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:03   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

alpha equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/alpha/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index ef427a6bdd1a..7b01ae4f3bc6 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -152,8 +152,11 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
 	   the fault.  */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 01/10] alpha: fix livelock in uaccess
@ 2023-01-31 20:03   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

alpha equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/alpha/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index ef427a6bdd1a..7b01ae4f3bc6 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -152,8 +152,11 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
 	   the fault.  */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 02/10] hexagon: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:03   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

hexagon equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/hexagon/mm/vm_fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index f73c7cbfe326..4b578d02fd01 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -93,8 +93,11 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 02/10] hexagon: fix livelock in uaccess
@ 2023-01-31 20:03   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:03 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

hexagon equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/hexagon/mm/vm_fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index f73c7cbfe326..4b578d02fd01 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -93,8 +93,11 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 03/10] ia64: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:04   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:04 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

ia64 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/ia64/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index ef78c2d66cdd..85c4d9ac8686 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -136,8 +136,11 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 03/10] ia64: fix livelock in uaccess
@ 2023-01-31 20:04   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:04 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

ia64 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/ia64/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index ef78c2d66cdd..85c4d9ac8686 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -136,8 +136,11 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 04/10] m68k: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:04   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:04 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/m68k/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 4d2837eb3e2a..228128e45c67 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 	fault = handle_mm_fault(vma, address, flags, regs);
 	pr_debug("handle_mm_fault returns %x\n", fault);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return 0;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-01-31 20:04   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:04 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/m68k/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 4d2837eb3e2a..228128e45c67 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 	fault = handle_mm_fault(vma, address, flags, regs);
 	pr_debug("handle_mm_fault returns %x\n", fault);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return 0;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 05/10] microblaze: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:05   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:05 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

microblaze equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/microblaze/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index 5c40c3ebe52f..32d9717039ba 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -219,8 +219,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 05/10] microblaze: fix livelock in uaccess
@ 2023-01-31 20:05   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:05 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

microblaze equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/microblaze/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index 5c40c3ebe52f..32d9717039ba 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -219,8 +219,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 06/10] nios2: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:05   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:05 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

nios2 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/nios2/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index edaca0a6c1c1..ca64eccea551 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -136,8 +136,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 06/10] nios2: fix livelock in uaccess
@ 2023-01-31 20:05   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:05 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

nios2 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/nios2/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index edaca0a6c1c1..ca64eccea551 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -136,8 +136,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 07/10] openrisc: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:06   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:06 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

openrisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/openrisc/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index b4762d66e9ef..6734fee3134f 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -162,8 +162,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 07/10] openrisc: fix livelock in uaccess
@ 2023-01-31 20:06   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:06 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

openrisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/openrisc/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index b4762d66e9ef..6734fee3134f 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -162,8 +162,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 08/10] parisc: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:06   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:06 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/parisc/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index 869204e97ec9..bb30ff6a3e19 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-01-31 20:06   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:06 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/parisc/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index 869204e97ec9..bb30ff6a3e19 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 09/10] riscv: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:06   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:06 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/riscv/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index d86f7cebd4a7..c91d85349d39 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -324,8 +324,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 	 * signal first. We do not need to release the mmap_lock because it
 	 * would already be released in __lock_page_or_retry in mm/filemap.c.
 	 */
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			no_context(regs, addr);
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 09/10] riscv: fix livelock in uaccess
@ 2023-01-31 20:06   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:06 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/riscv/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index d86f7cebd4a7..c91d85349d39 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -324,8 +324,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 	 * signal first. We do not need to release the mmap_lock because it
 	 * would already be released in __lock_page_or_retry in mm/filemap.c.
 	 */
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			no_context(regs, addr);
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 10/10] sparc: fix livelock in uaccess
  2023-01-31 20:02 ` Al Viro
@ 2023-01-31 20:07   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:07 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

sparc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/sparc/mm/fault_32.c | 5 ++++-
 arch/sparc/mm/fault_64.c | 7 ++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index 91259f291c54..179295b14664 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -187,8 +187,11 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!from_user)
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 4acc12eafbf5..d91305de694c 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -424,8 +424,13 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (regs->tstate & TSTATE_PRIV) {
+			insn = get_fault_insn(regs, insn);
+			goto handle_kernel_fault;
+		}
 		goto exit_exception;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 10/10] sparc: fix livelock in uaccess
@ 2023-01-31 20:07   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 20:07 UTC (permalink / raw)
  To: linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

sparc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/sparc/mm/fault_32.c | 5 ++++-
 arch/sparc/mm/fault_64.c | 7 ++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index 91259f291c54..179295b14664 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -187,8 +187,11 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	 */
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (!from_user)
+			goto no_context;
 		return;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 4acc12eafbf5..d91305de694c 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -424,8 +424,13 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 
 	fault = handle_mm_fault(vma, address, flags, regs);
 
-	if (fault_signal_pending(fault, regs))
+	if (fault_signal_pending(fault, regs)) {
+		if (regs->tstate & TSTATE_PRIV) {
+			insn = get_fault_insn(regs, insn);
+			goto handle_kernel_fault;
+		}
 		goto exit_exception;
+	}
 
 	/* The fault is fully completed (including releasing mmap lock) */
 	if (fault & VM_FAULT_COMPLETED)
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 20:02 ` Al Viro
  (?)
@ 2023-01-31 20:24   ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-01-31 20:24 UTC (permalink / raw)
  To: Al Viro, Peter Xu
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux

[ Adding Peter Xu to the participants, he did a lot of "add generic
helpers" code a few years ago. Peter, see

    https://lore.kernel.org/linux-arch/Y9lz6yk113LmC9SI@ZenIV/

  for context ]

On Tue, Jan 31, 2023 at 12:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> However, it never had been done on a bunch of architectures - the
> current mainline still has that bug on alpha, hexagon, itanic, m68k,
> microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> sparc64).  Fixes are trivial, but I've no way to test them for most
> of those architectures.

Gaah. The patches look simple and "trivially correct", but I hate how
we duplicate basically the same (complicated) logic across
architectures.

Peter improved a bit on things, but I think we could do better.

Yes, all architectures basically have their own special code for
special error registers etc, and for various errata and/or special
handling for vmalloc addresses of vdso's etc.

And they aren't always translated to the generic flags, looking at
alpha, for example, we have code like this:

        si_code = SEGV_ACCERR;
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
                flags |= FAULT_FLAG_WRITE;
        }

because it uses the alpha internal "cause < 0" logic, instead of
having translated it into FAULT_FLAG_INSTRUCTION.

But *if* the alpha code were to just translate it into the
FAULT_FLAG_xyz namespace, apretty much *all* of the alpha
do_page_fault() could have been then done by a completely generic
"generic_page_fault()" that has all of the retry logic, all of the
si_code logic, etc etc.

So in a perfect world, we'd only have the special errata handling (*)
and the "translate to FAULT_FLAG_xyz" code in the architecture code,
and then just call that generic_page_fault() function for what really
is pretty much generic.

And then we would never again have these kinds of "architecture got
the retry wrong" issues.

Would anybody be interested in trying that? Just converting one or two
architectures to a new world order?

But if not, Al's patches all look "obviously fine" to me, but that's
because they basically all do the same thing except for that odd
special TSTATE_PRIV thing for sparc-64 - why can't that use
'!user_mode(regs)' like everybody else?)

                   Linus

(*) we really have a *lot* of architectures that have gotten
'prefetch' wrong and have errata for prefetch: alpha, arm and x86 all
have magic "bogus fault on prefetch" cases.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 20:24   ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-01-31 20:24 UTC (permalink / raw)
  To: Al Viro, Peter Xu
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux

[ Adding Peter Xu to the participants, he did a lot of "add generic
helpers" code a few years ago. Peter, see

    https://lore.kernel.org/linux-arch/Y9lz6yk113LmC9SI@ZenIV/

  for context ]

On Tue, Jan 31, 2023 at 12:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> However, it never had been done on a bunch of architectures - the
> current mainline still has that bug on alpha, hexagon, itanic, m68k,
> microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> sparc64).  Fixes are trivial, but I've no way to test them for most
> of those architectures.

Gaah. The patches look simple and "trivially correct", but I hate how
we duplicate basically the same (complicated) logic across
architectures.

Peter improved a bit on things, but I think we could do better.

Yes, all architectures basically have their own special code for
special error registers etc, and for various errata and/or special
handling for vmalloc addresses of vdso's etc.

And they aren't always translated to the generic flags, looking at
alpha, for example, we have code like this:

        si_code = SEGV_ACCERR;
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
                flags |= FAULT_FLAG_WRITE;
        }

because it uses the alpha internal "cause < 0" logic, instead of
having translated it into FAULT_FLAG_INSTRUCTION.

But *if* the alpha code were to just translate it into the
FAULT_FLAG_xyz namespace, apretty much *all* of the alpha
do_page_fault() could have been then done by a completely generic
"generic_page_fault()" that has all of the retry logic, all of the
si_code logic, etc etc.

So in a perfect world, we'd only have the special errata handling (*)
and the "translate to FAULT_FLAG_xyz" code in the architecture code,
and then just call that generic_page_fault() function for what really
is pretty much generic.

And then we would never again have these kinds of "architecture got
the retry wrong" issues.

Would anybody be interested in trying that? Just converting one or two
architectures to a new world order?

But if not, Al's patches all look "obviously fine" to me, but that's
because they basically all do the same thing except for that odd
special TSTATE_PRIV thing for sparc-64 - why can't that use
'!user_mode(regs)' like everybody else?)

                   Linus

(*) we really have a *lot* of architectures that have gotten
'prefetch' wrong and have errata for prefetch: alpha, arm and x86 all
have magic "bogus fault on prefetch" cases.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 20:24   ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-01-31 20:24 UTC (permalink / raw)
  To: Al Viro, Peter Xu
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux

[ Adding Peter Xu to the participants, he did a lot of "add generic
helpers" code a few years ago. Peter, see

    https://lore.kernel.org/linux-arch/Y9lz6yk113LmC9SI@ZenIV/

  for context ]

On Tue, Jan 31, 2023 at 12:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> However, it never had been done on a bunch of architectures - the
> current mainline still has that bug on alpha, hexagon, itanic, m68k,
> microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> sparc64).  Fixes are trivial, but I've no way to test them for most
> of those architectures.

Gaah. The patches look simple and "trivially correct", but I hate how
we duplicate basically the same (complicated) logic across
architectures.

Peter improved a bit on things, but I think we could do better.

Yes, all architectures basically have their own special code for
special error registers etc, and for various errata and/or special
handling for vmalloc addresses of vdso's etc.

And they aren't always translated to the generic flags, looking at
alpha, for example, we have code like this:

        si_code = SEGV_ACCERR;
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
                flags |= FAULT_FLAG_WRITE;
        }

because it uses the alpha internal "cause < 0" logic, instead of
having translated it into FAULT_FLAG_INSTRUCTION.

But *if* the alpha code were to just translate it into the
FAULT_FLAG_xyz namespace, apretty much *all* of the alpha
do_page_fault() could have been then done by a completely generic
"generic_page_fault()" that has all of the retry logic, all of the
si_code logic, etc etc.

So in a perfect world, we'd only have the special errata handling (*)
and the "translate to FAULT_FLAG_xyz" code in the architecture code,
and then just call that generic_page_fault() function for what really
is pretty much generic.

And then we would never again have these kinds of "architecture got
the retry wrong" issues.

Would anybody be interested in trying that? Just converting one or two
architectures to a new world order?

But if not, Al's patches all look "obviously fine" to me, but that's
because they basically all do the same thing except for that odd
special TSTATE_PRIV thing for sparc-64 - why can't that use
'!user_mode(regs)' like everybody else?)

                   Linus

(*) we really have a *lot* of architectures that have gotten
'prefetch' wrong and have errata for prefetch: alpha, arm and x86 all
have magic "bogus fault on prefetch" cases.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 20:24   ` Linus Torvalds
@ 2023-01-31 21:10     ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 21:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 12:24:54PM -0800, Linus Torvalds wrote:

> But *if* the alpha code were to just translate it into the
> FAULT_FLAG_xyz namespace, apretty much *all* of the alpha
> do_page_fault() could have been then done by a completely generic
> "generic_page_fault()" that has all of the retry logic, all of the
> si_code logic, etc etc.

Umm...  What about the semantics of get_user() of unmapped address?
Some architectures do quiet EFAULT; some (including alpha) hit
the sucker with SIGBUS, no matter what.

Examples:
arm
        if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP | VM_FAULT_BADACCESS))))
		return 0;

	/*
	 * If we are in kernel mode at this point, we
	 * have no context to handle this fault with.
	 */
	if (!user_mode(regs))
		goto no_context;
	...
alpha
 do_sigbus:
        mmap_read_unlock(mm);
        /* Send a sigbus, regardless of whether we were in kernel
           or user mode.  */
        force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *) address);
        if (!user_mode(regs))
                goto no_context;
        return;

Are we free to modify that behaviour, or is that part of arch-specific
ABI?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 21:10     ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 21:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 12:24:54PM -0800, Linus Torvalds wrote:

> But *if* the alpha code were to just translate it into the
> FAULT_FLAG_xyz namespace, apretty much *all* of the alpha
> do_page_fault() could have been then done by a completely generic
> "generic_page_fault()" that has all of the retry logic, all of the
> si_code logic, etc etc.

Umm...  What about the semantics of get_user() of unmapped address?
Some architectures do quiet EFAULT; some (including alpha) hit
the sucker with SIGBUS, no matter what.

Examples:
arm
        if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP | VM_FAULT_BADACCESS))))
		return 0;

	/*
	 * If we are in kernel mode at this point, we
	 * have no context to handle this fault with.
	 */
	if (!user_mode(regs))
		goto no_context;
	...
alpha
 do_sigbus:
        mmap_read_unlock(mm);
        /* Send a sigbus, regardless of whether we were in kernel
           or user mode.  */
        force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *) address);
        if (!user_mode(regs))
                goto no_context;
        return;

Are we free to modify that behaviour, or is that part of arch-specific
ABI?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 21:10     ` Al Viro
  (?)
@ 2023-01-31 21:19       ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-01-31 21:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  What about the semantics of get_user() of unmapped address?
> Some architectures do quiet EFAULT; some (including alpha) hit
> the sucker with SIGBUS, no matter what.

I think we should strive to just make this all common.

The reason alpha is different is almost certainly not intentional, but
a combination of "pure accident" and "nobody actually cares".

> Are we free to modify that behaviour, or is that part of arch-specific
> ABI?

I'd just unify this all, probably with a preference for existing
semantics on x86 (because of "biggest and most varied user base").

That whole "send SIGBUS even for kernel faults" is certainly bogus and
against the usual rules. And I may well be to blame for it (I have
this memory of disliking how EFAULT as a return code didn't actually
return the faulting address). And realistically, it's also just not
something that any normal application will ever hit.  Giving invalid
addresses to system calls is basically always a bug, although there
are always special software that do all the crazy corner cases (ie
things like emulators tend to do odd things).

I doubt such special software exists on Linux/alpha, though.

So I wouldn't worry about those kinds of oddities overmuch.

*If* somebody then finds a load that cares, we can always fix it
later, and I'll go "mea culpa, I didn't think it would matter, and I
was wrong".

         Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 21:19       ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-01-31 21:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  What about the semantics of get_user() of unmapped address?
> Some architectures do quiet EFAULT; some (including alpha) hit
> the sucker with SIGBUS, no matter what.

I think we should strive to just make this all common.

The reason alpha is different is almost certainly not intentional, but
a combination of "pure accident" and "nobody actually cares".

> Are we free to modify that behaviour, or is that part of arch-specific
> ABI?

I'd just unify this all, probably with a preference for existing
semantics on x86 (because of "biggest and most varied user base").

That whole "send SIGBUS even for kernel faults" is certainly bogus and
against the usual rules. And I may well be to blame for it (I have
this memory of disliking how EFAULT as a return code didn't actually
return the faulting address). And realistically, it's also just not
something that any normal application will ever hit.  Giving invalid
addresses to system calls is basically always a bug, although there
are always special software that do all the crazy corner cases (ie
things like emulators tend to do odd things).

I doubt such special software exists on Linux/alpha, though.

So I wouldn't worry about those kinds of oddities overmuch.

*If* somebody then finds a load that cares, we can always fix it
later, and I'll go "mea culpa, I didn't think it would matter, and I
was wrong".

         Linus

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 21:19       ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-01-31 21:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  What about the semantics of get_user() of unmapped address?
> Some architectures do quiet EFAULT; some (including alpha) hit
> the sucker with SIGBUS, no matter what.

I think we should strive to just make this all common.

The reason alpha is different is almost certainly not intentional, but
a combination of "pure accident" and "nobody actually cares".

> Are we free to modify that behaviour, or is that part of arch-specific
> ABI?

I'd just unify this all, probably with a preference for existing
semantics on x86 (because of "biggest and most varied user base").

That whole "send SIGBUS even for kernel faults" is certainly bogus and
against the usual rules. And I may well be to blame for it (I have
this memory of disliking how EFAULT as a return code didn't actually
return the faulting address). And realistically, it's also just not
something that any normal application will ever hit.  Giving invalid
addresses to system calls is basically always a bug, although there
are always special software that do all the crazy corner cases (ie
things like emulators tend to do odd things).

I doubt such special software exists on Linux/alpha, though.

So I wouldn't worry about those kinds of oddities overmuch.

*If* somebody then finds a load that cares, we can always fix it
later, and I'll go "mea culpa, I didn't think it would matter, and I
was wrong".

         Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 21:19       ` Linus Torvalds
@ 2023-01-31 21:49         ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 21:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 01:19:59PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm...  What about the semantics of get_user() of unmapped address?
> > Some architectures do quiet EFAULT; some (including alpha) hit
> > the sucker with SIGBUS, no matter what.
> 
> I think we should strive to just make this all common.
> 
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".
> 
> > Are we free to modify that behaviour, or is that part of arch-specific
> > ABI?
> 
> I'd just unify this all, probably with a preference for existing
> semantics on x86 (because of "biggest and most varied user base").
> 
> That whole "send SIGBUS even for kernel faults" is certainly bogus and
> against the usual rules. And I may well be to blame for it (I have
> this memory of disliking how EFAULT as a return code didn't actually
> return the faulting address). And realistically, it's also just not
> something that any normal application will ever hit.  Giving invalid
> addresses to system calls is basically always a bug, although there
> are always special software that do all the crazy corner cases (ie
> things like emulators tend to do odd things).
> 
> I doubt such special software exists on Linux/alpha, though.
> 
> So I wouldn't worry about those kinds of oddities overmuch.
> 
> *If* somebody then finds a load that cares, we can always fix it
> later, and I'll go "mea culpa, I didn't think it would matter, and I
> was wrong".

FWIW, from digging through the current tree:

alpha, openrisc, sparc and xtensa send SIGBUS.
m68k: not sure, do_page_fault() callers there are delicate.
mips: really interesting -
        /* Kernel mode? Handle exceptions or die */
        if (!user_mode(regs))
                goto no_context;

        /*
         * Send a sigbus, regardless of whether we were in kernel
         * or user mode.
... which is obviously a rudiment of SIGBUS variant, but nowadays
it's EFAULT.

Everything else seems to be going with EFAULT.

PS: mips used to be SIGBUS, until this
commit 1d50e5e7a6e0325b1a652c4be296a71dc54a6e96
Author: Andrew Morton <akpm@osdl.org>
Date:   Fri Feb 20 01:33:18 2004 -0800

    [PATCH] MIPS mega-patch
    
    From: Ralf Baechle <ralf@linux-mips.org>
    
    Below following 125547 lines of patches, all to arch/mips and
    include/asm-mips.  I'm going to send the remaining stuff of which the one
    or other bit may need to be discussed in smaller bits.

IOW, details are buried somewhere in historical mips tree, assuming
it survives...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-01-31 21:49         ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-01-31 21:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 01:19:59PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm...  What about the semantics of get_user() of unmapped address?
> > Some architectures do quiet EFAULT; some (including alpha) hit
> > the sucker with SIGBUS, no matter what.
> 
> I think we should strive to just make this all common.
> 
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".
> 
> > Are we free to modify that behaviour, or is that part of arch-specific
> > ABI?
> 
> I'd just unify this all, probably with a preference for existing
> semantics on x86 (because of "biggest and most varied user base").
> 
> That whole "send SIGBUS even for kernel faults" is certainly bogus and
> against the usual rules. And I may well be to blame for it (I have
> this memory of disliking how EFAULT as a return code didn't actually
> return the faulting address). And realistically, it's also just not
> something that any normal application will ever hit.  Giving invalid
> addresses to system calls is basically always a bug, although there
> are always special software that do all the crazy corner cases (ie
> things like emulators tend to do odd things).
> 
> I doubt such special software exists on Linux/alpha, though.
> 
> So I wouldn't worry about those kinds of oddities overmuch.
> 
> *If* somebody then finds a load that cares, we can always fix it
> later, and I'll go "mea culpa, I didn't think it would matter, and I
> was wrong".

FWIW, from digging through the current tree:

alpha, openrisc, sparc and xtensa send SIGBUS.
m68k: not sure, do_page_fault() callers there are delicate.
mips: really interesting -
        /* Kernel mode? Handle exceptions or die */
        if (!user_mode(regs))
                goto no_context;

        /*
         * Send a sigbus, regardless of whether we were in kernel
         * or user mode.
... which is obviously a rudiment of SIGBUS variant, but nowadays
it's EFAULT.

Everything else seems to be going with EFAULT.

PS: mips used to be SIGBUS, until this
commit 1d50e5e7a6e0325b1a652c4be296a71dc54a6e96
Author: Andrew Morton <akpm@osdl.org>
Date:   Fri Feb 20 01:33:18 2004 -0800

    [PATCH] MIPS mega-patch
    
    From: Ralf Baechle <ralf@linux-mips.org>
    
    Below following 125547 lines of patches, all to arch/mips and
    include/asm-mips.  I'm going to send the remaining stuff of which the one
    or other bit may need to be discussed in smaller bits.

IOW, details are buried somewhere in historical mips tree, assuming
it survives...

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 21:49         ` Al Viro
  (?)
@ 2023-02-01  0:00           ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-01  0:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 1:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Everything else seems to be going with EFAULT.

So I think fo kernel faults it's always basically up to the exception
handler, and sending a signal regardless of that is just wrong.

Of course, an exception handler might choose to send a signal, but it
just shouldn't be the do_page_fault() handler that does it.

For user faults, I think the rule ends up being that "if there's no
mapping, or if there is a protection fault, then we should do
SIGSEGV".

If there's an actual mapping in place, and the mapping has the right
permissions for the access, but fails for some *other* reason, then we
send SIGBUS.

So a shared mmap past the end of the file (but within the area that
was used for mmap) would SIGBUS, while a write to a read-only mapping
would SIGSEGV.

Things like unaligned accesses also might SIGBUS rather than SIGSEGV.

And I'm not surprised that there are exceptions to the rule, because
almost nothing really cares. In most situations, it's a fatal error.
And even when catching them, the end result is generally the same
(either "print out helpful message and die", or "longjump to some safe
code").

So most of the time it's probably not going to matter all that much
which signal gets sent in practice.

            Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01  0:00           ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-01  0:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 1:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Everything else seems to be going with EFAULT.

So I think fo kernel faults it's always basically up to the exception
handler, and sending a signal regardless of that is just wrong.

Of course, an exception handler might choose to send a signal, but it
just shouldn't be the do_page_fault() handler that does it.

For user faults, I think the rule ends up being that "if there's no
mapping, or if there is a protection fault, then we should do
SIGSEGV".

If there's an actual mapping in place, and the mapping has the right
permissions for the access, but fails for some *other* reason, then we
send SIGBUS.

So a shared mmap past the end of the file (but within the area that
was used for mmap) would SIGBUS, while a write to a read-only mapping
would SIGSEGV.

Things like unaligned accesses also might SIGBUS rather than SIGSEGV.

And I'm not surprised that there are exceptions to the rule, because
almost nothing really cares. In most situations, it's a fatal error.
And even when catching them, the end result is generally the same
(either "print out helpful message and die", or "longjump to some safe
code").

So most of the time it's probably not going to matter all that much
which signal gets sent in practice.

            Linus

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01  0:00           ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-01  0:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 1:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Everything else seems to be going with EFAULT.

So I think fo kernel faults it's always basically up to the exception
handler, and sending a signal regardless of that is just wrong.

Of course, an exception handler might choose to send a signal, but it
just shouldn't be the do_page_fault() handler that does it.

For user faults, I think the rule ends up being that "if there's no
mapping, or if there is a protection fault, then we should do
SIGSEGV".

If there's an actual mapping in place, and the mapping has the right
permissions for the access, but fails for some *other* reason, then we
send SIGBUS.

So a shared mmap past the end of the file (but within the area that
was used for mmap) would SIGBUS, while a write to a read-only mapping
would SIGSEGV.

Things like unaligned accesses also might SIGBUS rather than SIGSEGV.

And I'm not surprised that there are exceptions to the rule, because
almost nothing really cares. In most situations, it's a fatal error.
And even when catching them, the end result is generally the same
(either "print out helpful message and die", or "longjump to some safe
code").

So most of the time it's probably not going to matter all that much
which signal gets sent in practice.

            Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 21:19       ` Linus Torvalds
  (?)
@ 2023-02-01  8:21         ` Helge Deller
  -1 siblings, 0 replies; 120+ messages in thread
From: Helge Deller @ 2023-02-01  8:21 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On 1/31/23 22:19, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Umm...  What about the semantics of get_user() of unmapped address?
>> Some architectures do quiet EFAULT; some (including alpha) hit
>> the sucker with SIGBUS, no matter what.
>
> I think we should strive to just make this all common.
>
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".
>
>> Are we free to modify that behaviour, or is that part of arch-specific
>> ABI?
>
> I'd just unify this all, probably with a preference for existing
> semantics on x86 (because of "biggest and most varied user base").
>
> That whole "send SIGBUS even for kernel faults" is certainly bogus and
> against the usual rules. And I may well be to blame for it (I have
> this memory of disliking how EFAULT as a return code didn't actually
> return the faulting address). And realistically, it's also just not
> something that any normal application will ever hit.  Giving invalid
> addresses to system calls is basically always a bug, although there
> are always special software that do all the crazy corner cases (ie
> things like emulators tend to do odd things).
>
> I doubt such special software exists on Linux/alpha, though.
>
> So I wouldn't worry about those kinds of oddities overmuch.
>
> *If* somebody then finds a load that cares, we can always fix it
> later, and I'll go "mea culpa, I didn't think it would matter, and I
> was wrong".

AFAICS, the only applications which really care about the return
code are
- testsuites like LTP (i.e. the fstat05 testcase)
- some (not just debian) packages execute tests at the end of the
   build process and verify the return codes, i.e. libsigsegv.

The differences between the architectures is currently often taken care
of via #ifdefs, so if the return code is harmonized across platforms
it would at least help to simplify such testcases.

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01  8:21         ` Helge Deller
  0 siblings, 0 replies; 120+ messages in thread
From: Helge Deller @ 2023-02-01  8:21 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On 1/31/23 22:19, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Umm...  What about the semantics of get_user() of unmapped address?
>> Some architectures do quiet EFAULT; some (including alpha) hit
>> the sucker with SIGBUS, no matter what.
>
> I think we should strive to just make this all common.
>
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".
>
>> Are we free to modify that behaviour, or is that part of arch-specific
>> ABI?
>
> I'd just unify this all, probably with a preference for existing
> semantics on x86 (because of "biggest and most varied user base").
>
> That whole "send SIGBUS even for kernel faults" is certainly bogus and
> against the usual rules. And I may well be to blame for it (I have
> this memory of disliking how EFAULT as a return code didn't actually
> return the faulting address). And realistically, it's also just not
> something that any normal application will ever hit.  Giving invalid
> addresses to system calls is basically always a bug, although there
> are always special software that do all the crazy corner cases (ie
> things like emulators tend to do odd things).
>
> I doubt such special software exists on Linux/alpha, though.
>
> So I wouldn't worry about those kinds of oddities overmuch.
>
> *If* somebody then finds a load that cares, we can always fix it
> later, and I'll go "mea culpa, I didn't think it would matter, and I
> was wrong".

AFAICS, the only applications which really care about the return
code are
- testsuites like LTP (i.e. the fstat05 testcase)
- some (not just debian) packages execute tests at the end of the
   build process and verify the return codes, i.e. libsigsegv.

The differences between the architectures is currently often taken care
of via #ifdefs, so if the return code is harmonized across platforms
it would at least help to simplify such testcases.

Helge

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01  8:21         ` Helge Deller
  0 siblings, 0 replies; 120+ messages in thread
From: Helge Deller @ 2023-02-01  8:21 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Peter Xu, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On 1/31/23 22:19, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Umm...  What about the semantics of get_user() of unmapped address?
>> Some architectures do quiet EFAULT; some (including alpha) hit
>> the sucker with SIGBUS, no matter what.
>
> I think we should strive to just make this all common.
>
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".
>
>> Are we free to modify that behaviour, or is that part of arch-specific
>> ABI?
>
> I'd just unify this all, probably with a preference for existing
> semantics on x86 (because of "biggest and most varied user base").
>
> That whole "send SIGBUS even for kernel faults" is certainly bogus and
> against the usual rules. And I may well be to blame for it (I have
> this memory of disliking how EFAULT as a return code didn't actually
> return the faulting address). And realistically, it's also just not
> something that any normal application will ever hit.  Giving invalid
> addresses to system calls is basically always a bug, although there
> are always special software that do all the crazy corner cases (ie
> things like emulators tend to do odd things).
>
> I doubt such special software exists on Linux/alpha, though.
>
> So I wouldn't worry about those kinds of oddities overmuch.
>
> *If* somebody then finds a load that cares, we can always fix it
> later, and I'll go "mea culpa, I didn't think it would matter, and I
> was wrong".

AFAICS, the only applications which really care about the return
code are
- testsuites like LTP (i.e. the fstat05 testcase)
- some (not just debian) packages execute tests at the end of the
   build process and verify the return codes, i.e. libsigsegv.

The differences between the architectures is currently often taken care
of via #ifdefs, so if the return code is harmonized across platforms
it would at least help to simplify such testcases.

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 20:02 ` Al Viro
  (?)
@ 2023-02-01 10:50   ` Mark Rutland
  -1 siblings, 0 replies; 120+ messages in thread
From: Mark Rutland @ 2023-02-01 10:50 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 08:02:51PM +0000, Al Viro wrote:
> On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
> mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
> had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
> "ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
> in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
> in do_page_fault", etc.
> 
> However, it never had been done on a bunch of architectures - the
> current mainline still has that bug on alpha, hexagon, itanic, m68k,
> microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> sparc64).  Fixes are trivial, but I've no way to test them for most
> of those architectures.

FWIW, when I fixed arm and arm64 back in 2017, I did report the issue here with
a test case (and again in 2021, with maintainers all explciitly Cc'd):

  https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
  https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/

... so if anyone has access to those architectures, that test might be useful
for verifying the fix.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 10:50   ` Mark Rutland
  0 siblings, 0 replies; 120+ messages in thread
From: Mark Rutland @ 2023-02-01 10:50 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 08:02:51PM +0000, Al Viro wrote:
> On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
> mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
> had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
> "ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
> in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
> in do_page_fault", etc.
> 
> However, it never had been done on a bunch of architectures - the
> current mainline still has that bug on alpha, hexagon, itanic, m68k,
> microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> sparc64).  Fixes are trivial, but I've no way to test them for most
> of those architectures.

FWIW, when I fixed arm and arm64 back in 2017, I did report the issue here with
a test case (and again in 2021, with maintainers all explciitly Cc'd):

  https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
  https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/

... so if anyone has access to those architectures, that test might be useful
for verifying the fix.

Thanks,
Mark.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 10:50   ` Mark Rutland
  0 siblings, 0 replies; 120+ messages in thread
From: Mark Rutland @ 2023-02-01 10:50 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 08:02:51PM +0000, Al Viro wrote:
> On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
> mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
> had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
> "ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
> in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
> in do_page_fault", etc.
> 
> However, it never had been done on a bunch of architectures - the
> current mainline still has that bug on alpha, hexagon, itanic, m68k,
> microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> sparc64).  Fixes are trivial, but I've no way to test them for most
> of those architectures.

FWIW, when I fixed arm and arm64 back in 2017, I did report the issue here with
a test case (and again in 2021, with maintainers all explciitly Cc'd):

  https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
  https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/

... so if anyone has access to those architectures, that test might be useful
for verifying the fix.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-01  0:00           ` Linus Torvalds
  (?)
@ 2023-02-01 19:48             ` Peter Xu
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Xu @ 2023-02-01 19:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 04:00:22PM -0800, Linus Torvalds wrote:
> So most of the time it's probably not going to matter all that much
> which signal gets sent in practice.

I do also see a common pattern of the possibility to have a generic fault
handler like generic_page_fault().

It probably should start with taking the mmap_sem until providing some
retval that is much easier to digest further by the arch-dependent code, so
it can directly do something rather than parsing the bitmask in a
duplicated way (hence the new retval should hopefully not a bitmask anymore
but a "what to do").

Maybe it can be something like:

/**
 * enum page_fault_retval - Higher level fault retval, generalized from
 * vm_fault_reason above that is only used by hardware page fault handlers.
 * It generalizes the bitmask-versioned retval into something that the arch
 * dependent code should react upon.
 *
 * @PF_RET_COMPLETED:		The page fault is completed successfully
 * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
 *				(e.g., vma not found, expand_stack() fails..)
 * @PF_RET_ACCESS_ERR:		The page fault has access errors
 *				(e.g., write fault on !VM_WRITE vmas)
 * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
 *				(e.g., during copy_to_user() but fault failed?)
 * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
 * @PF_RET_SIGNAL:		The page fault encountered poisoned pages
 * ...
 */
enum page_fault_retval {
	PF_RET_DONE = 0,
	PF_RET_BAD_AREA,
	PF_RET_ACCESS_ERR,
	PF_RET_KERN_FIXUP,
        PF_RET_HWPOISON,
        PF_RET_SIGNAL,
	...
};

As a start we may still want to return some more information (perhaps still
the vm_fault_t alongside?  Or another union that will provide different
information based on different PF_RET_*).  One major thing is I see how we
handle VM_FAULT_HWPOISON and also the fact that we encode something more
into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).

So the generic helper could, hopefully, hide the complexity of:

  - Taking and releasing of mmap lock
  - find_vma(), and also relevant checks on access or stack handling
  - handle_mm_fault() itself (of course...)
  - detect signals
  - handle page fault retries (so, in the new layer of retval there should
    have nothing telling it to retry; it should always be the ultimate result)
  - parse different errors into "what the arch code should do", and
    generalize the common ones, e.g.
    - OOM, do pagefault_out_of_memory() for user-mode
    - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
    - ...

It'll simplify things if we can unify some small details like whether the
-EFAULT above should contain a sigbus.

A trivial detail I found when I was looking at this is, x86_64 passes in
different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
there're three call sites and each of them pass over a differerent signal.
IIUC that will only make a difference if there's a nested page fault during
the vsyscall emulation (but I may be wrong too because I'm new to this
code), and I have no idea when it'll happen and whether that needs to be
strictly followed.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 19:48             ` Peter Xu
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Xu @ 2023-02-01 19:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 04:00:22PM -0800, Linus Torvalds wrote:
> So most of the time it's probably not going to matter all that much
> which signal gets sent in practice.

I do also see a common pattern of the possibility to have a generic fault
handler like generic_page_fault().

It probably should start with taking the mmap_sem until providing some
retval that is much easier to digest further by the arch-dependent code, so
it can directly do something rather than parsing the bitmask in a
duplicated way (hence the new retval should hopefully not a bitmask anymore
but a "what to do").

Maybe it can be something like:

/**
 * enum page_fault_retval - Higher level fault retval, generalized from
 * vm_fault_reason above that is only used by hardware page fault handlers.
 * It generalizes the bitmask-versioned retval into something that the arch
 * dependent code should react upon.
 *
 * @PF_RET_COMPLETED:		The page fault is completed successfully
 * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
 *				(e.g., vma not found, expand_stack() fails..)
 * @PF_RET_ACCESS_ERR:		The page fault has access errors
 *				(e.g., write fault on !VM_WRITE vmas)
 * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
 *				(e.g., during copy_to_user() but fault failed?)
 * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
 * @PF_RET_SIGNAL:		The page fault encountered poisoned pages
 * ...
 */
enum page_fault_retval {
	PF_RET_DONE = 0,
	PF_RET_BAD_AREA,
	PF_RET_ACCESS_ERR,
	PF_RET_KERN_FIXUP,
        PF_RET_HWPOISON,
        PF_RET_SIGNAL,
	...
};

As a start we may still want to return some more information (perhaps still
the vm_fault_t alongside?  Or another union that will provide different
information based on different PF_RET_*).  One major thing is I see how we
handle VM_FAULT_HWPOISON and also the fact that we encode something more
into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).

So the generic helper could, hopefully, hide the complexity of:

  - Taking and releasing of mmap lock
  - find_vma(), and also relevant checks on access or stack handling
  - handle_mm_fault() itself (of course...)
  - detect signals
  - handle page fault retries (so, in the new layer of retval there should
    have nothing telling it to retry; it should always be the ultimate result)
  - parse different errors into "what the arch code should do", and
    generalize the common ones, e.g.
    - OOM, do pagefault_out_of_memory() for user-mode
    - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
    - ...

It'll simplify things if we can unify some small details like whether the
-EFAULT above should contain a sigbus.

A trivial detail I found when I was looking at this is, x86_64 passes in
different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
there're three call sites and each of them pass over a differerent signal.
IIUC that will only make a difference if there's a nested page fault during
the vsyscall emulation (but I may be wrong too because I'm new to this
code), and I have no idea when it'll happen and whether that needs to be
strictly followed.

Thanks,

-- 
Peter Xu


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 19:48             ` Peter Xu
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Xu @ 2023-02-01 19:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Tue, Jan 31, 2023 at 04:00:22PM -0800, Linus Torvalds wrote:
> So most of the time it's probably not going to matter all that much
> which signal gets sent in practice.

I do also see a common pattern of the possibility to have a generic fault
handler like generic_page_fault().

It probably should start with taking the mmap_sem until providing some
retval that is much easier to digest further by the arch-dependent code, so
it can directly do something rather than parsing the bitmask in a
duplicated way (hence the new retval should hopefully not a bitmask anymore
but a "what to do").

Maybe it can be something like:

/**
 * enum page_fault_retval - Higher level fault retval, generalized from
 * vm_fault_reason above that is only used by hardware page fault handlers.
 * It generalizes the bitmask-versioned retval into something that the arch
 * dependent code should react upon.
 *
 * @PF_RET_COMPLETED:		The page fault is completed successfully
 * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
 *				(e.g., vma not found, expand_stack() fails..)
 * @PF_RET_ACCESS_ERR:		The page fault has access errors
 *				(e.g., write fault on !VM_WRITE vmas)
 * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
 *				(e.g., during copy_to_user() but fault failed?)
 * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
 * @PF_RET_SIGNAL:		The page fault encountered poisoned pages
 * ...
 */
enum page_fault_retval {
	PF_RET_DONE = 0,
	PF_RET_BAD_AREA,
	PF_RET_ACCESS_ERR,
	PF_RET_KERN_FIXUP,
        PF_RET_HWPOISON,
        PF_RET_SIGNAL,
	...
};

As a start we may still want to return some more information (perhaps still
the vm_fault_t alongside?  Or another union that will provide different
information based on different PF_RET_*).  One major thing is I see how we
handle VM_FAULT_HWPOISON and also the fact that we encode something more
into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).

So the generic helper could, hopefully, hide the complexity of:

  - Taking and releasing of mmap lock
  - find_vma(), and also relevant checks on access or stack handling
  - handle_mm_fault() itself (of course...)
  - detect signals
  - handle page fault retries (so, in the new layer of retval there should
    have nothing telling it to retry; it should always be the ultimate result)
  - parse different errors into "what the arch code should do", and
    generalize the common ones, e.g.
    - OOM, do pagefault_out_of_memory() for user-mode
    - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
    - ...

It'll simplify things if we can unify some small details like whether the
-EFAULT above should contain a sigbus.

A trivial detail I found when I was looking at this is, x86_64 passes in
different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
there're three call sites and each of them pass over a differerent signal.
IIUC that will only make a difference if there's a nested page fault during
the vsyscall emulation (but I may be wrong too because I'm new to this
code), and I have no idea when it'll happen and whether that needs to be
strictly followed.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-01  8:21         ` Helge Deller
  (?)
@ 2023-02-01 19:51           ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-01 19:51 UTC (permalink / raw)
  To: Helge Deller
  Cc: Al Viro, Peter Xu, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 1, 2023 at 12:21 AM Helge Deller <deller@gmx.de> wrote:
>
> AFAICS, the only applications which really care about the return
> code are
> - testsuites like LTP (i.e. the fstat05 testcase)

Those have actually shown issues with various library implementations,
exactly because real system calls act very differently in this area
from library wrappers.

Things like the vdso implementation of gettimeofday() get a SIGSEGV if
the timeval or timezone pointer is invalid, while the "real system
call" version returns -1/EFAULT instead.

And very similar things happen when glibc ends up wrapping system
calls and converting buffers manually. At some point, glibc had a
special 'struct stat' and basically converted the native system call
to it, so you did 'stat()' on something, and it ended up actually
using a private on-stack buffer for the system call, followed by a
"convert that kernel 'struct stat' to the glibc 'struct stat'" phase.
So once again, instead of -1/EFAULT, you'd first have a successful
system call, and then get a SIGSEGV  in glibc.

And as you say, test suites would notice. But no actual normal app
would ever care.

Of course, there's always the abnormal apps. There _are_ the odd cases
that actually catch faults and fix them up, and can then be confused
by changes like that.

It's very very rare, but it happens - things like emulators do tend to
do some really strange things.

         Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 19:51           ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-01 19:51 UTC (permalink / raw)
  To: Helge Deller
  Cc: Al Viro, Peter Xu, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 1, 2023 at 12:21 AM Helge Deller <deller@gmx.de> wrote:
>
> AFAICS, the only applications which really care about the return
> code are
> - testsuites like LTP (i.e. the fstat05 testcase)

Those have actually shown issues with various library implementations,
exactly because real system calls act very differently in this area
from library wrappers.

Things like the vdso implementation of gettimeofday() get a SIGSEGV if
the timeval or timezone pointer is invalid, while the "real system
call" version returns -1/EFAULT instead.

And very similar things happen when glibc ends up wrapping system
calls and converting buffers manually. At some point, glibc had a
special 'struct stat' and basically converted the native system call
to it, so you did 'stat()' on something, and it ended up actually
using a private on-stack buffer for the system call, followed by a
"convert that kernel 'struct stat' to the glibc 'struct stat'" phase.
So once again, instead of -1/EFAULT, you'd first have a successful
system call, and then get a SIGSEGV  in glibc.

And as you say, test suites would notice. But no actual normal app
would ever care.

Of course, there's always the abnormal apps. There _are_ the odd cases
that actually catch faults and fix them up, and can then be confused
by changes like that.

It's very very rare, but it happens - things like emulators do tend to
do some really strange things.

         Linus

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 19:51           ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-01 19:51 UTC (permalink / raw)
  To: Helge Deller
  Cc: Al Viro, Peter Xu, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 1, 2023 at 12:21 AM Helge Deller <deller@gmx.de> wrote:
>
> AFAICS, the only applications which really care about the return
> code are
> - testsuites like LTP (i.e. the fstat05 testcase)

Those have actually shown issues with various library implementations,
exactly because real system calls act very differently in this area
from library wrappers.

Things like the vdso implementation of gettimeofday() get a SIGSEGV if
the timeval or timezone pointer is invalid, while the "real system
call" version returns -1/EFAULT instead.

And very similar things happen when glibc ends up wrapping system
calls and converting buffers manually. At some point, glibc had a
special 'struct stat' and basically converted the native system call
to it, so you did 'stat()' on something, and it ended up actually
using a private on-stack buffer for the system call, followed by a
"convert that kernel 'struct stat' to the glibc 'struct stat'" phase.
So once again, instead of -1/EFAULT, you'd first have a successful
system call, and then get a SIGSEGV  in glibc.

And as you say, test suites would notice. But no actual normal app
would ever care.

Of course, there's always the abnormal apps. There _are_ the odd cases
that actually catch faults and fix them up, and can then be confused
by changes like that.

It's very very rare, but it happens - things like emulators do tend to
do some really strange things.

         Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-01 19:48             ` Peter Xu
  (?)
@ 2023-02-01 22:18               ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-01 22:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 02:48:22PM -0500, Peter Xu wrote:

> I do also see a common pattern of the possibility to have a generic fault
> handler like generic_page_fault().
> 
> It probably should start with taking the mmap_sem until providing some
> retval that is much easier to digest further by the arch-dependent code, so
> it can directly do something rather than parsing the bitmask in a
> duplicated way (hence the new retval should hopefully not a bitmask anymore
> but a "what to do").
> 
> Maybe it can be something like:
> 
> /**
>  * enum page_fault_retval - Higher level fault retval, generalized from
>  * vm_fault_reason above that is only used by hardware page fault handlers.
>  * It generalizes the bitmask-versioned retval into something that the arch
>  * dependent code should react upon.
>  *
>  * @PF_RET_COMPLETED:		The page fault is completed successfully
>  * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
>  *				(e.g., vma not found, expand_stack() fails..)

FWIW, there's a fun discrepancy - VM_FAULT_SIGSEGV may yield SEGV_MAPERR
or SEGV_ACCERR; depends upon the architecture.  Not that there'd been
many places that return VM_FAULT_SIGSEGV these days...  Good thing, too,
since otherwise e.g. csky would oops...

>  * @PF_RET_ACCESS_ERR:		The page fault has access errors
>  *				(e.g., write fault on !VM_WRITE vmas)
>  * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
>  *				(e.g., during copy_to_user() but fault failed?)
>  * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
>  * @PF_RET_SIGNAL:		The page fault encountered poisoned pages

??

>  * ...
>  */
> enum page_fault_retval {
> 	PF_RET_DONE = 0,
> 	PF_RET_BAD_AREA,
> 	PF_RET_ACCESS_ERR,
> 	PF_RET_KERN_FIXUP,
>         PF_RET_HWPOISON,
>         PF_RET_SIGNAL,
> 	...
> };
> 
> As a start we may still want to return some more information (perhaps still
> the vm_fault_t alongside?  Or another union that will provide different
> information based on different PF_RET_*).  One major thing is I see how we
> handle VM_FAULT_HWPOISON and also the fact that we encode something more
> into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).
> 
> So the generic helper could, hopefully, hide the complexity of:
> 
>   - Taking and releasing of mmap lock
>   - find_vma(), and also relevant checks on access or stack handling

Umm...  arm is a bit special here:
                if (addr < FIRST_USER_ADDRESS)
			return VM_FAULT_BADMAP;
with no counterparts elsewhere.

>   - handle_mm_fault() itself (of course...)
>   - detect signals
>   - handle page fault retries (so, in the new layer of retval there should
>     have nothing telling it to retry; it should always be the ultimate result)

agreed.

    - unlock mmap; don't leave that to caller.

>   - parse different errors into "what the arch code should do", and
>     generalize the common ones, e.g.
>     - OOM, do pagefault_out_of_memory() for user-mode
>     - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
>     - ...

AFAICS, all errors in kernel mode => no_context.

> It'll simplify things if we can unify some small details like whether the
> -EFAULT above should contain a sigbus.
> 
> A trivial detail I found when I was looking at this is, x86_64 passes in
> different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
> there're three call sites and each of them pass over a differerent signal.
> IIUC that will only make a difference if there's a nested page fault during
> the vsyscall emulation (but I may be wrong too because I'm new to this
> code), and I have no idea when it'll happen and whether that needs to be
> strictly followed.

From my (very incomplete so far) dig through that pile:
	Q: do we still have the cases when handle_mm_fault() does
not return any of VM_FAULT_COMPLETED | VM_FAULT_RETRY | VM_FAULT_ERROR?
That gets treated as unlock + VM_FAULT_COMPLETED, but do we still need
that?
	Q: can VM_FAULT_RETRY be mixed with anything in VM_FAULT_ERROR?
What locking, if that happens?
	* details of storing the fault details (for ptrace, mostly)
vary a lot; no chance to unify, AFAICS.
	* requirements for vma flags also differ; e.g. read fault on
alpha is explicitly OK with absence of VM_READ if VM_WRITE is there.
Probably should go by way of arm and pass the mask that must
have non-empty intersection with vma->vm_flags?  Because *that*
is very likely to be a part of ABI - mmap(2) callers that rely
upon the flags being OK for given architecture are quite possible.
	* mmap lock is also quite variable in how it's taken;
x86 and arm have fun dance with trylock/search for exception handler/etc.
Other architectures do not; OTOH, there's a prefetch stuck in itanic
variant, with comment about mmap_sem being performance-critical...
	* logics for stack expansion includes this twist:
        if (!(vma->vm_flags & VM_GROWSDOWN))
                goto map_err;
        if (user_mode(regs)) {
                /* Accessing the stack below usp is always a bug.  The
                   "+ 256" is there due to some instructions doing
                   pre-decrement on the stack and that doesn't show up
                   until later.  */
                if (address + 256 < rdusp())
                        goto map_err;
        }
        if (expand_stack(vma, address))
                goto map_err;
That's m68k; ISTR similar considerations elsewhere, but I could be
wrong.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 22:18               ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-01 22:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 02:48:22PM -0500, Peter Xu wrote:

> I do also see a common pattern of the possibility to have a generic fault
> handler like generic_page_fault().
> 
> It probably should start with taking the mmap_sem until providing some
> retval that is much easier to digest further by the arch-dependent code, so
> it can directly do something rather than parsing the bitmask in a
> duplicated way (hence the new retval should hopefully not a bitmask anymore
> but a "what to do").
> 
> Maybe it can be something like:
> 
> /**
>  * enum page_fault_retval - Higher level fault retval, generalized from
>  * vm_fault_reason above that is only used by hardware page fault handlers.
>  * It generalizes the bitmask-versioned retval into something that the arch
>  * dependent code should react upon.
>  *
>  * @PF_RET_COMPLETED:		The page fault is completed successfully
>  * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
>  *				(e.g., vma not found, expand_stack() fails..)

FWIW, there's a fun discrepancy - VM_FAULT_SIGSEGV may yield SEGV_MAPERR
or SEGV_ACCERR; depends upon the architecture.  Not that there'd been
many places that return VM_FAULT_SIGSEGV these days...  Good thing, too,
since otherwise e.g. csky would oops...

>  * @PF_RET_ACCESS_ERR:		The page fault has access errors
>  *				(e.g., write fault on !VM_WRITE vmas)
>  * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
>  *				(e.g., during copy_to_user() but fault failed?)
>  * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
>  * @PF_RET_SIGNAL:		The page fault encountered poisoned pages

??

>  * ...
>  */
> enum page_fault_retval {
> 	PF_RET_DONE = 0,
> 	PF_RET_BAD_AREA,
> 	PF_RET_ACCESS_ERR,
> 	PF_RET_KERN_FIXUP,
>         PF_RET_HWPOISON,
>         PF_RET_SIGNAL,
> 	...
> };
> 
> As a start we may still want to return some more information (perhaps still
> the vm_fault_t alongside?  Or another union that will provide different
> information based on different PF_RET_*).  One major thing is I see how we
> handle VM_FAULT_HWPOISON and also the fact that we encode something more
> into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).
> 
> So the generic helper could, hopefully, hide the complexity of:
> 
>   - Taking and releasing of mmap lock
>   - find_vma(), and also relevant checks on access or stack handling

Umm...  arm is a bit special here:
                if (addr < FIRST_USER_ADDRESS)
			return VM_FAULT_BADMAP;
with no counterparts elsewhere.

>   - handle_mm_fault() itself (of course...)
>   - detect signals
>   - handle page fault retries (so, in the new layer of retval there should
>     have nothing telling it to retry; it should always be the ultimate result)

agreed.

    - unlock mmap; don't leave that to caller.

>   - parse different errors into "what the arch code should do", and
>     generalize the common ones, e.g.
>     - OOM, do pagefault_out_of_memory() for user-mode
>     - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
>     - ...

AFAICS, all errors in kernel mode => no_context.

> It'll simplify things if we can unify some small details like whether the
> -EFAULT above should contain a sigbus.
> 
> A trivial detail I found when I was looking at this is, x86_64 passes in
> different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
> there're three call sites and each of them pass over a differerent signal.
> IIUC that will only make a difference if there's a nested page fault during
> the vsyscall emulation (but I may be wrong too because I'm new to this
> code), and I have no idea when it'll happen and whether that needs to be
> strictly followed.

From my (very incomplete so far) dig through that pile:
	Q: do we still have the cases when handle_mm_fault() does
not return any of VM_FAULT_COMPLETED | VM_FAULT_RETRY | VM_FAULT_ERROR?
That gets treated as unlock + VM_FAULT_COMPLETED, but do we still need
that?
	Q: can VM_FAULT_RETRY be mixed with anything in VM_FAULT_ERROR?
What locking, if that happens?
	* details of storing the fault details (for ptrace, mostly)
vary a lot; no chance to unify, AFAICS.
	* requirements for vma flags also differ; e.g. read fault on
alpha is explicitly OK with absence of VM_READ if VM_WRITE is there.
Probably should go by way of arm and pass the mask that must
have non-empty intersection with vma->vm_flags?  Because *that*
is very likely to be a part of ABI - mmap(2) callers that rely
upon the flags being OK for given architecture are quite possible.
	* mmap lock is also quite variable in how it's taken;
x86 and arm have fun dance with trylock/search for exception handler/etc.
Other architectures do not; OTOH, there's a prefetch stuck in itanic
variant, with comment about mmap_sem being performance-critical...
	* logics for stack expansion includes this twist:
        if (!(vma->vm_flags & VM_GROWSDOWN))
                goto map_err;
        if (user_mode(regs)) {
                /* Accessing the stack below usp is always a bug.  The
                   "+ 256" is there due to some instructions doing
                   pre-decrement on the stack and that doesn't show up
                   until later.  */
                if (address + 256 < rdusp())
                        goto map_err;
        }
        if (expand_stack(vma, address))
                goto map_err;
That's m68k; ISTR similar considerations elsewhere, but I could be
wrong.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-01 22:18               ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-01 22:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 02:48:22PM -0500, Peter Xu wrote:

> I do also see a common pattern of the possibility to have a generic fault
> handler like generic_page_fault().
> 
> It probably should start with taking the mmap_sem until providing some
> retval that is much easier to digest further by the arch-dependent code, so
> it can directly do something rather than parsing the bitmask in a
> duplicated way (hence the new retval should hopefully not a bitmask anymore
> but a "what to do").
> 
> Maybe it can be something like:
> 
> /**
>  * enum page_fault_retval - Higher level fault retval, generalized from
>  * vm_fault_reason above that is only used by hardware page fault handlers.
>  * It generalizes the bitmask-versioned retval into something that the arch
>  * dependent code should react upon.
>  *
>  * @PF_RET_COMPLETED:		The page fault is completed successfully
>  * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
>  *				(e.g., vma not found, expand_stack() fails..)

FWIW, there's a fun discrepancy - VM_FAULT_SIGSEGV may yield SEGV_MAPERR
or SEGV_ACCERR; depends upon the architecture.  Not that there'd been
many places that return VM_FAULT_SIGSEGV these days...  Good thing, too,
since otherwise e.g. csky would oops...

>  * @PF_RET_ACCESS_ERR:		The page fault has access errors
>  *				(e.g., write fault on !VM_WRITE vmas)
>  * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
>  *				(e.g., during copy_to_user() but fault failed?)
>  * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
>  * @PF_RET_SIGNAL:		The page fault encountered poisoned pages

??

>  * ...
>  */
> enum page_fault_retval {
> 	PF_RET_DONE = 0,
> 	PF_RET_BAD_AREA,
> 	PF_RET_ACCESS_ERR,
> 	PF_RET_KERN_FIXUP,
>         PF_RET_HWPOISON,
>         PF_RET_SIGNAL,
> 	...
> };
> 
> As a start we may still want to return some more information (perhaps still
> the vm_fault_t alongside?  Or another union that will provide different
> information based on different PF_RET_*).  One major thing is I see how we
> handle VM_FAULT_HWPOISON and also the fact that we encode something more
> into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).
> 
> So the generic helper could, hopefully, hide the complexity of:
> 
>   - Taking and releasing of mmap lock
>   - find_vma(), and also relevant checks on access or stack handling

Umm...  arm is a bit special here:
                if (addr < FIRST_USER_ADDRESS)
			return VM_FAULT_BADMAP;
with no counterparts elsewhere.

>   - handle_mm_fault() itself (of course...)
>   - detect signals
>   - handle page fault retries (so, in the new layer of retval there should
>     have nothing telling it to retry; it should always be the ultimate result)

agreed.

    - unlock mmap; don't leave that to caller.

>   - parse different errors into "what the arch code should do", and
>     generalize the common ones, e.g.
>     - OOM, do pagefault_out_of_memory() for user-mode
>     - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
>     - ...

AFAICS, all errors in kernel mode => no_context.

> It'll simplify things if we can unify some small details like whether the
> -EFAULT above should contain a sigbus.
> 
> A trivial detail I found when I was looking at this is, x86_64 passes in
> different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
> there're three call sites and each of them pass over a differerent signal.
> IIUC that will only make a difference if there's a nested page fault during
> the vsyscall emulation (but I may be wrong too because I'm new to this
> code), and I have no idea when it'll happen and whether that needs to be
> strictly followed.

From my (very incomplete so far) dig through that pile:
	Q: do we still have the cases when handle_mm_fault() does
not return any of VM_FAULT_COMPLETED | VM_FAULT_RETRY | VM_FAULT_ERROR?
That gets treated as unlock + VM_FAULT_COMPLETED, but do we still need
that?
	Q: can VM_FAULT_RETRY be mixed with anything in VM_FAULT_ERROR?
What locking, if that happens?
	* details of storing the fault details (for ptrace, mostly)
vary a lot; no chance to unify, AFAICS.
	* requirements for vma flags also differ; e.g. read fault on
alpha is explicitly OK with absence of VM_READ if VM_WRITE is there.
Probably should go by way of arm and pass the mask that must
have non-empty intersection with vma->vm_flags?  Because *that*
is very likely to be a part of ABI - mmap(2) callers that rely
upon the flags being OK for given architecture are quite possible.
	* mmap lock is also quite variable in how it's taken;
x86 and arm have fun dance with trylock/search for exception handler/etc.
Other architectures do not; OTOH, there's a prefetch stuck in itanic
variant, with comment about mmap_sem being performance-critical...
	* logics for stack expansion includes this twist:
        if (!(vma->vm_flags & VM_GROWSDOWN))
                goto map_err;
        if (user_mode(regs)) {
                /* Accessing the stack below usp is always a bug.  The
                   "+ 256" is there due to some instructions doing
                   pre-decrement on the stack and that doesn't show up
                   until later.  */
                if (address + 256 < rdusp())
                        goto map_err;
        }
        if (expand_stack(vma, address))
                goto map_err;
That's m68k; ISTR similar considerations elsewhere, but I could be
wrong.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-01 22:18               ` Al Viro
  (?)
@ 2023-02-02  0:57                 ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-02  0:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 10:18:11PM +0000, Al Viro wrote:
> 	* logics for stack expansion includes this twist:
>         if (!(vma->vm_flags & VM_GROWSDOWN))
>                 goto map_err;
>         if (user_mode(regs)) {
>                 /* Accessing the stack below usp is always a bug.  The
>                    "+ 256" is there due to some instructions doing
>                    pre-decrement on the stack and that doesn't show up
>                    until later.  */
>                 if (address + 256 < rdusp())
>                         goto map_err;
>         }
>         if (expand_stack(vma, address))
>                 goto map_err;
> That's m68k; ISTR similar considerations elsewhere, but I could be
> wrong.

Hell, yes - 
        if (!(vma->vm_flags & VM_GROWSDOWN))
                goto bad_area;
        if (!(fault_code & FAULT_CODE_WRITE)) {
                /* Non-faulting loads shouldn't expand stack. */
                insn = get_fault_insn(regs, insn);
                if ((insn & 0xc0800000) == 0xc0800000) {
                        unsigned char asi;

                        if (insn & 0x2000)
                                asi = (regs->tstate >> 24);
                        else
                                asi = (insn >> 5);
                        if ((asi & 0xf2) == 0x82)
                                goto bad_area;
                }
        }
        if (expand_stack(vma, address))
                goto bad_area;

Note that it's very much not a bug - it's a nonfaulting (== speculative)
load, and the place where we are heading from bad_area in this case is
this in do_kernel_fault():
        if (!(fault_code & (FAULT_CODE_WRITE|FAULT_CODE_ITLB)) &&
            (insn & 0xc0800000) == 0xc0800000) {
                if (insn & 0x2000)
                        asi = (regs->tstate >> 24);
                else  
                        asi = (insn >> 5);
                if ((asi & 0xf2) == 0x82) {
                        if (insn & 0x1000000) {
                                handle_ldf_stq(insn, regs);
                        } else {
                                /* This was a non-faulting load. Just clear the
                                 * destination register(s) and continue with the next
                                 * instruction. -jj
                                 */
                                handle_ld_nf(insn, regs);
                        }
                        return;

(the name is misguiding - it covers userland stuff as well; in this
particular case the triggering instruction is non-priveleged)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-02  0:57                 ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-02  0:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 10:18:11PM +0000, Al Viro wrote:
> 	* logics for stack expansion includes this twist:
>         if (!(vma->vm_flags & VM_GROWSDOWN))
>                 goto map_err;
>         if (user_mode(regs)) {
>                 /* Accessing the stack below usp is always a bug.  The
>                    "+ 256" is there due to some instructions doing
>                    pre-decrement on the stack and that doesn't show up
>                    until later.  */
>                 if (address + 256 < rdusp())
>                         goto map_err;
>         }
>         if (expand_stack(vma, address))
>                 goto map_err;
> That's m68k; ISTR similar considerations elsewhere, but I could be
> wrong.

Hell, yes - 
        if (!(vma->vm_flags & VM_GROWSDOWN))
                goto bad_area;
        if (!(fault_code & FAULT_CODE_WRITE)) {
                /* Non-faulting loads shouldn't expand stack. */
                insn = get_fault_insn(regs, insn);
                if ((insn & 0xc0800000) == 0xc0800000) {
                        unsigned char asi;

                        if (insn & 0x2000)
                                asi = (regs->tstate >> 24);
                        else
                                asi = (insn >> 5);
                        if ((asi & 0xf2) == 0x82)
                                goto bad_area;
                }
        }
        if (expand_stack(vma, address))
                goto bad_area;

Note that it's very much not a bug - it's a nonfaulting (== speculative)
load, and the place where we are heading from bad_area in this case is
this in do_kernel_fault():
        if (!(fault_code & (FAULT_CODE_WRITE|FAULT_CODE_ITLB)) &&
            (insn & 0xc0800000) == 0xc0800000) {
                if (insn & 0x2000)
                        asi = (regs->tstate >> 24);
                else  
                        asi = (insn >> 5);
                if ((asi & 0xf2) == 0x82) {
                        if (insn & 0x1000000) {
                                handle_ldf_stq(insn, regs);
                        } else {
                                /* This was a non-faulting load. Just clear the
                                 * destination register(s) and continue with the next
                                 * instruction. -jj
                                 */
                                handle_ld_nf(insn, regs);
                        }
                        return;

(the name is misguiding - it covers userland stuff as well; in this
particular case the triggering instruction is non-priveleged)

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-02  0:57                 ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-02  0:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 10:18:11PM +0000, Al Viro wrote:
> 	* logics for stack expansion includes this twist:
>         if (!(vma->vm_flags & VM_GROWSDOWN))
>                 goto map_err;
>         if (user_mode(regs)) {
>                 /* Accessing the stack below usp is always a bug.  The
>                    "+ 256" is there due to some instructions doing
>                    pre-decrement on the stack and that doesn't show up
>                    until later.  */
>                 if (address + 256 < rdusp())
>                         goto map_err;
>         }
>         if (expand_stack(vma, address))
>                 goto map_err;
> That's m68k; ISTR similar considerations elsewhere, but I could be
> wrong.

Hell, yes - 
        if (!(vma->vm_flags & VM_GROWSDOWN))
                goto bad_area;
        if (!(fault_code & FAULT_CODE_WRITE)) {
                /* Non-faulting loads shouldn't expand stack. */
                insn = get_fault_insn(regs, insn);
                if ((insn & 0xc0800000) = 0xc0800000) {
                        unsigned char asi;

                        if (insn & 0x2000)
                                asi = (regs->tstate >> 24);
                        else
                                asi = (insn >> 5);
                        if ((asi & 0xf2) = 0x82)
                                goto bad_area;
                }
        }
        if (expand_stack(vma, address))
                goto bad_area;

Note that it's very much not a bug - it's a nonfaulting (= speculative)
load, and the place where we are heading from bad_area in this case is
this in do_kernel_fault():
        if (!(fault_code & (FAULT_CODE_WRITE|FAULT_CODE_ITLB)) &&
            (insn & 0xc0800000) = 0xc0800000) {
                if (insn & 0x2000)
                        asi = (regs->tstate >> 24);
                else  
                        asi = (insn >> 5);
                if ((asi & 0xf2) = 0x82) {
                        if (insn & 0x1000000) {
                                handle_ldf_stq(insn, regs);
                        } else {
                                /* This was a non-faulting load. Just clear the
                                 * destination register(s) and continue with the next
                                 * instruction. -jj
                                 */
                                handle_ld_nf(insn, regs);
                        }
                        return;

(the name is misguiding - it covers userland stuff as well; in this
particular case the triggering instruction is non-priveleged)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 21:19       ` Linus Torvalds
@ 2023-02-02  6:58         ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-02  6:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Peter Xu, linux-arch, linux-alpha, Richard Henderson

On Tue, Jan 31, 2023 at 01:19:59PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm...  What about the semantics of get_user() of unmapped address?
> > Some architectures do quiet EFAULT; some (including alpha) hit
> > the sucker with SIGBUS, no matter what.
> 
> I think we should strive to just make this all common.
> 
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".

BTW, speaking of alpha page faults - maybe I'm misreading the manual,
but it seems to imply that interrupts are *not* disabled when entering
page fault handler:

Table 24–2 Entry Point Address Registers
Entry Point Value in a0             Value in a1   Value in a2         PS<IPL>
entArith    Exception summary       Register mask UNPREDICTABLE       Unchanged
entIF       Fault or trap type code UNPREDICTABLE UNPREDICTABLE       Unchanged
entInt      Interrupt type          Vector        Interrupt parameter Priority of interrupt
entMM       VA                      MMCSR         Cause               Unchanged
entSys      p0                      p1            p2                  Unchanged
entUna      VA                      Opcode        Src/Dst             Unchanged

So there's nothing to prevent an interrupt hitting just as we reach
entMM, with interrupt handler stepping on a vmalloc'ed area and
triggering another page fault.

If that is correct, this
                /* Synchronize this task's top level page-table
                   with the "reference" page table from init.  */
                long index = pgd_index(address);
                pgd_t *pgd, *pgd_k;

                pgd = current->active_mm->pgd + index;
                pgd_k = swapper_pg_dir + index;
                if (!pgd_present(*pgd) && pgd_present(*pgd_k)) {
                        pgd_val(*pgd) = pgd_val(*pgd_k);
                        return;
                }
                goto no_context;
is not just missing local_irq_save()/local_irq_restore() around that
fragment - if it finds pgd already present, it needs to check pte
before deciding to proceed to no_context.

Suppose we access vmalloc area and corresponding pgd is not
present in current->active_mm.  Just as we get to entMM,
an interrupt arrives and proceeds to access something
covered by the same pgd.  OK, current->active_mm is still
not present, we get another page fault and do_page_fault()
gets to the quoted code.  pgd is copied from the swapper_pg_dir,
do_page_fault() returns and we get back to the instruction in
interrupt handler that had triggered the second #PF.  This
time around it succeeds.  Once the interrupt handler completes
we are back to entMM.  Once *that* gets to do_page_fault()
we hit the quoted code again.  Only this time around pgd
*is* present and instead of returning we get to no_context.
And since it's been a normal access to vmalloc'ed memory,
there's nothing to be found in exception table.  Oops...

	AFAICS, one way to deal with that is to treat
(unlikely) pgd_present(*pgd) as "get to pte, return if
it looks legitimate, proceed to no_context if it isn't".

Other bugs in the same area:
	* we ought to compare address with VMALLOC_START,
not TASK_SIZE.
	* we ought to do that *before* checking for
kernel threads/pagefault_disable() being in effect.

Wait a minute - pgd_present() on alpha has become constant 1
since a73c948952cc "alpha: use pgtable-nopud instead of 4level-fixup"

So that thing had been completely broken for 3 years and nobody
had noticed.  And that's really completely broken - it stopped
copying top-level entries since that commit.

That's also not hard to fix, but...
	* CONFIG_ALPHA_LARGE_VMALLOC had been racy all along
	* it had very limited use (need for >8Gb of vmalloc
space)
	* it had stopped working in late 2019 and nobody cared.

How about removing that kludge?  Richard?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-02  6:58         ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-02  6:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Peter Xu, linux-arch, linux-alpha, Richard Henderson

On Tue, Jan 31, 2023 at 01:19:59PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm...  What about the semantics of get_user() of unmapped address?
> > Some architectures do quiet EFAULT; some (including alpha) hit
> > the sucker with SIGBUS, no matter what.
> 
> I think we should strive to just make this all common.
> 
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".

BTW, speaking of alpha page faults - maybe I'm misreading the manual,
but it seems to imply that interrupts are *not* disabled when entering
page fault handler:

Table 24–2 Entry Point Address Registers
Entry Point Value in a0             Value in a1   Value in a2         PS<IPL>
entArith    Exception summary       Register mask UNPREDICTABLE       Unchanged
entIF       Fault or trap type code UNPREDICTABLE UNPREDICTABLE       Unchanged
entInt      Interrupt type          Vector        Interrupt parameter Priority of interrupt
entMM       VA                      MMCSR         Cause               Unchanged
entSys      p0                      p1            p2                  Unchanged
entUna      VA                      Opcode        Src/Dst             Unchanged

So there's nothing to prevent an interrupt hitting just as we reach
entMM, with interrupt handler stepping on a vmalloc'ed area and
triggering another page fault.

If that is correct, this
                /* Synchronize this task's top level page-table
                   with the "reference" page table from init.  */
                long index = pgd_index(address);
                pgd_t *pgd, *pgd_k;

                pgd = current->active_mm->pgd + index;
                pgd_k = swapper_pg_dir + index;
                if (!pgd_present(*pgd) && pgd_present(*pgd_k)) {
                        pgd_val(*pgd) = pgd_val(*pgd_k);
                        return;
                }
                goto no_context;
is not just missing local_irq_save()/local_irq_restore() around that
fragment - if it finds pgd already present, it needs to check pte
before deciding to proceed to no_context.

Suppose we access vmalloc area and corresponding pgd is not
present in current->active_mm.  Just as we get to entMM,
an interrupt arrives and proceeds to access something
covered by the same pgd.  OK, current->active_mm is still
not present, we get another page fault and do_page_fault()
gets to the quoted code.  pgd is copied from the swapper_pg_dir,
do_page_fault() returns and we get back to the instruction in
interrupt handler that had triggered the second #PF.  This
time around it succeeds.  Once the interrupt handler completes
we are back to entMM.  Once *that* gets to do_page_fault()
we hit the quoted code again.  Only this time around pgd
*is* present and instead of returning we get to no_context.
And since it's been a normal access to vmalloc'ed memory,
there's nothing to be found in exception table.  Oops...

	AFAICS, one way to deal with that is to treat
(unlikely) pgd_present(*pgd) as "get to pte, return if
it looks legitimate, proceed to no_context if it isn't".

Other bugs in the same area:
	* we ought to compare address with VMALLOC_START,
not TASK_SIZE.
	* we ought to do that *before* checking for
kernel threads/pagefault_disable() being in effect.

Wait a minute - pgd_present() on alpha has become constant 1
since a73c948952cc "alpha: use pgtable-nopud instead of 4level-fixup"

So that thing had been completely broken for 3 years and nobody
had noticed.  And that's really completely broken - it stopped
copying top-level entries since that commit.

That's also not hard to fix, but...
	* CONFIG_ALPHA_LARGE_VMALLOC had been racy all along
	* it had very limited use (need for >8Gb of vmalloc
space)
	* it had stopped working in late 2019 and nobody cared.

How about removing that kludge?  Richard?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-02  6:58         ` Al Viro
  (?)
@ 2023-02-02  8:54         ` Michael Cree
  2023-02-02  9:56           ` John Paul Adrian Glaubitz
  2023-02-02 15:20           ` Al Viro
  -1 siblings, 2 replies; 120+ messages in thread
From: Michael Cree @ 2023-02-02  8:54 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Peter Xu, linux-arch, linux-alpha, Richard Henderson

On Thu, Feb 02, 2023 at 06:58:03AM +0000, Al Viro wrote:
> Other bugs in the same area:
> 	* we ought to compare address with VMALLOC_START,
> not TASK_SIZE.
> 	* we ought to do that *before* checking for
> kernel threads/pagefault_disable() being in effect.
> 
> Wait a minute - pgd_present() on alpha has become constant 1
> since a73c948952cc "alpha: use pgtable-nopud instead of 4level-fixup"
> 
> So that thing had been completely broken for 3 years and nobody
> had noticed.  

I have never noticed because I haven't been able to run a 5.9 or
newer kernel on Alpha reliably so have been running a 5.8 kernel
for quite some time.  Bad commit is about
25788738eb9ce46fe6a0fd84a3ceef5c795d41f0 but bisection proved very
difficult because the bug might only show up once per day (memory
corruption in user space causing gcc in big long builds to ICE
reporting corrupted structures, or similar) and so could never be
entirely sure a kernel marked as good was truly good in bisection
and went wrong a number of times.  There is a good possibility the bad
commit is one of facdaa917c4d5a376d or 25788738eb9ce46fe6.

Cheers,
Michael.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-02  8:54         ` Michael Cree
@ 2023-02-02  9:56           ` John Paul Adrian Glaubitz
  2023-02-02 15:20           ` Al Viro
  1 sibling, 0 replies; 120+ messages in thread
From: John Paul Adrian Glaubitz @ 2023-02-02  9:56 UTC (permalink / raw)
  To: Michael Cree, Al Viro
  Cc: Linus Torvalds, Peter Xu, linux-arch, linux-alpha, Richard Henderson

Hi Michael!

On Thu, 2023-02-02 at 21:54 +1300, Michael Cree wrote:
> I have never noticed because I haven't been able to run a 5.9 or
> newer kernel on Alpha reliably so have been running a 5.8 kernel
> for quite some time.

Didn't people on the debian-alpha mailing list report that newer 6.x
kernels work fine on alpha again? Did you try a recent kernel yourself?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-02  8:54         ` Michael Cree
  2023-02-02  9:56           ` John Paul Adrian Glaubitz
@ 2023-02-02 15:20           ` Al Viro
  2023-02-02 20:20             ` Al Viro
  1 sibling, 1 reply; 120+ messages in thread
From: Al Viro @ 2023-02-02 15:20 UTC (permalink / raw)
  To: Michael Cree, Linus Torvalds, Peter Xu, linux-arch, linux-alpha,
	Richard Henderson

On Thu, Feb 02, 2023 at 09:54:35PM +1300, Michael Cree wrote:
> On Thu, Feb 02, 2023 at 06:58:03AM +0000, Al Viro wrote:
> > Other bugs in the same area:
> > 	* we ought to compare address with VMALLOC_START,
> > not TASK_SIZE.
> > 	* we ought to do that *before* checking for
> > kernel threads/pagefault_disable() being in effect.
> > 
> > Wait a minute - pgd_present() on alpha has become constant 1
> > since a73c948952cc "alpha: use pgtable-nopud instead of 4level-fixup"
> > 
> > So that thing had been completely broken for 3 years and nobody
> > had noticed.  
> 
> I have never noticed because I haven't been able to run a 5.9 or
> newer kernel on Alpha reliably so have been running a 5.8 kernel
> for quite some time.

Er...  For one thing, commit in question went into 5.5; for another
you would only have noticed if you had CONFIG_ALPHA_LARGE_VMALLOC
in your .config, so I rather doubt it's the same problem.

Normally alpha has one PGDIR_SIZE worth of vmalloc space, and it's shared
between all processes - that one slot in top-level table is filled before
we spawn any threads so all of them end up sharing the reference to
the same middle-level table, which is where vmalloc stuff gets mapped.

So normally we have 8Gb of vmalloc space and, as usual for 64bit
boxen, no need to play with vmalloc handling on page faults.  The trouble
is with a kludge that tries to give more than 8Gb; it gives just under
2Tb (2Tb-8Gb, actually - 255 slots out 1024 in top-level table).  That
does need assistance from #PF handler, and that assistance (ifdefed
under CONFIG_ALPHA_LARGE_VMALLOC) what I'd been refering to.

To quote Kconfig,
====
# LARGE_VMALLOC is racy, if you *really* need it then fix it first
config ALPHA_LARGE_VMALLOC
        bool
        help
          Process creation and other aspects of virtual memory management can
          be streamlined if we restrict the kernel to one PGD for all vmalloc
          allocations.  This equates to about 8GB.

          Under normal circumstances, this is so far and above what is needed
          as to be laughable.  However, there are certain applications (such
          as benchmark-grade in-kernel web serving) that can make use of as
          much vmalloc space as is available.

          Say N unless you know you need gobs and gobs of vmalloc space.
====
"Racy" probably had been about something along the lines of the scenario
I'd mentioned just upthread, but in 5.5 that "racy" escalated to "does not
work at all" - if you ever hit a vmalloc-related fault, you are going
to get an oops.  You still get 8Gb, but beyond that it's broken.

And it's almost certainly not the problem you are seeing...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-02 15:20           ` Al Viro
@ 2023-02-02 20:20             ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-02 20:20 UTC (permalink / raw)
  To: Michael Cree, Linus Torvalds, Peter Xu, linux-arch, linux-alpha,
	Richard Henderson

On Thu, Feb 02, 2023 at 03:20:37PM +0000, Al Viro wrote:

> "Racy" probably had been about something along the lines of the scenario
> I'd mentioned just upthread, but in 5.5 that "racy" escalated to "does not
> work at all" - if you ever hit a vmalloc-related fault, you are going
> to get an oops.  You still get 8Gb, but beyond that it's broken.
> 
> And it's almost certainly not the problem you are seeing...

Incidentally, direct store to pgd_val() in there looks fishy wrt barriers -
shouldn't there be an smp_wmb() before we shove that sucker into our
top-level table's entry?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-02  6:58         ` Al Viro
  (?)
  (?)
@ 2023-02-02 20:34         ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-02 20:34 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Xu, linux-arch, linux-alpha, Richard Henderson

On Wed, Feb 1, 2023 at 10:58 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> BTW, speaking of alpha page faults - maybe I'm misreading the manual,
> but it seems to imply that interrupts are *not* disabled when entering
> page fault handler:

Yeah, I think you are right.

And we *thought* we didn't need to disable interrupts like we do on
x86, because (unlike x86), we get the fault address in an
interrupt-safe way as an argument from palcode, rather than having to
read it from a register.

But now interrupts can race with that vmalloc case.

> is not just missing local_irq_save()/local_irq_restore() around that
> fragment - if it finds pgd already present, it needs to check pte
> before deciding to proceed to no_context.

Well, the logic there is that if the pgd was already present, then
something *else* wasn't present, so no_context makes perfect sense.

But that assumption does not hold for the "we raced with an interrupt"
case, so yes, it's broken.

And as you point out, it's doubly broken because "pgd_present()"
doesn't actually do what it historically did, and what that code
*thinks* it still does.

So yeah, this looks all broken.

            Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-01 22:18               ` Al Viro
  (?)
@ 2023-02-02 22:56                 ` Peter Xu
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Xu @ 2023-02-02 22:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 10:18:11PM +0000, Al Viro wrote:
> On Wed, Feb 01, 2023 at 02:48:22PM -0500, Peter Xu wrote:
> 
> > I do also see a common pattern of the possibility to have a generic fault
> > handler like generic_page_fault().
> > 
> > It probably should start with taking the mmap_sem until providing some
> > retval that is much easier to digest further by the arch-dependent code, so
> > it can directly do something rather than parsing the bitmask in a
> > duplicated way (hence the new retval should hopefully not a bitmask anymore
> > but a "what to do").
> > 
> > Maybe it can be something like:
> > 
> > /**
> >  * enum page_fault_retval - Higher level fault retval, generalized from
> >  * vm_fault_reason above that is only used by hardware page fault handlers.
> >  * It generalizes the bitmask-versioned retval into something that the arch
> >  * dependent code should react upon.
> >  *
> >  * @PF_RET_COMPLETED:		The page fault is completed successfully
> >  * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
> >  *				(e.g., vma not found, expand_stack() fails..)
> 
> FWIW, there's a fun discrepancy - VM_FAULT_SIGSEGV may yield SEGV_MAPERR
> or SEGV_ACCERR; depends upon the architecture.  Not that there'd been
> many places that return VM_FAULT_SIGSEGV these days...  Good thing, too,
> since otherwise e.g. csky would oops...
> 
> >  * @PF_RET_ACCESS_ERR:		The page fault has access errors
> >  *				(e.g., write fault on !VM_WRITE vmas)
> >  * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
> >  *				(e.g., during copy_to_user() but fault failed?)
> >  * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
> >  * @PF_RET_SIGNAL:		The page fault encountered poisoned pages
> 
> ??
> 
> >  * ...
> >  */
> > enum page_fault_retval {
> > 	PF_RET_DONE = 0,
> > 	PF_RET_BAD_AREA,
> > 	PF_RET_ACCESS_ERR,
> > 	PF_RET_KERN_FIXUP,
> >         PF_RET_HWPOISON,
> >         PF_RET_SIGNAL,
> > 	...
> > };
> > 
> > As a start we may still want to return some more information (perhaps still
> > the vm_fault_t alongside?  Or another union that will provide different
> > information based on different PF_RET_*).  One major thing is I see how we
> > handle VM_FAULT_HWPOISON and also the fact that we encode something more
> > into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).
> > 
> > So the generic helper could, hopefully, hide the complexity of:
> > 
> >   - Taking and releasing of mmap lock
> >   - find_vma(), and also relevant checks on access or stack handling
> 
> Umm...  arm is a bit special here:
>                 if (addr < FIRST_USER_ADDRESS)
> 			return VM_FAULT_BADMAP;
> with no counterparts elsewhere.

For this specific case IIUC it's the same as bad_area.  VM_FAULT_BADMAP is
further handled later in do_page_fault() there for either arm/arm64.

This reminded me this, on how arm defines the private retvals, while the
generic ones grows and probably no one noticed they can collapse already..

#define VM_FAULT_BADMAP		0x010000
#define VM_FAULT_BADACCESS	0x020000

enum vm_fault_reason {
        ...
	VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
};

VM_FAULT_HINDEX_MASK is only used by VM_FAULT_HWPOISON_LARGE, so I think
arm[64] could expect some surprise when it hit hugetlb hwpoison pages...
maybe I should prepare a patch for arm.

> 
> >   - handle_mm_fault() itself (of course...)
> >   - detect signals
> >   - handle page fault retries (so, in the new layer of retval there should
> >     have nothing telling it to retry; it should always be the ultimate result)
> 
> agreed.
> 
>     - unlock mmap; don't leave that to caller.
> 
> >   - parse different errors into "what the arch code should do", and
> >     generalize the common ones, e.g.
> >     - OOM, do pagefault_out_of_memory() for user-mode
> >     - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
> >     - ...
> 
> AFAICS, all errors in kernel mode => no_context.
> 
> > It'll simplify things if we can unify some small details like whether the
> > -EFAULT above should contain a sigbus.
> > 
> > A trivial detail I found when I was looking at this is, x86_64 passes in
> > different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
> > there're three call sites and each of them pass over a differerent signal.
> > IIUC that will only make a difference if there's a nested page fault during
> > the vsyscall emulation (but I may be wrong too because I'm new to this
> > code), and I have no idea when it'll happen and whether that needs to be
> > strictly followed.
> 
> From my (very incomplete so far) dig through that pile:
> 	Q: do we still have the cases when handle_mm_fault() does
> not return any of VM_FAULT_COMPLETED | VM_FAULT_RETRY | VM_FAULT_ERROR?
> That gets treated as unlock + VM_FAULT_COMPLETED, but do we still need
> that?
> 	Q: can VM_FAULT_RETRY be mixed with anything in VM_FAULT_ERROR?
> What locking, if that happens?

For this one, I don't think they can be mixed.  IMHO RETRY only binds with
a wait, so if we didn't wait and found issue, we return ERROR; if we
decided to wait, we will try nothing more besides return after wait with
the RETRY.  We should just never check any error at all if the wait
happened.  Otherwise there's a bug of potential deadlock.

I'll skip some details in this email either above or below; I agree
there're so many trivial details to take care of to not break a thing.

IMHO it'll be merely impossible to merge things across most (if not to say,
all) archs.  It will need to be start from one or at least a few that still
shares a major common base - I would still rely on x86 as a start - then we
try to use the helper in as much archs as possible.

Even on x86, I do also see challenges so I'm not sure whether a common
enough routine can be abstracted indeed.  But I believe there's a way to do
this because obviously we still see tons of duplicated logics falling
around.  It may definitely need time to think out where's the best spot to
start, and how to gradually move towards covering more archs starting from
one.

Thanks,

> 	* details of storing the fault details (for ptrace, mostly)
> vary a lot; no chance to unify, AFAICS.
> 	* requirements for vma flags also differ; e.g. read fault on
> alpha is explicitly OK with absence of VM_READ if VM_WRITE is there.
> Probably should go by way of arm and pass the mask that must
> have non-empty intersection with vma->vm_flags?  Because *that*
> is very likely to be a part of ABI - mmap(2) callers that rely
> upon the flags being OK for given architecture are quite possible.
> 	* mmap lock is also quite variable in how it's taken;
> x86 and arm have fun dance with trylock/search for exception handler/etc.
> Other architectures do not; OTOH, there's a prefetch stuck in itanic
> variant, with comment about mmap_sem being performance-critical...
> 	* logics for stack expansion includes this twist:
>         if (!(vma->vm_flags & VM_GROWSDOWN))
>                 goto map_err;
>         if (user_mode(regs)) {
>                 /* Accessing the stack below usp is always a bug.  The
>                    "+ 256" is there due to some instructions doing
>                    pre-decrement on the stack and that doesn't show up
>                    until later.  */
>                 if (address + 256 < rdusp())
>                         goto map_err;
>         }
>         if (expand_stack(vma, address))
>                 goto map_err;
> That's m68k; ISTR similar considerations elsewhere, but I could be
> wrong.
> 

-- 
Peter Xu


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-02 22:56                 ` Peter Xu
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Xu @ 2023-02-02 22:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 10:18:11PM +0000, Al Viro wrote:
> On Wed, Feb 01, 2023 at 02:48:22PM -0500, Peter Xu wrote:
> 
> > I do also see a common pattern of the possibility to have a generic fault
> > handler like generic_page_fault().
> > 
> > It probably should start with taking the mmap_sem until providing some
> > retval that is much easier to digest further by the arch-dependent code, so
> > it can directly do something rather than parsing the bitmask in a
> > duplicated way (hence the new retval should hopefully not a bitmask anymore
> > but a "what to do").
> > 
> > Maybe it can be something like:
> > 
> > /**
> >  * enum page_fault_retval - Higher level fault retval, generalized from
> >  * vm_fault_reason above that is only used by hardware page fault handlers.
> >  * It generalizes the bitmask-versioned retval into something that the arch
> >  * dependent code should react upon.
> >  *
> >  * @PF_RET_COMPLETED:		The page fault is completed successfully
> >  * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
> >  *				(e.g., vma not found, expand_stack() fails..)
> 
> FWIW, there's a fun discrepancy - VM_FAULT_SIGSEGV may yield SEGV_MAPERR
> or SEGV_ACCERR; depends upon the architecture.  Not that there'd been
> many places that return VM_FAULT_SIGSEGV these days...  Good thing, too,
> since otherwise e.g. csky would oops...
> 
> >  * @PF_RET_ACCESS_ERR:		The page fault has access errors
> >  *				(e.g., write fault on !VM_WRITE vmas)
> >  * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
> >  *				(e.g., during copy_to_user() but fault failed?)
> >  * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
> >  * @PF_RET_SIGNAL:		The page fault encountered poisoned pages
> 
> ??
> 
> >  * ...
> >  */
> > enum page_fault_retval {
> > 	PF_RET_DONE = 0,
> > 	PF_RET_BAD_AREA,
> > 	PF_RET_ACCESS_ERR,
> > 	PF_RET_KERN_FIXUP,
> >         PF_RET_HWPOISON,
> >         PF_RET_SIGNAL,
> > 	...
> > };
> > 
> > As a start we may still want to return some more information (perhaps still
> > the vm_fault_t alongside?  Or another union that will provide different
> > information based on different PF_RET_*).  One major thing is I see how we
> > handle VM_FAULT_HWPOISON and also the fact that we encode something more
> > into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).
> > 
> > So the generic helper could, hopefully, hide the complexity of:
> > 
> >   - Taking and releasing of mmap lock
> >   - find_vma(), and also relevant checks on access or stack handling
> 
> Umm...  arm is a bit special here:
>                 if (addr < FIRST_USER_ADDRESS)
> 			return VM_FAULT_BADMAP;
> with no counterparts elsewhere.

For this specific case IIUC it's the same as bad_area.  VM_FAULT_BADMAP is
further handled later in do_page_fault() there for either arm/arm64.

This reminded me this, on how arm defines the private retvals, while the
generic ones grows and probably no one noticed they can collapse already..

#define VM_FAULT_BADMAP		0x010000
#define VM_FAULT_BADACCESS	0x020000

enum vm_fault_reason {
        ...
	VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
};

VM_FAULT_HINDEX_MASK is only used by VM_FAULT_HWPOISON_LARGE, so I think
arm[64] could expect some surprise when it hit hugetlb hwpoison pages...
maybe I should prepare a patch for arm.

> 
> >   - handle_mm_fault() itself (of course...)
> >   - detect signals
> >   - handle page fault retries (so, in the new layer of retval there should
> >     have nothing telling it to retry; it should always be the ultimate result)
> 
> agreed.
> 
>     - unlock mmap; don't leave that to caller.
> 
> >   - parse different errors into "what the arch code should do", and
> >     generalize the common ones, e.g.
> >     - OOM, do pagefault_out_of_memory() for user-mode
> >     - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
> >     - ...
> 
> AFAICS, all errors in kernel mode => no_context.
> 
> > It'll simplify things if we can unify some small details like whether the
> > -EFAULT above should contain a sigbus.
> > 
> > A trivial detail I found when I was looking at this is, x86_64 passes in
> > different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
> > there're three call sites and each of them pass over a differerent signal.
> > IIUC that will only make a difference if there's a nested page fault during
> > the vsyscall emulation (but I may be wrong too because I'm new to this
> > code), and I have no idea when it'll happen and whether that needs to be
> > strictly followed.
> 
> From my (very incomplete so far) dig through that pile:
> 	Q: do we still have the cases when handle_mm_fault() does
> not return any of VM_FAULT_COMPLETED | VM_FAULT_RETRY | VM_FAULT_ERROR?
> That gets treated as unlock + VM_FAULT_COMPLETED, but do we still need
> that?
> 	Q: can VM_FAULT_RETRY be mixed with anything in VM_FAULT_ERROR?
> What locking, if that happens?

For this one, I don't think they can be mixed.  IMHO RETRY only binds with
a wait, so if we didn't wait and found issue, we return ERROR; if we
decided to wait, we will try nothing more besides return after wait with
the RETRY.  We should just never check any error at all if the wait
happened.  Otherwise there's a bug of potential deadlock.

I'll skip some details in this email either above or below; I agree
there're so many trivial details to take care of to not break a thing.

IMHO it'll be merely impossible to merge things across most (if not to say,
all) archs.  It will need to be start from one or at least a few that still
shares a major common base - I would still rely on x86 as a start - then we
try to use the helper in as much archs as possible.

Even on x86, I do also see challenges so I'm not sure whether a common
enough routine can be abstracted indeed.  But I believe there's a way to do
this because obviously we still see tons of duplicated logics falling
around.  It may definitely need time to think out where's the best spot to
start, and how to gradually move towards covering more archs starting from
one.

Thanks,

> 	* details of storing the fault details (for ptrace, mostly)
> vary a lot; no chance to unify, AFAICS.
> 	* requirements for vma flags also differ; e.g. read fault on
> alpha is explicitly OK with absence of VM_READ if VM_WRITE is there.
> Probably should go by way of arm and pass the mask that must
> have non-empty intersection with vma->vm_flags?  Because *that*
> is very likely to be a part of ABI - mmap(2) callers that rely
> upon the flags being OK for given architecture are quite possible.
> 	* mmap lock is also quite variable in how it's taken;
> x86 and arm have fun dance with trylock/search for exception handler/etc.
> Other architectures do not; OTOH, there's a prefetch stuck in itanic
> variant, with comment about mmap_sem being performance-critical...
> 	* logics for stack expansion includes this twist:
>         if (!(vma->vm_flags & VM_GROWSDOWN))
>                 goto map_err;
>         if (user_mode(regs)) {
>                 /* Accessing the stack below usp is always a bug.  The
>                    "+ 256" is there due to some instructions doing
>                    pre-decrement on the stack and that doesn't show up
>                    until later.  */
>                 if (address + 256 < rdusp())
>                         goto map_err;
>         }
>         if (expand_stack(vma, address))
>                 goto map_err;
> That's m68k; ISTR similar considerations elsewhere, but I could be
> wrong.
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-02 22:56                 ` Peter Xu
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Xu @ 2023-02-02 22:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Wed, Feb 01, 2023 at 10:18:11PM +0000, Al Viro wrote:
> On Wed, Feb 01, 2023 at 02:48:22PM -0500, Peter Xu wrote:
> 
> > I do also see a common pattern of the possibility to have a generic fault
> > handler like generic_page_fault().
> > 
> > It probably should start with taking the mmap_sem until providing some
> > retval that is much easier to digest further by the arch-dependent code, so
> > it can directly do something rather than parsing the bitmask in a
> > duplicated way (hence the new retval should hopefully not a bitmask anymore
> > but a "what to do").
> > 
> > Maybe it can be something like:
> > 
> > /**
> >  * enum page_fault_retval - Higher level fault retval, generalized from
> >  * vm_fault_reason above that is only used by hardware page fault handlers.
> >  * It generalizes the bitmask-versioned retval into something that the arch
> >  * dependent code should react upon.
> >  *
> >  * @PF_RET_COMPLETED:		The page fault is completed successfully
> >  * @PF_RET_BAD_AREA:		The page fault address falls in a bad area
> >  *				(e.g., vma not found, expand_stack() fails..)
> 
> FWIW, there's a fun discrepancy - VM_FAULT_SIGSEGV may yield SEGV_MAPERR
> or SEGV_ACCERR; depends upon the architecture.  Not that there'd been
> many places that return VM_FAULT_SIGSEGV these days...  Good thing, too,
> since otherwise e.g. csky would oops...
> 
> >  * @PF_RET_ACCESS_ERR:		The page fault has access errors
> >  *				(e.g., write fault on !VM_WRITE vmas)
> >  * @PF_RET_KERN_FIXUP:		The page fault requires kernel fixups
> >  *				(e.g., during copy_to_user() but fault failed?)
> >  * @PF_RET_HWPOISON:		The page fault encountered poisoned pages
> >  * @PF_RET_SIGNAL:		The page fault encountered poisoned pages
> 
> ??
> 
> >  * ...
> >  */
> > enum page_fault_retval {
> > 	PF_RET_DONE = 0,
> > 	PF_RET_BAD_AREA,
> > 	PF_RET_ACCESS_ERR,
> > 	PF_RET_KERN_FIXUP,
> >         PF_RET_HWPOISON,
> >         PF_RET_SIGNAL,
> > 	...
> > };
> > 
> > As a start we may still want to return some more information (perhaps still
> > the vm_fault_t alongside?  Or another union that will provide different
> > information based on different PF_RET_*).  One major thing is I see how we
> > handle VM_FAULT_HWPOISON and also the fact that we encode something more
> > into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).
> > 
> > So the generic helper could, hopefully, hide the complexity of:
> > 
> >   - Taking and releasing of mmap lock
> >   - find_vma(), and also relevant checks on access or stack handling
> 
> Umm...  arm is a bit special here:
>                 if (addr < FIRST_USER_ADDRESS)
> 			return VM_FAULT_BADMAP;
> with no counterparts elsewhere.

For this specific case IIUC it's the same as bad_area.  VM_FAULT_BADMAP is
further handled later in do_page_fault() there for either arm/arm64.

This reminded me this, on how arm defines the private retvals, while the
generic ones grows and probably no one noticed they can collapse already..

#define VM_FAULT_BADMAP		0x010000
#define VM_FAULT_BADACCESS	0x020000

enum vm_fault_reason {
        ...
	VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
};

VM_FAULT_HINDEX_MASK is only used by VM_FAULT_HWPOISON_LARGE, so I think
arm[64] could expect some surprise when it hit hugetlb hwpoison pages...
maybe I should prepare a patch for arm.

> 
> >   - handle_mm_fault() itself (of course...)
> >   - detect signals
> >   - handle page fault retries (so, in the new layer of retval there should
> >     have nothing telling it to retry; it should always be the ultimate result)
> 
> agreed.
> 
>     - unlock mmap; don't leave that to caller.
> 
> >   - parse different errors into "what the arch code should do", and
> >     generalize the common ones, e.g.
> >     - OOM, do pagefault_out_of_memory() for user-mode
> >     - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
> >     - ...
> 
> AFAICS, all errors in kernel mode => no_context.
> 
> > It'll simplify things if we can unify some small details like whether the
> > -EFAULT above should contain a sigbus.
> > 
> > A trivial detail I found when I was looking at this is, x86_64 passes in
> > different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
> > there're three call sites and each of them pass over a differerent signal.
> > IIUC that will only make a difference if there's a nested page fault during
> > the vsyscall emulation (but I may be wrong too because I'm new to this
> > code), and I have no idea when it'll happen and whether that needs to be
> > strictly followed.
> 
> From my (very incomplete so far) dig through that pile:
> 	Q: do we still have the cases when handle_mm_fault() does
> not return any of VM_FAULT_COMPLETED | VM_FAULT_RETRY | VM_FAULT_ERROR?
> That gets treated as unlock + VM_FAULT_COMPLETED, but do we still need
> that?
> 	Q: can VM_FAULT_RETRY be mixed with anything in VM_FAULT_ERROR?
> What locking, if that happens?

For this one, I don't think they can be mixed.  IMHO RETRY only binds with
a wait, so if we didn't wait and found issue, we return ERROR; if we
decided to wait, we will try nothing more besides return after wait with
the RETRY.  We should just never check any error at all if the wait
happened.  Otherwise there's a bug of potential deadlock.

I'll skip some details in this email either above or below; I agree
there're so many trivial details to take care of to not break a thing.

IMHO it'll be merely impossible to merge things across most (if not to say,
all) archs.  It will need to be start from one or at least a few that still
shares a major common base - I would still rely on x86 as a start - then we
try to use the helper in as much archs as possible.

Even on x86, I do also see challenges so I'm not sure whether a common
enough routine can be abstracted indeed.  But I believe there's a way to do
this because obviously we still see tons of duplicated logics falling
around.  It may definitely need time to think out where's the best spot to
start, and how to gradually move towards covering more archs starting from
one.

Thanks,

> 	* details of storing the fault details (for ptrace, mostly)
> vary a lot; no chance to unify, AFAICS.
> 	* requirements for vma flags also differ; e.g. read fault on
> alpha is explicitly OK with absence of VM_READ if VM_WRITE is there.
> Probably should go by way of arm and pass the mask that must
> have non-empty intersection with vma->vm_flags?  Because *that*
> is very likely to be a part of ABI - mmap(2) callers that rely
> upon the flags being OK for given architecture are quite possible.
> 	* mmap lock is also quite variable in how it's taken;
> x86 and arm have fun dance with trylock/search for exception handler/etc.
> Other architectures do not; OTOH, there's a prefetch stuck in itanic
> variant, with comment about mmap_sem being performance-critical...
> 	* logics for stack expansion includes this twist:
>         if (!(vma->vm_flags & VM_GROWSDOWN))
>                 goto map_err;
>         if (user_mode(regs)) {
>                 /* Accessing the stack below usp is always a bug.  The
>                    "+ 256" is there due to some instructions doing
>                    pre-decrement on the stack and that doesn't show up
>                    until later.  */
>                 if (address + 256 < rdusp())
>                         goto map_err;
>         }
>         if (expand_stack(vma, address))
>                 goto map_err;
> That's m68k; ISTR similar considerations elsewhere, but I could be
> wrong.
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-02 22:56                 ` Peter Xu
  (?)
@ 2023-02-04  0:26                   ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-04  0:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Thu, Feb 02, 2023 at 05:56:22PM -0500, Peter Xu wrote:

> IMHO it'll be merely impossible to merge things across most (if not to say,
> all) archs.  It will need to be start from one or at least a few that still
> shares a major common base - I would still rely on x86 as a start - then we
> try to use the helper in as much archs as possible.
> 
> Even on x86, I do also see challenges so I'm not sure whether a common
> enough routine can be abstracted indeed.  But I believe there's a way to do
> this because obviously we still see tons of duplicated logics falling
> around.  It may definitely need time to think out where's the best spot to
> start, and how to gradually move towards covering more archs starting from
> one.

FWIW, after going through everything from alpha to loongarch (in alphabetic
order, skipping the itanic) the following seems to be suitable for all of
them:

generic_fault(address, flags, vm_flags, regs)
{
	struct mm_struct *mm = current->mm;
	struct vm_area_struct *vma;
	vm_fault_t fault;

	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

	if (unlikely(!mmap_read_trylock(mm))) {
		if (!(flags & FAULT_FLAG_USER) &&
		    !search_exception_tables(instruction_pointer(regs))) {
			/*
			 * Fault from code in kernel from
			 * which we do not expect faults.
			 */
			return KERN;
		}
retry:
		mmap_read_lock(mm);
	} else {
		might_sleep();
#ifdef CONFIG_DEBUG_VM
		if (!(flags & FAULT_FLAG_USER) &&
		    !search_exception_tables(instruction_pointer(regs)))
			return KERN;
#endif
	}
	vma = find_vma(mm, address);
	if (!vma)
		goto Eunmapped;
	if (unlikely(vma->vm_start > address)) {
		if (!(vma->vm_flags & VM_GROWSDOWN))
			goto Eunmapped;
		if (addr < FIRST_USER_ADDRESS)
			goto Eunmapped;
		if (expand_stack(vma, address))
			goto Eunmapped;
	}

	/* Ok, we have a good vm_area for this memory access, so
	   we can handle it.  */
	if (!(vma->vm_flags & vm_flags))
		goto Eaccess;

	/* If for any reason at all we couldn't handle the fault,
	   make sure we exit gracefully rather than endlessly redo
	   the fault.  */
	fault = handle_mm_fault(vma, address, flags, regs);

	if (unlikely(fault & VM_FAULT_RETRY)) {
		if (!(flags & FAULT_FLAG_USER)) {
			if (fatal_signal_pending(current))
				return KERN;
		} else {
			if (signal_pending(current))
				return FOAD;
		}
		flags |= FAULT_FLAG_TRIED;
		goto retry;
	}

	if (fault & VM_FAULT_COMPLETED)
		return DONE;

	mmap_read_unlock(mm);

	if (likely(!(fault & VM_FAULT_ERROR)))
		return DONE;

	if (!(flags & FAULT_FLAG_USER))
		return KERN;

	if (fault & VM_FAULT_OOM) {
		pagefault_out_of_memory();
		return FOAD;
	}

	if (fault & VM_FAULT_SIGSEGV)
		return SIGSEGV;

	if (fault & VM_FAULT_SIGBUS)
		return SIGBUS;

	if (fault & VM_FAULT_HWPOISON)
		return POISON + PAGE_SHIFT;	// POISON == 256

	if (fault & VM_FAULT_HWPOISON_LARGE)
		return POISON + hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));

	BUG();

Eunmapped:
	mmap_read_unlock(mm);
	return flags & FAULT_FLAG_USER ? MAPERR : KERN;
Eaccess:
	mmap_read_unlock(mm);
	return flags & FAULT_FLAG_USER ? ACCERR : KERN;
}

possible return values (and that's obviously not the identifiers to be
used for real; for now I'm just looking for feasibility of it all):
	DONE		success, nothing else to be done
	FOAD		OOM/fatal signal with VM_FAULT_RETRY/
			signal with VM_FAULT_RETRY from userland - nothing
			to be done here.
	KERN		kernel mode failed fault, fixup or oops
	MAPERR		unmapped address, SIGSEGV/SEGV_MAPERR for you
	ACCERR		nothing in vm_flags present in ->vm_flags of vma;
			SIGSEGV/SEGV_ACCERR
	SIGSEGV		VM_FAULT_SIGSEGV; some architectures treat that
			as SEGV_MAPERR, some as SEGV_ACCERR.
	SIGBUS		VM_FAULT_SIGBUS; SIGBUS/BUS_ADRERR
	POISON + shift	VM_FAULT_HWPOISON and VM_FAULT_HWPOISON_LARGE, with
			log2(affected page size) encoded into return value.

This is obviously not even close to final helper, but... alpha, arc, arm, arm64,
csky, hexagon, loongarch convert to that cleanly.

Itanic very much does not (due to weird dual stacks, awful address space layout,
etc.), but then git rm arch/ia64 is long overdue.

Fairly typical look after conversion:

arc: 
{
	struct task_struct *tsk = current;
	struct mm_struct *mm = tsk->mm;
	unsigned int mask;
	unsigned int flags;
	unsigned int res;

	/*
	 * NOTE! We MUST NOT take any locks for this case. We may
	 * be in an interrupt or a critical region, and should
	 * only copy the information from the master page table,
	 * nothing more.
	 */
	if (address >= VMALLOC_START && !user_mode(regs)) {
		if (unlikely(handle_kernel_vaddr_fault(address)))
			goto no_context;
		else
			return;
	}

	/*
	 * If we're in an interrupt or have no user
	 * context, we must not take the fault..
	 */
	if (faulthandler_disabled() || !mm)
		goto no_context;

	flags = FAULT_FLAG_DEFAULT;
	if (user_mode(regs))
		flags |= FAULT_FLAG_USER;
	mask = VM_READ;
	if (regs->ecr_cause & ECR_C_PROTV_STORE) {	/* ST/EX */
		flags |= FAULT_FLAG_WRITE;
		mask = VM_WRITE;
	} else if ((regs->ecr_vec == ECR_V_PROTV) &&
	         (regs->ecr_cause == ECR_C_PROTV_INST_FETCH)) {
		mask = VM_EXEC;
	}

	res = generic_fault(address, flags, mask, regs);
	if (likely(res == DONE))
		return;
	if (res == FOAD)
		return;
	if (res == KERN) {
no_context:
		if (fixup_exception(regs))
			return;
		die("Oops", regs, address);
	}

	tsk->thread.fault_address = address;
	if (res == SIGBUS)
		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *) address);
	else
		force_sig_fault(SIGSEGV, res == ACCERR ? SEGV_ACCERR : SEGV_MAPERR,
				(void __user *) address);
}

Or this arm64:

{
	const struct fault_info *inf;
	struct mm_struct *mm = current->mm;
	unsigned long vm_flags;
	unsigned int mm_flags = FAULT_FLAG_DEFAULT;
	unsigned long addr = untagged_addr(far);
	unsigned int res;

	if (kprobe_page_fault(regs, esr))
		return 0;

	/*
	 * If we're in an interrupt or have no user context, we must not take
	 * the fault.
	 */
	if (faulthandler_disabled() || !mm)
		goto no_context;

	if (user_mode(regs))
		mm_flags |= FAULT_FLAG_USER;

	/*
	 * vm_flags tells us what bits we must have in vma->vm_flags
	 * for the fault to be benign, __do_page_fault() would check
	 * vma->vm_flags & vm_flags and returns an error if the
	 * intersection is empty
	 */
	if (is_el0_instruction_abort(esr)) {
		/* It was exec fault */
		vm_flags = VM_EXEC;
		mm_flags |= FAULT_FLAG_INSTRUCTION;
	} else if (is_write_abort(esr)) {
		/* It was write fault */
		vm_flags = VM_WRITE;
		mm_flags |= FAULT_FLAG_WRITE;
	} else {
		/* It was read fault */
		vm_flags = VM_READ;
		/* Write implies read */
		vm_flags |= VM_WRITE;
		/* If EPAN is absent then exec implies read */
		if (!cpus_have_const_cap(ARM64_HAS_EPAN))
			vm_flags |= VM_EXEC;
	}

	if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) {
		if (is_el1_instruction_abort(esr))
			die_kernel_fault("execution of user memory",
					 addr, esr, regs);

		if (!search_exception_tables(regs->pc))
			die_kernel_fault("access to user memory outside uaccess routines",
					 addr, esr, regs);
	}

	res = generic_fault(addr, mm_flags, vm_flags, regs);
	if (likely(res == DONE))
		return 0;
	if (res == FOAD)
		return 0;
	if (res == KERN) {
no_context:
		__do_kernel_fault(addr, esr, regs);
		return 0;
	}
	inf = esr_to_fault_info(esr);
	set_thread_esr(addr, esr);
	if (res == SIGBUS) {
		/*
		 * We had some memory, but were unable to successfully fix up
		 * this page fault.
		 */
		arm64_force_sig_fault(SIGBUS, BUS_ADRERR, far, inf->name);
	} else if (res > POISON) {
		arm64_force_sig_mceerr(BUS_MCEERR_AR, far, res - POISON, inf->name);
	} else {
		/*
		 * Something tried to access memory that isn't in our memory
		 * map.
		 */
		arm64_force_sig_fault(SIGSEGV,
				      res == ACCERR ? SEGV_ACCERR : SEGV_MAPERR,
				      far, inf->name);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-04  0:26                   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-04  0:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Thu, Feb 02, 2023 at 05:56:22PM -0500, Peter Xu wrote:

> IMHO it'll be merely impossible to merge things across most (if not to say,
> all) archs.  It will need to be start from one or at least a few that still
> shares a major common base - I would still rely on x86 as a start - then we
> try to use the helper in as much archs as possible.
> 
> Even on x86, I do also see challenges so I'm not sure whether a common
> enough routine can be abstracted indeed.  But I believe there's a way to do
> this because obviously we still see tons of duplicated logics falling
> around.  It may definitely need time to think out where's the best spot to
> start, and how to gradually move towards covering more archs starting from
> one.

FWIW, after going through everything from alpha to loongarch (in alphabetic
order, skipping the itanic) the following seems to be suitable for all of
them:

generic_fault(address, flags, vm_flags, regs)
{
	struct mm_struct *mm = current->mm;
	struct vm_area_struct *vma;
	vm_fault_t fault;

	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

	if (unlikely(!mmap_read_trylock(mm))) {
		if (!(flags & FAULT_FLAG_USER) &&
		    !search_exception_tables(instruction_pointer(regs))) {
			/*
			 * Fault from code in kernel from
			 * which we do not expect faults.
			 */
			return KERN;
		}
retry:
		mmap_read_lock(mm);
	} else {
		might_sleep();
#ifdef CONFIG_DEBUG_VM
		if (!(flags & FAULT_FLAG_USER) &&
		    !search_exception_tables(instruction_pointer(regs)))
			return KERN;
#endif
	}
	vma = find_vma(mm, address);
	if (!vma)
		goto Eunmapped;
	if (unlikely(vma->vm_start > address)) {
		if (!(vma->vm_flags & VM_GROWSDOWN))
			goto Eunmapped;
		if (addr < FIRST_USER_ADDRESS)
			goto Eunmapped;
		if (expand_stack(vma, address))
			goto Eunmapped;
	}

	/* Ok, we have a good vm_area for this memory access, so
	   we can handle it.  */
	if (!(vma->vm_flags & vm_flags))
		goto Eaccess;

	/* If for any reason at all we couldn't handle the fault,
	   make sure we exit gracefully rather than endlessly redo
	   the fault.  */
	fault = handle_mm_fault(vma, address, flags, regs);

	if (unlikely(fault & VM_FAULT_RETRY)) {
		if (!(flags & FAULT_FLAG_USER)) {
			if (fatal_signal_pending(current))
				return KERN;
		} else {
			if (signal_pending(current))
				return FOAD;
		}
		flags |= FAULT_FLAG_TRIED;
		goto retry;
	}

	if (fault & VM_FAULT_COMPLETED)
		return DONE;

	mmap_read_unlock(mm);

	if (likely(!(fault & VM_FAULT_ERROR)))
		return DONE;

	if (!(flags & FAULT_FLAG_USER))
		return KERN;

	if (fault & VM_FAULT_OOM) {
		pagefault_out_of_memory();
		return FOAD;
	}

	if (fault & VM_FAULT_SIGSEGV)
		return SIGSEGV;

	if (fault & VM_FAULT_SIGBUS)
		return SIGBUS;

	if (fault & VM_FAULT_HWPOISON)
		return POISON + PAGE_SHIFT;	// POISON == 256

	if (fault & VM_FAULT_HWPOISON_LARGE)
		return POISON + hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));

	BUG();

Eunmapped:
	mmap_read_unlock(mm);
	return flags & FAULT_FLAG_USER ? MAPERR : KERN;
Eaccess:
	mmap_read_unlock(mm);
	return flags & FAULT_FLAG_USER ? ACCERR : KERN;
}

possible return values (and that's obviously not the identifiers to be
used for real; for now I'm just looking for feasibility of it all):
	DONE		success, nothing else to be done
	FOAD		OOM/fatal signal with VM_FAULT_RETRY/
			signal with VM_FAULT_RETRY from userland - nothing
			to be done here.
	KERN		kernel mode failed fault, fixup or oops
	MAPERR		unmapped address, SIGSEGV/SEGV_MAPERR for you
	ACCERR		nothing in vm_flags present in ->vm_flags of vma;
			SIGSEGV/SEGV_ACCERR
	SIGSEGV		VM_FAULT_SIGSEGV; some architectures treat that
			as SEGV_MAPERR, some as SEGV_ACCERR.
	SIGBUS		VM_FAULT_SIGBUS; SIGBUS/BUS_ADRERR
	POISON + shift	VM_FAULT_HWPOISON and VM_FAULT_HWPOISON_LARGE, with
			log2(affected page size) encoded into return value.

This is obviously not even close to final helper, but... alpha, arc, arm, arm64,
csky, hexagon, loongarch convert to that cleanly.

Itanic very much does not (due to weird dual stacks, awful address space layout,
etc.), but then git rm arch/ia64 is long overdue.

Fairly typical look after conversion:

arc: 
{
	struct task_struct *tsk = current;
	struct mm_struct *mm = tsk->mm;
	unsigned int mask;
	unsigned int flags;
	unsigned int res;

	/*
	 * NOTE! We MUST NOT take any locks for this case. We may
	 * be in an interrupt or a critical region, and should
	 * only copy the information from the master page table,
	 * nothing more.
	 */
	if (address >= VMALLOC_START && !user_mode(regs)) {
		if (unlikely(handle_kernel_vaddr_fault(address)))
			goto no_context;
		else
			return;
	}

	/*
	 * If we're in an interrupt or have no user
	 * context, we must not take the fault..
	 */
	if (faulthandler_disabled() || !mm)
		goto no_context;

	flags = FAULT_FLAG_DEFAULT;
	if (user_mode(regs))
		flags |= FAULT_FLAG_USER;
	mask = VM_READ;
	if (regs->ecr_cause & ECR_C_PROTV_STORE) {	/* ST/EX */
		flags |= FAULT_FLAG_WRITE;
		mask = VM_WRITE;
	} else if ((regs->ecr_vec == ECR_V_PROTV) &&
	         (regs->ecr_cause == ECR_C_PROTV_INST_FETCH)) {
		mask = VM_EXEC;
	}

	res = generic_fault(address, flags, mask, regs);
	if (likely(res == DONE))
		return;
	if (res == FOAD)
		return;
	if (res == KERN) {
no_context:
		if (fixup_exception(regs))
			return;
		die("Oops", regs, address);
	}

	tsk->thread.fault_address = address;
	if (res == SIGBUS)
		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *) address);
	else
		force_sig_fault(SIGSEGV, res == ACCERR ? SEGV_ACCERR : SEGV_MAPERR,
				(void __user *) address);
}

Or this arm64:

{
	const struct fault_info *inf;
	struct mm_struct *mm = current->mm;
	unsigned long vm_flags;
	unsigned int mm_flags = FAULT_FLAG_DEFAULT;
	unsigned long addr = untagged_addr(far);
	unsigned int res;

	if (kprobe_page_fault(regs, esr))
		return 0;

	/*
	 * If we're in an interrupt or have no user context, we must not take
	 * the fault.
	 */
	if (faulthandler_disabled() || !mm)
		goto no_context;

	if (user_mode(regs))
		mm_flags |= FAULT_FLAG_USER;

	/*
	 * vm_flags tells us what bits we must have in vma->vm_flags
	 * for the fault to be benign, __do_page_fault() would check
	 * vma->vm_flags & vm_flags and returns an error if the
	 * intersection is empty
	 */
	if (is_el0_instruction_abort(esr)) {
		/* It was exec fault */
		vm_flags = VM_EXEC;
		mm_flags |= FAULT_FLAG_INSTRUCTION;
	} else if (is_write_abort(esr)) {
		/* It was write fault */
		vm_flags = VM_WRITE;
		mm_flags |= FAULT_FLAG_WRITE;
	} else {
		/* It was read fault */
		vm_flags = VM_READ;
		/* Write implies read */
		vm_flags |= VM_WRITE;
		/* If EPAN is absent then exec implies read */
		if (!cpus_have_const_cap(ARM64_HAS_EPAN))
			vm_flags |= VM_EXEC;
	}

	if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) {
		if (is_el1_instruction_abort(esr))
			die_kernel_fault("execution of user memory",
					 addr, esr, regs);

		if (!search_exception_tables(regs->pc))
			die_kernel_fault("access to user memory outside uaccess routines",
					 addr, esr, regs);
	}

	res = generic_fault(addr, mm_flags, vm_flags, regs);
	if (likely(res == DONE))
		return 0;
	if (res == FOAD)
		return 0;
	if (res == KERN) {
no_context:
		__do_kernel_fault(addr, esr, regs);
		return 0;
	}
	inf = esr_to_fault_info(esr);
	set_thread_esr(addr, esr);
	if (res == SIGBUS) {
		/*
		 * We had some memory, but were unable to successfully fix up
		 * this page fault.
		 */
		arm64_force_sig_fault(SIGBUS, BUS_ADRERR, far, inf->name);
	} else if (res > POISON) {
		arm64_force_sig_mceerr(BUS_MCEERR_AR, far, res - POISON, inf->name);
	} else {
		/*
		 * Something tried to access memory that isn't in our memory
		 * map.
		 */
		arm64_force_sig_fault(SIGSEGV,
				      res == ACCERR ? SEGV_ACCERR : SEGV_MAPERR,
				      far, inf->name);
	}

	return 0;
}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-04  0:26                   ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-04  0:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Thu, Feb 02, 2023 at 05:56:22PM -0500, Peter Xu wrote:

> IMHO it'll be merely impossible to merge things across most (if not to say,
> all) archs.  It will need to be start from one or at least a few that still
> shares a major common base - I would still rely on x86 as a start - then we
> try to use the helper in as much archs as possible.
> 
> Even on x86, I do also see challenges so I'm not sure whether a common
> enough routine can be abstracted indeed.  But I believe there's a way to do
> this because obviously we still see tons of duplicated logics falling
> around.  It may definitely need time to think out where's the best spot to
> start, and how to gradually move towards covering more archs starting from
> one.

FWIW, after going through everything from alpha to loongarch (in alphabetic
order, skipping the itanic) the following seems to be suitable for all of
them:

generic_fault(address, flags, vm_flags, regs)
{
	struct mm_struct *mm = current->mm;
	struct vm_area_struct *vma;
	vm_fault_t fault;

	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

	if (unlikely(!mmap_read_trylock(mm))) {
		if (!(flags & FAULT_FLAG_USER) &&
		    !search_exception_tables(instruction_pointer(regs))) {
			/*
			 * Fault from code in kernel from
			 * which we do not expect faults.
			 */
			return KERN;
		}
retry:
		mmap_read_lock(mm);
	} else {
		might_sleep();
#ifdef CONFIG_DEBUG_VM
		if (!(flags & FAULT_FLAG_USER) &&
		    !search_exception_tables(instruction_pointer(regs)))
			return KERN;
#endif
	}
	vma = find_vma(mm, address);
	if (!vma)
		goto Eunmapped;
	if (unlikely(vma->vm_start > address)) {
		if (!(vma->vm_flags & VM_GROWSDOWN))
			goto Eunmapped;
		if (addr < FIRST_USER_ADDRESS)
			goto Eunmapped;
		if (expand_stack(vma, address))
			goto Eunmapped;
	}

	/* Ok, we have a good vm_area for this memory access, so
	   we can handle it.  */
	if (!(vma->vm_flags & vm_flags))
		goto Eaccess;

	/* If for any reason at all we couldn't handle the fault,
	   make sure we exit gracefully rather than endlessly redo
	   the fault.  */
	fault = handle_mm_fault(vma, address, flags, regs);

	if (unlikely(fault & VM_FAULT_RETRY)) {
		if (!(flags & FAULT_FLAG_USER)) {
			if (fatal_signal_pending(current))
				return KERN;
		} else {
			if (signal_pending(current))
				return FOAD;
		}
		flags |= FAULT_FLAG_TRIED;
		goto retry;
	}

	if (fault & VM_FAULT_COMPLETED)
		return DONE;

	mmap_read_unlock(mm);

	if (likely(!(fault & VM_FAULT_ERROR)))
		return DONE;

	if (!(flags & FAULT_FLAG_USER))
		return KERN;

	if (fault & VM_FAULT_OOM) {
		pagefault_out_of_memory();
		return FOAD;
	}

	if (fault & VM_FAULT_SIGSEGV)
		return SIGSEGV;

	if (fault & VM_FAULT_SIGBUS)
		return SIGBUS;

	if (fault & VM_FAULT_HWPOISON)
		return POISON + PAGE_SHIFT;	// POISON = 256

	if (fault & VM_FAULT_HWPOISON_LARGE)
		return POISON + hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));

	BUG();

Eunmapped:
	mmap_read_unlock(mm);
	return flags & FAULT_FLAG_USER ? MAPERR : KERN;
Eaccess:
	mmap_read_unlock(mm);
	return flags & FAULT_FLAG_USER ? ACCERR : KERN;
}

possible return values (and that's obviously not the identifiers to be
used for real; for now I'm just looking for feasibility of it all):
	DONE		success, nothing else to be done
	FOAD		OOM/fatal signal with VM_FAULT_RETRY/
			signal with VM_FAULT_RETRY from userland - nothing
			to be done here.
	KERN		kernel mode failed fault, fixup or oops
	MAPERR		unmapped address, SIGSEGV/SEGV_MAPERR for you
	ACCERR		nothing in vm_flags present in ->vm_flags of vma;
			SIGSEGV/SEGV_ACCERR
	SIGSEGV		VM_FAULT_SIGSEGV; some architectures treat that
			as SEGV_MAPERR, some as SEGV_ACCERR.
	SIGBUS		VM_FAULT_SIGBUS; SIGBUS/BUS_ADRERR
	POISON + shift	VM_FAULT_HWPOISON and VM_FAULT_HWPOISON_LARGE, with
			log2(affected page size) encoded into return value.

This is obviously not even close to final helper, but... alpha, arc, arm, arm64,
csky, hexagon, loongarch convert to that cleanly.

Itanic very much does not (due to weird dual stacks, awful address space layout,
etc.), but then git rm arch/ia64 is long overdue.

Fairly typical look after conversion:

arc: 
{
	struct task_struct *tsk = current;
	struct mm_struct *mm = tsk->mm;
	unsigned int mask;
	unsigned int flags;
	unsigned int res;

	/*
	 * NOTE! We MUST NOT take any locks for this case. We may
	 * be in an interrupt or a critical region, and should
	 * only copy the information from the master page table,
	 * nothing more.
	 */
	if (address >= VMALLOC_START && !user_mode(regs)) {
		if (unlikely(handle_kernel_vaddr_fault(address)))
			goto no_context;
		else
			return;
	}

	/*
	 * If we're in an interrupt or have no user
	 * context, we must not take the fault..
	 */
	if (faulthandler_disabled() || !mm)
		goto no_context;

	flags = FAULT_FLAG_DEFAULT;
	if (user_mode(regs))
		flags |= FAULT_FLAG_USER;
	mask = VM_READ;
	if (regs->ecr_cause & ECR_C_PROTV_STORE) {	/* ST/EX */
		flags |= FAULT_FLAG_WRITE;
		mask = VM_WRITE;
	} else if ((regs->ecr_vec = ECR_V_PROTV) &&
	         (regs->ecr_cause = ECR_C_PROTV_INST_FETCH)) {
		mask = VM_EXEC;
	}

	res = generic_fault(address, flags, mask, regs);
	if (likely(res = DONE))
		return;
	if (res = FOAD)
		return;
	if (res = KERN) {
no_context:
		if (fixup_exception(regs))
			return;
		die("Oops", regs, address);
	}

	tsk->thread.fault_address = address;
	if (res = SIGBUS)
		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *) address);
	else
		force_sig_fault(SIGSEGV, res = ACCERR ? SEGV_ACCERR : SEGV_MAPERR,
				(void __user *) address);
}

Or this arm64:

{
	const struct fault_info *inf;
	struct mm_struct *mm = current->mm;
	unsigned long vm_flags;
	unsigned int mm_flags = FAULT_FLAG_DEFAULT;
	unsigned long addr = untagged_addr(far);
	unsigned int res;

	if (kprobe_page_fault(regs, esr))
		return 0;

	/*
	 * If we're in an interrupt or have no user context, we must not take
	 * the fault.
	 */
	if (faulthandler_disabled() || !mm)
		goto no_context;

	if (user_mode(regs))
		mm_flags |= FAULT_FLAG_USER;

	/*
	 * vm_flags tells us what bits we must have in vma->vm_flags
	 * for the fault to be benign, __do_page_fault() would check
	 * vma->vm_flags & vm_flags and returns an error if the
	 * intersection is empty
	 */
	if (is_el0_instruction_abort(esr)) {
		/* It was exec fault */
		vm_flags = VM_EXEC;
		mm_flags |= FAULT_FLAG_INSTRUCTION;
	} else if (is_write_abort(esr)) {
		/* It was write fault */
		vm_flags = VM_WRITE;
		mm_flags |= FAULT_FLAG_WRITE;
	} else {
		/* It was read fault */
		vm_flags = VM_READ;
		/* Write implies read */
		vm_flags |= VM_WRITE;
		/* If EPAN is absent then exec implies read */
		if (!cpus_have_const_cap(ARM64_HAS_EPAN))
			vm_flags |= VM_EXEC;
	}

	if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) {
		if (is_el1_instruction_abort(esr))
			die_kernel_fault("execution of user memory",
					 addr, esr, regs);

		if (!search_exception_tables(regs->pc))
			die_kernel_fault("access to user memory outside uaccess routines",
					 addr, esr, regs);
	}

	res = generic_fault(addr, mm_flags, vm_flags, regs);
	if (likely(res = DONE))
		return 0;
	if (res = FOAD)
		return 0;
	if (res = KERN) {
no_context:
		__do_kernel_fault(addr, esr, regs);
		return 0;
	}
	inf = esr_to_fault_info(esr);
	set_thread_esr(addr, esr);
	if (res = SIGBUS) {
		/*
		 * We had some memory, but were unable to successfully fix up
		 * this page fault.
		 */
		arm64_force_sig_fault(SIGBUS, BUS_ADRERR, far, inf->name);
	} else if (res > POISON) {
		arm64_force_sig_mceerr(BUS_MCEERR_AR, far, res - POISON, inf->name);
	} else {
		/*
		 * Something tried to access memory that isn't in our memory
		 * map.
		 */
		arm64_force_sig_fault(SIGSEGV,
				      res = ACCERR ? SEGV_ACCERR : SEGV_MAPERR,
				      far, inf->name);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [loongarch oddities] Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-01-31 21:49         ` Al Viro
  (?)
  (?)
@ 2023-02-04  0:47         ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-04  0:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Peter Xu, linux-arch, Huacai Chen

	Speaking of really weird stuff, on loongarch we have
this:
        /*
         * We fault-in kernel-space virtual memory on-demand. The  
         * 'reference' page table is init_mm.pgd.
         *
         * NOTE! We MUST NOT take any locks for this case. We may
         * be in an interrupt or a critical region, and should
         * only copy the information from the master page table,
         * nothing more.
         */
        if (address & __UA_LIMIT) {
                if (!user_mode(regs))
                        no_context(regs, address);
                else   
                        do_sigsegv(regs, write, address, si_code);
                return;
        }

That looks as if it had started off as usual vmalloc page tables
propagation in #PF, but... it obviously does nothing of that sort.
What's going on there and why do we even bother?  After all,
we'd better have no vma above that address, so the normal logics
would seem to be fine with that case...

Another really weird part there is
        /*
         * If we're in an interrupt or have no user
         * context, we must not take the fault..
         */
        if (faulthandler_disabled() || !mm) {
                do_sigsegv(regs, write, address, si_code);
                return;
        }
There should be no way to have this condition for userland
page fault and do_sigsegv() is starting with
        /* Kernel mode? Handle exceptions or die */
        if (!user_mode(regs)) {
                no_context(regs, address);  
                return;
        }
	<force SIGSEGV>
What's wrong with just
        if (faulthandler_disabled() || !mm) {
		no_context(regs, address);
		return;
	}
Am I missing something subtle here?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-04  0:26                   ` Al Viro
  (?)
@ 2023-02-05  5:10                     ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-05  5:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Sat, Feb 04, 2023 at 12:26:39AM +0000, Al Viro wrote:
> On Thu, Feb 02, 2023 at 05:56:22PM -0500, Peter Xu wrote:
> 
> > IMHO it'll be merely impossible to merge things across most (if not to say,
> > all) archs.  It will need to be start from one or at least a few that still
> > shares a major common base - I would still rely on x86 as a start - then we
> > try to use the helper in as much archs as possible.
> > 
> > Even on x86, I do also see challenges so I'm not sure whether a common
> > enough routine can be abstracted indeed.  But I believe there's a way to do
> > this because obviously we still see tons of duplicated logics falling
> > around.  It may definitely need time to think out where's the best spot to
> > start, and how to gradually move towards covering more archs starting from
> > one.
> 
> FWIW, after going through everything from alpha to loongarch (in alphabetic
> order, skipping the itanic) the following seems to be suitable for all of
> them:
> 
> generic_fault(address, flags, vm_flags, regs)
[snip]

After looking through other architectures: that needs changes.

AFAICS, the right approach would be to add a pointer to (uninitialized)
kernel_siginfo.  And let it pass the signal number, etc. through that.
That way all "we want to raise a signal" return values fold together.
*IF* the page fault wants to do something extra on SIGSEGV, but not on
SIGBUS (we have several such), it can key that upon info.si_signo.

Speaking of SIGSEGV, there's a fun situation with VM_FAULT_SIGSEGV:

1) Only x86 and ppc generate VM_FAULT_SIGSEGV in handle_mm_fault()
without hitting WARN_ON_ONCE().  And neither actually returns it
to page fault handler - the conditions that would've led to that
return value (pkey stuff) are checked in the page fault handler
and handle_mm_fault() is not called in such cases.

2) on alpha, hexagon, itanic, loongarch, microblaze, mips, nios2,
openrisc, sparc, um and xtensa #PF handler would end up with SEGV_ACCERR
if it saw VM_FAULT_SIGNAL.

3) on arm, arm64, arc, m68k, powerpc, s390, sh and x86 - SEGV_MAPERR
(except that neither x86 nor powerpc #PF ever see it)

4) on csky and riscv it would end up with BUG()

5) on parisc it's complicated^Wbroken - it tries to decide which
one to use after having unlocked mmap and looking at vma in process.

As it is, VM_FAULT_SIGSEGV had ended up as "we need to report some
error to get_user_pages() and similar callers in cases when
page fault handler would've detected pkey problem and refused
to call handle_mm_fault()", with several places where it's
"we had been called with bogus arguments; printk and return
something to indicate the things had gone wrong".  It used to
have other uses, but this is what it had become now - grep and
you'll see.

AFAICS, all checks for it in page fault handlers are pretty much
dead code...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-05  5:10                     ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-05  5:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Sat, Feb 04, 2023 at 12:26:39AM +0000, Al Viro wrote:
> On Thu, Feb 02, 2023 at 05:56:22PM -0500, Peter Xu wrote:
> 
> > IMHO it'll be merely impossible to merge things across most (if not to say,
> > all) archs.  It will need to be start from one or at least a few that still
> > shares a major common base - I would still rely on x86 as a start - then we
> > try to use the helper in as much archs as possible.
> > 
> > Even on x86, I do also see challenges so I'm not sure whether a common
> > enough routine can be abstracted indeed.  But I believe there's a way to do
> > this because obviously we still see tons of duplicated logics falling
> > around.  It may definitely need time to think out where's the best spot to
> > start, and how to gradually move towards covering more archs starting from
> > one.
> 
> FWIW, after going through everything from alpha to loongarch (in alphabetic
> order, skipping the itanic) the following seems to be suitable for all of
> them:
> 
> generic_fault(address, flags, vm_flags, regs)
[snip]

After looking through other architectures: that needs changes.

AFAICS, the right approach would be to add a pointer to (uninitialized)
kernel_siginfo.  And let it pass the signal number, etc. through that.
That way all "we want to raise a signal" return values fold together.
*IF* the page fault wants to do something extra on SIGSEGV, but not on
SIGBUS (we have several such), it can key that upon info.si_signo.

Speaking of SIGSEGV, there's a fun situation with VM_FAULT_SIGSEGV:

1) Only x86 and ppc generate VM_FAULT_SIGSEGV in handle_mm_fault()
without hitting WARN_ON_ONCE().  And neither actually returns it
to page fault handler - the conditions that would've led to that
return value (pkey stuff) are checked in the page fault handler
and handle_mm_fault() is not called in such cases.

2) on alpha, hexagon, itanic, loongarch, microblaze, mips, nios2,
openrisc, sparc, um and xtensa #PF handler would end up with SEGV_ACCERR
if it saw VM_FAULT_SIGNAL.

3) on arm, arm64, arc, m68k, powerpc, s390, sh and x86 - SEGV_MAPERR
(except that neither x86 nor powerpc #PF ever see it)

4) on csky and riscv it would end up with BUG()

5) on parisc it's complicated^Wbroken - it tries to decide which
one to use after having unlocked mmap and looking at vma in process.

As it is, VM_FAULT_SIGSEGV had ended up as "we need to report some
error to get_user_pages() and similar callers in cases when
page fault handler would've detected pkey problem and refused
to call handle_mm_fault()", with several places where it's
"we had been called with bogus arguments; printk and return
something to indicate the things had gone wrong".  It used to
have other uses, but this is what it had become now - grep and
you'll see.

AFAICS, all checks for it in page fault handlers are pretty much
dead code...

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-05  5:10                     ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-05  5:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linus Torvalds, linux-arch, linux-alpha, linux-ia64,
	linux-hexagon, linux-m68k, Michal Simek, Dinh Nguyen, openrisc,
	linux-parisc, linux-riscv, sparclinux

On Sat, Feb 04, 2023 at 12:26:39AM +0000, Al Viro wrote:
> On Thu, Feb 02, 2023 at 05:56:22PM -0500, Peter Xu wrote:
> 
> > IMHO it'll be merely impossible to merge things across most (if not to say,
> > all) archs.  It will need to be start from one or at least a few that still
> > shares a major common base - I would still rely on x86 as a start - then we
> > try to use the helper in as much archs as possible.
> > 
> > Even on x86, I do also see challenges so I'm not sure whether a common
> > enough routine can be abstracted indeed.  But I believe there's a way to do
> > this because obviously we still see tons of duplicated logics falling
> > around.  It may definitely need time to think out where's the best spot to
> > start, and how to gradually move towards covering more archs starting from
> > one.
> 
> FWIW, after going through everything from alpha to loongarch (in alphabetic
> order, skipping the itanic) the following seems to be suitable for all of
> them:
> 
> generic_fault(address, flags, vm_flags, regs)
[snip]

After looking through other architectures: that needs changes.

AFAICS, the right approach would be to add a pointer to (uninitialized)
kernel_siginfo.  And let it pass the signal number, etc. through that.
That way all "we want to raise a signal" return values fold together.
*IF* the page fault wants to do something extra on SIGSEGV, but not on
SIGBUS (we have several such), it can key that upon info.si_signo.

Speaking of SIGSEGV, there's a fun situation with VM_FAULT_SIGSEGV:

1) Only x86 and ppc generate VM_FAULT_SIGSEGV in handle_mm_fault()
without hitting WARN_ON_ONCE().  And neither actually returns it
to page fault handler - the conditions that would've led to that
return value (pkey stuff) are checked in the page fault handler
and handle_mm_fault() is not called in such cases.

2) on alpha, hexagon, itanic, loongarch, microblaze, mips, nios2,
openrisc, sparc, um and xtensa #PF handler would end up with SEGV_ACCERR
if it saw VM_FAULT_SIGNAL.

3) on arm, arm64, arc, m68k, powerpc, s390, sh and x86 - SEGV_MAPERR
(except that neither x86 nor powerpc #PF ever see it)

4) on csky and riscv it would end up with BUG()

5) on parisc it's complicated^Wbroken - it tries to decide which
one to use after having unlocked mmap and looking at vma in process.

As it is, VM_FAULT_SIGSEGV had ended up as "we need to report some
error to get_user_pages() and similar callers in cases when
page fault handler would've detected pkey problem and refused
to call handle_mm_fault()", with several places where it's
"we had been called with bogus arguments; printk and return
something to indicate the things had gone wrong".  It used to
have other uses, but this is what it had become now - grep and
you'll see.

AFAICS, all checks for it in page fault handlers are pretty much
dead code...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
  2023-01-31 20:04   ` Al Viro
  (?)
@ 2023-02-05  6:18     ` Finn Thain
  -1 siblings, 0 replies; 120+ messages in thread
From: Finn Thain @ 2023-02-05  6:18 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

Hello Al,

On Tue, 31 Jan 2023, Al Viro wrote:

> m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY 
> handling" If e.g. get_user() triggers a page fault and a fatal signal is 
> caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY 
> and not doing anything to page tables.  In such case we must *not* 
> return to the faulting insn - that would repeat the entire thing without 
> making any progress; what we need instead is to treat that as failed 
> (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

That could be a bug I was chasing back in 2021 but never found. The mmap 
stressors in stress-ng were triggering a crash on a Mac Quadras, though 
only rarely. Sometimes it would run all day without a failure.

Last year when I started using GCC 12 to build the kernel, I saw the same 
workload fail again but the failure mode had become a silent hang/livelock 
instead of the oopses I got with GCC 6.

When I press the NMI button after the livelock I always see 
do_page_fault() in the backtrace. So I've been testing your patch. I've 
been running the same stress-ng reproducer for about 12 hours now with no 
failures which looks promising.

In case that stress-ng testing is of use:
Tested-by: Finn Thain <fthain@linux-m68k.org>

BTW, how did you identify that bug in do_page_fault()? If its the same bug 
I was chasing, it could be an old one. The stress-ng logs I collected last 
year include a crash from a v4.14 build.

> ---
>  arch/m68k/mm/fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
> index 4d2837eb3e2a..228128e45c67 100644
> --- a/arch/m68k/mm/fault.c
> +++ b/arch/m68k/mm/fault.c
> @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
>  	fault = handle_mm_fault(vma, address, flags, regs);
>  	pr_debug("handle_mm_fault returns %x\n", fault);
>  
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;
>  		return 0;
> +	}
>  
>  	/* The fault is fully completed (including releasing mmap lock) */
>  	if (fault & VM_FAULT_COMPLETED)
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05  6:18     ` Finn Thain
  0 siblings, 0 replies; 120+ messages in thread
From: Finn Thain @ 2023-02-05  6:18 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

Hello Al,

On Tue, 31 Jan 2023, Al Viro wrote:

> m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY 
> handling" If e.g. get_user() triggers a page fault and a fatal signal is 
> caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY 
> and not doing anything to page tables.  In such case we must *not* 
> return to the faulting insn - that would repeat the entire thing without 
> making any progress; what we need instead is to treat that as failed 
> (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

That could be a bug I was chasing back in 2021 but never found. The mmap 
stressors in stress-ng were triggering a crash on a Mac Quadras, though 
only rarely. Sometimes it would run all day without a failure.

Last year when I started using GCC 12 to build the kernel, I saw the same 
workload fail again but the failure mode had become a silent hang/livelock 
instead of the oopses I got with GCC 6.

When I press the NMI button after the livelock I always see 
do_page_fault() in the backtrace. So I've been testing your patch. I've 
been running the same stress-ng reproducer for about 12 hours now with no 
failures which looks promising.

In case that stress-ng testing is of use:
Tested-by: Finn Thain <fthain@linux-m68k.org>

BTW, how did you identify that bug in do_page_fault()? If its the same bug 
I was chasing, it could be an old one. The stress-ng logs I collected last 
year include a crash from a v4.14 build.

> ---
>  arch/m68k/mm/fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
> index 4d2837eb3e2a..228128e45c67 100644
> --- a/arch/m68k/mm/fault.c
> +++ b/arch/m68k/mm/fault.c
> @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
>  	fault = handle_mm_fault(vma, address, flags, regs);
>  	pr_debug("handle_mm_fault returns %x\n", fault);
>  
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;
>  		return 0;
> +	}
>  
>  	/* The fault is fully completed (including releasing mmap lock) */
>  	if (fault & VM_FAULT_COMPLETED)
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05  6:18     ` Finn Thain
  0 siblings, 0 replies; 120+ messages in thread
From: Finn Thain @ 2023-02-05  6:18 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

Hello Al,

On Tue, 31 Jan 2023, Al Viro wrote:

> m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY 
> handling" If e.g. get_user() triggers a page fault and a fatal signal is 
> caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY 
> and not doing anything to page tables.  In such case we must *not* 
> return to the faulting insn - that would repeat the entire thing without 
> making any progress; what we need instead is to treat that as failed 
> (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

That could be a bug I was chasing back in 2021 but never found. The mmap 
stressors in stress-ng were triggering a crash on a Mac Quadras, though 
only rarely. Sometimes it would run all day without a failure.

Last year when I started using GCC 12 to build the kernel, I saw the same 
workload fail again but the failure mode had become a silent hang/livelock 
instead of the oopses I got with GCC 6.

When I press the NMI button after the livelock I always see 
do_page_fault() in the backtrace. So I've been testing your patch. I've 
been running the same stress-ng reproducer for about 12 hours now with no 
failures which looks promising.

In case that stress-ng testing is of use:
Tested-by: Finn Thain <fthain@linux-m68k.org>

BTW, how did you identify that bug in do_page_fault()? If its the same bug 
I was chasing, it could be an old one. The stress-ng logs I collected last 
year include a crash from a v4.14 build.

> ---
>  arch/m68k/mm/fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
> index 4d2837eb3e2a..228128e45c67 100644
> --- a/arch/m68k/mm/fault.c
> +++ b/arch/m68k/mm/fault.c
> @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
>  	fault = handle_mm_fault(vma, address, flags, regs);
>  	pr_debug("handle_mm_fault returns %x\n", fault);
>  
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;
>  		return 0;
> +	}
>  
>  	/* The fault is fully completed (including releasing mmap lock) */
>  	if (fault & VM_FAULT_COMPLETED)
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
  2023-02-05  6:18     ` Finn Thain
  (?)
@ 2023-02-05 18:51       ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-05 18:51 UTC (permalink / raw)
  To: Finn Thain
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote:
>
> That could be a bug I was chasing back in 2021 but never found. The mmap
> stressors in stress-ng were triggering a crash on a Mac Quadras, though
> only rarely. Sometimes it would run all day without a failure.
>
> Last year when I started using GCC 12 to build the kernel, I saw the same
> workload fail again but the failure mode had become a silent hang/livelock
> instead of the oopses I got with GCC 6.
>
> When I press the NMI button after the livelock I always see
> do_page_fault() in the backtrace. So I've been testing your patch. I've
> been running the same stress-ng reproducer for about 12 hours now with no
> failures which looks promising.
>
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>

Could you test the thing that Mark Rutland pointed to? He had an
actual test-case for this for the arm64 fixes some years ago.

See

   https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/

for his email with links to his old test-case?

                Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05 18:51       ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-05 18:51 UTC (permalink / raw)
  To: Finn Thain
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote:
>
> That could be a bug I was chasing back in 2021 but never found. The mmap
> stressors in stress-ng were triggering a crash on a Mac Quadras, though
> only rarely. Sometimes it would run all day without a failure.
>
> Last year when I started using GCC 12 to build the kernel, I saw the same
> workload fail again but the failure mode had become a silent hang/livelock
> instead of the oopses I got with GCC 6.
>
> When I press the NMI button after the livelock I always see
> do_page_fault() in the backtrace. So I've been testing your patch. I've
> been running the same stress-ng reproducer for about 12 hours now with no
> failures which looks promising.
>
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>

Could you test the thing that Mark Rutland pointed to? He had an
actual test-case for this for the arm64 fixes some years ago.

See

   https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/

for his email with links to his old test-case?

                Linus

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05 18:51       ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-05 18:51 UTC (permalink / raw)
  To: Finn Thain
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote:
>
> That could be a bug I was chasing back in 2021 but never found. The mmap
> stressors in stress-ng were triggering a crash on a Mac Quadras, though
> only rarely. Sometimes it would run all day without a failure.
>
> Last year when I started using GCC 12 to build the kernel, I saw the same
> workload fail again but the failure mode had become a silent hang/livelock
> instead of the oopses I got with GCC 6.
>
> When I press the NMI button after the livelock I always see
> do_page_fault() in the backtrace. So I've been testing your patch. I've
> been running the same stress-ng reproducer for about 12 hours now with no
> failures which looks promising.
>
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>

Could you test the thing that Mark Rutland pointed to? He had an
actual test-case for this for the arm64 fixes some years ago.

See

   https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/

for his email with links to his old test-case?

                Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
  2023-02-05  6:18     ` Finn Thain
  (?)
@ 2023-02-05 20:39       ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-05 20:39 UTC (permalink / raw)
  To: Finn Thain
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Sun, Feb 05, 2023 at 05:18:08PM +1100, Finn Thain wrote:

> That could be a bug I was chasing back in 2021 but never found. The mmap 
> stressors in stress-ng were triggering a crash on a Mac Quadras, though 
> only rarely. Sometimes it would run all day without a failure.
> 
> Last year when I started using GCC 12 to build the kernel, I saw the same 
> workload fail again but the failure mode had become a silent hang/livelock 
> instead of the oopses I got with GCC 6.
> 
> When I press the NMI button after the livelock I always see 
> do_page_fault() in the backtrace. So I've been testing your patch. I've 
> been running the same stress-ng reproducer for about 12 hours now with no 
> failures which looks promising.
> 
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> BTW, how did you identify that bug in do_page_fault()? If its the same bug 
> I was chasing, it could be an old one. The stress-ng logs I collected last 
> year include a crash from a v4.14 build.

Went to reread the current state of mm/gup.c, decided to reread handle_mm_fault()
and its callers, noticed fault_signal_pending() which hadn't been there back
when I last crawled through that area, realized what it had replaced, went
to check if everything had been converted (arch/um got missed, BTW).  Noticed
the difference between the architectures (the first hit was on alpha, without
the "sod off to no_context if it's a user fault" logics, the last - xtensa, with
it).  Checked the log for xtensa, found the commit from 2021 adding that part;
looked on arm and arm64, found commits from 2017 doing the same thing, then,
on x86, Linus' commit from 2014 adding the x86 counterpart...  Figuring out
what all of those had been for wasn't particularly hard, and it was easy
to check which architectures still needed the same thing...

BTW, since these patches would be much easier to backport than any unification
work, I think the right thing to do would be to have further unification done on
top of them.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05 20:39       ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-05 20:39 UTC (permalink / raw)
  To: Finn Thain
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Sun, Feb 05, 2023 at 05:18:08PM +1100, Finn Thain wrote:

> That could be a bug I was chasing back in 2021 but never found. The mmap 
> stressors in stress-ng were triggering a crash on a Mac Quadras, though 
> only rarely. Sometimes it would run all day without a failure.
> 
> Last year when I started using GCC 12 to build the kernel, I saw the same 
> workload fail again but the failure mode had become a silent hang/livelock 
> instead of the oopses I got with GCC 6.
> 
> When I press the NMI button after the livelock I always see 
> do_page_fault() in the backtrace. So I've been testing your patch. I've 
> been running the same stress-ng reproducer for about 12 hours now with no 
> failures which looks promising.
> 
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> BTW, how did you identify that bug in do_page_fault()? If its the same bug 
> I was chasing, it could be an old one. The stress-ng logs I collected last 
> year include a crash from a v4.14 build.

Went to reread the current state of mm/gup.c, decided to reread handle_mm_fault()
and its callers, noticed fault_signal_pending() which hadn't been there back
when I last crawled through that area, realized what it had replaced, went
to check if everything had been converted (arch/um got missed, BTW).  Noticed
the difference between the architectures (the first hit was on alpha, without
the "sod off to no_context if it's a user fault" logics, the last - xtensa, with
it).  Checked the log for xtensa, found the commit from 2021 adding that part;
looked on arm and arm64, found commits from 2017 doing the same thing, then,
on x86, Linus' commit from 2014 adding the x86 counterpart...  Figuring out
what all of those had been for wasn't particularly hard, and it was easy
to check which architectures still needed the same thing...

BTW, since these patches would be much easier to backport than any unification
work, I think the right thing to do would be to have further unification done on
top of them.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05 20:39       ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-05 20:39 UTC (permalink / raw)
  To: Finn Thain
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Sun, Feb 05, 2023 at 05:18:08PM +1100, Finn Thain wrote:

> That could be a bug I was chasing back in 2021 but never found. The mmap 
> stressors in stress-ng were triggering a crash on a Mac Quadras, though 
> only rarely. Sometimes it would run all day without a failure.
> 
> Last year when I started using GCC 12 to build the kernel, I saw the same 
> workload fail again but the failure mode had become a silent hang/livelock 
> instead of the oopses I got with GCC 6.
> 
> When I press the NMI button after the livelock I always see 
> do_page_fault() in the backtrace. So I've been testing your patch. I've 
> been running the same stress-ng reproducer for about 12 hours now with no 
> failures which looks promising.
> 
> In case that stress-ng testing is of use:
> Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> BTW, how did you identify that bug in do_page_fault()? If its the same bug 
> I was chasing, it could be an old one. The stress-ng logs I collected last 
> year include a crash from a v4.14 build.

Went to reread the current state of mm/gup.c, decided to reread handle_mm_fault()
and its callers, noticed fault_signal_pending() which hadn't been there back
when I last crawled through that area, realized what it had replaced, went
to check if everything had been converted (arch/um got missed, BTW).  Noticed
the difference between the architectures (the first hit was on alpha, without
the "sod off to no_context if it's a user fault" logics, the last - xtensa, with
it).  Checked the log for xtensa, found the commit from 2021 adding that part;
looked on arm and arm64, found commits from 2017 doing the same thing, then,
on x86, Linus' commit from 2014 adding the x86 counterpart...  Figuring out
what all of those had been for wasn't particularly hard, and it was easy
to check which architectures still needed the same thing...

BTW, since these patches would be much easier to backport than any unification
work, I think the right thing to do would be to have further unification done on
top of them.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
  2023-02-05 20:39       ` Al Viro
  (?)
@ 2023-02-05 20:41         ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-05 20:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Finn Thain, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, linux-parisc, linux-riscv,
	sparclinux

On Sun, Feb 5, 2023 at 12:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> BTW, since these patches would be much easier to backport than any unification
> work, I think the right thing to do would be to have further unification done on
> top of them.

Ack. I'm not NAKing the patches, I was just hoping that we also have
some way forward.

So "fix the issues, then unify" sounds like the right thing to do to me.

               Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05 20:41         ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-05 20:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Finn Thain, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, linux-parisc, linux-riscv,
	sparclinux

On Sun, Feb 5, 2023 at 12:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> BTW, since these patches would be much easier to backport than any unification
> work, I think the right thing to do would be to have further unification done on
> top of them.

Ack. I'm not NAKing the patches, I was just hoping that we also have
some way forward.

So "fix the issues, then unify" sounds like the right thing to do to me.

               Linus

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-05 20:41         ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2023-02-05 20:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Finn Thain, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, linux-parisc, linux-riscv,
	sparclinux

On Sun, Feb 5, 2023 at 12:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> BTW, since these patches would be much easier to backport than any unification
> work, I think the right thing to do would be to have further unification done on
> top of them.

Ack. I'm not NAKing the patches, I was just hoping that we also have
some way forward.

So "fix the issues, then unify" sounds like the right thing to do to me.

               Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
  2023-02-01 10:50   ` Mark Rutland
  (?)
@ 2023-02-06 12:08     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-06 12:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux, Linus Torvalds

Hi Mark,

On Wed, Feb 1, 2023 at 11:52 AM Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Jan 31, 2023 at 08:02:51PM +0000, Al Viro wrote:
> > On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
> > mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
> > had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
> > "ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
> > in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
> > in do_page_fault", etc.
> >
> > However, it never had been done on a bunch of architectures - the
> > current mainline still has that bug on alpha, hexagon, itanic, m68k,
> > microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> > sparc64).  Fixes are trivial, but I've no way to test them for most
> > of those architectures.
>
> FWIW, when I fixed arm and arm64 back in 2017, I did report the issue here with
> a test case (and again in 2021, with maintainers all explciitly Cc'd):
>
>   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
>   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
>
> ... so if anyone has access to those architectures, that test might be useful
> for verifying the fix.

Thanks a lot! This showed the problem on m68k, and confirmed Al's fix.

I'll give it a try on a few more systems later...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-06 12:08     ` Geert Uytterhoeven
  0 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-06 12:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux, Linus Torvalds

Hi Mark,

On Wed, Feb 1, 2023 at 11:52 AM Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Jan 31, 2023 at 08:02:51PM +0000, Al Viro wrote:
> > On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
> > mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
> > had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
> > "ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
> > in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
> > in do_page_fault", etc.
> >
> > However, it never had been done on a bunch of architectures - the
> > current mainline still has that bug on alpha, hexagon, itanic, m68k,
> > microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> > sparc64).  Fixes are trivial, but I've no way to test them for most
> > of those architectures.
>
> FWIW, when I fixed arm and arm64 back in 2017, I did report the issue here with
> a test case (and again in 2021, with maintainers all explciitly Cc'd):
>
>   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
>   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
>
> ... so if anyone has access to those architectures, that test might be useful
> for verifying the fix.

Thanks a lot! This showed the problem on m68k, and confirmed Al's fix.

I'll give it a try on a few more systems later...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
@ 2023-02-06 12:08     ` Geert Uytterhoeven
  0 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-06 12:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux, Linus Torvalds

Hi Mark,

On Wed, Feb 1, 2023 at 11:52 AM Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Jan 31, 2023 at 08:02:51PM +0000, Al Viro wrote:
> > On x86 it had been noticed and fixed back in 2014, in 26178ec11ef3 "x86:
> > mm: consolidate VM_FAULT_RETRY handling".  Some of the other architectures
> > had it dealt with later - e.g. arm in 2017, the fix is 746a272e44141
> > "ARM: 8692/1: mm: abort uaccess retries upon fatal signal"; xtensa -
> > in 2021, the fix is 7b9acbb6aad4f "xtensa: fix uaccess-related livelock
> > in do_page_fault", etc.
> >
> > However, it never had been done on a bunch of architectures - the
> > current mainline still has that bug on alpha, hexagon, itanic, m68k,
> > microblaze, nios2, openrisc, parisc, riscv and sparc (both sparc32 and
> > sparc64).  Fixes are trivial, but I've no way to test them for most
> > of those architectures.
>
> FWIW, when I fixed arm and arm64 back in 2017, I did report the issue here with
> a test case (and again in 2021, with maintainers all explciitly Cc'd):
>
>   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
>   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
>
> ... so if anyone has access to those architectures, that test might be useful
> for verifying the fix.

Thanks a lot! This showed the problem on m68k, and confirmed Al's fix.

I'll give it a try on a few more systems later...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
  2023-01-31 20:04   ` Al Viro
  (?)
@ 2023-02-06 12:08     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-06 12:08 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 9:06 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-06 12:08     ` Geert Uytterhoeven
  0 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-06 12:08 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 9:06 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-06 12:08     ` Geert Uytterhoeven
  0 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-06 12:08 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 9:06 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-01-31 20:06   ` Al Viro
  (?)
@ 2023-02-06 16:58     ` Helge Deller
  -1 siblings, 0 replies; 120+ messages in thread
From: Helge Deller @ 2023-02-06 16:58 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

Hi Al,

On 1/31/23 21:06, Al Viro wrote:
> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>   arch/parisc/mm/fault.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index 869204e97ec9..bb30ff6a3e19 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>
>   	fault = handle_mm_fault(vma, address, flags, regs);
>
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;
>   		return;
> +	}

The testcase in
   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
does hang with and without above patch on parisc.
It does not consume CPU in that state and can be killed with ^C.

Any idea?

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-02-06 16:58     ` Helge Deller
  0 siblings, 0 replies; 120+ messages in thread
From: Helge Deller @ 2023-02-06 16:58 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

Hi Al,

On 1/31/23 21:06, Al Viro wrote:
> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>   arch/parisc/mm/fault.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index 869204e97ec9..bb30ff6a3e19 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>
>   	fault = handle_mm_fault(vma, address, flags, regs);
>
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;
>   		return;
> +	}

The testcase in
   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
does hang with and without above patch on parisc.
It does not consume CPU in that state and can be killed with ^C.

Any idea?

Helge

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-02-06 16:58     ` Helge Deller
  0 siblings, 0 replies; 120+ messages in thread
From: Helge Deller @ 2023-02-06 16:58 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

Hi Al,

On 1/31/23 21:06, Al Viro wrote:
> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>   arch/parisc/mm/fault.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index 869204e97ec9..bb30ff6a3e19 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>
>   	fault = handle_mm_fault(vma, address, flags, regs);
>
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;
>   		return;
> +	}

The testcase in
   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
does hang with and without above patch on parisc.
It does not consume CPU in that state and can be killed with ^C.

Any idea?

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/10] riscv: fix livelock in uaccess
  2023-01-31 20:06   ` Al Viro
  (?)
@ 2023-02-06 20:06     ` Björn Töpel
  -1 siblings, 0 replies; 120+ messages in thread
From: Björn Töpel @ 2023-02-06 20:06 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds, Mark Rutland

Al Viro <viro@zeniv.linux.org.uk> writes:

> riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reproduced with Mark's userland program -- thanks!

Tested-by: Björn Töpel <bjorn@kernel.org>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/10] riscv: fix livelock in uaccess
@ 2023-02-06 20:06     ` Björn Töpel
  0 siblings, 0 replies; 120+ messages in thread
From: Björn Töpel @ 2023-02-06 20:06 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds, Mark Rutland

Al Viro <viro@zeniv.linux.org.uk> writes:

> riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reproduced with Mark's userland program -- thanks!

Tested-by: Björn Töpel <bjorn@kernel.org>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/10] riscv: fix livelock in uaccess
@ 2023-02-06 20:06     ` Björn Töpel
  0 siblings, 0 replies; 120+ messages in thread
From: Björn Töpel @ 2023-02-06 20:06 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds, Mark Rutland

Al Viro <viro@zeniv.linux.org.uk> writes:

> riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reproduced with Mark's userland program -- thanks!

Tested-by: Björn Töpel <bjorn@kernel.org>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
  2023-02-05 18:51       ` Linus Torvalds
  (?)
@ 2023-02-07  3:07         ` Finn Thain
  -1 siblings, 0 replies; 120+ messages in thread
From: Finn Thain @ 2023-02-07  3:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Sun, 5 Feb 2023, Linus Torvalds wrote:

> On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote:
> >
> > That could be a bug I was chasing back in 2021 but never found. The mmap
> > stressors in stress-ng were triggering a crash on a Mac Quadras, though
> > only rarely. Sometimes it would run all day without a failure.
> >
> > Last year when I started using GCC 12 to build the kernel, I saw the same
> > workload fail again but the failure mode had become a silent hang/livelock
> > instead of the oopses I got with GCC 6.
> >
> > When I press the NMI button after the livelock I always see
> > do_page_fault() in the backtrace. So I've been testing your patch. I've
> > been running the same stress-ng reproducer for about 12 hours now with no
> > failures which looks promising.
> >
> > In case that stress-ng testing is of use:
> > Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> Could you test the thing that Mark Rutland pointed to? He had an
> actual test-case for this for the arm64 fixes some years ago.
> 
> See
> 
>    https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/
> 
> for his email with links to his old test-case?
> 

With qemu-system-m68k v7.2 runing Linux v6.1, killing Mark's program did 
not trigger any failure.

With my Quadra 650 running the same binaries, killing Mark's program 
produces the BUG splat below. I can confirm that Al's patch fixes it.

[  199.060000] BUG: scheduling while atomic: test/48/0x0003e2e6
[  199.060000] Modules linked in:
[  199.060000] CPU: 0 PID: 48 Comm: test Tainted: G        W          6.1.0-mac #3
[  199.060000] Stack from 00a6204c:
[  199.060000]         00a6204c 004c1979 004c1979 00000000 00a6215e 00a6206c 00408d62 004c1979
[  199.060000]         00a62074 0003ad4a 00a620a8 004099e2 00983480 00000000 00a6215e 00000000
[  199.060000]         00000002 00983480 00a62000 0040970a 00a6209c 00a6209c 00a620c0 00a620c0
[  199.060000]         00409b94 00000000 00bcf0c0 00a62198 00a357fc 00a6216c 00120e5c 00bcf0d0
[  199.060000]         00000003 00000001 00000001 0123f039 00000000 00a357fc 00aead38 005870dc
[  199.060000]         005870dc 00a31200 00000000 00000000 00000000 00010000 0000c000 00000000
[  199.060000] Call Trace: [<00408d62>] dump_stack+0x10/0x16
[  199.060000]  [<0003ad4a>] __schedule_bug+0x5e/0x70
[  199.060000]  [<004099e2>] __schedule+0x2d8/0x446
[  199.060000]  [<0040970a>] __schedule+0x0/0x446
[  199.060000]  [<00409b94>] schedule+0x44/0x8e
[  199.060000]  [<00120e5c>] handle_userfault+0x298/0x3de
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<00001200>] kernel_pg_dir+0x200/0x1000
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<00001200>] kernel_pg_dir+0x200/0x1000
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<000ab0a0>] handle_mm_fault+0xa34/0xa56
[  199.060000]  [<000c0000>] __alloc_pages_bulk+0x26/0x3f8
[  199.060000]  [<00007056>] do_page_fault+0xd8/0x28a
[  199.060000]  [<00006098>] buserr_c+0x1a6/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00008001>] mac_irq_disable+0x3b/0x98
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00008001>] mac_irq_disable+0x3b/0x98
[  199.060000] 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-07  3:07         ` Finn Thain
  0 siblings, 0 replies; 120+ messages in thread
From: Finn Thain @ 2023-02-07  3:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Sun, 5 Feb 2023, Linus Torvalds wrote:

> On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote:
> >
> > That could be a bug I was chasing back in 2021 but never found. The mmap
> > stressors in stress-ng were triggering a crash on a Mac Quadras, though
> > only rarely. Sometimes it would run all day without a failure.
> >
> > Last year when I started using GCC 12 to build the kernel, I saw the same
> > workload fail again but the failure mode had become a silent hang/livelock
> > instead of the oopses I got with GCC 6.
> >
> > When I press the NMI button after the livelock I always see
> > do_page_fault() in the backtrace. So I've been testing your patch. I've
> > been running the same stress-ng reproducer for about 12 hours now with no
> > failures which looks promising.
> >
> > In case that stress-ng testing is of use:
> > Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> Could you test the thing that Mark Rutland pointed to? He had an
> actual test-case for this for the arm64 fixes some years ago.
> 
> See
> 
>    https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/
> 
> for his email with links to his old test-case?
> 

With qemu-system-m68k v7.2 runing Linux v6.1, killing Mark's program did 
not trigger any failure.

With my Quadra 650 running the same binaries, killing Mark's program 
produces the BUG splat below. I can confirm that Al's patch fixes it.

[  199.060000] BUG: scheduling while atomic: test/48/0x0003e2e6
[  199.060000] Modules linked in:
[  199.060000] CPU: 0 PID: 48 Comm: test Tainted: G        W          6.1.0-mac #3
[  199.060000] Stack from 00a6204c:
[  199.060000]         00a6204c 004c1979 004c1979 00000000 00a6215e 00a6206c 00408d62 004c1979
[  199.060000]         00a62074 0003ad4a 00a620a8 004099e2 00983480 00000000 00a6215e 00000000
[  199.060000]         00000002 00983480 00a62000 0040970a 00a6209c 00a6209c 00a620c0 00a620c0
[  199.060000]         00409b94 00000000 00bcf0c0 00a62198 00a357fc 00a6216c 00120e5c 00bcf0d0
[  199.060000]         00000003 00000001 00000001 0123f039 00000000 00a357fc 00aead38 005870dc
[  199.060000]         005870dc 00a31200 00000000 00000000 00000000 00010000 0000c000 00000000
[  199.060000] Call Trace: [<00408d62>] dump_stack+0x10/0x16
[  199.060000]  [<0003ad4a>] __schedule_bug+0x5e/0x70
[  199.060000]  [<004099e2>] __schedule+0x2d8/0x446
[  199.060000]  [<0040970a>] __schedule+0x0/0x446
[  199.060000]  [<00409b94>] schedule+0x44/0x8e
[  199.060000]  [<00120e5c>] handle_userfault+0x298/0x3de
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<00001200>] kernel_pg_dir+0x200/0x1000
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<00001200>] kernel_pg_dir+0x200/0x1000
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<000ab0a0>] handle_mm_fault+0xa34/0xa56
[  199.060000]  [<000c0000>] __alloc_pages_bulk+0x26/0x3f8
[  199.060000]  [<00007056>] do_page_fault+0xd8/0x28a
[  199.060000]  [<00006098>] buserr_c+0x1a6/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00008001>] mac_irq_disable+0x3b/0x98
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00008001>] mac_irq_disable+0x3b/0x98
[  199.060000] 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/10] m68k: fix livelock in uaccess
@ 2023-02-07  3:07         ` Finn Thain
  0 siblings, 0 replies; 120+ messages in thread
From: Finn Thain @ 2023-02-07  3:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, Michal Simek, Dinh Nguyen, openrisc, linux-parisc,
	linux-riscv, sparclinux

On Sun, 5 Feb 2023, Linus Torvalds wrote:

> On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote:
> >
> > That could be a bug I was chasing back in 2021 but never found. The mmap
> > stressors in stress-ng were triggering a crash on a Mac Quadras, though
> > only rarely. Sometimes it would run all day without a failure.
> >
> > Last year when I started using GCC 12 to build the kernel, I saw the same
> > workload fail again but the failure mode had become a silent hang/livelock
> > instead of the oopses I got with GCC 6.
> >
> > When I press the NMI button after the livelock I always see
> > do_page_fault() in the backtrace. So I've been testing your patch. I've
> > been running the same stress-ng reproducer for about 12 hours now with no
> > failures which looks promising.
> >
> > In case that stress-ng testing is of use:
> > Tested-by: Finn Thain <fthain@linux-m68k.org>
> 
> Could you test the thing that Mark Rutland pointed to? He had an
> actual test-case for this for the arm64 fixes some years ago.
> 
> See
> 
>    https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/
> 
> for his email with links to his old test-case?
> 

With qemu-system-m68k v7.2 runing Linux v6.1, killing Mark's program did 
not trigger any failure.

With my Quadra 650 running the same binaries, killing Mark's program 
produces the BUG splat below. I can confirm that Al's patch fixes it.

[  199.060000] BUG: scheduling while atomic: test/48/0x0003e2e6
[  199.060000] Modules linked in:
[  199.060000] CPU: 0 PID: 48 Comm: test Tainted: G        W          6.1.0-mac #3
[  199.060000] Stack from 00a6204c:
[  199.060000]         00a6204c 004c1979 004c1979 00000000 00a6215e 00a6206c 00408d62 004c1979
[  199.060000]         00a62074 0003ad4a 00a620a8 004099e2 00983480 00000000 00a6215e 00000000
[  199.060000]         00000002 00983480 00a62000 0040970a 00a6209c 00a6209c 00a620c0 00a620c0
[  199.060000]         00409b94 00000000 00bcf0c0 00a62198 00a357fc 00a6216c 00120e5c 00bcf0d0
[  199.060000]         00000003 00000001 00000001 0123f039 00000000 00a357fc 00aead38 005870dc
[  199.060000]         005870dc 00a31200 00000000 00000000 00000000 00010000 0000c000 00000000
[  199.060000] Call Trace: [<00408d62>] dump_stack+0x10/0x16
[  199.060000]  [<0003ad4a>] __schedule_bug+0x5e/0x70
[  199.060000]  [<004099e2>] __schedule+0x2d8/0x446
[  199.060000]  [<0040970a>] __schedule+0x0/0x446
[  199.060000]  [<00409b94>] schedule+0x44/0x8e
[  199.060000]  [<00120e5c>] handle_userfault+0x298/0x3de
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<00001200>] kernel_pg_dir+0x200/0x1000
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<00001200>] kernel_pg_dir+0x200/0x1000
[  199.060000]  [<00010000>] zer_rp2+0x14/0x18
[  199.060000]  [<0000c000>] cu_dmrs+0x0/0x16
[  199.060000]  [<000ab0a0>] handle_mm_fault+0xa34/0xa56
[  199.060000]  [<000c0000>] __alloc_pages_bulk+0x26/0x3f8
[  199.060000]  [<00007056>] do_page_fault+0xd8/0x28a
[  199.060000]  [<00006098>] buserr_c+0x1a6/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<000056fc>] do_040writeback1+0x0/0x1d8
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<000062cc>] buserr_c+0x3da/0x6b0
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00002ac0>] buserr+0x20/0x28
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00008001>] mac_irq_disable+0x3b/0x98
[  199.060000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  199.060000]  [<00008001>] mac_irq_disable+0x3b/0x98
[  199.060000] 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/10] riscv: fix livelock in uaccess
  2023-01-31 20:06   ` Al Viro
  (?)
@ 2023-02-07 16:11     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-07 16:11 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 9:07 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/10] riscv: fix livelock in uaccess
@ 2023-02-07 16:11     ` Geert Uytterhoeven
  0 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-07 16:11 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 9:07 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/10] riscv: fix livelock in uaccess
@ 2023-02-07 16:11     ` Geert Uytterhoeven
  0 siblings, 0 replies; 120+ messages in thread
From: Geert Uytterhoeven @ 2023-02-07 16:11 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 9:07 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [PATCH 02/10] hexagon: fix livelock in uaccess
  2023-01-31 20:03   ` Al Viro
@ 2023-02-10  2:59     ` Brian Cain
  -1 siblings, 0 replies; 120+ messages in thread
From: Brian Cain @ 2023-02-10  2:59 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

> -----Original Message-----
> From: Al Viro <viro@ftp.linux.org.uk> On Behalf Of Al Viro
> Sent: Tuesday, January 31, 2023 2:04 PM
> To: linux-arch@vger.kernel.org
> Cc: linux-alpha@vger.kernel.org; linux-ia64@vger.kernel.org; linux-
> hexagon@vger.kernel.org; linux-m68k@lists.linux-m68k.org; Michal Simek
> <monstr@monstr.eu>; Dinh Nguyen <dinguyen@kernel.org>;
> openrisc@lists.librecores.org; linux-parisc@vger.kernel.org; linux-
> riscv@lists.infradead.org; sparclinux@vger.kernel.org; Linus Torvalds
> <torvalds@linux-foundation.org>
> Subject: [PATCH 02/10] hexagon: fix livelock in uaccess
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> hexagon equivalent of 26178ec11ef3 "x86: mm: consolidate
> VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing
> anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we
> need
> instead is to treat that as failed (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  arch/hexagon/mm/vm_fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
> index f73c7cbfe326..4b578d02fd01 100644
> --- a/arch/hexagon/mm/vm_fault.c
> +++ b/arch/hexagon/mm/vm_fault.c
> @@ -93,8 +93,11 @@ void do_page_fault(unsigned long address, long cause,
> struct pt_regs *regs)
> 
>         fault = handle_mm_fault(vma, address, flags, regs);
> 
> -       if (fault_signal_pending(fault, regs))
> +       if (fault_signal_pending(fault, regs)) {
> +               if (!user_mode(regs))
> +                       goto no_context;
>                 return;
> +       }
> 
>         /* The fault is fully completed (including releasing mmap lock) */
>         if (fault & VM_FAULT_COMPLETED)
> --
> 2.30.2

Acked-by: Brian Cain <bcain@quicinc.com>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [PATCH 02/10] hexagon: fix livelock in uaccess
@ 2023-02-10  2:59     ` Brian Cain
  0 siblings, 0 replies; 120+ messages in thread
From: Brian Cain @ 2023-02-10  2:59 UTC (permalink / raw)
  To: Al Viro, linux-arch
  Cc: linux-alpha, linux-ia64, linux-hexagon, linux-m68k, Michal Simek,
	Dinh Nguyen, openrisc, linux-parisc, linux-riscv, sparclinux,
	Linus Torvalds

> -----Original Message-----
> From: Al Viro <viro@ftp.linux.org.uk> On Behalf Of Al Viro
> Sent: Tuesday, January 31, 2023 2:04 PM
> To: linux-arch@vger.kernel.org
> Cc: linux-alpha@vger.kernel.org; linux-ia64@vger.kernel.org; linux-
> hexagon@vger.kernel.org; linux-m68k@lists.linux-m68k.org; Michal Simek
> <monstr@monstr.eu>; Dinh Nguyen <dinguyen@kernel.org>;
> openrisc@lists.librecores.org; linux-parisc@vger.kernel.org; linux-
> riscv@lists.infradead.org; sparclinux@vger.kernel.org; Linus Torvalds
> <torvalds@linux-foundation.org>
> Subject: [PATCH 02/10] hexagon: fix livelock in uaccess
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> hexagon equivalent of 26178ec11ef3 "x86: mm: consolidate
> VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing
> anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we
> need
> instead is to treat that as failed (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  arch/hexagon/mm/vm_fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
> index f73c7cbfe326..4b578d02fd01 100644
> --- a/arch/hexagon/mm/vm_fault.c
> +++ b/arch/hexagon/mm/vm_fault.c
> @@ -93,8 +93,11 @@ void do_page_fault(unsigned long address, long cause,
> struct pt_regs *regs)
> 
>         fault = handle_mm_fault(vma, address, flags, regs);
> 
> -       if (fault_signal_pending(fault, regs))
> +       if (fault_signal_pending(fault, regs)) {
> +               if (!user_mode(regs))
> +                       goto no_context;
>                 return;
> +       }
> 
>         /* The fault is fully completed (including releasing mmap lock) */
>         if (fault & VM_FAULT_COMPLETED)
> --
> 2.30.2

Acked-by: Brian Cain <bcain@quicinc.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-01-31 20:06   ` Al Viro
  (?)
@ 2023-02-28 15:22     ` Guenter Roeck
  -1 siblings, 0 replies; 120+ messages in thread
From: Guenter Roeck @ 2023-02-28 15:22 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 08:06:27PM +0000, Al Viro wrote:
> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  arch/parisc/mm/fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index 869204e97ec9..bb30ff6a3e19 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>  
>  	fault = handle_mm_fault(vma, address, flags, regs);
>  
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;

0-day rightfully complains that this leaves 'msg' uninitialized.

arch/parisc/mm/fault.c:427 do_page_fault() error: uninitialized symbol 'msg'

Guenter

>  		return;
> +	}
>  
>  	/* The fault is fully completed (including releasing mmap lock) */
>  	if (fault & VM_FAULT_COMPLETED)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-02-28 15:22     ` Guenter Roeck
  0 siblings, 0 replies; 120+ messages in thread
From: Guenter Roeck @ 2023-02-28 15:22 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 08:06:27PM +0000, Al Viro wrote:
> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  arch/parisc/mm/fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index 869204e97ec9..bb30ff6a3e19 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>  
>  	fault = handle_mm_fault(vma, address, flags, regs);
>  
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;

0-day rightfully complains that this leaves 'msg' uninitialized.

arch/parisc/mm/fault.c:427 do_page_fault() error: uninitialized symbol 'msg'

Guenter

>  		return;
> +	}
>  
>  	/* The fault is fully completed (including releasing mmap lock) */
>  	if (fault & VM_FAULT_COMPLETED)

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-02-28 15:22     ` Guenter Roeck
  0 siblings, 0 replies; 120+ messages in thread
From: Guenter Roeck @ 2023-02-28 15:22 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Tue, Jan 31, 2023 at 08:06:27PM +0000, Al Viro wrote:
> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  arch/parisc/mm/fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index 869204e97ec9..bb30ff6a3e19 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>  
>  	fault = handle_mm_fault(vma, address, flags, regs);
>  
> -	if (fault_signal_pending(fault, regs))
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			goto no_context;

0-day rightfully complains that this leaves 'msg' uninitialized.

arch/parisc/mm/fault.c:427 do_page_fault() error: uninitialized symbol 'msg'

Guenter

>  		return;
> +	}
>  
>  	/* The fault is fully completed (including releasing mmap lock) */
>  	if (fault & VM_FAULT_COMPLETED)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-06 16:58     ` Helge Deller
@ 2023-02-28 17:34       ` Al Viro
  -1 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-28 17:34 UTC (permalink / raw)
  To: Helge Deller
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Mon, Feb 06, 2023 at 05:58:02PM +0100, Helge Deller wrote:
> Hi Al,
> 
> On 1/31/23 21:06, Al Viro wrote:
> > parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> > If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> > end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> > to page tables.  In such case we must *not* return to the faulting insn -
> > that would repeat the entire thing without making any progress; what we need
> > instead is to treat that as failed (user) memory access.
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >   arch/parisc/mm/fault.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> > index 869204e97ec9..bb30ff6a3e19 100644
> > --- a/arch/parisc/mm/fault.c
> > +++ b/arch/parisc/mm/fault.c
> > @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
> > 
> >   	fault = handle_mm_fault(vma, address, flags, regs);
> > 
> > -	if (fault_signal_pending(fault, regs))
> > +	if (fault_signal_pending(fault, regs)) {
> > +		if (!user_mode(regs))
> > +			goto no_context;
> >   		return;
> > +	}
> 
> The testcase in
>   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
>   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
> does hang with and without above patch on parisc.
> It does not consume CPU in that state and can be killed with ^C.
> 
> Any idea?

	Still trying to resurrect the parisc box to test on it...
FWIW, right now I've locally confirmed that mainline has the bug
in question and that patch fixes it for alpha, sparc32 and sparc64;
hexagon, m68k and riscv got acks from other folks; microblaze,
nios2 and openrisc I can't test at all (no hardware, no qemu setup);
same for parisc64.  Itanic and parisc32 I might be able to test,
if I manage to resurrect the hardware.

	Just to confirm: your "can be killed with ^C" had been on the
mainline parisc kernel (with userfaultfd enable, of course, or it wouldn't
hang up at all), right?  Was it 32bit or 64bit kernel?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-02-28 17:34       ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-02-28 17:34 UTC (permalink / raw)
  To: Helge Deller
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

On Mon, Feb 06, 2023 at 05:58:02PM +0100, Helge Deller wrote:
> Hi Al,
> 
> On 1/31/23 21:06, Al Viro wrote:
> > parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> > If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> > end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> > to page tables.  In such case we must *not* return to the faulting insn -
> > that would repeat the entire thing without making any progress; what we need
> > instead is to treat that as failed (user) memory access.
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >   arch/parisc/mm/fault.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> > index 869204e97ec9..bb30ff6a3e19 100644
> > --- a/arch/parisc/mm/fault.c
> > +++ b/arch/parisc/mm/fault.c
> > @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
> > 
> >   	fault = handle_mm_fault(vma, address, flags, regs);
> > 
> > -	if (fault_signal_pending(fault, regs))
> > +	if (fault_signal_pending(fault, regs)) {
> > +		if (!user_mode(regs))
> > +			goto no_context;
> >   		return;
> > +	}
> 
> The testcase in
>   https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
>   https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
> does hang with and without above patch on parisc.
> It does not consume CPU in that state and can be killed with ^C.
> 
> Any idea?

	Still trying to resurrect the parisc box to test on it...
FWIW, right now I've locally confirmed that mainline has the bug
in question and that patch fixes it for alpha, sparc32 and sparc64;
hexagon, m68k and riscv got acks from other folks; microblaze,
nios2 and openrisc I can't test at all (no hardware, no qemu setup);
same for parisc64.  Itanic and parisc32 I might be able to test,
if I manage to resurrect the hardware.

	Just to confirm: your "can be killed with ^C" had been on the
mainline parisc kernel (with userfaultfd enable, of course, or it wouldn't
hang up at all), right?  Was it 32bit or 64bit kernel?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 17:34       ` Al Viro
  (?)
@ 2023-02-28 18:26       ` Helge Deller
  2023-02-28 19:14         ` Al Viro
  -1 siblings, 1 reply; 120+ messages in thread
From: Helge Deller @ 2023-02-28 18:26 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, linux-parisc

On 2/28/23 18:34, Al Viro wrote:
> On Mon, Feb 06, 2023 at 05:58:02PM +0100, Helge Deller wrote:
>> Hi Al,
>>
>> On 1/31/23 21:06, Al Viro wrote:
>>> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
>>> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
>>> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
>>> to page tables.  In such case we must *not* return to the faulting insn -
>>> that would repeat the entire thing without making any progress; what we need
>>> instead is to treat that as failed (user) memory access.
>>>
>>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>>> ---
>>>    arch/parisc/mm/fault.c | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
>>> index 869204e97ec9..bb30ff6a3e19 100644
>>> --- a/arch/parisc/mm/fault.c
>>> +++ b/arch/parisc/mm/fault.c
>>> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>>>
>>>    	fault = handle_mm_fault(vma, address, flags, regs);
>>>
>>> -	if (fault_signal_pending(fault, regs))
>>> +	if (fault_signal_pending(fault, regs)) {
>>> +		if (!user_mode(regs))
>>> +			goto no_context;
>>>    		return;
>>> +	}
>>
>> The testcase in
>>    https://lore.kernel.org/lkml/20170822102527.GA14671@leverpostej/
>>    https://lore.kernel.org/linux-arch/20210121123140.GD48431@C02TD0UTHF1T.local/
>> does hang with and without above patch on parisc.
>> It does not consume CPU in that state and can be killed with ^C.
>>
>> Any idea?
>
> 	Still trying to resurrect the parisc box to test on it...
> FWIW, right now I've locally confirmed that mainline has the bug
> in question and that patch fixes it for alpha, sparc32 and sparc64;
> hexagon, m68k and riscv got acks from other folks; microblaze,
> nios2 and openrisc I can't test at all (no hardware, no qemu setup);
> same for parisc64.  Itanic and parisc32 I might be able to test,
> if I manage to resurrect the hardware.

I can test both parisc32 and parisc64.

> 	Just to confirm: your "can be killed with ^C" had been on the
> mainline parisc kernel (with userfaultfd enable, of course, or it wouldn't
> hang up at all), right?

It was a recent mainline kernel with your patch.

> Was it 32bit or 64bit kernel?

I don't remember. I think I tried both.

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 18:26       ` Helge Deller
@ 2023-02-28 19:14         ` Al Viro
  2023-02-28 19:32           ` Helge Deller
  0 siblings, 1 reply; 120+ messages in thread
From: Al Viro @ 2023-02-28 19:14 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-arch, linux-parisc

On Tue, Feb 28, 2023 at 07:26:48PM +0100, Helge Deller wrote:

> I can test both parisc32 and parisc64.
> 
> > 	Just to confirm: your "can be killed with ^C" had been on the
> > mainline parisc kernel (with userfaultfd enable, of course, or it wouldn't
> > hang up at all), right?
> 
> It was a recent mainline kernel with your patch.

Er...  Reproducer *is* supposed to block; the bug is that on the kernel
without this patch it would (AFAICS) be impossible to kill - not even
with kill -9.  With bug fixed the behaviour should be "blocks and is
killable", i.e. what you've reported.

What does it do on unpatched kernel?  *IF* the big is there, it would
block and be unkillable by any means.  Could you verify that?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 15:22     ` Guenter Roeck
  (?)
@ 2023-02-28 19:18       ` Michael Schmitz
  -1 siblings, 0 replies; 120+ messages in thread
From: Michael Schmitz @ 2023-02-28 19:18 UTC (permalink / raw)
  To: Guenter Roeck, Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

Guenter,

On 1/03/23 04:22, Guenter Roeck wrote:
> On Tue, Jan 31, 2023 at 08:06:27PM +0000, Al Viro wrote:
>> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
>> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
>> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
>> to page tables.  In such case we must *not* return to the faulting insn -
>> that would repeat the entire thing without making any progress; what we need
>> instead is to treat that as failed (user) memory access.
>>
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>> ---
>>   arch/parisc/mm/fault.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
>> index 869204e97ec9..bb30ff6a3e19 100644
>> --- a/arch/parisc/mm/fault.c
>> +++ b/arch/parisc/mm/fault.c
>> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>>   
>>   	fault = handle_mm_fault(vma, address, flags, regs);
>>   
>> -	if (fault_signal_pending(fault, regs))
>> +	if (fault_signal_pending(fault, regs)) {
>> +		if (!user_mode(regs))
>> +			goto no_context;
> 0-day rightfully complains that this leaves 'msg' uninitialized.
>
> arch/parisc/mm/fault.c:427 do_page_fault() error: uninitialized symbol 'msg'
>
> Guenter

What happens if you initialize msg to "Page fault: no context" right at 
the start of do_page_fault (and drop the assignment a few lines down as 
that's now redundant)?

(Wondering if the zero page access on parisc could cause a trip right 
back into do_page_fault, ad infinitum...)

Cheers,

     Michael


>>   		return;
>> +	}
>>   
>>   	/* The fault is fully completed (including releasing mmap lock) */
>>   	if (fault & VM_FAULT_COMPLETED)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-02-28 19:18       ` Michael Schmitz
  0 siblings, 0 replies; 120+ messages in thread
From: Michael Schmitz @ 2023-02-28 19:18 UTC (permalink / raw)
  To: Guenter Roeck, Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

Guenter,

On 1/03/23 04:22, Guenter Roeck wrote:
> On Tue, Jan 31, 2023 at 08:06:27PM +0000, Al Viro wrote:
>> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
>> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
>> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
>> to page tables.  In such case we must *not* return to the faulting insn -
>> that would repeat the entire thing without making any progress; what we need
>> instead is to treat that as failed (user) memory access.
>>
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>> ---
>>   arch/parisc/mm/fault.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
>> index 869204e97ec9..bb30ff6a3e19 100644
>> --- a/arch/parisc/mm/fault.c
>> +++ b/arch/parisc/mm/fault.c
>> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>>   
>>   	fault = handle_mm_fault(vma, address, flags, regs);
>>   
>> -	if (fault_signal_pending(fault, regs))
>> +	if (fault_signal_pending(fault, regs)) {
>> +		if (!user_mode(regs))
>> +			goto no_context;
> 0-day rightfully complains that this leaves 'msg' uninitialized.
>
> arch/parisc/mm/fault.c:427 do_page_fault() error: uninitialized symbol 'msg'
>
> Guenter

What happens if you initialize msg to "Page fault: no context" right at 
the start of do_page_fault (and drop the assignment a few lines down as 
that's now redundant)?

(Wondering if the zero page access on parisc could cause a trip right 
back into do_page_fault, ad infinitum...)

Cheers,

     Michael


>>   		return;
>> +	}
>>   
>>   	/* The fault is fully completed (including releasing mmap lock) */
>>   	if (fault & VM_FAULT_COMPLETED)

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
@ 2023-02-28 19:18       ` Michael Schmitz
  0 siblings, 0 replies; 120+ messages in thread
From: Michael Schmitz @ 2023-02-28 19:18 UTC (permalink / raw)
  To: Guenter Roeck, Al Viro
  Cc: linux-arch, linux-alpha, linux-ia64, linux-hexagon, linux-m68k,
	Michal Simek, Dinh Nguyen, openrisc, linux-parisc, linux-riscv,
	sparclinux, Linus Torvalds

Guenter,

On 1/03/23 04:22, Guenter Roeck wrote:
> On Tue, Jan 31, 2023 at 08:06:27PM +0000, Al Viro wrote:
>> parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
>> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
>> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
>> to page tables.  In such case we must *not* return to the faulting insn -
>> that would repeat the entire thing without making any progress; what we need
>> instead is to treat that as failed (user) memory access.
>>
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>> ---
>>   arch/parisc/mm/fault.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
>> index 869204e97ec9..bb30ff6a3e19 100644
>> --- a/arch/parisc/mm/fault.c
>> +++ b/arch/parisc/mm/fault.c
>> @@ -308,8 +308,11 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>>   
>>   	fault = handle_mm_fault(vma, address, flags, regs);
>>   
>> -	if (fault_signal_pending(fault, regs))
>> +	if (fault_signal_pending(fault, regs)) {
>> +		if (!user_mode(regs))
>> +			goto no_context;
> 0-day rightfully complains that this leaves 'msg' uninitialized.
>
> arch/parisc/mm/fault.c:427 do_page_fault() error: uninitialized symbol 'msg'
>
> Guenter

What happens if you initialize msg to "Page fault: no context" right at 
the start of do_page_fault (and drop the assignment a few lines down as 
that's now redundant)?

(Wondering if the zero page access on parisc could cause a trip right 
back into do_page_fault, ad infinitum...)

Cheers,

     Michael


>>   		return;
>> +	}
>>   
>>   	/* The fault is fully completed (including releasing mmap lock) */
>>   	if (fault & VM_FAULT_COMPLETED)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 19:14         ` Al Viro
@ 2023-02-28 19:32           ` Helge Deller
  2023-02-28 20:00             ` Helge Deller
  0 siblings, 1 reply; 120+ messages in thread
From: Helge Deller @ 2023-02-28 19:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, linux-parisc

Hi Al,

On 2/28/23 20:14, Al Viro wrote:
> On Tue, Feb 28, 2023 at 07:26:48PM +0100, Helge Deller wrote:
>
>> I can test both parisc32 and parisc64.
>>
>>> 	Just to confirm: your "can be killed with ^C" had been on the
>>> mainline parisc kernel (with userfaultfd enable, of course, or it wouldn't
>>> hang up at all), right?
>>
>> It was a recent mainline kernel with your patch.
>
> Er...  Reproducer *is* supposed to block; the bug is that on the kernel
> without this patch it would (AFAICS) be impossible to kill - not even
> with kill -9.  With bug fixed the behaviour should be "blocks and is
> killable", i.e. what you've reported.
>
> What does it do on unpatched kernel?  *IF* the big is there, it would
> block and be unkillable by any means.  Could you verify that?

I just tried 32- and 64-bit kernels.
With and without your patch on top of kernel 6.2.
The result is always the same:
The process hangs, but does not consume CPU and can be killed with ^C or kill command.

strace output is:

userfaultfd(0)                          = 3
ioctl(3, UFFDIO_API, {api=0xaa, features=0 => features=UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID|0x800, ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0
ioctl(3, UFFDIO_REGISTER, {range={start=0xf7afa000, len=0x1000}, mode=UFFDIO_REGISTER_MODE_MISSING, ioctls=1<<_UFFDIO_WAKE|1<<_UFFDIO_COPY|1<<_UFFDIO_ZEROPAGE}) = 0
fstatfs64(0, 88, {f_type=DEVPTS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0

<here it hangs until I kill the process with SIGTERM>

--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=859, si_uid=0} ---
+++ killed by SIGTERM +++
Terminated

It seems the code flow doesn't reaches fault_signal_pending(), or it returns false...

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 19:32           ` Helge Deller
@ 2023-02-28 20:00             ` Helge Deller
  2023-02-28 20:22               ` Helge Deller
  0 siblings, 1 reply; 120+ messages in thread
From: Helge Deller @ 2023-02-28 20:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, linux-parisc

On 2/28/23 20:32, Helge Deller wrote:
> Hi Al,
>
> On 2/28/23 20:14, Al Viro wrote:
>> On Tue, Feb 28, 2023 at 07:26:48PM +0100, Helge Deller wrote:
>>
>>> I can test both parisc32 and parisc64.
>>>
>>>>     Just to confirm: your "can be killed with ^C" had been on the
>>>> mainline parisc kernel (with userfaultfd enable, of course, or it wouldn't
>>>> hang up at all), right?
>>>
>>> It was a recent mainline kernel with your patch.
>>
>> Er...  Reproducer *is* supposed to block; the bug is that on the kernel
>> without this patch it would (AFAICS) be impossible to kill - not even
>> with kill -9.  With bug fixed the behaviour should be "blocks and is
>> killable", i.e. what you've reported.
>>
>> What does it do on unpatched kernel?  *IF* the big is there, it would
>> block and be unkillable by any means.  Could you verify that?
>
> I just tried 32- and 64-bit kernels.
> With and without your patch on top of kernel 6.2.
> The result is always the same:
> The process hangs, but does not consume CPU and can be killed with ^C or kill command.
>
> strace output is:
>
> userfaultfd(0)                          = 3
> ioctl(3, UFFDIO_API, {api=0xaa, features=0 => features=UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID|0x800, ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0
> ioctl(3, UFFDIO_REGISTER, {range={start=0xf7afa000, len=0x1000}, mode=UFFDIO_REGISTER_MODE_MISSING, ioctls=1<<_UFFDIO_WAKE|1<<_UFFDIO_COPY|1<<_UFFDIO_ZEROPAGE}) = 0
> fstatfs64(0, 88, {f_type=DEVPTS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0

Ok, one step further...

We get fooled by glibc, which replaces this line in the testcase:
         ret = fstatfs(0, (struct statfs *)mem);
and instead executes the 64-bit variant fstatfs64() inside a wrapper in glibc on another address.

I replaced that line in the testcase to use the 32-bit syscall with the right (userfault-specified) address by
         ret = syscall(__NR_fstatfs, 0, mem);
and now I see:
userfaultfd(0)                          = 3
ioctl(3, UFFDIO_API, {api=0xaa, features=0 => features=UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID|0x800, ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0
ioctl(3, UFFDIO_REGISTER, {range={start=0xf7afa000, len=0x1000}, mode=UFFDIO_REGISTER_MODE_MISSING, ioctls=1<<_UFFDIO_WAKE|1<<_UFFDIO_COPY|1<<_UFFDIO_ZEROPAGE}) = 0
fstatfs(0,
<here it hangs now and is unkillable>

Needs further testing.

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 20:00             ` Helge Deller
@ 2023-02-28 20:22               ` Helge Deller
  2023-02-28 22:57                 ` Al Viro
  0 siblings, 1 reply; 120+ messages in thread
From: Helge Deller @ 2023-02-28 20:22 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, linux-parisc

Hi Al,

On 2/28/23 21:00, Helge Deller wrote:
> On 2/28/23 20:32, Helge Deller wrote:
>> Hi Al,
>>
>> On 2/28/23 20:14, Al Viro wrote:
>>> On Tue, Feb 28, 2023 at 07:26:48PM +0100, Helge Deller wrote:
>>>
>>>> I can test both parisc32 and parisc64.
>>>>
>>>>>     Just to confirm: your "can be killed with ^C" had been on the
>>>>> mainline parisc kernel (with userfaultfd enable, of course, or it wouldn't
>>>>> hang up at all), right?
>>>>
>>>> It was a recent mainline kernel with your patch.
>>>
>>> Er...  Reproducer *is* supposed to block; the bug is that on the kernel
>>> without this patch it would (AFAICS) be impossible to kill - not even
>>> with kill -9.  With bug fixed the behaviour should be "blocks and is
>>> killable", i.e. what you've reported.
>>>
>>> What does it do on unpatched kernel?  *IF* the big is there, it would
>>> block and be unkillable by any means.  Could you verify that?
>>
>> I just tried 32- and 64-bit kernels.
>> With and without your patch on top of kernel 6.2.
>> The result is always the same:
>> The process hangs, but does not consume CPU and can be killed with ^C or kill command.
>>
>> strace output is:
>>
>> userfaultfd(0)                          = 3
>> ioctl(3, UFFDIO_API, {api=0xaa, features=0 => features=UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID|0x800, ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0
>> ioctl(3, UFFDIO_REGISTER, {range={start=0xf7afa000, len=0x1000}, mode=UFFDIO_REGISTER_MODE_MISSING, ioctls=1<<_UFFDIO_WAKE|1<<_UFFDIO_COPY|1<<_UFFDIO_ZEROPAGE}) = 0
>> fstatfs64(0, 88, {f_type=DEVPTS_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0
>
> Ok, one step further...
>
> We get fooled by glibc, which replaces this line in the testcase:
>          ret = fstatfs(0, (struct statfs *)mem);
> and instead executes the 64-bit variant fstatfs64() inside a wrapper in glibc on another address.
>
> I replaced that line in the testcase to use the 32-bit syscall with the right (userfault-specified) address by
>          ret = syscall(__NR_fstatfs, 0, mem);
> and now I see:
> userfaultfd(0)                          = 3
> ioctl(3, UFFDIO_API, {api=0xaa, features=0 => features=UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID|0x800, ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0
> ioctl(3, UFFDIO_REGISTER, {range={start=0xf7afa000, len=0x1000}, mode=UFFDIO_REGISTER_MODE_MISSING, ioctls=1<<_UFFDIO_WAKE|1<<_UFFDIO_COPY|1<<_UFFDIO_ZEROPAGE}) = 0
> fstatfs(0,
> <here it hangs now and is unkillable>
>
> Needs further testing.

Now I can confirm (with the adjusted reproducer), that your patch
allows to kill the process with SIGKILL, while without your patch
it's not possibe to kill the process at all.
I've tested with a 32- and 64-bit parisc kernel.

You may add
Tested-by: Helge Deller <deller@gmx.de> # parisc
to the patch.

If you want me to take the patch (with the warning regarding missing msg variable fixed)
through the parisc tree, please let me know.

Thank you!
Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 20:22               ` Helge Deller
@ 2023-02-28 22:57                 ` Al Viro
  2023-03-01  4:00                   ` Helge Deller
  0 siblings, 1 reply; 120+ messages in thread
From: Al Viro @ 2023-02-28 22:57 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-arch, linux-parisc

On Tue, Feb 28, 2023 at 09:22:31PM +0100, Helge Deller wrote:

> Now I can confirm (with the adjusted reproducer), that your patch
> allows to kill the process with SIGKILL, while without your patch
> it's not possibe to kill the process at all.
> I've tested with a 32- and 64-bit parisc kernel.
> 
> You may add
> Tested-by: Helge Deller <deller@gmx.de> # parisc
> to the patch.
> 
> If you want me to take the patch (with the warning regarding missing msg variable fixed)
> through the parisc tree, please let me know.

What message do you prefer there?  It matters only for the case when
we are hitting an oops there, but...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-02-28 22:57                 ` Al Viro
@ 2023-03-01  4:00                   ` Helge Deller
  2023-03-02 17:53                     ` Al Viro
  0 siblings, 1 reply; 120+ messages in thread
From: Helge Deller @ 2023-03-01  4:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, linux-parisc

On 2/28/23 23:57, Al Viro wrote:
> On Tue, Feb 28, 2023 at 09:22:31PM +0100, Helge Deller wrote:
>
>> Now I can confirm (with the adjusted reproducer), that your patch
>> allows to kill the process with SIGKILL, while without your patch
>> it's not possibe to kill the process at all.
>> I've tested with a 32- and 64-bit parisc kernel.
>>
>> You may add
>> Tested-by: Helge Deller <deller@gmx.de> # parisc
>> to the patch.
>>
>> If you want me to take the patch (with the warning regarding missing msg variable fixed)
>> through the parisc tree, please let me know.
>
> What message do you prefer there?  It matters only for the case when
> we are hitting an oops there, but...

I have no preference on that message. Maybe "Page fault: fault signal on kernel memory"
is too generic, so I'm fine with any better idea/wording you come up with.

Helge

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/10] parisc: fix livelock in uaccess
  2023-03-01  4:00                   ` Helge Deller
@ 2023-03-02 17:53                     ` Al Viro
  0 siblings, 0 replies; 120+ messages in thread
From: Al Viro @ 2023-03-02 17:53 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-arch, linux-parisc, Linus Torvalds

On Wed, Mar 01, 2023 at 05:00:11AM +0100, Helge Deller wrote:
> On 2/28/23 23:57, Al Viro wrote:
> > On Tue, Feb 28, 2023 at 09:22:31PM +0100, Helge Deller wrote:
> > 
> > > Now I can confirm (with the adjusted reproducer), that your patch
> > > allows to kill the process with SIGKILL, while without your patch
> > > it's not possibe to kill the process at all.
> > > I've tested with a 32- and 64-bit parisc kernel.
> > > 
> > > You may add
> > > Tested-by: Helge Deller <deller@gmx.de> # parisc
> > > to the patch.
> > > 
> > > If you want me to take the patch (with the warning regarding missing msg variable fixed)
> > > through the parisc tree, please let me know.
> > 
> > What message do you prefer there?  It matters only for the case when
> > we are hitting an oops there, but...
> 
> I have no preference on that message. Maybe "Page fault: fault signal on kernel memory"
> is too generic, so I'm fine with any better idea/wording you come up with.

Ended up going with that...  Series reordered and pushed out in #fixes.
Status:
	m68k, riscv, hexagon, parisc: tested by arch maintainers
	alpha, sparc: locally tested (both sparc32 and sparc64 parts)
	ia64, microblaze, nios2, openrisc: builds, but I have no way to test the
resulting kernels (itanic box appears to be pining for fjords and I've no emulator
setup for that, the other three I don't even have userland to test with).

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 01/10] alpha: fix livelock in uaccess
  2023-01-31 20:03   ` Al Viro
@ 2023-03-07  0:48     ` patchwork-bot+linux-riscv
  -1 siblings, 0 replies; 120+ messages in thread
From: patchwork-bot+linux-riscv @ 2023-03-07  0:48 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-riscv, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, monstr, dinguyen, openrisc, linux-parisc, sparclinux,
	torvalds

Hello:

This series was applied to riscv/linux.git (fixes)
by Al Viro <viro@zeniv.linux.org.uk>:

On Tue, 31 Jan 2023 20:03:28 +0000 you wrote:
> alpha equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
> 
> [...]

Here is the summary with links:
  - [01/10] alpha: fix livelock in uaccess
    (no matching commit)
  - [02/10] hexagon: fix livelock in uaccess
    (no matching commit)
  - [03/10] ia64: fix livelock in uaccess
    (no matching commit)
  - [04/10] m68k: fix livelock in uaccess
    (no matching commit)
  - [05/10] microblaze: fix livelock in uaccess
    (no matching commit)
  - [06/10] nios2: fix livelock in uaccess
    (no matching commit)
  - [07/10] openrisc: fix livelock in uaccess
    (no matching commit)
  - [08/10] parisc: fix livelock in uaccess
    (no matching commit)
  - [09/10] riscv: fix livelock in uaccess
    https://git.kernel.org/riscv/c/d835eb3a57de
  - [10/10] sparc: fix livelock in uaccess
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 01/10] alpha: fix livelock in uaccess
@ 2023-03-07  0:48     ` patchwork-bot+linux-riscv
  0 siblings, 0 replies; 120+ messages in thread
From: patchwork-bot+linux-riscv @ 2023-03-07  0:48 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-riscv, linux-arch, linux-alpha, linux-ia64, linux-hexagon,
	linux-m68k, monstr, dinguyen, openrisc, linux-parisc, sparclinux,
	torvalds

Hello:

This series was applied to riscv/linux.git (fixes)
by Al Viro <viro@zeniv.linux.org.uk>:

On Tue, 31 Jan 2023 20:03:28 +0000 you wrote:
> alpha equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
> If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
> end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
> to page tables.  In such case we must *not* return to the faulting insn -
> that would repeat the entire thing without making any progress; what we need
> instead is to treat that as failed (user) memory access.
> 
> [...]

Here is the summary with links:
  - [01/10] alpha: fix livelock in uaccess
    (no matching commit)
  - [02/10] hexagon: fix livelock in uaccess
    (no matching commit)
  - [03/10] ia64: fix livelock in uaccess
    (no matching commit)
  - [04/10] m68k: fix livelock in uaccess
    (no matching commit)
  - [05/10] microblaze: fix livelock in uaccess
    (no matching commit)
  - [06/10] nios2: fix livelock in uaccess
    (no matching commit)
  - [07/10] openrisc: fix livelock in uaccess
    (no matching commit)
  - [08/10] parisc: fix livelock in uaccess
    (no matching commit)
  - [09/10] riscv: fix livelock in uaccess
    https://git.kernel.org/riscv/c/d835eb3a57de
  - [10/10] sparc: fix livelock in uaccess
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2023-03-07  0:49 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-31 20:02 [RFC][PATCHSET] VM_FAULT_RETRY fixes Al Viro
2023-01-31 20:02 ` Al Viro
2023-01-31 20:03 ` [PATCH 01/10] alpha: fix livelock in uaccess Al Viro
2023-01-31 20:03   ` Al Viro
2023-03-07  0:48   ` patchwork-bot+linux-riscv
2023-03-07  0:48     ` patchwork-bot+linux-riscv
2023-01-31 20:03 ` [PATCH 02/10] hexagon: " Al Viro
2023-01-31 20:03   ` Al Viro
2023-02-10  2:59   ` Brian Cain
2023-02-10  2:59     ` Brian Cain
2023-01-31 20:04 ` [PATCH 03/10] ia64: " Al Viro
2023-01-31 20:04   ` Al Viro
2023-01-31 20:04 ` [PATCH 04/10] m68k: " Al Viro
2023-01-31 20:04   ` Al Viro
2023-02-05  6:18   ` Finn Thain
2023-02-05  6:18     ` Finn Thain
2023-02-05  6:18     ` Finn Thain
2023-02-05 18:51     ` Linus Torvalds
2023-02-05 18:51       ` Linus Torvalds
2023-02-05 18:51       ` Linus Torvalds
2023-02-07  3:07       ` Finn Thain
2023-02-07  3:07         ` Finn Thain
2023-02-07  3:07         ` Finn Thain
2023-02-05 20:39     ` Al Viro
2023-02-05 20:39       ` Al Viro
2023-02-05 20:39       ` Al Viro
2023-02-05 20:41       ` Linus Torvalds
2023-02-05 20:41         ` Linus Torvalds
2023-02-05 20:41         ` Linus Torvalds
2023-02-06 12:08   ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-01-31 20:05 ` [PATCH 05/10] microblaze: " Al Viro
2023-01-31 20:05   ` Al Viro
2023-01-31 20:05 ` [PATCH 06/10] nios2: " Al Viro
2023-01-31 20:05   ` Al Viro
2023-01-31 20:06 ` [PATCH 07/10] openrisc: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-01-31 20:06 ` [PATCH 08/10] parisc: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-02-06 16:58   ` Helge Deller
2023-02-06 16:58     ` Helge Deller
2023-02-06 16:58     ` Helge Deller
2023-02-28 17:34     ` Al Viro
2023-02-28 17:34       ` Al Viro
2023-02-28 18:26       ` Helge Deller
2023-02-28 19:14         ` Al Viro
2023-02-28 19:32           ` Helge Deller
2023-02-28 20:00             ` Helge Deller
2023-02-28 20:22               ` Helge Deller
2023-02-28 22:57                 ` Al Viro
2023-03-01  4:00                   ` Helge Deller
2023-03-02 17:53                     ` Al Viro
2023-02-28 15:22   ` Guenter Roeck
2023-02-28 15:22     ` Guenter Roeck
2023-02-28 15:22     ` Guenter Roeck
2023-02-28 19:18     ` Michael Schmitz
2023-02-28 19:18       ` Michael Schmitz
2023-02-28 19:18       ` Michael Schmitz
2023-01-31 20:06 ` [PATCH 09/10] riscv: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-02-06 20:06   ` Björn Töpel
2023-02-06 20:06     ` Björn Töpel
2023-02-06 20:06     ` Björn Töpel
2023-02-07 16:11   ` Geert Uytterhoeven
2023-02-07 16:11     ` Geert Uytterhoeven
2023-02-07 16:11     ` Geert Uytterhoeven
2023-01-31 20:07 ` [PATCH 10/10] sparc: " Al Viro
2023-01-31 20:07   ` Al Viro
2023-01-31 20:24 ` [RFC][PATCHSET] VM_FAULT_RETRY fixes Linus Torvalds
2023-01-31 20:24   ` Linus Torvalds
2023-01-31 20:24   ` Linus Torvalds
2023-01-31 21:10   ` Al Viro
2023-01-31 21:10     ` Al Viro
2023-01-31 21:19     ` Linus Torvalds
2023-01-31 21:19       ` Linus Torvalds
2023-01-31 21:19       ` Linus Torvalds
2023-01-31 21:49       ` Al Viro
2023-01-31 21:49         ` Al Viro
2023-02-01  0:00         ` Linus Torvalds
2023-02-01  0:00           ` Linus Torvalds
2023-02-01  0:00           ` Linus Torvalds
2023-02-01 19:48           ` Peter Xu
2023-02-01 19:48             ` Peter Xu
2023-02-01 19:48             ` Peter Xu
2023-02-01 22:18             ` Al Viro
2023-02-01 22:18               ` Al Viro
2023-02-01 22:18               ` Al Viro
2023-02-02  0:57               ` Al Viro
2023-02-02  0:57                 ` Al Viro
2023-02-02  0:57                 ` Al Viro
2023-02-02 22:56               ` Peter Xu
2023-02-02 22:56                 ` Peter Xu
2023-02-02 22:56                 ` Peter Xu
2023-02-04  0:26                 ` Al Viro
2023-02-04  0:26                   ` Al Viro
2023-02-04  0:26                   ` Al Viro
2023-02-05  5:10                   ` Al Viro
2023-02-05  5:10                     ` Al Viro
2023-02-05  5:10                     ` Al Viro
2023-02-04  0:47         ` [loongarch oddities] " Al Viro
2023-02-01  8:21       ` Helge Deller
2023-02-01  8:21         ` Helge Deller
2023-02-01  8:21         ` Helge Deller
2023-02-01 19:51         ` Linus Torvalds
2023-02-01 19:51           ` Linus Torvalds
2023-02-01 19:51           ` Linus Torvalds
2023-02-02  6:58       ` Al Viro
2023-02-02  6:58         ` Al Viro
2023-02-02  8:54         ` Michael Cree
2023-02-02  9:56           ` John Paul Adrian Glaubitz
2023-02-02 15:20           ` Al Viro
2023-02-02 20:20             ` Al Viro
2023-02-02 20:34         ` Linus Torvalds
2023-02-01 10:50 ` Mark Rutland
2023-02-01 10:50   ` Mark Rutland
2023-02-01 10:50   ` Mark Rutland
2023-02-06 12:08   ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.