All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] exit cleanups
@ 2021-10-20 17:32 ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Andy Lutomirski, Jonas Bonn, Stefan Kristiansson, Stafford Horne,
	openrisc, Nick Hu, Greentime Hu, Vincent Chen, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, linux-s390, Yoshinori Sato,
	Rich Felker, linux-sh, linux-xtensa, Chris Zankel, Max Filippov,
	David Miller, sparclinux, Thomas Bogendoerfer, Maciej Rozycki,
	linux-mips, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Greg Kroah-Hartman


While looking at some issues related to the exit path in the kernel I
found several instances where the code is not using the existing
abstractions properly.

This set of changes introduces force_fatal_sig a way of sending
a signal and not allowing it to be caught, and corrects the
misuse of the existing abstractions that I found.

A lot of the misuse of the existing abstractions are silly things such
as doing something after calling a no return function, rolling BUG by
hand, doing more work than necessary to terminate a kernel thread, or
calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).

It is my plan after sending all of these changes out for review to place
them in a topic branch for sending Linus.  Especially for the changes
that depend upon the new helper force_fatal_sig this is important.

Eric W. Biederman (20):
      exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit
      exit: Remove calls of do_exit after noreturn versions of die
      reboot: Remove the unreachable panic after do_exit in reboot(2)
      signal/sparc32: Remove unreachable do_exit in do_sparc_fault
      signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
      signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
      signal/powerpc: On swapcontext failure force SIGSEGV
      signal/sparc: In setup_tsb_params convert open coded BUG into BUG
      signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
      signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
      signal/s390: Use force_sigsegv in default_trap_handler
      exit/kthread: Have kernel threads return instead of calling do_exit
      signal: Implement force_fatal_sig
      exit/syscall_user_dispatch: Send ordinary signals on failure
      signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
      signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
      signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
      exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
      exit/rtl8712: Replace the macro thread_exit with a simple return 0
      exit/r8188eu: Replace the macro thread_exit with a simple return 0

 arch/mips/kernel/r2300_fpu.S                       |  4 ++--
 arch/mips/kernel/syscall.c                         |  9 --------
 arch/nds32/kernel/traps.c                          |  2 +-
 arch/nds32/mm/fault.c                              |  6 +----
 arch/openrisc/kernel/traps.c                       |  2 +-
 arch/openrisc/mm/fault.c                           |  4 +---
 arch/powerpc/kernel/signal_32.c                    |  6 +++--
 arch/powerpc/kernel/signal_64.c                    |  9 +++++---
 arch/s390/include/asm/kdebug.h                     |  2 +-
 arch/s390/kernel/dumpstack.c                       |  2 +-
 arch/s390/kernel/traps.c                           |  2 +-
 arch/s390/mm/fault.c                               |  2 --
 arch/sh/kernel/cpu/fpu.c                           | 10 +++++----
 arch/sh/kernel/traps.c                             |  2 +-
 arch/sh/mm/fault.c                                 |  2 --
 arch/sparc/kernel/signal_32.c                      |  4 ++--
 arch/sparc/kernel/windows.c                        |  6 +++--
 arch/sparc/mm/fault_32.c                           |  1 -
 arch/sparc/mm/tsb.c                                |  2 +-
 arch/x86/entry/vsyscall/vsyscall_64.c              |  3 ++-
 arch/x86/kernel/doublefault_32.c                   |  3 ---
 arch/x86/kernel/signal.c                           |  6 ++++-
 arch/x86/kernel/vm86_32.c                          |  8 +++----
 arch/xtensa/kernel/traps.c                         |  2 +-
 arch/xtensa/mm/fault.c                             |  3 +--
 drivers/firmware/stratix10-svc.c                   |  4 ++--
 drivers/soc/ti/wkup_m3_ipc.c                       |  2 +-
 drivers/staging/r8188eu/core/rtw_cmd.c             |  2 +-
 drivers/staging/r8188eu/core/rtw_mp.c              |  2 +-
 drivers/staging/r8188eu/include/osdep_service.h    |  2 --
 drivers/staging/rtl8712/osdep_service.h            |  1 -
 drivers/staging/rtl8712/rtl8712_cmd.c              |  2 +-
 drivers/staging/rtl8723bs/core/rtw_cmd.c           |  2 +-
 drivers/staging/rtl8723bs/core/rtw_xmit.c          |  2 +-
 drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c     |  2 +-
 .../rtl8723bs/include/osdep_service_linux.h        |  2 --
 fs/ocfs2/journal.c                                 |  5 +----
 include/linux/sched/signal.h                       |  1 +
 kernel/entry/syscall_user_dispatch.c               | 12 ++++++----
 kernel/kthread.c                                   |  2 +-
 kernel/reboot.c                                    |  1 -
 kernel/signal.c                                    | 26 ++++++++++++++--------
 net/batman-adv/tp_meter.c                          |  2 +-
 43 files changed, 83 insertions(+), 91 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 00/20] exit cleanups
@ 2021-10-20 17:32 ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rich Felker, linux-xtensa, linux-mips, Max Filippov,
	Paul Mackerras, H Peter Anvin, sparclinux, Vincent Chen,
	Thomas Gleixner, linux-arch, linux-s390, Yoshinori Sato,
	linux-sh, Christian Borntraeger, Ingo Molnar, Jonas Bonn,
	Kees Cook, Vasily Gorbik, Heiko Carstens, Stefan Kristiansson,
	openrisc, Borislav Petkov, Al Viro, Andy Lutomirski,
	Stafford Horne, Chris Zankel, Thomas Bogendoerfer, Nick Hu,
	linuxppc-dev, Oleg Nesterov, Greg Kroah-Hartman, Maciej Rozycki,
	Linus Torvalds, David Miller, Greentime Hu


While looking at some issues related to the exit path in the kernel I
found several instances where the code is not using the existing
abstractions properly.

This set of changes introduces force_fatal_sig a way of sending
a signal and not allowing it to be caught, and corrects the
misuse of the existing abstractions that I found.

A lot of the misuse of the existing abstractions are silly things such
as doing something after calling a no return function, rolling BUG by
hand, doing more work than necessary to terminate a kernel thread, or
calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).

It is my plan after sending all of these changes out for review to place
them in a topic branch for sending Linus.  Especially for the changes
that depend upon the new helper force_fatal_sig this is important.

Eric W. Biederman (20):
      exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit
      exit: Remove calls of do_exit after noreturn versions of die
      reboot: Remove the unreachable panic after do_exit in reboot(2)
      signal/sparc32: Remove unreachable do_exit in do_sparc_fault
      signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
      signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
      signal/powerpc: On swapcontext failure force SIGSEGV
      signal/sparc: In setup_tsb_params convert open coded BUG into BUG
      signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
      signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
      signal/s390: Use force_sigsegv in default_trap_handler
      exit/kthread: Have kernel threads return instead of calling do_exit
      signal: Implement force_fatal_sig
      exit/syscall_user_dispatch: Send ordinary signals on failure
      signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
      signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
      signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
      exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
      exit/rtl8712: Replace the macro thread_exit with a simple return 0
      exit/r8188eu: Replace the macro thread_exit with a simple return 0

 arch/mips/kernel/r2300_fpu.S                       |  4 ++--
 arch/mips/kernel/syscall.c                         |  9 --------
 arch/nds32/kernel/traps.c                          |  2 +-
 arch/nds32/mm/fault.c                              |  6 +----
 arch/openrisc/kernel/traps.c                       |  2 +-
 arch/openrisc/mm/fault.c                           |  4 +---
 arch/powerpc/kernel/signal_32.c                    |  6 +++--
 arch/powerpc/kernel/signal_64.c                    |  9 +++++---
 arch/s390/include/asm/kdebug.h                     |  2 +-
 arch/s390/kernel/dumpstack.c                       |  2 +-
 arch/s390/kernel/traps.c                           |  2 +-
 arch/s390/mm/fault.c                               |  2 --
 arch/sh/kernel/cpu/fpu.c                           | 10 +++++----
 arch/sh/kernel/traps.c                             |  2 +-
 arch/sh/mm/fault.c                                 |  2 --
 arch/sparc/kernel/signal_32.c                      |  4 ++--
 arch/sparc/kernel/windows.c                        |  6 +++--
 arch/sparc/mm/fault_32.c                           |  1 -
 arch/sparc/mm/tsb.c                                |  2 +-
 arch/x86/entry/vsyscall/vsyscall_64.c              |  3 ++-
 arch/x86/kernel/doublefault_32.c                   |  3 ---
 arch/x86/kernel/signal.c                           |  6 ++++-
 arch/x86/kernel/vm86_32.c                          |  8 +++----
 arch/xtensa/kernel/traps.c                         |  2 +-
 arch/xtensa/mm/fault.c                             |  3 +--
 drivers/firmware/stratix10-svc.c                   |  4 ++--
 drivers/soc/ti/wkup_m3_ipc.c                       |  2 +-
 drivers/staging/r8188eu/core/rtw_cmd.c             |  2 +-
 drivers/staging/r8188eu/core/rtw_mp.c              |  2 +-
 drivers/staging/r8188eu/include/osdep_service.h    |  2 --
 drivers/staging/rtl8712/osdep_service.h            |  1 -
 drivers/staging/rtl8712/rtl8712_cmd.c              |  2 +-
 drivers/staging/rtl8723bs/core/rtw_cmd.c           |  2 +-
 drivers/staging/rtl8723bs/core/rtw_xmit.c          |  2 +-
 drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c     |  2 +-
 .../rtl8723bs/include/osdep_service_linux.h        |  2 --
 fs/ocfs2/journal.c                                 |  5 +----
 include/linux/sched/signal.h                       |  1 +
 kernel/entry/syscall_user_dispatch.c               | 12 ++++++----
 kernel/kthread.c                                   |  2 +-
 kernel/reboot.c                                    |  1 -
 kernel/signal.c                                    | 26 ++++++++++++++--------
 net/batman-adv/tp_meter.c                          |  2 +-
 43 files changed, 83 insertions(+), 91 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 00/20] exit cleanups
@ 2021-10-20 17:32 ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:32 UTC (permalink / raw)
  To: openrisc


While looking at some issues related to the exit path in the kernel I
found several instances where the code is not using the existing
abstractions properly.

This set of changes introduces force_fatal_sig a way of sending
a signal and not allowing it to be caught, and corrects the
misuse of the existing abstractions that I found.

A lot of the misuse of the existing abstractions are silly things such
as doing something after calling a no return function, rolling BUG by
hand, doing more work than necessary to terminate a kernel thread, or
calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).

It is my plan after sending all of these changes out for review to place
them in a topic branch for sending Linus.  Especially for the changes
that depend upon the new helper force_fatal_sig this is important.

Eric W. Biederman (20):
      exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit
      exit: Remove calls of do_exit after noreturn versions of die
      reboot: Remove the unreachable panic after do_exit in reboot(2)
      signal/sparc32: Remove unreachable do_exit in do_sparc_fault
      signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
      signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
      signal/powerpc: On swapcontext failure force SIGSEGV
      signal/sparc: In setup_tsb_params convert open coded BUG into BUG
      signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
      signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
      signal/s390: Use force_sigsegv in default_trap_handler
      exit/kthread: Have kernel threads return instead of calling do_exit
      signal: Implement force_fatal_sig
      exit/syscall_user_dispatch: Send ordinary signals on failure
      signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
      signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
      signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
      exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
      exit/rtl8712: Replace the macro thread_exit with a simple return 0
      exit/r8188eu: Replace the macro thread_exit with a simple return 0

 arch/mips/kernel/r2300_fpu.S                       |  4 ++--
 arch/mips/kernel/syscall.c                         |  9 --------
 arch/nds32/kernel/traps.c                          |  2 +-
 arch/nds32/mm/fault.c                              |  6 +----
 arch/openrisc/kernel/traps.c                       |  2 +-
 arch/openrisc/mm/fault.c                           |  4 +---
 arch/powerpc/kernel/signal_32.c                    |  6 +++--
 arch/powerpc/kernel/signal_64.c                    |  9 +++++---
 arch/s390/include/asm/kdebug.h                     |  2 +-
 arch/s390/kernel/dumpstack.c                       |  2 +-
 arch/s390/kernel/traps.c                           |  2 +-
 arch/s390/mm/fault.c                               |  2 --
 arch/sh/kernel/cpu/fpu.c                           | 10 +++++----
 arch/sh/kernel/traps.c                             |  2 +-
 arch/sh/mm/fault.c                                 |  2 --
 arch/sparc/kernel/signal_32.c                      |  4 ++--
 arch/sparc/kernel/windows.c                        |  6 +++--
 arch/sparc/mm/fault_32.c                           |  1 -
 arch/sparc/mm/tsb.c                                |  2 +-
 arch/x86/entry/vsyscall/vsyscall_64.c              |  3 ++-
 arch/x86/kernel/doublefault_32.c                   |  3 ---
 arch/x86/kernel/signal.c                           |  6 ++++-
 arch/x86/kernel/vm86_32.c                          |  8 +++----
 arch/xtensa/kernel/traps.c                         |  2 +-
 arch/xtensa/mm/fault.c                             |  3 +--
 drivers/firmware/stratix10-svc.c                   |  4 ++--
 drivers/soc/ti/wkup_m3_ipc.c                       |  2 +-
 drivers/staging/r8188eu/core/rtw_cmd.c             |  2 +-
 drivers/staging/r8188eu/core/rtw_mp.c              |  2 +-
 drivers/staging/r8188eu/include/osdep_service.h    |  2 --
 drivers/staging/rtl8712/osdep_service.h            |  1 -
 drivers/staging/rtl8712/rtl8712_cmd.c              |  2 +-
 drivers/staging/rtl8723bs/core/rtw_cmd.c           |  2 +-
 drivers/staging/rtl8723bs/core/rtw_xmit.c          |  2 +-
 drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c     |  2 +-
 .../rtl8723bs/include/osdep_service_linux.h        |  2 --
 fs/ocfs2/journal.c                                 |  5 +----
 include/linux/sched/signal.h                       |  1 +
 kernel/entry/syscall_user_dispatch.c               | 12 ++++++----
 kernel/kthread.c                                   |  2 +-
 kernel/reboot.c                                    |  1 -
 kernel/signal.c                                    | 26 ++++++++++++++--------
 net/batman-adv/tp_meter.c                          |  2 +-
 43 files changed, 83 insertions(+), 91 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 01/20] exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit
  2021-10-20 17:32 ` Eric W. Biederman
  (?)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:02   ` Kees Cook
  -1 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Andy Lutomirski

I do not see panic calling rewind_stack_do_exit anywhere, nor can I
find anywhere in the history where doublefault_shim has called
rewind_stack_do_exit.  So I don't think this comment was ever actually
correct.

Cc: Andy Lutomirski <luto@kernel.org>
Fixes: 7d8d8cfdee9a ("x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/doublefault_32.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
index d1d49e3d536b..3b58d8703094 100644
--- a/arch/x86/kernel/doublefault_32.c
+++ b/arch/x86/kernel/doublefault_32.c
@@ -77,9 +77,6 @@ asmlinkage noinstr void __noreturn doublefault_shim(void)
 	 * some way to reconstruct CR3.  We could make a credible guess based
 	 * on cpu_tlbstate, but that would be racy and would not account for
 	 * PTI.
-	 *
-	 * Instead, don't bother.  We can return through
-	 * rewind_stack_do_exit() instead.
 	 */
 	panic("cannot return from double fault\n");
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 02/20] exit: Remove calls of do_exit after noreturn versions of die
  2021-10-20 17:32 ` Eric W. Biederman
@ 2021-10-20 17:43   ` Eric W. Biederman
  -1 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, openrisc, Nick Hu, Greentime Hu, Vincent Chen,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	Yoshinori Sato, Rich Felker, linux-sh, linux-xtensa,
	Chris Zankel, Max Filippov

On nds32, openrisc, s390, sh, and xtensa the function die never
returns.  Mark die __noreturn so that no one expects die to return.
Remove the do_exit calls after die as they will never be reached.

Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: openrisc@lists.librecores.org
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Fixes: 2.3.16
Fixes: 2.3.99-pre8
Fixes: 3f65ce4d141e ("[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 5")
Fixes: 664eec400bf8 ("nds32: MMU fault handling and page table management")
Fixes: 61e85e367535 ("OpenRISC: Memory management")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/nds32/kernel/traps.c      | 2 +-
 arch/nds32/mm/fault.c          | 6 +-----
 arch/openrisc/kernel/traps.c   | 2 +-
 arch/openrisc/mm/fault.c       | 4 +---
 arch/s390/include/asm/kdebug.h | 2 +-
 arch/s390/kernel/dumpstack.c   | 2 +-
 arch/s390/mm/fault.c           | 2 --
 arch/sh/kernel/traps.c         | 2 +-
 arch/sh/mm/fault.c             | 2 --
 arch/xtensa/kernel/traps.c     | 2 +-
 arch/xtensa/mm/fault.c         | 3 +--
 11 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/nds32/kernel/traps.c b/arch/nds32/kernel/traps.c
index f06421c645af..ca75d475eda4 100644
--- a/arch/nds32/kernel/traps.c
+++ b/arch/nds32/kernel/traps.c
@@ -118,7 +118,7 @@ DEFINE_SPINLOCK(die_lock);
 /*
  * This function is protected against re-entrancy.
  */
-void die(const char *str, struct pt_regs *regs, int err)
+void __noreturn die(const char *str, struct pt_regs *regs, int err)
 {
 	struct task_struct *tsk = current;
 	static int die_counter;
diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c
index f02524eb6d56..1d139b117168 100644
--- a/arch/nds32/mm/fault.c
+++ b/arch/nds32/mm/fault.c
@@ -13,7 +13,7 @@
 
 #include <asm/tlbflush.h>
 
-extern void die(const char *str, struct pt_regs *regs, long err);
+extern void __noreturn die(const char *str, struct pt_regs *regs, long err);
 
 /*
  * This is useful to dump out the page tables associated with
@@ -299,10 +299,6 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 
 	show_pte(mm, addr);
 	die("Oops", regs, error_code);
-	bust_spinlocks(0);
-	do_exit(SIGKILL);
-
-	return;
 
 	/*
 	 * We ran out of memory, or some other thing happened to us that made
diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index aa1e709405ac..0898cb159fac 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -197,7 +197,7 @@ void nommu_dump_state(struct pt_regs *regs,
 }
 
 /* This is normally the 'Oops' routine */
-void die(const char *str, struct pt_regs *regs, long err)
+void __noreturn die(const char *str, struct pt_regs *regs, long err)
 {
 
 	console_verbose();
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index c730d1a51686..f0fa6394a58e 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -32,7 +32,7 @@ unsigned long pte_errors;	/* updated by do_page_fault() */
  */
 volatile pgd_t *current_pgd[NR_CPUS];
 
-extern void die(char *, struct pt_regs *, long);
+extern void __noreturn die(char *, struct pt_regs *, long);
 
 /*
  * This routine handles page faults.  It determines the address,
@@ -248,8 +248,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	die("Oops", regs, write_acc);
 
-	do_exit(SIGKILL);
-
 	/*
 	 * We ran out of memory, or some other thing happened to us that made
 	 * us unable to handle the page fault gracefully.
diff --git a/arch/s390/include/asm/kdebug.h b/arch/s390/include/asm/kdebug.h
index d5327f064799..4377238e4752 100644
--- a/arch/s390/include/asm/kdebug.h
+++ b/arch/s390/include/asm/kdebug.h
@@ -23,6 +23,6 @@ enum die_val {
 	DIE_NMI_IPI,
 };
 
-extern void die(struct pt_regs *, const char *);
+extern void __noreturn die(struct pt_regs *, const char *);
 
 #endif
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index db1bc00229ca..f45e66b8bed6 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -192,7 +192,7 @@ void show_regs(struct pt_regs *regs)
 
 static DEFINE_SPINLOCK(die_lock);
 
-void die(struct pt_regs *regs, const char *str)
+void __noreturn die(struct pt_regs *regs, const char *str)
 {
 	static int die_counter;
 
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 212632d57db9..d30f5986fa85 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -260,7 +260,6 @@ static noinline void do_no_context(struct pt_regs *regs)
 		       " in virtual user address space\n");
 	dump_fault_info(regs);
 	die(regs, "Oops");
-	do_exit(SIGKILL);
 }
 
 static noinline void do_low_address(struct pt_regs *regs)
@@ -270,7 +269,6 @@ static noinline void do_low_address(struct pt_regs *regs)
 	if (regs->psw.mask & PSW_MASK_PSTATE) {
 		/* Low-address protection hit in user mode 'cannot happen'. */
 		die (regs, "Low-address protection");
-		do_exit(SIGKILL);
 	}
 
 	do_no_context(regs);
diff --git a/arch/sh/kernel/traps.c b/arch/sh/kernel/traps.c
index e76b22157099..cbe3201d4f21 100644
--- a/arch/sh/kernel/traps.c
+++ b/arch/sh/kernel/traps.c
@@ -20,7 +20,7 @@
 
 static DEFINE_SPINLOCK(die_lock);
 
-void die(const char *str, struct pt_regs *regs, long err)
+void __noreturn die(const char *str, struct pt_regs *regs, long err)
 {
 	static int die_counter;
 
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 88a1f453d73e..1e1aa75df3ca 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -238,8 +238,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	show_fault_oops(regs, address);
 
 	die("Oops", regs, error_code);
-	bust_spinlocks(0);
-	do_exit(SIGKILL);
 }
 
 static void
diff --git a/arch/xtensa/kernel/traps.c b/arch/xtensa/kernel/traps.c
index 874b6efc6fb3..fb056a191339 100644
--- a/arch/xtensa/kernel/traps.c
+++ b/arch/xtensa/kernel/traps.c
@@ -527,7 +527,7 @@ void show_stack(struct task_struct *task, unsigned long *sp, const char *loglvl)
 
 DEFINE_SPINLOCK(die_lock);
 
-void die(const char * str, struct pt_regs * regs, long err)
+void __noreturn die(const char * str, struct pt_regs * regs, long err)
 {
 	static int die_counter;
 	const char *pr = "";
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 95a74890c7e9..fd6a70635962 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -238,7 +238,7 @@ void do_page_fault(struct pt_regs *regs)
 void
 bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 {
-	extern void die(const char*, struct pt_regs*, long);
+	extern void __noreturn die(const char*, struct pt_regs*, long);
 	const struct exception_table_entry *entry;
 
 	/* Are we prepared to handle this kernel fault?  */
@@ -257,5 +257,4 @@ bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 		 "address %08lx\n pc = %08lx, ra = %08lx\n",
 		 address, regs->pc, regs->areg[0]);
 	die("Oops", regs, sig);
-	do_exit(sig);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 02/20] exit: Remove calls of do_exit after noreturn versions of die
@ 2021-10-20 17:43   ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: openrisc

On nds32, openrisc, s390, sh, and xtensa the function die never
returns.  Mark die __noreturn so that no one expects die to return.
Remove the do_exit calls after die as they will never be reached.

Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: openrisc at lists.librecores.org
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390 at vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh at vger.kernel.org
Cc: linux-xtensa at linux-xtensa.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Fixes: 2.3.16
Fixes: 2.3.99-pre8
Fixes: 3f65ce4d141e ("[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 5")
Fixes: 664eec400bf8 ("nds32: MMU fault handling and page table management")
Fixes: 61e85e367535 ("OpenRISC: Memory management")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/nds32/kernel/traps.c      | 2 +-
 arch/nds32/mm/fault.c          | 6 +-----
 arch/openrisc/kernel/traps.c   | 2 +-
 arch/openrisc/mm/fault.c       | 4 +---
 arch/s390/include/asm/kdebug.h | 2 +-
 arch/s390/kernel/dumpstack.c   | 2 +-
 arch/s390/mm/fault.c           | 2 --
 arch/sh/kernel/traps.c         | 2 +-
 arch/sh/mm/fault.c             | 2 --
 arch/xtensa/kernel/traps.c     | 2 +-
 arch/xtensa/mm/fault.c         | 3 +--
 11 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/nds32/kernel/traps.c b/arch/nds32/kernel/traps.c
index f06421c645af..ca75d475eda4 100644
--- a/arch/nds32/kernel/traps.c
+++ b/arch/nds32/kernel/traps.c
@@ -118,7 +118,7 @@ DEFINE_SPINLOCK(die_lock);
 /*
  * This function is protected against re-entrancy.
  */
-void die(const char *str, struct pt_regs *regs, int err)
+void __noreturn die(const char *str, struct pt_regs *regs, int err)
 {
 	struct task_struct *tsk = current;
 	static int die_counter;
diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c
index f02524eb6d56..1d139b117168 100644
--- a/arch/nds32/mm/fault.c
+++ b/arch/nds32/mm/fault.c
@@ -13,7 +13,7 @@
 
 #include <asm/tlbflush.h>
 
-extern void die(const char *str, struct pt_regs *regs, long err);
+extern void __noreturn die(const char *str, struct pt_regs *regs, long err);
 
 /*
  * This is useful to dump out the page tables associated with
@@ -299,10 +299,6 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 
 	show_pte(mm, addr);
 	die("Oops", regs, error_code);
-	bust_spinlocks(0);
-	do_exit(SIGKILL);
-
-	return;
 
 	/*
 	 * We ran out of memory, or some other thing happened to us that made
diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index aa1e709405ac..0898cb159fac 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -197,7 +197,7 @@ void nommu_dump_state(struct pt_regs *regs,
 }
 
 /* This is normally the 'Oops' routine */
-void die(const char *str, struct pt_regs *regs, long err)
+void __noreturn die(const char *str, struct pt_regs *regs, long err)
 {
 
 	console_verbose();
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index c730d1a51686..f0fa6394a58e 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -32,7 +32,7 @@ unsigned long pte_errors;	/* updated by do_page_fault() */
  */
 volatile pgd_t *current_pgd[NR_CPUS];
 
-extern void die(char *, struct pt_regs *, long);
+extern void __noreturn die(char *, struct pt_regs *, long);
 
 /*
  * This routine handles page faults.  It determines the address,
@@ -248,8 +248,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	die("Oops", regs, write_acc);
 
-	do_exit(SIGKILL);
-
 	/*
 	 * We ran out of memory, or some other thing happened to us that made
 	 * us unable to handle the page fault gracefully.
diff --git a/arch/s390/include/asm/kdebug.h b/arch/s390/include/asm/kdebug.h
index d5327f064799..4377238e4752 100644
--- a/arch/s390/include/asm/kdebug.h
+++ b/arch/s390/include/asm/kdebug.h
@@ -23,6 +23,6 @@ enum die_val {
 	DIE_NMI_IPI,
 };
 
-extern void die(struct pt_regs *, const char *);
+extern void __noreturn die(struct pt_regs *, const char *);
 
 #endif
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index db1bc00229ca..f45e66b8bed6 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -192,7 +192,7 @@ void show_regs(struct pt_regs *regs)
 
 static DEFINE_SPINLOCK(die_lock);
 
-void die(struct pt_regs *regs, const char *str)
+void __noreturn die(struct pt_regs *regs, const char *str)
 {
 	static int die_counter;
 
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 212632d57db9..d30f5986fa85 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -260,7 +260,6 @@ static noinline void do_no_context(struct pt_regs *regs)
 		       " in virtual user address space\n");
 	dump_fault_info(regs);
 	die(regs, "Oops");
-	do_exit(SIGKILL);
 }
 
 static noinline void do_low_address(struct pt_regs *regs)
@@ -270,7 +269,6 @@ static noinline void do_low_address(struct pt_regs *regs)
 	if (regs->psw.mask & PSW_MASK_PSTATE) {
 		/* Low-address protection hit in user mode 'cannot happen'. */
 		die (regs, "Low-address protection");
-		do_exit(SIGKILL);
 	}
 
 	do_no_context(regs);
diff --git a/arch/sh/kernel/traps.c b/arch/sh/kernel/traps.c
index e76b22157099..cbe3201d4f21 100644
--- a/arch/sh/kernel/traps.c
+++ b/arch/sh/kernel/traps.c
@@ -20,7 +20,7 @@
 
 static DEFINE_SPINLOCK(die_lock);
 
-void die(const char *str, struct pt_regs *regs, long err)
+void __noreturn die(const char *str, struct pt_regs *regs, long err)
 {
 	static int die_counter;
 
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 88a1f453d73e..1e1aa75df3ca 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -238,8 +238,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	show_fault_oops(regs, address);
 
 	die("Oops", regs, error_code);
-	bust_spinlocks(0);
-	do_exit(SIGKILL);
 }
 
 static void
diff --git a/arch/xtensa/kernel/traps.c b/arch/xtensa/kernel/traps.c
index 874b6efc6fb3..fb056a191339 100644
--- a/arch/xtensa/kernel/traps.c
+++ b/arch/xtensa/kernel/traps.c
@@ -527,7 +527,7 @@ void show_stack(struct task_struct *task, unsigned long *sp, const char *loglvl)
 
 DEFINE_SPINLOCK(die_lock);
 
-void die(const char * str, struct pt_regs * regs, long err)
+void __noreturn die(const char * str, struct pt_regs * regs, long err)
 {
 	static int die_counter;
 	const char *pr = "";
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 95a74890c7e9..fd6a70635962 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -238,7 +238,7 @@ void do_page_fault(struct pt_regs *regs)
 void
 bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 {
-	extern void die(const char*, struct pt_regs*, long);
+	extern void __noreturn die(const char*, struct pt_regs*, long);
 	const struct exception_table_entry *entry;
 
 	/* Are we prepared to handle this kernel fault?  */
@@ -257,5 +257,4 @@ bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 		 "address %08lx\n pc = %08lx, ra = %08lx\n",
 		 address, regs->pc, regs->areg[0]);
 	die("Oops", regs, sig);
-	do_exit(sig);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 03/20] reboot: Remove the unreachable panic after do_exit in reboot(2)
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (3 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:05   ` Kees Cook
  -1 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/reboot.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index f7440c0c7e43..d6e0f9fb7f04 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -359,7 +359,6 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 	case LINUX_REBOOT_CMD_HALT:
 		kernel_halt();
 		do_exit(0);
-		panic("cannot halt");
 
 	case LINUX_REBOOT_CMD_POWER_OFF:
 		kernel_power_off();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 04/20] signal/sparc32: Remove unreachable do_exit in do_sparc_fault
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (4 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:05   ` Kees Cook
  -1 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, David Miller, sparclinux

The call to do_exit in do_sparc_fault immediately follows a call to
unhandled_fault.  The function unhandled_fault never returns.  This
means the call to do_exit can never be reached.

Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Fixes: 2.3.41
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/sparc/mm/fault_32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index fa858626b85b..90dc4ae315c8 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -248,7 +248,6 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	}
 
 	unhandled_fault(address, tsk, regs);
-	do_exit(SIGKILL);
 
 /*
  * We ran out of memory, or some other thing happened to us that made
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (5 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:06   ` Kees Cook
                     ` (2 more replies)
  -1 siblings, 3 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Thomas Bogendoerfer, Maciej Rozycki,
	linux-mips

When an instruction to save or restore a register from the stack fails
in _save_fp_context or _restore_fp_context return with -EFAULT.  This
change was made to r2300_fpu.S[1] but it looks like it got lost with
the introduction of EX2[2].  This is also what the other implementation
of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and
what is needed for the callers to be able to handle the error.

Furthermore calling do_exit(SIGSEGV) from bad_stack is wrong because
it does not terminate the entire process it just terminates a single
thread.

As the changed code was the only caller of arch/mips/kernel/syscall.c:bad_stack
remove the problematic and now unused helper function.

Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Maciej Rozycki <macro@orcam.me.uk>
Cc: linux-mips@vger.kernel.org
[1] 35938a00ba86 ("MIPS: Fix ISA I FP sigcontext access violation handling")
[2] f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout")
Fixes: f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/mips/kernel/r2300_fpu.S | 4 ++--
 arch/mips/kernel/syscall.c   | 9 ---------
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/mips/kernel/r2300_fpu.S b/arch/mips/kernel/r2300_fpu.S
index 12e58053544f..cbf6db98cfb3 100644
--- a/arch/mips/kernel/r2300_fpu.S
+++ b/arch/mips/kernel/r2300_fpu.S
@@ -29,8 +29,8 @@
 #define EX2(a,b)						\
 9:	a,##b;							\
 	.section __ex_table,"a";				\
-	PTR	9b,bad_stack;					\
-	PTR	9b+4,bad_stack;					\
+	PTR	9b,fault;					\
+	PTR	9b+4,fault;					\
 	.previous
 
 	.set	mips1
diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index 2afa3eef486a..5512cd586e6e 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -240,12 +240,3 @@ SYSCALL_DEFINE3(cachectl, char *, addr, int, nbytes, int, op)
 {
 	return -ENOSYS;
 }
-
-/*
- * If we ever come here the user sp is bad.  Zap the process right away.
- * Due to the bad stack signaling wouldn't work.
- */
-asmlinkage void bad_stack(void)
-{
-	do_exit(SIGSEGV);
-}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 06/20] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (6 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-20 19:57   ` Linus Torvalds
  2021-10-21 16:08   ` Kees Cook
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Yoshinori Sato, Rich Felker, linux-sh

Today the sh code allocates memory the first time a process uses
the fpu.  If that memory allocation fails, kill the affected task
with force_sig(SIGKILL) rather than do_group_exit(SIGKILL).

Calling do_group_exit from an exception handler can potentially lead
to dead locks as do_group_exit is not designed to be called from
interrupt context.  Instead use force_sig(SIGKILL) to kill the
userspace process.  Sending signals in general and force_sig in
particular has been tested from interrupt context so there should be
no problems.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Fixes: 0ea820cf9bf5 ("sh: Move over to dynamically allocated FPU context.")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/sh/kernel/cpu/fpu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/sh/kernel/cpu/fpu.c b/arch/sh/kernel/cpu/fpu.c
index ae354a2931e7..fd6db0ab1928 100644
--- a/arch/sh/kernel/cpu/fpu.c
+++ b/arch/sh/kernel/cpu/fpu.c
@@ -62,18 +62,20 @@ void fpu_state_restore(struct pt_regs *regs)
 	}
 
 	if (!tsk_used_math(tsk)) {
-		local_irq_enable();
+		int ret;
 		/*
 		 * does a slab alloc which can sleep
 		 */
-		if (init_fpu(tsk)) {
+		local_irq_enable();
+		ret = init_fpu(tsk);
+		local_irq_disable();
+		if (ret) {
 			/*
 			 * ran out of memory!
 			 */
-			do_group_exit(SIGKILL);
+			force_sig(SIGKILL);
 			return;
 		}
-		local_irq_disable();
 	}
 
 	grab_fpu(regs);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 07/20] signal/powerpc: On swapcontext failure force SIGSEGV
  2021-10-20 17:32 ` Eric W. Biederman
@ 2021-10-20 17:43   ` Eric W. Biederman
  -1 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev

If the register state may be partial and corrupted instead of calling
do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
process with SIGSEGV and does not let any more userspace code execute,
instead of just killing one thread of the process and potentially
confusing everything.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/powerpc/kernel/signal_32.c | 6 ++++--
 arch/powerpc/kernel/signal_64.c | 9 ++++++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 0608581967f0..666f3da41232 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1062,8 +1062,10 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 * or if another thread unmaps the region containing the context.
 	 * We kill the task with a SIGSEGV in this situation.
 	 */
-	if (do_setcontext(new_ctx, regs, 0))
-		do_exit(SIGSEGV);
+	if (do_setcontext(new_ctx, regs, 0)) {
+		force_sigsegv(SIGSEGV);
+		return -EFAULT;
+	}
 
 	set_thread_flag(TIF_RESTOREALL);
 	return 0;
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 1831bba0582e..d8de622c9e4a 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -703,15 +703,18 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 * We kill the task with a SIGSEGV in this situation.
 	 */
 
-	if (__get_user_sigset(&set, &new_ctx->uc_sigmask))
-		do_exit(SIGSEGV);
+	if (__get_user_sigset(&set, &new_ctx->uc_sigmask)) {
+		force_sigsegv(SIGSEGV);
+		return -EFAULT;
+	}
 	set_current_blocked(&set);
 
 	if (!user_read_access_begin(new_ctx, ctx_size))
 		return -EFAULT;
 	if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) {
 		user_read_access_end();
-		do_exit(SIGSEGV);
+		force_sigsegv(SIGSEGV);
+		return -EFAULT;
 	}
 	user_read_access_end();
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 07/20] signal/powerpc: On swapcontext failure force SIGSEGV
@ 2021-10-20 17:43   ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Kees Cook, linuxppc-dev, Oleg Nesterov,
	Paul Mackerras, Eric W. Biederman, Linus Torvalds, Al Viro

If the register state may be partial and corrupted instead of calling
do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
process with SIGSEGV and does not let any more userspace code execute,
instead of just killing one thread of the process and potentially
confusing everything.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/powerpc/kernel/signal_32.c | 6 ++++--
 arch/powerpc/kernel/signal_64.c | 9 ++++++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 0608581967f0..666f3da41232 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1062,8 +1062,10 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 * or if another thread unmaps the region containing the context.
 	 * We kill the task with a SIGSEGV in this situation.
 	 */
-	if (do_setcontext(new_ctx, regs, 0))
-		do_exit(SIGSEGV);
+	if (do_setcontext(new_ctx, regs, 0)) {
+		force_sigsegv(SIGSEGV);
+		return -EFAULT;
+	}
 
 	set_thread_flag(TIF_RESTOREALL);
 	return 0;
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 1831bba0582e..d8de622c9e4a 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -703,15 +703,18 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 * We kill the task with a SIGSEGV in this situation.
 	 */
 
-	if (__get_user_sigset(&set, &new_ctx->uc_sigmask))
-		do_exit(SIGSEGV);
+	if (__get_user_sigset(&set, &new_ctx->uc_sigmask)) {
+		force_sigsegv(SIGSEGV);
+		return -EFAULT;
+	}
 	set_current_blocked(&set);
 
 	if (!user_read_access_begin(new_ctx, ctx_size))
 		return -EFAULT;
 	if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) {
 		user_read_access_end();
-		do_exit(SIGSEGV);
+		force_sigsegv(SIGSEGV);
+		return -EFAULT;
 	}
 	user_read_access_end();
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 08/20] signal/sparc: In setup_tsb_params convert open coded BUG into BUG
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (8 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:12   ` Kees Cook
  -1 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, David Miller, sparclinux

The function setup_tsb_params has exactly one caller tsb_grow.  The
function tsb_grow passes in a tsb_bytes value that is between 8192 and
1048576 inclusive, and is guaranteed to be a power of 2.  The function
setup_tsb_params verifies this property with a switch statement and
then prints an error and causes the task to exit if this is not true.

In practice that print statement can never be reached because tsb_grow
never passes in a bad tsb_size.  So if tsb_size ever gets a bad value
that is a kernel bug.

So replace the do_exit which is effectively an open coded version of
BUG() with an actuall call to BUG().  Making it clearer that this
is a case that can never, and should never happen.

Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/sparc/mm/tsb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index 0dce4b7ff73e..912205787161 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -266,7 +266,7 @@ static void setup_tsb_params(struct mm_struct *mm, unsigned long tsb_idx, unsign
 	default:
 		printk(KERN_ERR "TSB[%s:%d]: Impossible TSB size %lu, killing process.\n",
 		       current->comm, current->pid, tsb_bytes);
-		do_exit(SIGSEGV);
+		BUG();
 	}
 	tte |= pte_sz_bits(page_sz);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (9 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:15   ` Kees Cook
  2021-11-12 15:40   ` Eric W. Biederman
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H Peter Anvin

The function save_v86_state is only called when userspace was
operating in vm86 mode before entering the kernel.  Not having vm86
state in the task_struct should never happen.  So transform the hand
rolled BUG_ON into an actual BUG_ON to make it clear what is
happening.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: H Peter Anvin <hpa@zytor.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/vm86_32.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index e5a7a10a0164..63486da77272 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -106,10 +106,8 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	 */
 	local_irq_enable();
 
-	if (!vm86 || !vm86->user_vm86) {
-		pr_alert("no user_vm86: BAD\n");
-		do_exit(SIGSEGV);
-	}
+	BUG_ON(!vm86 || !vm86->user_vm86);
+
 	set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | vm86->veflags_mask);
 	user = vm86->user_vm86;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (10 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:16   ` Kees Cook
                     ` (2 more replies)
  -1 siblings, 3 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H Peter Anvin

Instead of pretending to send SIGSEGV by calling do_exit(SIGSEGV)
call force_sigsegv(SIGSEGV) to force the process to take a SIGSEGV
and terminate.

Update handle_signal to return immediately when save_v86_state fails
and kills the process.  Returning immediately without doing anything
except killing the process with SIGSEGV is also what signal_setup_done
does when setup_rt_frame fails.  Plus it is always ok to return
immediately without delivering a signal to a userspace handler when a
fatal signal has killed the current process.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: H Peter Anvin <hpa@zytor.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/signal.c  | 6 +++++-
 arch/x86/kernel/vm86_32.c | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index f4d21e470083..25a230f705c1 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -785,8 +785,12 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	bool stepping, failed;
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (v8086_mode(regs))
+	if (v8086_mode(regs)) {
 		save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
+		/* Has save_v86_state failed and killed the process? */
+		if (fatal_signal_pending(current))
+			return;
+	}
 
 	/* Are we from a system call? */
 	if (syscall_get_nr(current, regs) != -1) {
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 63486da77272..040fd01be8b3 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	user_access_end();
 Efault:
 	pr_alert("could not access userspace vm86 info\n");
-	do_exit(SIGSEGV);
+	force_sigsegv(SIGSEGV);
 }
 
 static int do_vm86_irq_handling(int subfunction, int irqnumber);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (11 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 16:17   ` Kees Cook
  2021-10-26  9:38   ` Christian Borntraeger
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, linux-s390

Reading the history it is unclear why default_trap_handler calls
do_exit.  It is not even menthioned in the commit where the change
happened.  My best guess is that because it is unknown why the
exception happened it was desired to guarantee the process never
returned to userspace.

Using do_exit(SIGSEGV) has the problem that it will only terminate one
thread of a process, leaving the process in an undefined state.

Use force_sigsegv(SIGSEGV) instead which effectively has the same
behavior except that is uses the ordinary signal mechanism and
terminates all threads of a process and is generally well defined.

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/s390/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index bcefc2173de4..51729ea2cf8e 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
 		report_user_fault(regs, SIGSEGV, 0);
-		do_exit(SIGSEGV);
+		force_sigsegv(SIGSEGV);
 	} else
 		die(regs, "Unknown program exception");
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 12/20] exit/kthread: Have kernel threads return instead of calling do_exit
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (12 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-21 11:12   ` Christoph Hellwig
  2021-10-21 16:21   ` Kees Cook
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman

In 2009 Oleg reworked[1] the kernel threads so that it is not
necessary to call do_exit if you are not using kthread_stop().  Remove
the explicit calls of do_exit and complete_and_exit (with a NULL
completion) that were previously necessary.

[1] 63706172f332 ("kthreads: rework kthread_stop()")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/firmware/stratix10-svc.c | 4 ++--
 drivers/soc/ti/wkup_m3_ipc.c     | 2 +-
 fs/ocfs2/journal.c               | 5 +----
 kernel/kthread.c                 | 2 +-
 net/batman-adv/tp_meter.c        | 2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/stratix10-svc.c b/drivers/firmware/stratix10-svc.c
index 2a7687911c09..29c0a616b317 100644
--- a/drivers/firmware/stratix10-svc.c
+++ b/drivers/firmware/stratix10-svc.c
@@ -520,7 +520,7 @@ static int svc_normal_to_secure_thread(void *data)
  * physical address of memory block reserved by secure monitor software at
  * secure world.
  *
- * svc_normal_to_secure_shm_thread() calls do_exit() directly since it is a
+ * svc_normal_to_secure_shm_thread() terminates directly since it is a
  * standlone thread for which no one will call kthread_stop() or return when
  * 'kthread_should_stop()' is true.
  */
@@ -544,7 +544,7 @@ static int svc_normal_to_secure_shm_thread(void *data)
 	}
 
 	complete(&sh_mem->sync_complete);
-	do_exit(0);
+	return 0;
 }
 
 /**
diff --git a/drivers/soc/ti/wkup_m3_ipc.c b/drivers/soc/ti/wkup_m3_ipc.c
index 09abd17065ba..0733443a2631 100644
--- a/drivers/soc/ti/wkup_m3_ipc.c
+++ b/drivers/soc/ti/wkup_m3_ipc.c
@@ -426,7 +426,7 @@ static void wkup_m3_rproc_boot_thread(struct wkup_m3_ipc *m3_ipc)
 	else
 		m3_ipc_state = m3_ipc;
 
-	do_exit(0);
+	return 0;
 }
 
 static int wkup_m3_ipc_probe(struct platform_device *pdev)
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 4f15750aac5d..329986f12db3 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -1497,10 +1497,7 @@ static int __ocfs2_recovery_thread(void *arg)
 	if (quota_enabled)
 		kfree(rm_quota);
 
-	/* no one is callint kthread_stop() for us so the kthread() api
-	 * requires that we call do_exit().  And it isn't exported, but
-	 * complete_and_exit() seems to be a minimal wrapper around it. */
-	complete_and_exit(NULL, status);
+	return status;
 }
 
 void ocfs2_recovery_thread(struct ocfs2_super *osb, int node_num)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5b37a8567168..33e17beaa682 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -433,7 +433,7 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
  * If thread is going to be bound on a particular cpu, give its node
  * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
  * When woken, the thread will run @threadfn() with @data as its
- * argument. @threadfn() can either call do_exit() directly if it is a
+ * argument. @threadfn() can either return directly if it is a
  * standalone thread for which no one will call kthread_stop(), or
  * return when 'kthread_should_stop()' is true (which means
  * kthread_stop() has been called).  The return value should be zero
diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c
index 56b9fe97b3b4..1252540cde17 100644
--- a/net/batman-adv/tp_meter.c
+++ b/net/batman-adv/tp_meter.c
@@ -890,7 +890,7 @@ static int batadv_tp_send(void *arg)
 
 	batadv_tp_vars_put(tp_vars);
 
-	do_exit(0);
+	return 0;
 }
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (13 preceding siblings ...)
  (?)
@ 2021-10-20 17:43 ` Eric W. Biederman
  2021-10-20 20:05   ` Linus Torvalds
  2021-10-21 16:24   ` Kees Cook
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman

Add a simple helper force_fatal_sig that causes a signal to be
delivered to a process as if the signal handler was set to SIG_DFL.

Reimplement force_sigsegv based upon this new helper.  This fixes
force_sigsegv so that when it forces the default signal handler
to be used the code now forces the signal to be unblocked as well.

Reusing the tested logic in force_sig_info_to_task that was built for
force_sig_seccomp this makes the implementation trivial.

This is interesting both because it makes force_sigsegv simpler and
because there are a couple of buggy places in the kernel that call
do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight
forward way today for those places to simply force the exit of a
process with the chosen signal.  Creating force_fatal_sig allows
those places to be implemented with normal signal exits.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/signal.h |  1 +
 kernel/signal.c              | 26 +++++++++++++++++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e5f4ce622ee6..e2dc9f119ada 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -338,6 +338,7 @@ extern int kill_pid(struct pid *pid, int sig, int priv);
 extern __must_check bool do_notify_parent(struct task_struct *, int);
 extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
 extern void force_sig(int);
+extern void force_fatal_sig(int);
 extern int send_sig(int, struct task_struct *, int);
 extern int zap_other_threads(struct task_struct *p);
 extern struct sigqueue *sigqueue_alloc(void);
diff --git a/kernel/signal.c b/kernel/signal.c
index 952741f6d0f9..6a5e1802b9a2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1662,6 +1662,19 @@ void force_sig(int sig)
 }
 EXPORT_SYMBOL(force_sig);
 
+void force_fatal_sig(int sig)
+{
+	struct kernel_siginfo info;
+
+	clear_siginfo(&info);
+	info.si_signo = sig;
+	info.si_errno = 0;
+	info.si_code = SI_KERNEL;
+	info.si_pid = 0;
+	info.si_uid = 0;
+	force_sig_info_to_task(&info, current, true);
+}
+
 /*
  * When things go south during signal handling, we
  * will force a SIGSEGV. And if the signal that caused
@@ -1670,15 +1683,10 @@ EXPORT_SYMBOL(force_sig);
  */
 void force_sigsegv(int sig)
 {
-	struct task_struct *p = current;
-
-	if (sig == SIGSEGV) {
-		unsigned long flags;
-		spin_lock_irqsave(&p->sighand->siglock, flags);
-		p->sighand->action[sig - 1].sa.sa_handler = SIG_DFL;
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
-	}
-	force_sig(SIGSEGV);
+	if (sig == SIGSEGV)
+		force_fatal_sig(SIGSEGV);
+	else
+		force_sig(SIGSEGV);
 }
 
 int force_sig_fault_to_task(int sig, int code, void __user *addr
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (14 preceding siblings ...)
  (?)
@ 2021-10-20 17:44 ` Eric W. Biederman
  2021-10-21 16:25   ` Kees Cook
  2021-10-21 16:35   ` Gabriel Krisman Bertazi
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Gabriel Krisman Bertazi, Thomas Gleixner,
	Peter Zijlstra, Andy Lutomirski

Use force_fatal_sig instead of calling do_exit directly.  This ensures
the ordinary signal handling path gets invoked, core dumps as
appropriate get created, and for multi-threaded processes all of the
threads are terminated not just a single thread.

When asked Gabriel Krisman Bertazi <krisman@collabora.com> said [1]:
> ebiederm@xmission.com (Eric W. Biederman) asked:
>
> > Why does do_syscal_user_dispatch call do_exit(SIGSEGV) and
> > do_exit(SIGSYS) instead of force_sig(SIGSEGV) and force_sig(SIGSYS)?
> >
> > Looking at the code these cases are not expected to happen, so I would
> > be surprised if userspace depends on any particular behaviour on the
> > failure path so I think we can change this.
>
> Hi Eric,
>
> There is not really a good reason, and the use case that originated the
> feature doesn't rely on it.
>
> Unless I'm missing yet another problem and others correct me, I think
> it makes sense to change it as you described.
>
> > Is using do_exit in this way something you copied from seccomp?
>
> I'm not sure, its been a while, but I think it might be just that.  The
> first prototype of SUD was implemented as a seccomp mode.

If at some point it becomes interesting we could relax
"force_fatal_sig(SIGSEGV)" to instead say
"force_sig_fault(SIGSEGV, SEGV_MAPERR, sd->selector)".

I avoid doing that in this patch to avoid making it possible
to catch currently uncatchable signals.

Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
[1] https://lkml.kernel.org/r/87mtr6gdvi.fsf@collabora.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/entry/syscall_user_dispatch.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index c240302f56e2..4508201847d2 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -47,14 +47,18 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 		 * access_ok() is performed once, at prctl time, when
 		 * the selector is loaded by userspace.
 		 */
-		if (unlikely(__get_user(state, sd->selector)))
-			do_exit(SIGSEGV);
+		if (unlikely(__get_user(state, sd->selector))) {
+			force_fatal_sig(SIGSEGV);
+			return true;
+		}
 
 		if (likely(state == SYSCALL_DISPATCH_FILTER_ALLOW))
 			return false;
 
-		if (state != SYSCALL_DISPATCH_FILTER_BLOCK)
-			do_exit(SIGSYS);
+		if (state != SYSCALL_DISPATCH_FILTER_BLOCK) {
+			force_fatal_sig(SIGSYS);
+			return true;
+		}
 	}
 
 	sd->on_dispatch = true;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 15/20] signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (15 preceding siblings ...)
  (?)
@ 2021-10-20 17:44 ` Eric W. Biederman
  2021-10-21 16:34   ` Kees Cook
  -1 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, David Miller, sparclinux

The function try_to_clear_window_buffer is only called from
rtrap_32.c.  After it is called the signal pending state is retested,
and signals are handled if TIF_SIGPENDING is set.  This allows
try_to_clear_window_buffer to call force_fatal_signal and then rely on
the signal being delivered to kill the process, without any danger of
returning to userspace, or otherwise using possible corrupt state on
failure.

The functional difference between force_fatal_sig and do_exit is that
do_exit will only terminate a single thread, and will never trigger a
core-dump.  A multi-threaded program for which a single thread
terminates unexpectedly is hard to reason about.  Calling force_fatal_sig
does not give userspace a chance to catch the signal, but otherwise
is an ordinary fatal signal exit, and it will trigger a coredump
of the offending process if core dumps are enabled.

Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/sparc/kernel/windows.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/kernel/windows.c b/arch/sparc/kernel/windows.c
index 69a6ba6e9293..bbbd40cc6b28 100644
--- a/arch/sparc/kernel/windows.c
+++ b/arch/sparc/kernel/windows.c
@@ -121,8 +121,10 @@ void try_to_clear_window_buffer(struct pt_regs *regs, int who)
 
 		if ((sp & 7) ||
 		    copy_to_user((char __user *) sp, &tp->reg_window[window],
-				 sizeof(struct reg_window32)))
-			do_exit(SIGILL);
+				 sizeof(struct reg_window32))) {
+			force_fatal_sig(SIGILL);
+			return;
+		}
 	}
 	tp->w_saved = 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 16/20] signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (16 preceding siblings ...)
  (?)
@ 2021-10-20 17:44 ` Eric W. Biederman
  2021-10-21 16:34   ` Kees Cook
  -1 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, David Miller, sparclinux

Modify the 32bit version of setup_rt_frame and setup_frame to act
similar to the 64bit version of setup_rt_frame and fail with a signal
instead of calling do_exit.

Replacing do_exit(SIGILL) with force_fatal_signal(SIGILL) ensures that
the process will be terminated cleanly when the stack frame is
invalid, instead of just killing off a single thread and leaving the
process is a weird state.

Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/sparc/kernel/signal_32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/kernel/signal_32.c b/arch/sparc/kernel/signal_32.c
index 02f3ad55dfe3..cd677bc564a7 100644
--- a/arch/sparc/kernel/signal_32.c
+++ b/arch/sparc/kernel/signal_32.c
@@ -244,7 +244,7 @@ static int setup_frame(struct ksignal *ksig, struct pt_regs *regs,
 		get_sigframe(ksig, regs, sigframe_size);
 
 	if (invalid_frame_pointer(sf, sigframe_size)) {
-		do_exit(SIGILL);
+		force_fatal_sig(SIGILL);
 		return -EINVAL;
 	}
 
@@ -336,7 +336,7 @@ static int setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs,
 	sf = (struct rt_signal_frame __user *)
 		get_sigframe(ksig, regs, sigframe_size);
 	if (invalid_frame_pointer(sf, sigframe_size)) {
-		do_exit(SIGILL);
+		force_fatal_sig(SIGILL);
 		return -EINVAL;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 17/20] signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (17 preceding siblings ...)
  (?)
@ 2021-10-20 17:44 ` Eric W. Biederman
  2021-10-21 16:36   ` Kees Cook
  -1 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman, Andy Lutomirski

Directly calling do_exit with a signal number has the problem that
all of the side effects of the signal don't happen, such as
killing all of the threads of a process instead of just the
calling thread.

So replace do_exit(SIGSYS) with force_fatal_sig(SIGSYS) which
causes the signal handling to take it's normal path and work
as expected.

Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 1b40b9297083..0b6b277ee050 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -226,7 +226,8 @@ bool emulate_vsyscall(unsigned long error_code,
 	if ((!tmp && regs->orig_ax != syscall_nr) || regs->ip != address) {
 		warn_bad_vsyscall(KERN_DEBUG, regs,
 				  "seccomp tried to change syscall nr or ip");
-		do_exit(SIGSYS);
+		force_fatal_sig(SIGSYS);
+		return true;
 	}
 	regs->orig_ax = -1;
 	if (tmp)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 18/20] exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (18 preceding siblings ...)
  (?)
@ 2021-10-20 17:44 ` Eric W. Biederman
  2021-10-21  7:06   ` Greg KH
  2021-10-21 16:37   ` Kees Cook
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman

Every place thread_exit is called is at the end of a function started
with kthread_run.  The code in kthread_run has arranged things so a
kernel thread can just return and do_exit will be called.

So just have the threads return instead of calling complete_and_exit.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/staging/rtl8723bs/core/rtw_cmd.c                | 2 +-
 drivers/staging/rtl8723bs/core/rtw_xmit.c               | 2 +-
 drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c          | 2 +-
 drivers/staging/rtl8723bs/include/osdep_service_linux.h | 2 --
 4 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_cmd.c b/drivers/staging/rtl8723bs/core/rtw_cmd.c
index d494c06dab96..8e69f9c10f5c 100644
--- a/drivers/staging/rtl8723bs/core/rtw_cmd.c
+++ b/drivers/staging/rtl8723bs/core/rtw_cmd.c
@@ -524,7 +524,7 @@ int rtw_cmd_thread(void *context)
 	complete(&pcmdpriv->terminate_cmdthread_comp);
 	atomic_set(&(pcmdpriv->cmdthd_running), false);
 
-	thread_exit();
+	return 0;
 }
 
 /*
diff --git a/drivers/staging/rtl8723bs/core/rtw_xmit.c b/drivers/staging/rtl8723bs/core/rtw_xmit.c
index 79e4d7df1ef5..0c357bc2478c 100644
--- a/drivers/staging/rtl8723bs/core/rtw_xmit.c
+++ b/drivers/staging/rtl8723bs/core/rtw_xmit.c
@@ -2491,7 +2491,7 @@ int rtw_xmit_thread(void *context)
 
 	complete(&padapter->xmitpriv.terminate_xmitthread_comp);
 
-	thread_exit();
+	return 0;
 }
 
 void rtw_sctx_init(struct submit_ctx *sctx, int timeout_ms)
diff --git a/drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c b/drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c
index 156d6aba18ca..2b9a41b12d1f 100644
--- a/drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c
+++ b/drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c
@@ -435,7 +435,7 @@ int rtl8723bs_xmit_thread(void *context)
 
 	complete(&pxmitpriv->SdioXmitTerminate);
 
-	thread_exit();
+	return 0;
 }
 
 s32 rtl8723bs_mgnt_xmit(
diff --git a/drivers/staging/rtl8723bs/include/osdep_service_linux.h b/drivers/staging/rtl8723bs/include/osdep_service_linux.h
index 3492ec1efd1e..188ed7e26550 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service_linux.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service_linux.h
@@ -45,8 +45,6 @@
 		spinlock_t	lock;
 	};
 
-	#define thread_exit() complete_and_exit(NULL, 0)
-
 static inline struct list_head *get_next(struct list_head	*list)
 {
 	return list->next;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 19/20] exit/rtl8712: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (19 preceding siblings ...)
  (?)
@ 2021-10-20 17:44 ` Eric W. Biederman
  2021-10-21  7:07   ` Greg KH
  2021-10-21 16:37   ` Kees Cook
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman

The macro thread_exit is called is at the end of a function started
with kthread_run.  The code in kthread_run has arranged things so a
kernel thread can just return and do_exit will be called.

So just have the cmd_thread return instead of calling complete_and_exit.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/staging/rtl8712/osdep_service.h | 1 -
 drivers/staging/rtl8712/rtl8712_cmd.c   | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8712/osdep_service.h b/drivers/staging/rtl8712/osdep_service.h
index d33ddffb7ad9..0d9bb42cbc58 100644
--- a/drivers/staging/rtl8712/osdep_service.h
+++ b/drivers/staging/rtl8712/osdep_service.h
@@ -37,7 +37,6 @@ struct	__queue	{
 
 #define _pkt struct sk_buff
 #define _buffer unsigned char
-#define thread_exit() complete_and_exit(NULL, 0)
 
 #define _init_queue(pqueue)				\
 	do {						\
diff --git a/drivers/staging/rtl8712/rtl8712_cmd.c b/drivers/staging/rtl8712/rtl8712_cmd.c
index e9294e1ed06e..2326aae6709e 100644
--- a/drivers/staging/rtl8712/rtl8712_cmd.c
+++ b/drivers/staging/rtl8712/rtl8712_cmd.c
@@ -393,7 +393,7 @@ int r8712_cmd_thread(void *context)
 		r8712_free_cmd_obj(pcmd);
 	} while (1);
 	complete(&pcmdpriv->terminate_cmdthread_comp);
-	thread_exit();
+	return 0;
 }
 
 void r8712_event_handle(struct _adapter *padapter, __le32 *peventbuf)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 20/20] exit/r8188eu: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:32 ` Eric W. Biederman
                   ` (20 preceding siblings ...)
  (?)
@ 2021-10-20 17:44 ` Eric W. Biederman
  2021-10-21  7:07   ` Greg KH
  2021-10-21 16:37   ` Kees Cook
  -1 siblings, 2 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Eric W. Biederman

The macro thread_exit is called is at the end of functions started
with kthread_run.  The code in kthread_run has arranged things so a
kernel thread can just return and do_exit will be called.

So just have rtw_cmd_thread and mp_xmit_packet_thread return instead
of calling complete_and_exit.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/staging/r8188eu/core/rtw_cmd.c          | 2 +-
 drivers/staging/r8188eu/core/rtw_mp.c           | 2 +-
 drivers/staging/r8188eu/include/osdep_service.h | 2 --
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/r8188eu/core/rtw_cmd.c b/drivers/staging/r8188eu/core/rtw_cmd.c
index ce73ac7cf973..d37c9463eecc 100644
--- a/drivers/staging/r8188eu/core/rtw_cmd.c
+++ b/drivers/staging/r8188eu/core/rtw_cmd.c
@@ -347,7 +347,7 @@ int rtw_cmd_thread(void *context)
 
 	up(&pcmdpriv->terminate_cmdthread_sema);
 
-	thread_exit();
+	return 0;
 }
 
 u8 rtw_setstandby_cmd(struct adapter *padapter, uint action)
diff --git a/drivers/staging/r8188eu/core/rtw_mp.c b/drivers/staging/r8188eu/core/rtw_mp.c
index dabdd0406f30..3945c4efe45a 100644
--- a/drivers/staging/r8188eu/core/rtw_mp.c
+++ b/drivers/staging/r8188eu/core/rtw_mp.c
@@ -580,7 +580,7 @@ static int mp_xmit_packet_thread(void *context)
 	pmptx->pallocated_buf = NULL;
 	pmptx->stop = 1;
 
-	thread_exit();
+	return 0;
 }
 
 void fill_txdesc_for_mp(struct adapter *padapter, struct tx_desc *ptxdesc)
diff --git a/drivers/staging/r8188eu/include/osdep_service.h b/drivers/staging/r8188eu/include/osdep_service.h
index 029aa4e92c9b..afbffb551f9b 100644
--- a/drivers/staging/r8188eu/include/osdep_service.h
+++ b/drivers/staging/r8188eu/include/osdep_service.h
@@ -49,8 +49,6 @@ struct	__queue	{
 	spinlock_t lock;
 };
 
-#define thread_exit() complete_and_exit(NULL, 0)
-
 static inline struct list_head *get_list_head(struct __queue *queue)
 {
 	return (&(queue->queue));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH 06/20] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  2021-10-20 17:43 ` [PATCH 06/20] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) Eric W. Biederman
@ 2021-10-20 19:57   ` Linus Torvalds
  2021-10-27 14:24     ` Rich Felker
  2021-10-21 16:08   ` Kees Cook
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2021-10-20 19:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, linux-arch, Oleg Nesterov, Al Viro,
	Kees Cook, Yoshinori Sato, Rich Felker, Linux-sh list

On Wed, Oct 20, 2021 at 7:44 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> +                       force_sig(SIGKILL);

I wonder if SIGFPE would be a more intuitive thing.

Doesn't really matter, this is a "doesn't happen" event anyway, but
that was just my reaction to reading the patch.

            Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-20 17:43 ` [PATCH 13/20] signal: Implement force_fatal_sig Eric W. Biederman
@ 2021-10-20 20:05   ` Linus Torvalds
  2021-10-20 21:25     ` Eric W. Biederman
  2021-10-25 22:41     ` Andy Lutomirski
  2021-10-21 16:24   ` Kees Cook
  1 sibling, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2021-10-20 20:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, linux-arch, Oleg Nesterov, Al Viro, Kees Cook

On Wed, Oct 20, 2021 at 7:45 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Add a simple helper force_fatal_sig that causes a signal to be
> delivered to a process as if the signal handler was set to SIG_DFL.
>
> Reimplement force_sigsegv based upon this new helper.

Can you just make the old force_sigsegv() go away? The odd special
casing of SIGSEGV was odd to begin with, I think everybody really just
wanted this new "force_fatal_sig()" and allow any signal - not making
SIGSEGV special.

Also, I think it should set SIGKILL in p->pending.signal or something
like that - because we want this to trigger fatal_signal_pending(),
don't we?

Right now fatal_signal_pending() is only true for SIGKILL, I think.

               Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-20 20:05   ` Linus Torvalds
@ 2021-10-20 21:25     ` Eric W. Biederman
  2021-10-25 22:41     ` Andy Lutomirski
  1 sibling, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 21:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linux-arch, Oleg Nesterov, Al Viro, Kees Cook

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, Oct 20, 2021 at 7:45 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Add a simple helper force_fatal_sig that causes a signal to be
>> delivered to a process as if the signal handler was set to SIG_DFL.
>>
>> Reimplement force_sigsegv based upon this new helper.
>
> Can you just make the old force_sigsegv() go away? The odd special
> casing of SIGSEGV was odd to begin with, I think everybody really just
> wanted this new "force_fatal_sig()" and allow any signal - not making
> SIGSEGV special.

There remains the original case that is signal_set up_done
deals with generically.  When sending a signal fails the code
attempts send SIGSEGV and if sending SIGSEGV fails the signal
delivery code terminates the process with SIGSEGV.

To keep dependencies to a minimum and to allow for the possibility of
backports I used "force_sigsegv(SIGSEGV)" instead of
"force_fatal_sig(SIGSEGV)".  I will be happy to add an additional
patch that converts all of those case to force_fatal_sig.

> Also, I think it should set SIGKILL in p->pending.signal or something
> like that - because we want this to trigger fatal_signal_pending(),
> don't we?
>
> Right now fatal_signal_pending() is only true for SIGKILL, I think.

In general when a fatal signal is delivered the function complete_signal
individually delivers SIGKILL to the threads, making
fatal_signal_pending true.

For signals like SIGSYS that generate a coredump that is not currently
true, but in the cases I looked at signal_pending() was enough to
get the code to get_signal(), which dequeues the signals and starts
processing them.

I have a branch queued up for the next merge window that implements per
signal_struct coredumps.  Assuming that does not trigger any user space
regressions I can remove the coredump special case in complete_signal.
That will in turn mean that force_siginfo_to_task does not need to
change sa_handler, blocked or clear SIGNAL_UNKILLABLE, as all of the
cases where that matters today will just wind up with complete_signal
setting a per_thread SIGKILL.



I keep playing with the idea of having fatal_signal_pending depend on a
different flag than the per thread bit for SIGKILL in the per thread
signal set.  That might make it clearer that complete_signal has started
killing the process and it is a start of the killing the process that
triggers fatal_signal_pending.

So far the way fatal_signal_pending works hasn't really been a problem
so I keep putting away ideas of cleaner implementations.

Eric



^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
  2021-10-20 17:32 ` Eric W. Biederman
  (?)
@ 2021-10-20 21:51   ` Eric W. Biederman
  -1 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 21:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Andy Lutomirski, Jonas Bonn, Stefan Kristiansson, Stafford Horne,
	openrisc, Nick Hu, Greentime Hu, Vincent Chen, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, linux-s390, Yoshinori Sato,
	Rich Felker, linux-sh, linux-xtensa, Chris Zankel, Max Filippov,
	David Miller, sparclinux, Thomas Bogendoerfer, Maciej Rozycki,
	linux-mips, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Greg Kroah-Hartman


Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/arc/kernel/process.c       | 2 +-
 arch/m68k/kernel/traps.c        | 2 +-
 arch/powerpc/kernel/signal_32.c | 2 +-
 arch/powerpc/kernel/signal_64.c | 4 ++--
 arch/s390/kernel/traps.c        | 2 +-
 arch/um/kernel/trap.c           | 2 +-
 arch/x86/kernel/vm86_32.c       | 2 +-
 fs/exec.c                       | 2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 3793876f42d9..8e90052f6f05 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -294,7 +294,7 @@ int elf_check_arch(const struct elf32_hdr *x)
 	eflags = x->e_flags;
 	if ((eflags & EF_ARC_OSABI_MSK) != EF_ARC_OSABI_CURRENT) {
 		pr_err("ABI mismatch - you need newer toolchain\n");
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return 0;
 	}
 
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index 5b19fcdcd69e..74045d164ddb 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -1150,7 +1150,7 @@ asmlinkage void set_esp0(unsigned long ssp)
  */
 asmlinkage void fpsp040_die(void)
 {
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 }
 
 #ifdef CONFIG_M68KFPU_EMU
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 666f3da41232..933ab95805a6 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1063,7 +1063,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 * We kill the task with a SIGSEGV in this situation.
 	 */
 	if (do_setcontext(new_ctx, regs, 0)) {
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index d8de622c9e4a..8ead9b3f47c6 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -704,7 +704,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 */
 
 	if (__get_user_sigset(&set, &new_ctx->uc_sigmask)) {
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 	set_current_blocked(&set);
@@ -713,7 +713,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 		return -EFAULT;
 	if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) {
 		user_read_access_end();
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 	user_read_access_end();
diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index 51729ea2cf8e..01a7c68dcfb6 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
 		report_user_fault(regs, SIGSEGV, 0);
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 	} else
 		die(regs, "Unknown program exception");
 }
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 3198c4767387..c32efb09db21 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -158,7 +158,7 @@ static void bad_segv(struct faultinfo fi, unsigned long ip)
 
 void fatal_sigsegv(void)
 {
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 	do_signal(&current->thread.regs);
 	/*
 	 * This is to tell gcc that we're not returning - do_signal
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 040fd01be8b3..7ff0f622abd4 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	user_access_end();
 Efault:
 	pr_alert("could not access userspace vm86 info\n");
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 }
 
 static int do_vm86_irq_handling(int subfunction, int irqnumber);
diff --git a/fs/exec.c b/fs/exec.c
index a098c133d8d7..ac7b51b51f38 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1852,7 +1852,7 @@ static int bprm_execve(struct linux_binprm *bprm,
 	 * SIGSEGV.
 	 */
 	if (bprm->point_of_no_return && !fatal_signal_pending(current))
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 
 out_unmark:
 	current->fs->in_exec = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-20 21:51   ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 21:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rich Felker, linux-xtensa, linux-mips, Max Filippov,
	Paul Mackerras, H Peter Anvin, sparclinux, Vincent Chen,
	Thomas Gleixner, linux-arch, linux-s390, Yoshinori Sato,
	linux-sh, Christian Borntraeger, Ingo Molnar, Jonas Bonn,
	Kees Cook, Vasily Gorbik, Heiko Carstens, Stefan Kristiansson,
	openrisc, Borislav Petkov, Al Viro, Andy Lutomirski,
	Stafford Horne, Chris Zankel, Thomas Bogendoerfer, Nick Hu,
	linuxppc-dev, Oleg Nesterov, Greg Kroah-Hartman, Maciej Rozycki,
	Linus Torvalds, David Miller, Greentime Hu


Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/arc/kernel/process.c       | 2 +-
 arch/m68k/kernel/traps.c        | 2 +-
 arch/powerpc/kernel/signal_32.c | 2 +-
 arch/powerpc/kernel/signal_64.c | 4 ++--
 arch/s390/kernel/traps.c        | 2 +-
 arch/um/kernel/trap.c           | 2 +-
 arch/x86/kernel/vm86_32.c       | 2 +-
 fs/exec.c                       | 2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 3793876f42d9..8e90052f6f05 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -294,7 +294,7 @@ int elf_check_arch(const struct elf32_hdr *x)
 	eflags = x->e_flags;
 	if ((eflags & EF_ARC_OSABI_MSK) != EF_ARC_OSABI_CURRENT) {
 		pr_err("ABI mismatch - you need newer toolchain\n");
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return 0;
 	}
 
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index 5b19fcdcd69e..74045d164ddb 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -1150,7 +1150,7 @@ asmlinkage void set_esp0(unsigned long ssp)
  */
 asmlinkage void fpsp040_die(void)
 {
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 }
 
 #ifdef CONFIG_M68KFPU_EMU
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 666f3da41232..933ab95805a6 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1063,7 +1063,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 * We kill the task with a SIGSEGV in this situation.
 	 */
 	if (do_setcontext(new_ctx, regs, 0)) {
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index d8de622c9e4a..8ead9b3f47c6 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -704,7 +704,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 */
 
 	if (__get_user_sigset(&set, &new_ctx->uc_sigmask)) {
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 	set_current_blocked(&set);
@@ -713,7 +713,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 		return -EFAULT;
 	if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) {
 		user_read_access_end();
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 	user_read_access_end();
diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index 51729ea2cf8e..01a7c68dcfb6 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
 		report_user_fault(regs, SIGSEGV, 0);
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 	} else
 		die(regs, "Unknown program exception");
 }
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 3198c4767387..c32efb09db21 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -158,7 +158,7 @@ static void bad_segv(struct faultinfo fi, unsigned long ip)
 
 void fatal_sigsegv(void)
 {
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 	do_signal(&current->thread.regs);
 	/*
 	 * This is to tell gcc that we're not returning - do_signal
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 040fd01be8b3..7ff0f622abd4 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	user_access_end();
 Efault:
 	pr_alert("could not access userspace vm86 info\n");
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 }
 
 static int do_vm86_irq_handling(int subfunction, int irqnumber);
diff --git a/fs/exec.c b/fs/exec.c
index a098c133d8d7..ac7b51b51f38 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1852,7 +1852,7 @@ static int bprm_execve(struct linux_binprm *bprm,
 	 * SIGSEGV.
 	 */
 	if (bprm->point_of_no_return && !fatal_signal_pending(current))
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 
 out_unmark:
 	current->fs->in_exec = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-20 21:51   ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-20 21:51 UTC (permalink / raw)
  To: openrisc


Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/arc/kernel/process.c       | 2 +-
 arch/m68k/kernel/traps.c        | 2 +-
 arch/powerpc/kernel/signal_32.c | 2 +-
 arch/powerpc/kernel/signal_64.c | 4 ++--
 arch/s390/kernel/traps.c        | 2 +-
 arch/um/kernel/trap.c           | 2 +-
 arch/x86/kernel/vm86_32.c       | 2 +-
 fs/exec.c                       | 2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 3793876f42d9..8e90052f6f05 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -294,7 +294,7 @@ int elf_check_arch(const struct elf32_hdr *x)
 	eflags = x->e_flags;
 	if ((eflags & EF_ARC_OSABI_MSK) != EF_ARC_OSABI_CURRENT) {
 		pr_err("ABI mismatch - you need newer toolchain\n");
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return 0;
 	}
 
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index 5b19fcdcd69e..74045d164ddb 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -1150,7 +1150,7 @@ asmlinkage void set_esp0(unsigned long ssp)
  */
 asmlinkage void fpsp040_die(void)
 {
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 }
 
 #ifdef CONFIG_M68KFPU_EMU
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 666f3da41232..933ab95805a6 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1063,7 +1063,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 * We kill the task with a SIGSEGV in this situation.
 	 */
 	if (do_setcontext(new_ctx, regs, 0)) {
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index d8de622c9e4a..8ead9b3f47c6 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -704,7 +704,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 	 */
 
 	if (__get_user_sigset(&set, &new_ctx->uc_sigmask)) {
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 	set_current_blocked(&set);
@@ -713,7 +713,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 		return -EFAULT;
 	if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) {
 		user_read_access_end();
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 		return -EFAULT;
 	}
 	user_read_access_end();
diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index 51729ea2cf8e..01a7c68dcfb6 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
 		report_user_fault(regs, SIGSEGV, 0);
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 	} else
 		die(regs, "Unknown program exception");
 }
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 3198c4767387..c32efb09db21 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -158,7 +158,7 @@ static void bad_segv(struct faultinfo fi, unsigned long ip)
 
 void fatal_sigsegv(void)
 {
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 	do_signal(&current->thread.regs);
 	/*
 	 * This is to tell gcc that we're not returning - do_signal
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 040fd01be8b3..7ff0f622abd4 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	user_access_end();
 Efault:
 	pr_alert("could not access userspace vm86 info\n");
-	force_sigsegv(SIGSEGV);
+	force_fatal_sig(SIGSEGV);
 }
 
 static int do_vm86_irq_handling(int subfunction, int irqnumber);
diff --git a/fs/exec.c b/fs/exec.c
index a098c133d8d7..ac7b51b51f38 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1852,7 +1852,7 @@ static int bprm_execve(struct linux_binprm *bprm,
 	 * SIGSEGV.
 	 */
 	if (bprm->point_of_no_return && !fatal_signal_pending(current))
-		force_sigsegv(SIGSEGV);
+		force_fatal_sig(SIGSEGV);
 
 out_unmark:
 	current->fs->in_exec = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH 18/20] exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:44 ` [PATCH 18/20] exit/rtl8723bs: Replace the macro thread_exit with a simple return 0 Eric W. Biederman
@ 2021-10-21  7:06   ` Greg KH
  2021-10-21 15:06     ` Eric W. Biederman
  2021-10-21 16:37   ` Kees Cook
  1 sibling, 1 reply; 110+ messages in thread
From: Greg KH @ 2021-10-21  7:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook

On Wed, Oct 20, 2021 at 12:44:04PM -0500, Eric W. Biederman wrote:
> Every place thread_exit is called is at the end of a function started
> with kthread_run.  The code in kthread_run has arranged things so a
> kernel thread can just return and do_exit will be called.
> 
> So just have the threads return instead of calling complete_and_exit.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  drivers/staging/rtl8723bs/core/rtw_cmd.c                | 2 +-
>  drivers/staging/rtl8723bs/core/rtw_xmit.c               | 2 +-
>  drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c          | 2 +-
>  drivers/staging/rtl8723bs/include/osdep_service_linux.h | 2 --
>  4 files changed, 3 insertions(+), 5 deletions(-)

You "forgot" to cc: the linux-staging and the staging driver maintainer
on these drivers/staging/ changes...

Anyway, they look fine to me, but you will get some conflicts with some
of these changes based on cleanups already in my staging-next tree (in
linux-next if you want to see them).  But feel free to take these all in
your tree if that makes it easier:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 19/20] exit/rtl8712: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:44 ` [PATCH 19/20] exit/rtl8712: " Eric W. Biederman
@ 2021-10-21  7:07   ` Greg KH
  2021-10-21 16:37   ` Kees Cook
  1 sibling, 0 replies; 110+ messages in thread
From: Greg KH @ 2021-10-21  7:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook

On Wed, Oct 20, 2021 at 12:44:05PM -0500, Eric W. Biederman wrote:
> The macro thread_exit is called is at the end of a function started
> with kthread_run.  The code in kthread_run has arranged things so a
> kernel thread can just return and do_exit will be called.
> 
> So just have the cmd_thread return instead of calling complete_and_exit.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 20/20] exit/r8188eu: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:44 ` [PATCH 20/20] exit/r8188eu: " Eric W. Biederman
@ 2021-10-21  7:07   ` Greg KH
  2021-10-21 16:37   ` Kees Cook
  1 sibling, 0 replies; 110+ messages in thread
From: Greg KH @ 2021-10-21  7:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook

On Wed, Oct 20, 2021 at 12:44:06PM -0500, Eric W. Biederman wrote:
> The macro thread_exit is called is at the end of functions started
> with kthread_run.  The code in kthread_run has arranged things so a
> kernel thread can just return and do_exit will be called.
> 
> So just have rtw_cmd_thread and mp_xmit_packet_thread return instead
> of calling complete_and_exit.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
  2021-10-20 21:51   ` Eric W. Biederman
  (?)
@ 2021-10-21  8:09     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 110+ messages in thread
From: Geert Uytterhoeven @ 2021-10-21  8:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rich Felker, Linux-sh list, Linux Kernel Mailing List,
	Max Filippov, Paul Mackerras, Greentime Hu, H Peter Anvin,
	sparclinux, Vincent Chen, Linux-Arch, linux-s390, Yoshinori Sato,
	Christian Borntraeger, Ingo Molnar,
	open list:TENSILICA XTENSA PORT (xtensa),
	Kees Cook, Vasily Gorbik, Heiko Carstens, Openrisc,
	Borislav Petkov, Al Viro, Andy Lutomirski, Oleg Nesterov,
	Thomas Gleixner, Chris Zankel, Jonas Bonn, Nick Hu,
	Greg Kroah-Hartman, Linus Torvalds,
	open list:BROADCOM NVRAM DRIVER, Thomas Bogendoerfer,
	linuxppc-dev, David Miller, Maciej Rozycki

Hi Eric,

Patch 21/20?

On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Now that force_fatal_sig exists it is unnecessary and a bit confusing
> to use force_sigsegv in cases where the simpler force_fatal_sig is
> wanted.  So change every instance we can to make the code clearer.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

>  arch/m68k/kernel/traps.c        | 2 +-

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-21  8:09     ` Geert Uytterhoeven
  0 siblings, 0 replies; 110+ messages in thread
From: Geert Uytterhoeven @ 2021-10-21  8:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Rich Felker,
	open list:TENSILICA XTENSA PORT (xtensa),
	Benjamin Herrenschmidt, open list:BROADCOM NVRAM DRIVER,
	Max Filippov, Paul Mackerras, H Peter Anvin, sparclinux,
	Vincent Chen, Thomas Gleixner, Linux-Arch, linux-s390,
	Yoshinori Sato, Michael Ellerman, Linux-sh list,
	Christian Borntraeger, Ingo Molnar, Jonas Bonn, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Openrisc, Borislav Petkov,
	Al Viro, Andy Lutomirski, Chris Zankel, Thomas Bogendoerfer,
	Nick Hu, linuxppc-dev, Oleg Nesterov, Greg Kroah-Hartman,
	Maciej Rozycki, Linus Torvalds, David Miller, Greentime Hu

Hi Eric,

Patch 21/20?

On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Now that force_fatal_sig exists it is unnecessary and a bit confusing
> to use force_sigsegv in cases where the simpler force_fatal_sig is
> wanted.  So change every instance we can to make the code clearer.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

>  arch/m68k/kernel/traps.c        | 2 +-

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-21  8:09     ` Geert Uytterhoeven
  0 siblings, 0 replies; 110+ messages in thread
From: Geert Uytterhoeven @ 2021-10-21  8:09 UTC (permalink / raw)
  To: openrisc

Hi Eric,

Patch 21/20?

On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Now that force_fatal_sig exists it is unnecessary and a bit confusing
> to use force_sigsegv in cases where the simpler force_fatal_sig is
> wanted.  So change every instance we can to make the code clearer.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

>  arch/m68k/kernel/traps.c        | 2 +-

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
  2021-10-20 21:51   ` Eric W. Biederman
  (?)
@ 2021-10-21  8:32     ` Philippe Mathieu-Daudé
  -1 siblings, 0 replies; 110+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-10-21  8:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: open list, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Andy Lutomirski, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, openrisc, Nick Hu, Greentime Hu, Vincent Chen,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390,
	Yoshinori Sato, Rich Felker, linux-sh, linux-xtensa,
	Chris Zankel, Max Filippov, David Miller, sparclinux,
	Thomas Bogendoerfer, Maciej Rozycki,
	open list:BROADCOM NVRAM DRIVER, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Greg Kroah-Hartman

On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
>
> Now that force_fatal_sig exists it is unnecessary and a bit confusing
> to use force_sigsegv in cases where the simpler force_fatal_sig is
> wanted.  So change every instance we can to make the code clearer.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/arc/kernel/process.c       | 2 +-
>  arch/m68k/kernel/traps.c        | 2 +-
>  arch/powerpc/kernel/signal_32.c | 2 +-
>  arch/powerpc/kernel/signal_64.c | 4 ++--
>  arch/s390/kernel/traps.c        | 2 +-
>  arch/um/kernel/trap.c           | 2 +-
>  arch/x86/kernel/vm86_32.c       | 2 +-
>  fs/exec.c                       | 2 +-
>  8 files changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-21  8:32     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 110+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-10-21  8:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rich Felker, linux-xtensa, Oleg Nesterov, Max Filippov,
	Paul Mackerras, H Peter Anvin, sparclinux, Vincent Chen,
	Thomas Gleixner, linux-arch, linux-s390, Yoshinori Sato,
	linux-sh, Christian Borntraeger, Ingo Molnar,
	open list:BROADCOM NVRAM DRIVER, Jonas Bonn, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Stefan Kristiansson, openrisc,
	Borislav Petkov, Al Viro, Andy Lutomirski, Stafford Horne,
	Chris Zankel, Thomas Bogendoerfer, Nick Hu, linuxppc-dev,
	open list, Greg Kroah-Hartman, Maciej Rozycki, Linus Torvalds,
	David Miller, Greentime Hu

On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
>
> Now that force_fatal_sig exists it is unnecessary and a bit confusing
> to use force_sigsegv in cases where the simpler force_fatal_sig is
> wanted.  So change every instance we can to make the code clearer.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/arc/kernel/process.c       | 2 +-
>  arch/m68k/kernel/traps.c        | 2 +-
>  arch/powerpc/kernel/signal_32.c | 2 +-
>  arch/powerpc/kernel/signal_64.c | 4 ++--
>  arch/s390/kernel/traps.c        | 2 +-
>  arch/um/kernel/trap.c           | 2 +-
>  arch/x86/kernel/vm86_32.c       | 2 +-
>  fs/exec.c                       | 2 +-
>  8 files changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-21  8:32     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 110+ messages in thread
From: Philippe =?unknown-8bit?q?Mathieu-Daud=C3=A9?= @ 2021-10-21  8:32 UTC (permalink / raw)
  To: openrisc

On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
>
> Now that force_fatal_sig exists it is unnecessary and a bit confusing
> to use force_sigsegv in cases where the simpler force_fatal_sig is
> wanted.  So change every instance we can to make the code clearer.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/arc/kernel/process.c       | 2 +-
>  arch/m68k/kernel/traps.c        | 2 +-
>  arch/powerpc/kernel/signal_32.c | 2 +-
>  arch/powerpc/kernel/signal_64.c | 4 ++--
>  arch/s390/kernel/traps.c        | 2 +-
>  arch/um/kernel/trap.c           | 2 +-
>  arch/x86/kernel/vm86_32.c       | 2 +-
>  fs/exec.c                       | 2 +-
>  8 files changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 12/20] exit/kthread: Have kernel threads return instead of calling do_exit
  2021-10-20 17:43 ` [PATCH 12/20] exit/kthread: Have kernel threads return instead of calling do_exit Eric W. Biederman
@ 2021-10-21 11:12   ` Christoph Hellwig
  2021-10-21 15:11     ` Eric W. Biederman
  2021-10-21 16:21   ` Kees Cook
  1 sibling, 1 reply; 110+ messages in thread
From: Christoph Hellwig @ 2021-10-21 11:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook

On Wed, Oct 20, 2021 at 12:43:58PM -0500, Eric W. Biederman wrote:
> In 2009 Oleg reworked[1] the kernel threads so that it is not
> necessary to call do_exit if you are not using kthread_stop().  Remove
> the explicit calls of do_exit and complete_and_exit (with a NULL
> completion) that were previously necessary.

With this we should also be able to drop the export for do_exit.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
  2021-10-21  8:09     ` Geert Uytterhoeven
  (?)
@ 2021-10-21 13:33       ` Eric W. Biederman
  -1 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 13:33 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux Kernel Mailing List, Rich Felker,
	open list:TENSILICA XTENSA PORT (xtensa),
	Benjamin Herrenschmidt, open list:BROADCOM NVRAM DRIVER,
	Max Filippov, Paul Mackerras, H Peter Anvin, sparclinux,
	Vincent Chen, Thomas Gleixner, Linux-Arch, linux-s390,
	Yoshinori Sato, Michael Ellerman, Linux-sh list,
	Christian Borntraeger, Ingo Molnar, Jonas Bonn, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Openrisc, Borislav Petkov,
	Al Viro, Andy Lutomirski, Chris Zankel, Thomas Bogendoerfer,
	Nick Hu, linuxppc-dev, Oleg Nesterov, Greg Kroah-Hartman,
	Maciej Rozycki, Linus Torvalds, David Miller, Greentime Hu

Geert Uytterhoeven <geert@linux-m68k.org> writes:

> Hi Eric,
>
> Patch 21/20?

In reviewing another part of the patchset Linus asked if force_sigsegv
could go away.  It can't completely but I can get this far.

Given that it is just a cleanup it makes most sense to me as an
additional patch on top of what is already here.


> On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Now that force_fatal_sig exists it is unnecessary and a bit confusing
>> to use force_sigsegv in cases where the simpler force_fatal_sig is
>> wanted.  So change every instance we can to make the code clearer.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
>>  arch/m68k/kernel/traps.c        | 2 +-
>
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Thank you.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-21 13:33       ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 13:33 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rich Felker, Linux-sh list, Linux Kernel Mailing List,
	Max Filippov, Paul Mackerras, Greentime Hu, H Peter Anvin,
	sparclinux, Vincent Chen, Linux-Arch, linux-s390, Yoshinori Sato,
	Christian Borntraeger, Ingo Molnar,
	open list:TENSILICA XTENSA PORT (xtensa),
	Kees Cook, Vasily Gorbik, Heiko Carstens, Openrisc,
	Borislav Petkov, Al Viro, Andy Lutomirski, Oleg Nesterov,
	Thomas Gleixner, Chris Zankel, Jonas Bonn, Nick Hu,
	Greg Kroah-Hartman, Linus Torvalds,
	open list:BROADCOM NVRAM DRIVER, Thomas Bogendoerfer,
	linuxppc-dev, David Miller, Maciej Rozycki

Geert Uytterhoeven <geert@linux-m68k.org> writes:

> Hi Eric,
>
> Patch 21/20?

In reviewing another part of the patchset Linus asked if force_sigsegv
could go away.  It can't completely but I can get this far.

Given that it is just a cleanup it makes most sense to me as an
additional patch on top of what is already here.


> On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Now that force_fatal_sig exists it is unnecessary and a bit confusing
>> to use force_sigsegv in cases where the simpler force_fatal_sig is
>> wanted.  So change every instance we can to make the code clearer.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
>>  arch/m68k/kernel/traps.c        | 2 +-
>
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Thank you.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
@ 2021-10-21 13:33       ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 13:33 UTC (permalink / raw)
  To: openrisc

Geert Uytterhoeven <geert@linux-m68k.org> writes:

> Hi Eric,
>
> Patch 21/20?

In reviewing another part of the patchset Linus asked if force_sigsegv
could go away.  It can't completely but I can get this far.

Given that it is just a cleanup it makes most sense to me as an
additional patch on top of what is already here.


> On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Now that force_fatal_sig exists it is unnecessary and a bit confusing
>> to use force_sigsegv in cases where the simpler force_fatal_sig is
>> wanted.  So change every instance we can to make the code clearer.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
>>  arch/m68k/kernel/traps.c        | 2 +-
>
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Thank you.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 18/20] exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
  2021-10-21  7:06   ` Greg KH
@ 2021-10-21 15:06     ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 15:06 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook

Greg KH <gregkh@linuxfoundation.org> writes:

> On Wed, Oct 20, 2021 at 12:44:04PM -0500, Eric W. Biederman wrote:
>> Every place thread_exit is called is at the end of a function started
>> with kthread_run.  The code in kthread_run has arranged things so a
>> kernel thread can just return and do_exit will be called.
>> 
>> So just have the threads return instead of calling complete_and_exit.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  drivers/staging/rtl8723bs/core/rtw_cmd.c                | 2 +-
>>  drivers/staging/rtl8723bs/core/rtw_xmit.c               | 2 +-
>>  drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c          | 2 +-
>>  drivers/staging/rtl8723bs/include/osdep_service_linux.h | 2 --
>>  4 files changed, 3 insertions(+), 5 deletions(-)
>
> You "forgot" to cc: the linux-staging and the staging driver maintainer
> on these drivers/staging/ changes...

Yes I did.  Sorry about that.

> Anyway, they look fine to me, but you will get some conflicts with some
> of these changes based on cleanups already in my staging-next tree (in
> linux-next if you want to see them).  But feel free to take these all in
> your tree if that makes it easier:

I just did a test merge and there was one file that was completely
removed and one file with had changes a line or two above where my code
changed.  So nothing too difficult to result.

I don't really mind either way.  But keeping them all in one tree makes
them easier to keep track of, and allows me to do things like see if
I can remove EXPORT_SYMBOL(do_exit) as Christoph suggested.

Eric

> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 12/20] exit/kthread: Have kernel threads return instead of calling do_exit
  2021-10-21 11:12   ` Christoph Hellwig
@ 2021-10-21 15:11     ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 15:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook

Christoph Hellwig <hch@infradead.org> writes:

> On Wed, Oct 20, 2021 at 12:43:58PM -0500, Eric W. Biederman wrote:
>> In 2009 Oleg reworked[1] the kernel threads so that it is not
>> necessary to call do_exit if you are not using kthread_stop().  Remove
>> the explicit calls of do_exit and complete_and_exit (with a NULL
>> completion) that were previously necessary.
>
> With this we should also be able to drop the export for do_exit.

Good point.

After this set of changes I don't see any calls of do_exit in drivers
or other awkward places so that would make a good addition.

Eric


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 02/20] exit: Remove calls of do_exit after noreturn versions of die
  2021-10-20 17:43   ` [OpenRISC] " Eric W. Biederman
@ 2021-10-21 16:02     ` Kees Cook
  -1 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Jonas Bonn, Stefan Kristiansson, Stafford Horne, openrisc,
	Nick Hu, Greentime Hu, Vincent Chen, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, linux-s390, Yoshinori Sato,
	Rich Felker, linux-sh, linux-xtensa, Chris Zankel, Max Filippov

On Wed, Oct 20, 2021 at 12:43:48PM -0500, Eric W. Biederman wrote:
> On nds32, openrisc, s390, sh, and xtensa the function die never
> returns.  Mark die __noreturn so that no one expects die to return.
> Remove the do_exit calls after die as they will never be reached.

Maybe note that the "bust_spinlocks" calls are also redundant, since
they're in die(). I note that is a "mismatch" between the do_kill()
in die() (SIGSEGV) and after die() (SIGKILL). This patch makes no
behavioral change (the first caller would "win"), but I thought I'd note
it in case some architecture would prefer a different signal.

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 02/20] exit: Remove calls of do_exit after noreturn versions of die
@ 2021-10-21 16:02     ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:02 UTC (permalink / raw)
  To: openrisc

On Wed, Oct 20, 2021 at 12:43:48PM -0500, Eric W. Biederman wrote:
> On nds32, openrisc, s390, sh, and xtensa the function die never
> returns.  Mark die __noreturn so that no one expects die to return.
> Remove the do_exit calls after die as they will never be reached.

Maybe note that the "bust_spinlocks" calls are also redundant, since
they're in die(). I note that is a "mismatch" between the do_kill()
in die() (SIGSEGV) and after die() (SIGKILL). This patch makes no
behavioral change (the first caller would "win"), but I thought I'd note
it in case some architecture would prefer a different signal.

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 01/20] exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit
  2021-10-20 17:43 ` [PATCH 01/20] exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit Eric W. Biederman
@ 2021-10-21 16:02   ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Andy Lutomirski

On Wed, Oct 20, 2021 at 12:43:47PM -0500, Eric W. Biederman wrote:
> I do not see panic calling rewind_stack_do_exit anywhere, nor can I
> find anywhere in the history where doublefault_shim has called
> rewind_stack_do_exit.  So I don't think this comment was ever actually
> correct.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Fixes: 7d8d8cfdee9a ("x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 03/20] reboot: Remove the unreachable panic after do_exit in reboot(2)
  2021-10-20 17:43 ` [PATCH 03/20] reboot: Remove the unreachable panic after do_exit in reboot(2) Eric W. Biederman
@ 2021-10-21 16:05   ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

On Wed, Oct 20, 2021 at 12:43:49PM -0500, Eric W. Biederman wrote:
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/reboot.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index f7440c0c7e43..d6e0f9fb7f04 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -359,7 +359,6 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  	case LINUX_REBOOT_CMD_HALT:
>  		kernel_halt();
>  		do_exit(0);
> -		panic("cannot halt");

This looks like it was here for robustness (i.e. panic if do_exit
somehow fails)? But since do_exit() is marked __no_return, it doesn't
make sense to keep it. If we wanted to keep _something_ here, maybe just
add unreachable() ?

Reviewed-by: Kees Cook <keescook@chromium.org>

>  
>  	case LINUX_REBOOT_CMD_POWER_OFF:
>  		kernel_power_off();
> -- 
> 2.20.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 04/20] signal/sparc32: Remove unreachable do_exit in do_sparc_fault
  2021-10-20 17:43 ` [PATCH 04/20] signal/sparc32: Remove unreachable do_exit in do_sparc_fault Eric W. Biederman
@ 2021-10-21 16:05   ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	David Miller, sparclinux

On Wed, Oct 20, 2021 at 12:43:50PM -0500, Eric W. Biederman wrote:
> The call to do_exit in do_sparc_fault immediately follows a call to
> unhandled_fault.  The function unhandled_fault never returns.  This
> means the call to do_exit can never be reached.

Same thought: replace with unreachable() just to make this more
self-documenting? Either way:

Reviewed-by: Kees Cook <keescook@chromium.org>

> 
> Cc: David Miller <davem@davemloft.net>
> Cc: sparclinux@vger.kernel.org
> Fixes: 2.3.41
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/sparc/mm/fault_32.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
> index fa858626b85b..90dc4ae315c8 100644
> --- a/arch/sparc/mm/fault_32.c
> +++ b/arch/sparc/mm/fault_32.c
> @@ -248,7 +248,6 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
>  	}
>  
>  	unhandled_fault(address, tsk, regs);
> -	do_exit(SIGKILL);
>  
>  /*
>   * We ran out of memory, or some other thing happened to us that made
> -- 
> 2.20.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  2021-10-20 17:43 ` [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT Eric W. Biederman
@ 2021-10-21 16:06   ` Kees Cook
  2021-10-24  4:24   ` Maciej W. Rozycki
  2021-10-24 15:27   ` Thomas Bogendoerfer
  2 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Thomas Bogendoerfer, Maciej Rozycki, linux-mips

On Wed, Oct 20, 2021 at 12:43:51PM -0500, Eric W. Biederman wrote:
> When an instruction to save or restore a register from the stack fails
> in _save_fp_context or _restore_fp_context return with -EFAULT.  This
> change was made to r2300_fpu.S[1] but it looks like it got lost with
> the introduction of EX2[2].  This is also what the other implementation
> of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and
> what is needed for the callers to be able to handle the error.
> 
> Furthermore calling do_exit(SIGSEGV) from bad_stack is wrong because
> it does not terminate the entire process it just terminates a single
> thread.
> 
> As the changed code was the only caller of arch/mips/kernel/syscall.c:bad_stack
> remove the problematic and now unused helper function.
> 
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: Maciej Rozycki <macro@orcam.me.uk>
> Cc: linux-mips@vger.kernel.org
> [1] 35938a00ba86 ("MIPS: Fix ISA I FP sigcontext access violation handling")
> [2] f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout")
> Fixes: f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 06/20] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  2021-10-20 17:43 ` [PATCH 06/20] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) Eric W. Biederman
  2021-10-20 19:57   ` Linus Torvalds
@ 2021-10-21 16:08   ` Kees Cook
  1 sibling, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Yoshinori Sato, Rich Felker, linux-sh

On Wed, Oct 20, 2021 at 12:43:52PM -0500, Eric W. Biederman wrote:
> Today the sh code allocates memory the first time a process uses
> the fpu.  If that memory allocation fails, kill the affected task
> with force_sig(SIGKILL) rather than do_group_exit(SIGKILL).
> 
> Calling do_group_exit from an exception handler can potentially lead
> to dead locks as do_group_exit is not designed to be called from
> interrupt context.  Instead use force_sig(SIGKILL) to kill the
> userspace process.  Sending signals in general and force_sig in
> particular has been tested from interrupt context so there should be
> no problems.
> 
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: linux-sh@vger.kernel.org
> Fixes: 0ea820cf9bf5 ("sh: Move over to dynamically allocated FPU context.")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Looks sane; there should be no observable changes.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 07/20] signal/powerpc: On swapcontext failure force SIGSEGV
  2021-10-20 17:43   ` Eric W. Biederman
@ 2021-10-21 16:09     ` Kees Cook
  -1 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev

On Wed, Oct 20, 2021 at 12:43:53PM -0500, Eric W. Biederman wrote:
> If the register state may be partial and corrupted instead of calling
> do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
> process with SIGSEGV and does not let any more userspace code execute,
> instead of just killing one thread of the process and potentially
> confusing everything.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
> Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

This looks right to me.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 07/20] signal/powerpc: On swapcontext failure force SIGSEGV
@ 2021-10-21 16:09     ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-arch, linuxppc-dev, linux-kernel, Oleg Nesterov,
	Paul Mackerras, Al Viro, Linus Torvalds

On Wed, Oct 20, 2021 at 12:43:53PM -0500, Eric W. Biederman wrote:
> If the register state may be partial and corrupted instead of calling
> do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
> process with SIGSEGV and does not let any more userspace code execute,
> instead of just killing one thread of the process and potentially
> confusing everything.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
> Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

This looks right to me.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 08/20] signal/sparc: In setup_tsb_params convert open coded BUG into BUG
  2021-10-20 17:43 ` [PATCH 08/20] signal/sparc: In setup_tsb_params convert open coded BUG into BUG Eric W. Biederman
@ 2021-10-21 16:12   ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	David Miller, sparclinux

On Wed, Oct 20, 2021 at 12:43:54PM -0500, Eric W. Biederman wrote:
> The function setup_tsb_params has exactly one caller tsb_grow.  The
> function tsb_grow passes in a tsb_bytes value that is between 8192 and
> 1048576 inclusive, and is guaranteed to be a power of 2.  The function
> setup_tsb_params verifies this property with a switch statement and
> then prints an error and causes the task to exit if this is not true.
> 
> In practice that print statement can never be reached because tsb_grow
> never passes in a bad tsb_size.  So if tsb_size ever gets a bad value
> that is a kernel bug.
> 
> So replace the do_exit which is effectively an open coded version of
> BUG() with an actuall call to BUG().  Making it clearer that this
> is a case that can never, and should never happen.
> 
> Cc: David Miller <davem@davemloft.net>
> Cc: sparclinux@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/sparc/mm/tsb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
> index 0dce4b7ff73e..912205787161 100644
> --- a/arch/sparc/mm/tsb.c
> +++ b/arch/sparc/mm/tsb.c
> @@ -266,7 +266,7 @@ static void setup_tsb_params(struct mm_struct *mm, unsigned long tsb_idx, unsign
>  	default:
>  		printk(KERN_ERR "TSB[%s:%d]: Impossible TSB size %lu, killing process.\n",
>  		       current->comm, current->pid, tsb_bytes);
> -		do_exit(SIGSEGV);
> +		BUG();
>  	}
>  	tte |= pte_sz_bits(page_sz);

Given the other uses of BUG() here, this seems okay.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-10-20 17:43 ` [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON Eric W. Biederman
@ 2021-10-21 16:15   ` Kees Cook
  2021-11-12 15:40   ` Eric W. Biederman
  1 sibling, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H Peter Anvin

On Wed, Oct 20, 2021 at 12:43:55PM -0500, Eric W. Biederman wrote:
> The function save_v86_state is only called when userspace was
> operating in vm86 mode before entering the kernel.  Not having vm86
> state in the task_struct should never happen.  So transform the hand
> rolled BUG_ON into an actual BUG_ON to make it clear what is
> happening.

If this is actually not a state userspace can put itself into:

Reviewed-by: Kees Cook <keescook@chromium.org>

Otherwise, this should be a WARN+kill.

> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: H Peter Anvin <hpa@zytor.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/x86/kernel/vm86_32.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
> index e5a7a10a0164..63486da77272 100644
> --- a/arch/x86/kernel/vm86_32.c
> +++ b/arch/x86/kernel/vm86_32.c
> @@ -106,10 +106,8 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
>  	 */
>  	local_irq_enable();
>  
> -	if (!vm86 || !vm86->user_vm86) {
> -		pr_alert("no user_vm86: BAD\n");
> -		do_exit(SIGSEGV);
> -	}
> +	BUG_ON(!vm86 || !vm86->user_vm86);
> +
>  	set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | vm86->veflags_mask);
>  	user = vm86->user_vm86;
>  
> -- 
> 2.20.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-20 17:43 ` [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved Eric W. Biederman
@ 2021-10-21 16:16   ` Kees Cook
  2021-10-21 17:02     ` Eric W. Biederman
  2021-10-21 23:08   ` Andy Lutomirski
       [not found]   ` <875ytkygfj.fsf_-_@disp2133>
  2 siblings, 1 reply; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H Peter Anvin

On Wed, Oct 20, 2021 at 12:43:56PM -0500, Eric W. Biederman wrote:
> Instead of pretending to send SIGSEGV by calling do_exit(SIGSEGV)
> call force_sigsegv(SIGSEGV) to force the process to take a SIGSEGV
> and terminate.
> 
> Update handle_signal to return immediately when save_v86_state fails
> and kills the process.  Returning immediately without doing anything
> except killing the process with SIGSEGV is also what signal_setup_done
> does when setup_rt_frame fails.  Plus it is always ok to return
> immediately without delivering a signal to a userspace handler when a
> fatal signal has killed the current process.

Do the tools/testing/selftests/x86 tests all pass after these changes? I
know Andy has a bunch of weird corner cases in there.

Reviewed-by: Kees Cook <keescook@chromium.org>

> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: H Peter Anvin <hpa@zytor.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/x86/kernel/signal.c  | 6 +++++-
>  arch/x86/kernel/vm86_32.c | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index f4d21e470083..25a230f705c1 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -785,8 +785,12 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>  	bool stepping, failed;
>  	struct fpu *fpu = &current->thread.fpu;
>  
> -	if (v8086_mode(regs))
> +	if (v8086_mode(regs)) {
>  		save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
> +		/* Has save_v86_state failed and killed the process? */
> +		if (fatal_signal_pending(current))
> +			return;
> +	}
>  
>  	/* Are we from a system call? */
>  	if (syscall_get_nr(current, regs) != -1) {
> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
> index 63486da77272..040fd01be8b3 100644
> --- a/arch/x86/kernel/vm86_32.c
> +++ b/arch/x86/kernel/vm86_32.c
> @@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
>  	user_access_end();
>  Efault:
>  	pr_alert("could not access userspace vm86 info\n");
> -	do_exit(SIGSEGV);
> +	force_sigsegv(SIGSEGV);
>  }
>  
>  static int do_vm86_irq_handling(int subfunction, int irqnumber);
> -- 
> 2.20.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler
  2021-10-20 17:43 ` [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler Eric W. Biederman
@ 2021-10-21 16:17   ` Kees Cook
  2021-10-26  9:38   ` Christian Borntraeger
  1 sibling, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390

On Wed, Oct 20, 2021 at 12:43:57PM -0500, Eric W. Biederman wrote:
> Reading the history it is unclear why default_trap_handler calls
> do_exit.  It is not even menthioned in the commit where the change
> happened.  My best guess is that because it is unknown why the
> exception happened it was desired to guarantee the process never
> returned to userspace.
> 
> Using do_exit(SIGSEGV) has the problem that it will only terminate one
> thread of a process, leaving the process in an undefined state.
> 
> Use force_sigsegv(SIGSEGV) instead which effectively has the same
> behavior except that is uses the ordinary signal mechanism and
> terminates all threads of a process and is generally well defined.
> 
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 12/20] exit/kthread: Have kernel threads return instead of calling do_exit
  2021-10-20 17:43 ` [PATCH 12/20] exit/kthread: Have kernel threads return instead of calling do_exit Eric W. Biederman
  2021-10-21 11:12   ` Christoph Hellwig
@ 2021-10-21 16:21   ` Kees Cook
  1 sibling, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

On Wed, Oct 20, 2021 at 12:43:58PM -0500, Eric W. Biederman wrote:
> In 2009 Oleg reworked[1] the kernel threads so that it is not
> necessary to call do_exit if you are not using kthread_stop().  Remove
> the explicit calls of do_exit and complete_and_exit (with a NULL
> completion) that were previously necessary.
> 
> [1] 63706172f332 ("kthreads: rework kthread_stop()")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Looks sensible. Can you check that tools/testing/selftests/firmware
still passes? That test does a fair bit of kthread waiting, etc.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-20 17:43 ` [PATCH 13/20] signal: Implement force_fatal_sig Eric W. Biederman
  2021-10-20 20:05   ` Linus Torvalds
@ 2021-10-21 16:24   ` Kees Cook
  2021-10-21 16:33     ` Eric W. Biederman
  1 sibling, 1 reply; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

On Wed, Oct 20, 2021 at 12:43:59PM -0500, Eric W. Biederman wrote:
> Add a simple helper force_fatal_sig that causes a signal to be
> delivered to a process as if the signal handler was set to SIG_DFL.
> 
> Reimplement force_sigsegv based upon this new helper.  This fixes
> force_sigsegv so that when it forces the default signal handler
> to be used the code now forces the signal to be unblocked as well.
> 
> Reusing the tested logic in force_sig_info_to_task that was built for
> force_sig_seccomp this makes the implementation trivial.
> 
> This is interesting both because it makes force_sigsegv simpler and
> because there are a couple of buggy places in the kernel that call
> do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight
> forward way today for those places to simply force the exit of a
> process with the chosen signal.  Creating force_fatal_sig allows
> those places to be implemented with normal signal exits.

I assume this is talking about seccomp()? :) Should a patch be included
in this series to change those?

> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 02/20] exit: Remove calls of do_exit after noreturn versions of die
  2021-10-21 16:02     ` [OpenRISC] " Kees Cook
@ 2021-10-21 16:25       ` Eric W. Biederman
  -1 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 16:25 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Jonas Bonn, Stefan Kristiansson, Stafford Horne, openrisc,
	Nick Hu, Greentime Hu, Vincent Chen, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, linux-s390, Yoshinori Sato,
	Rich Felker, linux-sh, linux-xtensa, Chris Zankel, Max Filippov

Kees Cook <keescook@chromium.org> writes:

> On Wed, Oct 20, 2021 at 12:43:48PM -0500, Eric W. Biederman wrote:
>> On nds32, openrisc, s390, sh, and xtensa the function die never
>> returns.  Mark die __noreturn so that no one expects die to return.
>> Remove the do_exit calls after die as they will never be reached.
>
> Maybe note that the "bust_spinlocks" calls are also redundant, since
> they're in die(). I note that is a "mismatch" between the do_kill()
> in die() (SIGSEGV) and after die() (SIGKILL). This patch makes no
> behavioral change (the first caller would "win"), but I thought I'd note
> it in case some architecture would prefer a different signal.

If someone has some strong preferences in the matter of which signal a
wait on a processes that has oopsed should return please let me know.

My next step in cleaning up the uses of do_exit looks like it is going
to be getting all of the architectures to use the same signal for oopses
(aka die), and then introducing a helper (called something like
"make_task_dead" or "oops_task_exit" ) that will replace do_exit on the
oops path and not take a signal number at all.

That helper I can then remove the ptrace break point from and possibly
some of the coredump logic as well.  Ultimately it will be something
we can optimize for the case when we know there is a kernel bug and we
just want the task to exit so the rest of the system can limp along
as best as it can.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [OpenRISC] [PATCH 02/20] exit: Remove calls of do_exit after noreturn versions of die
@ 2021-10-21 16:25       ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 16:25 UTC (permalink / raw)
  To: openrisc

Kees Cook <keescook@chromium.org> writes:

> On Wed, Oct 20, 2021 at 12:43:48PM -0500, Eric W. Biederman wrote:
>> On nds32, openrisc, s390, sh, and xtensa the function die never
>> returns.  Mark die __noreturn so that no one expects die to return.
>> Remove the do_exit calls after die as they will never be reached.
>
> Maybe note that the "bust_spinlocks" calls are also redundant, since
> they're in die(). I note that is a "mismatch" between the do_kill()
> in die() (SIGSEGV) and after die() (SIGKILL). This patch makes no
> behavioral change (the first caller would "win"), but I thought I'd note
> it in case some architecture would prefer a different signal.

If someone has some strong preferences in the matter of which signal a
wait on a processes that has oopsed should return please let me know.

My next step in cleaning up the uses of do_exit looks like it is going
to be getting all of the architectures to use the same signal for oopses
(aka die), and then introducing a helper (called something like
"make_task_dead" or "oops_task_exit" ) that will replace do_exit on the
oops path and not take a signal number at all.

That helper I can then remove the ptrace break point from and possibly
some of the coredump logic as well.  Ultimately it will be something
we can optimize for the case when we know there is a kernel bug and we
just want the task to exit so the rest of the system can limp along
as best as it can.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure
  2021-10-20 17:44 ` [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure Eric W. Biederman
@ 2021-10-21 16:25   ` Kees Cook
  2021-10-21 16:37     ` Eric W. Biederman
  2021-10-25 22:32     ` Andy Lutomirski
  2021-10-21 16:35   ` Gabriel Krisman Bertazi
  1 sibling, 2 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Gabriel Krisman Bertazi, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski

On Wed, Oct 20, 2021 at 12:44:00PM -0500, Eric W. Biederman wrote:
> Use force_fatal_sig instead of calling do_exit directly.  This ensures
> the ordinary signal handling path gets invoked, core dumps as
> appropriate get created, and for multi-threaded processes all of the
> threads are terminated not just a single thread.
> 
> When asked Gabriel Krisman Bertazi <krisman@collabora.com> said [1]:
> > ebiederm@xmission.com (Eric W. Biederman) asked:
> >
> > > Why does do_syscal_user_dispatch call do_exit(SIGSEGV) and
> > > do_exit(SIGSYS) instead of force_sig(SIGSEGV) and force_sig(SIGSYS)?
> > >
> > > Looking at the code these cases are not expected to happen, so I would
> > > be surprised if userspace depends on any particular behaviour on the
> > > failure path so I think we can change this.
> >
> > Hi Eric,
> >
> > There is not really a good reason, and the use case that originated the
> > feature doesn't rely on it.
> >
> > Unless I'm missing yet another problem and others correct me, I think
> > it makes sense to change it as you described.
> >
> > > Is using do_exit in this way something you copied from seccomp?
> >
> > I'm not sure, its been a while, but I think it might be just that.  The
> > first prototype of SUD was implemented as a seccomp mode.
> 
> If at some point it becomes interesting we could relax
> "force_fatal_sig(SIGSEGV)" to instead say
> "force_sig_fault(SIGSEGV, SEGV_MAPERR, sd->selector)".
> 
> I avoid doing that in this patch to avoid making it possible
> to catch currently uncatchable signals.
> 
> Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> [1] https://lkml.kernel.org/r/87mtr6gdvi.fsf@collabora.com
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Yeah, looks good. Should be no visible behavior change.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-21 16:24   ` Kees Cook
@ 2021-10-21 16:33     ` Eric W. Biederman
  2021-10-21 16:39       ` Kees Cook
  0 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 16:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

Kees Cook <keescook@chromium.org> writes:

> On Wed, Oct 20, 2021 at 12:43:59PM -0500, Eric W. Biederman wrote:
>> This is interesting both because it makes force_sigsegv simpler and
>> because there are a couple of buggy places in the kernel that call
>> do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight
>> forward way today for those places to simply force the exit of a
>> process with the chosen signal.  Creating force_fatal_sig allows
>> those places to be implemented with normal signal exits.
>
> I assume this is talking about seccomp()? :) Should a patch be included
> in this series to change those?

Actually it is not talking about seccomp.  As far as I can tell seccomp
is deliberately only killing a single thread when it calls do_exit.

I am thinking about places where we really want the entire process to
die and not just a single thread.  Please see the following changes
where I actually use force_fatal_sig.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 15/20] signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
  2021-10-20 17:44 ` [PATCH 15/20] signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails Eric W. Biederman
@ 2021-10-21 16:34   ` Kees Cook
  2021-10-21 16:56     ` Eric W. Biederman
  0 siblings, 1 reply; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	David Miller, sparclinux

On Wed, Oct 20, 2021 at 12:44:01PM -0500, Eric W. Biederman wrote:
> The function try_to_clear_window_buffer is only called from
> rtrap_32.c.  After it is called the signal pending state is retested,

nit: rtrap_32.S

> and signals are handled if TIF_SIGPENDING is set.  This allows
> try_to_clear_window_buffer to call force_fatal_signal and then rely on
> the signal being delivered to kill the process, without any danger of
> returning to userspace, or otherwise using possible corrupt state on
> failure.

The TIF_SIGPENDING test happens in do_notify_resume(), though I see
other code before that:

...
        call    try_to_clear_window_buffer
        add    %sp, STACKFRAME_SZ, %o0

        b       signal_p
...
signal_p:
        andcc   %g2, _TIF_DO_NOTIFY_RESUME_MASK, %g0
        bz,a    ret_trap_continue
        ld     [%sp + STACKFRAME_SZ + PT_PSR], %t_psr

        mov     %g2, %o2
        mov     %l6, %o1
        call    do_notify_resume

Will the ret_trap_continue always be skipped?

Also I see the "tp->w_saved = 0" never happens due to the "return" in
try_to_clear_window_buffer. Is that okay? Only synchronize_user_stack()
uses it, and that could be called in do_sigreturn(). Should the "return"
be removed?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 16/20] signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
  2021-10-20 17:44 ` [PATCH 16/20] signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig Eric W. Biederman
@ 2021-10-21 16:34   ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	David Miller, sparclinux

On Wed, Oct 20, 2021 at 12:44:02PM -0500, Eric W. Biederman wrote:
> Modify the 32bit version of setup_rt_frame and setup_frame to act
> similar to the 64bit version of setup_rt_frame and fail with a signal
> instead of calling do_exit.
> 
> Replacing do_exit(SIGILL) with force_fatal_signal(SIGILL) ensures that
> the process will be terminated cleanly when the stack frame is
> invalid, instead of just killing off a single thread and leaving the
> process is a weird state.
> 
> Cc: David Miller <davem@davemloft.net>
> Cc: sparclinux@vger.kernel.org
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Nicely already had the return path written. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure
  2021-10-20 17:44 ` [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure Eric W. Biederman
  2021-10-21 16:25   ` Kees Cook
@ 2021-10-21 16:35   ` Gabriel Krisman Bertazi
  1 sibling, 0 replies; 110+ messages in thread
From: Gabriel Krisman Bertazi @ 2021-10-21 16:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Use force_fatal_sig instead of calling do_exit directly.  This ensures
> the ordinary signal handling path gets invoked, core dumps as
> appropriate get created, and for multi-threaded processes all of the
> threads are terminated not just a single thread.
>
> When asked Gabriel Krisman Bertazi <krisman@collabora.com> said [1]:
>> ebiederm@xmission.com (Eric W. Biederman) asked:
>>
>> > Why does do_syscal_user_dispatch call do_exit(SIGSEGV) and
>> > do_exit(SIGSYS) instead of force_sig(SIGSEGV) and force_sig(SIGSYS)?
>> >
>> > Looking at the code these cases are not expected to happen, so I would
>> > be surprised if userspace depends on any particular behaviour on the
>> > failure path so I think we can change this.
>>
>> Hi Eric,
>>
>> There is not really a good reason, and the use case that originated the
>> feature doesn't rely on it.
>>
>> Unless I'm missing yet another problem and others correct me, I think
>> it makes sense to change it as you described.
>>
>> > Is using do_exit in this way something you copied from seccomp?
>>
>> I'm not sure, its been a while, but I think it might be just that.  The
>> first prototype of SUD was implemented as a seccomp mode.
>
> If at some point it becomes interesting we could relax
> "force_fatal_sig(SIGSEGV)" to instead say
> "force_sig_fault(SIGSEGV, SEGV_MAPERR, sd->selector)".
>
> I avoid doing that in this patch to avoid making it possible
> to catch currently uncatchable signals.
>
> Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> [1] https://lkml.kernel.org/r/87mtr6gdvi.fsf@collabora.com
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/entry/syscall_user_dispatch.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>

Hi Eric,

Feel free to add:

Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 17/20] signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
  2021-10-20 17:44 ` [PATCH 17/20] signal/x86: In emulate_vsyscall force a signal instead of calling do_exit Eric W. Biederman
@ 2021-10-21 16:36   ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Andy Lutomirski

On Wed, Oct 20, 2021 at 12:44:03PM -0500, Eric W. Biederman wrote:
> Directly calling do_exit with a signal number has the problem that
> all of the side effects of the signal don't happen, such as
> killing all of the threads of a process instead of just the
> calling thread.
> 
> So replace do_exit(SIGSYS) with force_fatal_sig(SIGSYS) which
> causes the signal handling to take it's normal path and work
> as expected.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/x86/entry/vsyscall/vsyscall_64.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 1b40b9297083..0b6b277ee050 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -226,7 +226,8 @@ bool emulate_vsyscall(unsigned long error_code,
>  	if ((!tmp && regs->orig_ax != syscall_nr) || regs->ip != address) {
>  		warn_bad_vsyscall(KERN_DEBUG, regs,
>  				  "seccomp tried to change syscall nr or ip");
> -		do_exit(SIGSYS);
> +		force_fatal_sig(SIGSYS);
> +		return true;
>  	}
>  	regs->orig_ax = -1;
>  	if (tmp)

This looks correct to me, but please double-check the x86 selftests if
you haven't already.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 18/20] exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:44 ` [PATCH 18/20] exit/rtl8723bs: Replace the macro thread_exit with a simple return 0 Eric W. Biederman
  2021-10-21  7:06   ` Greg KH
@ 2021-10-21 16:37   ` Kees Cook
  1 sibling, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

On Wed, Oct 20, 2021 at 12:44:04PM -0500, Eric W. Biederman wrote:
> Every place thread_exit is called is at the end of a function started
> with kthread_run.  The code in kthread_run has arranged things so a
> kernel thread can just return and do_exit will be called.
> 
> So just have the threads return instead of calling complete_and_exit.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure
  2021-10-21 16:25   ` Kees Cook
@ 2021-10-21 16:37     ` Eric W. Biederman
  2021-10-21 16:40       ` Kees Cook
  2021-10-25 22:32     ` Andy Lutomirski
  1 sibling, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 16:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Gabriel Krisman Bertazi, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Wed, Oct 20, 2021 at 12:44:00PM -0500, Eric W. Biederman wrote:
>> Use force_fatal_sig instead of calling do_exit directly.  This ensures
>> the ordinary signal handling path gets invoked, core dumps as
>> appropriate get created, and for multi-threaded processes all of the
>> threads are terminated not just a single thread.
>
> Yeah, looks good. Should be no visible behavior change.

It is observable in that an entire multi-threaded process gets
terminated instead of a single thread.  But since these events should
be handling of extra-ordinary events I don't expect there is anyone
who wants to have a thread of their process survive.

Eric


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 19/20] exit/rtl8712: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:44 ` [PATCH 19/20] exit/rtl8712: " Eric W. Biederman
  2021-10-21  7:07   ` Greg KH
@ 2021-10-21 16:37   ` Kees Cook
  1 sibling, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

On Wed, Oct 20, 2021 at 12:44:05PM -0500, Eric W. Biederman wrote:
> The macro thread_exit is called is at the end of a function started
> with kthread_run.  The code in kthread_run has arranged things so a
> kernel thread can just return and do_exit will be called.
> 
> So just have the cmd_thread return instead of calling complete_and_exit.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 20/20] exit/r8188eu: Replace the macro thread_exit with a simple return 0
  2021-10-20 17:44 ` [PATCH 20/20] exit/r8188eu: " Eric W. Biederman
  2021-10-21  7:07   ` Greg KH
@ 2021-10-21 16:37   ` Kees Cook
  1 sibling, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

On Wed, Oct 20, 2021 at 12:44:06PM -0500, Eric W. Biederman wrote:
> The macro thread_exit is called is at the end of functions started
> with kthread_run.  The code in kthread_run has arranged things so a
> kernel thread can just return and do_exit will be called.
> 
> So just have rtw_cmd_thread and mp_xmit_packet_thread return instead
> of calling complete_and_exit.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-21 16:33     ` Eric W. Biederman
@ 2021-10-21 16:39       ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro

On Thu, Oct 21, 2021 at 11:33:43AM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Wed, Oct 20, 2021 at 12:43:59PM -0500, Eric W. Biederman wrote:
> >> This is interesting both because it makes force_sigsegv simpler and
> >> because there are a couple of buggy places in the kernel that call
> >> do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight
> >> forward way today for those places to simply force the exit of a
> >> process with the chosen signal.  Creating force_fatal_sig allows
> >> those places to be implemented with normal signal exits.
> >
> > I assume this is talking about seccomp()? :) Should a patch be included
> > in this series to change those?
> 
> Actually it is not talking about seccomp.  As far as I can tell seccomp
> is deliberately only killing a single thread when it calls do_exit.

Okay, I wasn't entirely sure, but yes, seccomp wants to keep the "kill
only 1 thread" option, which is weird, but useful for the threaded
seccomp monitor case.

> I am thinking about places where we really want the entire process to
> die and not just a single thread.  Please see the following changes
> where I actually use force_fatal_sig.

Yeah, I saw that now. Thanks!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure
  2021-10-21 16:37     ` Eric W. Biederman
@ 2021-10-21 16:40       ` Kees Cook
  2021-10-21 17:05         ` Eric W. Biederman
  0 siblings, 1 reply; 110+ messages in thread
From: Kees Cook @ 2021-10-21 16:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Gabriel Krisman Bertazi, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski

On Thu, Oct 21, 2021 at 11:37:23AM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Wed, Oct 20, 2021 at 12:44:00PM -0500, Eric W. Biederman wrote:
> >> Use force_fatal_sig instead of calling do_exit directly.  This ensures
> >> the ordinary signal handling path gets invoked, core dumps as
> >> appropriate get created, and for multi-threaded processes all of the
> >> threads are terminated not just a single thread.
> >
> > Yeah, looks good. Should be no visible behavior change.
> 
> It is observable in that an entire multi-threaded process gets
> terminated instead of a single thread.  But since these events should
> be handling of extra-ordinary events I don't expect there is anyone
> who wants to have a thread of their process survive.

Right -- sorry, I should have said that more clearly: "Besides the
single thread death now taking the whole process, there's not behavior
change (i.e. the signal delivery)." Still looks good to me.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 15/20] signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
  2021-10-21 16:34   ` Kees Cook
@ 2021-10-21 16:56     ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 16:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	David Miller, sparclinux

Kees Cook <keescook@chromium.org> writes:

> On Wed, Oct 20, 2021 at 12:44:01PM -0500, Eric W. Biederman wrote:
>> The function try_to_clear_window_buffer is only called from
>> rtrap_32.c.  After it is called the signal pending state is retested,
>
> nit: rtrap_32.S
>
>> and signals are handled if TIF_SIGPENDING is set.  This allows
>> try_to_clear_window_buffer to call force_fatal_signal and then rely on
>> the signal being delivered to kill the process, without any danger of
>> returning to userspace, or otherwise using possible corrupt state on
>> failure.
>
> The TIF_SIGPENDING test happens in do_notify_resume(), though I see
> other code before that:
>
> ...
>         call    try_to_clear_window_buffer
>         add    %sp, STACKFRAME_SZ, %o0
>
>         b       signal_p
> ...
> signal_p:
>         andcc   %g2, _TIF_DO_NOTIFY_RESUME_MASK, %g0
>         bz,a    ret_trap_continue
>         ld     [%sp + STACKFRAME_SZ + PT_PSR], %t_psr
>
>         mov     %g2, %o2
>         mov     %l6, %o1
>         call    do_notify_resume
>
> Will the ret_trap_continue always be skipped?

The ret_trap_continue is the break out of the loop.  So unless the code
is not properly setting the signal to be pending the code should be good.

> Also I see the "tp->w_saved = 0" never happens due to the "return" in
> try_to_clear_window_buffer. Is that okay?

It should be.  As you point out the w_saved value is only used in
generating signal frames.  The code in get_signal should never
return and should call do_group_exit which calls do_exit, so building
signal frames that happens after get_signal returns should never be
reached.

Further this is the same way the code makes it to do_exit today.

Also looking at it I think the logic is that w_saved == 0
says that the register windows have been saved on the user mode stack,
and that clearly has not happened so I think it would in general
be a bug to clear w_saved on failure.

> Only synchronize_user_stack()
> uses it, and that could be called in do_sigreturn(). Should the "return"
> be removed?

Of course I could be wrong, if David or someone else who knows sparc32
better than me wants to set me straight I would really appreciate it.

Eric


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-21 16:16   ` Kees Cook
@ 2021-10-21 17:02     ` Eric W. Biederman
  2021-10-21 20:33       ` Kees Cook
  0 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 17:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H Peter Anvin

Kees Cook <keescook@chromium.org> writes:

> On Wed, Oct 20, 2021 at 12:43:56PM -0500, Eric W. Biederman wrote:
>> Instead of pretending to send SIGSEGV by calling do_exit(SIGSEGV)
>> call force_sigsegv(SIGSEGV) to force the process to take a SIGSEGV
>> and terminate.
>> 
>> Update handle_signal to return immediately when save_v86_state fails
>> and kills the process.  Returning immediately without doing anything
>> except killing the process with SIGSEGV is also what signal_setup_done
>> does when setup_rt_frame fails.  Plus it is always ok to return
>> immediately without delivering a signal to a userspace handler when a
>> fatal signal has killed the current process.
>
> Do the tools/testing/selftests/x86 tests all pass after these changes? I
> know Andy has a bunch of weird corner cases in there.

That would require a 32bit userspace wouldn't it?

It is a good idea so I will see if I can dig such a box up, but I
unfortunately don't have an up-to-date 32bit box handy, or even
an up-to-date box with a 32bit userspace.

It has been about 20 years since I have done much with 32bit x86.

How hard is it to run the tests under tools/testing/selftests/...
Last time I tried it was a royal pain.  I am hoping it is better this
round.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure
  2021-10-21 16:40       ` Kees Cook
@ 2021-10-21 17:05         ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-21 17:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Gabriel Krisman Bertazi, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Thu, Oct 21, 2021 at 11:37:23AM -0500, Eric W. Biederman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>> 
>> > On Wed, Oct 20, 2021 at 12:44:00PM -0500, Eric W. Biederman wrote:
>> >> Use force_fatal_sig instead of calling do_exit directly.  This ensures
>> >> the ordinary signal handling path gets invoked, core dumps as
>> >> appropriate get created, and for multi-threaded processes all of the
>> >> threads are terminated not just a single thread.
>> >
>> > Yeah, looks good. Should be no visible behavior change.
>> 
>> It is observable in that an entire multi-threaded process gets
>> terminated instead of a single thread.  But since these events should
>> be handling of extra-ordinary events I don't expect there is anyone
>> who wants to have a thread of their process survive.
>
> Right -- sorry, I should have said that more clearly: "Besides the
> single thread death now taking the whole process, there's not behavior
> change (i.e. the signal delivery)." Still looks good to me.

Yes.  I just didn't want that single vs multi-thread case to sneak up on
people.  Especially since that is part of the questionable behavior that
I am sorting out.

Eric


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-21 17:02     ` Eric W. Biederman
@ 2021-10-21 20:33       ` Kees Cook
  0 siblings, 0 replies; 110+ messages in thread
From: Kees Cook @ 2021-10-21 20:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H Peter Anvin

On Thu, Oct 21, 2021 at 12:02:49PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Wed, Oct 20, 2021 at 12:43:56PM -0500, Eric W. Biederman wrote:
> >> Instead of pretending to send SIGSEGV by calling do_exit(SIGSEGV)
> >> call force_sigsegv(SIGSEGV) to force the process to take a SIGSEGV
> >> and terminate.
> >> 
> >> Update handle_signal to return immediately when save_v86_state fails
> >> and kills the process.  Returning immediately without doing anything
> >> except killing the process with SIGSEGV is also what signal_setup_done
> >> does when setup_rt_frame fails.  Plus it is always ok to return
> >> immediately without delivering a signal to a userspace handler when a
> >> fatal signal has killed the current process.
> >
> > Do the tools/testing/selftests/x86 tests all pass after these changes? I
> > know Andy has a bunch of weird corner cases in there.
> 
> That would require a 32bit userspace wouldn't it?
> 
> It is a good idea so I will see if I can dig such a box up, but I
> unfortunately don't have an up-to-date 32bit box handy, or even
> an up-to-date box with a 32bit userspace.
> 
> It has been about 20 years since I have done much with 32bit x86.

I've done recent ia32 testing just under qemu with a 32bit x86 image.
Since I've got this set up already, I'll give it a spin...

> How hard is it to run the tests under tools/testing/selftests/...
> Last time I tried it was a royal pain.  I am hoping it is better this
> round.

It _is_ a little weird. :P I do it like this, pulled from the larger docs[1]:

# Build host
$ make -C tools/testing/selftests gen_tar TARGETS="x86" FORMAT=.xz
$ scp $(find tools/testing/selftests -name kselftests.tar.xz) target:

# Target host
$ mkdir kselftests && cd kselftests
$ tar -xaf ../kselftests.tar.xz
$ ./run_kselftest.sh


[1] https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-20 17:43 ` [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved Eric W. Biederman
  2021-10-21 16:16   ` Kees Cook
@ 2021-10-21 23:08   ` Andy Lutomirski
  2021-10-24 16:06     ` Eric W. Biederman
       [not found]   ` <875ytkygfj.fsf_-_@disp2133>
  2 siblings, 1 reply; 110+ messages in thread
From: Andy Lutomirski @ 2021-10-21 23:08 UTC (permalink / raw)
  To: Eric W. Biederman, Linux Kernel Mailing List
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin



On Wed, Oct 20, 2021, at 10:43 AM, Eric W. Biederman wrote:
> Instead of pretending to send SIGSEGV by calling do_exit(SIGSEGV)
> call force_sigsegv(SIGSEGV) to force the process to take a SIGSEGV
> and terminate.

Why?  I realize it's more polite, but is this useful enough to justify the need for testing and potential security impacts?

>
> Update handle_signal to return immediately when save_v86_state fails
> and kills the process.  Returning immediately without doing anything
> except killing the process with SIGSEGV is also what signal_setup_done
> does when setup_rt_frame fails.  Plus it is always ok to return
> immediately without delivering a signal to a userspace handler when a
> fatal signal has killed the current process.
>

I can mostly understand the individual sentences, but I don't understand what you're getting it.  If a fatal signal has killed the current process and we are guaranteed not to hit the exit-to-usermode path, then, sure, it's safe to return unless we're worried that the core dump code will explode.

But, unless it's fixed elsewhere in your series, force_sigsegv() is itself quite racy, or at least looks racy -- it can race against another thread calling sigaction() and changing the action to something other than SIG_DFL.  So it does not appear to actually reliably kill the caller, especially if exposed to a malicious user program.



> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: H Peter Anvin <hpa@zytor.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/x86/kernel/signal.c  | 6 +++++-
>  arch/x86/kernel/vm86_32.c | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index f4d21e470083..25a230f705c1 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -785,8 +785,12 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>  	bool stepping, failed;
>  	struct fpu *fpu = &current->thread.fpu;
> 
> -	if (v8086_mode(regs))
> +	if (v8086_mode(regs)) {
>  		save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
> +		/* Has save_v86_state failed and killed the process? */
> +		if (fatal_signal_pending(current))
> +			return;

This might be an ABI break, or at least it could be if anyone cared about vm86.  Imagine this wasn't guarded by if (v8086_mode) and was just if (fatal_signal_pending(current)) return;  Then all the other processing gets skipped if a fatal signal is pending (e.g. from a concurrent kill), which could cause visible oddities in a core dump, I think.  Maybe it's minor.

> +	}
> 
>  	/* Are we from a system call? */
>  	if (syscall_get_nr(current, regs) != -1) {
> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
> index 63486da77272..040fd01be8b3 100644
> --- a/arch/x86/kernel/vm86_32.c
> +++ b/arch/x86/kernel/vm86_32.c
> @@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, 
> int retval)
>  	user_access_end();
>  Efault:
>  	pr_alert("could not access userspace vm86 info\n");
> -	do_exit(SIGSEGV);
> +	force_sigsegv(SIGSEGV);

This causes us to run unwitting kernel code with the vm86 garbage still loaded into the relevant architectural areas (see the chunk if save_v86_state that's inside preempt_disable()).  So NAK, especially since the aforementioned race might cause the exit-to-usermode path to actually run with who-knows-what consequences.

If you really want to make this change, please arrange for save_v86_state() to switch out of vm86 mode *before* anything that might fail so that it's guaranteed to at least put the task in a sane state.  And write an explicit test case that tests it.  I could help with the latter if you do the former.

--Andy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  2021-10-20 17:43 ` [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT Eric W. Biederman
  2021-10-21 16:06   ` Kees Cook
@ 2021-10-24  4:24   ` Maciej W. Rozycki
  2021-10-25 20:55     ` Eric W. Biederman
  2021-10-24 15:27   ` Thomas Bogendoerfer
  2 siblings, 1 reply; 110+ messages in thread
From: Maciej W. Rozycki @ 2021-10-24  4:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Bogendoerfer, linux-mips

On Wed, 20 Oct 2021, Eric W. Biederman wrote:

> When an instruction to save or restore a register from the stack fails
> in _save_fp_context or _restore_fp_context return with -EFAULT.  This
> change was made to r2300_fpu.S[1] but it looks like it got lost with
> the introduction of EX2[2].  This is also what the other implementation
> of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and
> what is needed for the callers to be able to handle the error.

 Umm, right, good catch, thanks!  I think this ought to be backported.

Acked-by: Maciej W. Rozycki <macro@orcam.me.uk>

  Maciej

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  2021-10-20 17:43 ` [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT Eric W. Biederman
  2021-10-21 16:06   ` Kees Cook
  2021-10-24  4:24   ` Maciej W. Rozycki
@ 2021-10-24 15:27   ` Thomas Bogendoerfer
  2 siblings, 0 replies; 110+ messages in thread
From: Thomas Bogendoerfer @ 2021-10-24 15:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Maciej Rozycki, linux-mips

On Wed, Oct 20, 2021 at 12:43:51PM -0500, Eric W. Biederman wrote:
> When an instruction to save or restore a register from the stack fails
> in _save_fp_context or _restore_fp_context return with -EFAULT.  This
> change was made to r2300_fpu.S[1] but it looks like it got lost with
> the introduction of EX2[2].  This is also what the other implementation
> of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and
> what is needed for the callers to be able to handle the error.
> 
> Furthermore calling do_exit(SIGSEGV) from bad_stack is wrong because
> it does not terminate the entire process it just terminates a single
> thread.
> 
> As the changed code was the only caller of arch/mips/kernel/syscall.c:bad_stack
> remove the problematic and now unused helper function.
> 
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: Maciej Rozycki <macro@orcam.me.uk>
> Cc: linux-mips@vger.kernel.org
> [1] 35938a00ba86 ("MIPS: Fix ISA I FP sigcontext access violation handling")
> [2] f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout")
> Fixes: f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/mips/kernel/r2300_fpu.S | 4 ++--
>  arch/mips/kernel/syscall.c   | 9 ---------
>  2 files changed, 2 insertions(+), 11 deletions(-)

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-21 23:08   ` Andy Lutomirski
@ 2021-10-24 16:06     ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-24 16:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Kernel Mailing List, linux-arch, Linus Torvalds,
	Oleg Nesterov, Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H. Peter Anvin

"Andy Lutomirski" <luto@kernel.org> writes:

> On Wed, Oct 20, 2021, at 10:43 AM, Eric W. Biederman wrote:
>> Instead of pretending to send SIGSEGV by calling do_exit(SIGSEGV)
>> call force_sigsegv(SIGSEGV) to force the process to take a SIGSEGV
>> and terminate.
>
> Why?  I realize it's more polite, but is this useful enough to justify
> the need for testing and potential security impacts?

The why is that do_exit as an interface needs to be refactored.

As it exists right now "do_exit" is bad enough that on a couple of older
architectures do_exit in a random location results in being able to
read/write the kernel stack using ptrace.

So to addresses the issues I need to get everything that really
shouldn't be using do_exit to use something else.

>> Update handle_signal to return immediately when save_v86_state fails
>> and kills the process.  Returning immediately without doing anything
>> except killing the process with SIGSEGV is also what signal_setup_done
>> does when setup_rt_frame fails.  Plus it is always ok to return
>> immediately without delivering a signal to a userspace handler when a
>> fatal signal has killed the current process.
>>
>
> I can mostly understand the individual sentences, but I don't
> understand what you're getting it.  If a fatal signal has killed the
> current process and we are guaranteed not to hit the exit-to-usermode
> path, then, sure, it's safe to return unless we're worried that the
> core dump code will explode.
>
> But, unless it's fixed elsewhere in your series, force_sigsegv() is
> itself quite racy, or at least looks racy -- it can race against
> another thread calling sigaction() and changing the action to
> something other than SIG_DFL.  So it does not appear to actually
> reliably kill the caller, especially if exposed to a malicious user
> program.

You are correct about the races.  I have changes in the works to make
the races go away but that is not an excuse for push a change that
is buggy without them.



>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: x86@kernel.org
>> Cc: H Peter Anvin <hpa@zytor.com>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  arch/x86/kernel/signal.c  | 6 +++++-
>>  arch/x86/kernel/vm86_32.c | 2 +-
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
>> index f4d21e470083..25a230f705c1 100644
>> --- a/arch/x86/kernel/signal.c
>> +++ b/arch/x86/kernel/signal.c
>> @@ -785,8 +785,12 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>>  	bool stepping, failed;
>>  	struct fpu *fpu = &current->thread.fpu;
>> 
>> -	if (v8086_mode(regs))
>> +	if (v8086_mode(regs)) {
>>  		save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
>> +		/* Has save_v86_state failed and killed the process? */
>> +		if (fatal_signal_pending(current))
>> +			return;
>
> This might be an ABI break, or at least it could be if anyone cared
> about vm86.  Imagine this wasn't guarded by if (v8086_mode) and was
> just if (fatal_signal_pending(current)) return; Then all the other
> processing gets skipped if a fatal signal is pending (e.g. from a
> concurrent kill), which could cause visible oddities in a core dump, I
> think.  Maybe it's minor.

I believe it is minor, because the test happens before anything is
written to userspace.  The worst case is a signal gets dequeued and
then not written to userspace.

On a second I am not certain this test is even necessary.  Especially
if the change you suggest be made to save_v86_state is made so that
the kernel is out of v86 state and kernel things can safely happen.

>> +	}
>> 
>>  	/* Are we from a system call? */
>>  	if (syscall_get_nr(current, regs) != -1) {
>> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
>> index 63486da77272..040fd01be8b3 100644
>> --- a/arch/x86/kernel/vm86_32.c
>> +++ b/arch/x86/kernel/vm86_32.c
>> @@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, 
>> int retval)
>>  	user_access_end();
>>  Efault:
>>  	pr_alert("could not access userspace vm86 info\n");
>> -	do_exit(SIGSEGV);
>> +	force_sigsegv(SIGSEGV);
>
> This causes us to run unwitting kernel code with the vm86 garbage
> still loaded into the relevant architectural areas (see the chunk if
> save_v86_state that's inside preempt_disable()).  So NAK, especially
> since the aforementioned race might cause the exit-to-usermode path to
> actually run with who-knows-what consequences.

Fair.  I suspect it might even make the current do_exit call run
with who-knows-what consequence.

> If you really want to make this change, please arrange for
> save_v86_state() to switch out of vm86 mode *before* anything that
> might fail so that it's guaranteed to at least put the task in a sane
> state.  And write an explicit test case that tests it.  I could help
> with the latter if you do the former.

I do really want to remove this do_exit.  If the error was causes by a
kernel malfunction we could do something like die.

As it is the code is effectively hand rolling die/oops for a userspace
caused condition.  Which is quite nasty from a maintenance point of
view.


I think your suggested changes to save_v86_state are much more robust
than my idea of simply calling force_sig... and expecting the kernel
to exit immediately.   Having to go another pass through the
exit_to_usermode_loop does not look like it is very hard to make
it robust against a kernel in a random state.

I could close the race today by replacing the force_sigsegv(SIGSEGV)
with force_sig(SIGKILL).  And that removes the coredump path from
the equation so is a bit interesting, but it really is unsatisfactory.


I will dig in and see what can be done including writing a test so that
this code path gracefully handles -EFAULT rather than tries to walk
through the rest of the kernel in a problematic state.


This change as proposed does not get this save_v86_state case to using
ordinary mechanisms to handle the problem, so as written it does not
solve the problem it set out to solve.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  2021-10-24  4:24   ` Maciej W. Rozycki
@ 2021-10-25 20:55     ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-25 20:55 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Bogendoerfer, linux-mips

"Maciej W. Rozycki" <macro@orcam.me.uk> writes:

> On Wed, 20 Oct 2021, Eric W. Biederman wrote:
>
>> When an instruction to save or restore a register from the stack fails
>> in _save_fp_context or _restore_fp_context return with -EFAULT.  This
>> change was made to r2300_fpu.S[1] but it looks like it got lost with
>> the introduction of EX2[2].  This is also what the other implementation
>> of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and
>> what is needed for the callers to be able to handle the error.
>
>  Umm, right, good catch, thanks!  I think this ought to be backported.
>
> Acked-by: Maciej W. Rozycki <macro@orcam.me.uk>
>

I will add a CC stable.  So it can be backported after it is merged.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 10/32] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
       [not found]   ` <875ytkygfj.fsf_-_@disp2133>
@ 2021-10-25 21:12     ` Linus Torvalds
  2021-10-25 21:28       ` Eric W. Biederman
  2021-10-25 22:25     ` Andy Lutomirski
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2021-10-25 21:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H Peter Anvin

On Mon, Oct 25, 2021 at 1:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Update save_v86_state to always complete all of it's work except
> possibly some of the copies to userspace even if save_v86_state takes
> a fault.  This ensures that the kernel is always in a sane state, even
> if userspace has done something silly.

Well, honestly, with this change, you might as well replace the
force_sigsegv() with just a plain "force_sig()", and make it something
the process can catch.

The only thing that "force_sigsgv()" does is to make SIGSEGV
uncatchable. In contrast, a plain "force_sig()" just means that it
can't be ignored - but it can be caught, and it is fatal only when not
caught.

And with the "always complete the non-vm86 state restore" part change,
there's really no reason for it to not be caught.

Of course, the other case (where we have no state information for the
"enter vm86 mode" case) is still fatal, and is a "this should never
happen". But the "cannot write to the vm86 save state" thing isn't
technically fatal.

It should even be possible to write a test for it: passing a read-only
pointer to the vm86() system call. The vm86 entry will work (because
it only reads the vm86 state from it), but then at vm86 exit, writing
the state back will fail.

Anybody?

                 Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 10/32] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-25 21:12     ` [PATCH v2 10/32] " Linus Torvalds
@ 2021-10-25 21:28       ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-25 21:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H Peter Anvin

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Oct 25, 2021 at 1:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Update save_v86_state to always complete all of it's work except
>> possibly some of the copies to userspace even if save_v86_state takes
>> a fault.  This ensures that the kernel is always in a sane state, even
>> if userspace has done something silly.
>
> Well, honestly, with this change, you might as well replace the
> force_sigsegv() with just a plain "force_sig()", and make it something
> the process can catch.

The trouble is I don't think there is enough information made available
for user space to do anything with the SIGSEGV.  My memory is that
applications like dosemu very much have a SIGSEGV handler.

So I think if it ever happened it could be quite confusing.  Not to
mention the pr_alert message.

But I guess if a test is written like you suggest we can include enough
information for someone to make sense of things.

> The only thing that "force_sigsgv()" does is to make SIGSEGV
> uncatchable. In contrast, a plain "force_sig()" just means that it
> can't be ignored - but it can be caught, and it is fatal only when not
> caught.
>
> And with the "always complete the non-vm86 state restore" part change,
> there's really no reason for it to not be caught.
>
> Of course, the other case (where we have no state information for the
> "enter vm86 mode" case) is still fatal, and is a "this should never
> happen". But the "cannot write to the vm86 save state" thing isn't
> technically fatal.
>
> It should even be possible to write a test for it: passing a read-only
> pointer to the vm86() system call. The vm86 entry will work (because
> it only reads the vm86 state from it), but then at vm86 exit, writing
> the state back will fail.
>
> Anybody?

I am enthusiastic about writing a test, but I will plod in that
direction just so I can get this sorted out.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 10/32] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
       [not found]   ` <875ytkygfj.fsf_-_@disp2133>
  2021-10-25 21:12     ` [PATCH v2 10/32] " Linus Torvalds
@ 2021-10-25 22:25     ` Andy Lutomirski
  2021-10-25 23:45       ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Andy Lutomirski @ 2021-10-25 22:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H Peter Anvin

On 10/25/21 13:53, Eric W. Biederman wrote:
> 
> Update save_v86_state to always complete all of it's work except
> possibly some of the copies to userspace even if save_v86_state takes
> a fault.  This ensures that the kernel is always in a sane state, even
> if userspace has done something silly.
> 
> When save_v86_state takes a fault update it to force userspace to take
> a SIGSEGV and terminate the userspace application.
> 
> As Andy pointed out in review of the first version of this change
> there are races between sigaction and the application terinating.  Now
> that the code has been modified to always perform all save_v86_state's
> work (except possibly copying to userspace) those races do not matter
> from a kernel perspective.
> 
> Forcing the userspace application to terminate (by resetting it's
> handler to SIGDFL) is there to keep everything as close to the current
> behavior as possible while removing the unique (and difficult to
> maintain) use of do_exit.
> 
> If this new SIGSEGV happens during handle_signal the next time around
> the exit_to_user_mode_loop, SIGSEGV will be delivered to userspace.
> 
> All of the callers of handle_vm86_trap and handle_vm86_fault run the
> exit_to_user_mode_loop before they return to userspace any signal sent
> to the current task during their execution will be delivered to the
> current task before that tasks exits to usermode.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: H Peter Anvin <hpa@zytor.com>
> v1: https://lkml.kernel.org/r/20211020174406.17889-10-ebiederm@xmission.com
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>   arch/x86/kernel/vm86_32.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Any does this look better?

Conceptually yes, but:

> 
> I think by just completing all of the work that isn't copying to
> userspace this makes save_v86_state much more robust.
> 
> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
> index 63486da77272..933cafab7832 100644
> --- a/arch/x86/kernel/vm86_32.c
> +++ b/arch/x86/kernel/vm86_32.c
> @@ -140,6 +140,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
>   
>   	user_access_end();
>   
> +exit_vm86:
>   	preempt_disable();
>   	tsk->thread.sp0 = vm86->saved_sp0;
>   	tsk->thread.sysenter_cs = __KERNEL_CS;
> @@ -159,7 +160,8 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
>   	user_access_end();
>   Efault:
>   	pr_alert("could not access userspace vm86 info\n");
> -	do_exit(SIGSEGV);
> +	force_sigsegv(SIGSEGV);
> +	goto exit_vm86;
>   }
>   
>   static int do_vm86_irq_handling(int subfunction, int irqnumber);
> 

I think the result would be nicer if, instead of adding an extra goto, 
you just literally moved all the cleanup under the unsafe_put_user()s 
above them.  Unless I missed something, none of the put_user stuff reads 
any state that is written by the cleanup code.

--Andy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure
  2021-10-21 16:25   ` Kees Cook
  2021-10-21 16:37     ` Eric W. Biederman
@ 2021-10-25 22:32     ` Andy Lutomirski
  1 sibling, 0 replies; 110+ messages in thread
From: Andy Lutomirski @ 2021-10-25 22:32 UTC (permalink / raw)
  To: Kees Cook, Eric W. Biederman
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Gabriel Krisman Bertazi, Thomas Gleixner, Peter Zijlstra

On 10/21/21 09:25, Kees Cook wrote:
> On Wed, Oct 20, 2021 at 12:44:00PM -0500, Eric W. Biederman wrote:
>> Use force_fatal_sig instead of calling do_exit directly.  This ensures
>> the ordinary signal handling path gets invoked, core dumps as
>> appropriate get created, and for multi-threaded processes all of the
>> threads are terminated not just a single thread.
>>
>> When asked Gabriel Krisman Bertazi <krisman@collabora.com> said [1]:
>>> ebiederm@xmission.com (Eric W. Biederman) asked:
>>>
>>>> Why does do_syscal_user_dispatch call do_exit(SIGSEGV) and
>>>> do_exit(SIGSYS) instead of force_sig(SIGSEGV) and force_sig(SIGSYS)?
>>>>
>>>> Looking at the code these cases are not expected to happen, so I would
>>>> be surprised if userspace depends on any particular behaviour on the
>>>> failure path so I think we can change this.
>>>
>>> Hi Eric,
>>>
>>> There is not really a good reason, and the use case that originated the
>>> feature doesn't rely on it.
>>>
>>> Unless I'm missing yet another problem and others correct me, I think
>>> it makes sense to change it as you described.
>>>
>>>> Is using do_exit in this way something you copied from seccomp?
>>>
>>> I'm not sure, its been a while, but I think it might be just that.  The
>>> first prototype of SUD was implemented as a seccomp mode.
>>
>> If at some point it becomes interesting we could relax
>> "force_fatal_sig(SIGSEGV)" to instead say
>> "force_sig_fault(SIGSEGV, SEGV_MAPERR, sd->selector)".
>>
>> I avoid doing that in this patch to avoid making it possible
>> to catch currently uncatchable signals.
>>
>> Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> [1] https://lkml.kernel.org/r/87mtr6gdvi.fsf@collabora.com
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Yeah, looks good. Should be no visible behavior change.
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> 

I'm confused.  Before this series, this error path would unconditionally 
kill the task (other than the race condition in force_sigsegv(), but at 
least a well-behaved task would get killed).  Now a signal handler might 
be invoked, and it would be invoked after the syscall that triggered the 
fault got processed as a no-op.  If the signal handler never returns, 
that's fine, but if the signal handler *does* return, the process might 
be in an odd state.  For SIGSYS, this behavior is probably fine, but 
having SIGSEGV swallow a syscall seems like a mistake.

Maybe rewind (approximately!) the syscall?  Or actually send SIGSYS?  Or 
actually make the signal uncatchable?

--Andy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-20 20:05   ` Linus Torvalds
  2021-10-20 21:25     ` Eric W. Biederman
@ 2021-10-25 22:41     ` Andy Lutomirski
  2021-10-25 23:15       ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Andy Lutomirski @ 2021-10-25 22:41 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman, Linux Kernel Mailing List, linux-arch
  Cc: Oleg Nesterov, Al Viro, Kees Cook

On 10/20/21 13:05, Linus Torvalds wrote:
> On Wed, Oct 20, 2021 at 7:45 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Add a simple helper force_fatal_sig that causes a signal to be
>> delivered to a process as if the signal handler was set to SIG_DFL.
>>
>> Reimplement force_sigsegv based upon this new helper.
> 
> Can you just make the old force_sigsegv() go away? The odd special
> casing of SIGSEGV was odd to begin with, I think everybody really just
> wanted this new "force_fatal_sig()" and allow any signal - not making
> SIGSEGV special.
> 

I'm rather nervous about all this, and I'm also nervous about the 
existing code.  A quick skim is finding plenty of code paths that assume 
force_sigsegv (or a do_exit that this series touches) are genuinely 
unrecoverable.  For example:

- rseq: the *kernel* will be fine if a signal is handled, but the 
userspace process may be in a very strange state.

- bprm_execve: The comment says it best:

         /*
          * If past the point of no return ensure the code never
          * returns to the userspace process.  Use an existing fatal
          * signal if present otherwise terminate the process with
          * SIGSEGV.
          */
         if (bprm->point_of_no_return && !fatal_signal_pending(current))
                 force_sigsegv(SIGSEGV);

- vm86: already discussed

Now force_sigsegv() at least tries to kill the task, but not very well. 
With the whole series applied and force_sigsegv() gone, these errors 
become handleable, and that needs real care.

(I don't think bprm_execve() is exploitable.  It looks like it's 
attackable in the window between setting point_of_no_return and 
unshare_sighand(), but I'm not seeing any useful way to attack it unless 
a core dump is already in progress or a *different* fatal signal is 
already pending, and in either of those cases we're fine.)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-25 22:41     ` Andy Lutomirski
@ 2021-10-25 23:15       ` Linus Torvalds
  2021-10-26  4:45         ` Eric W. Biederman
  2021-10-26  4:57         ` Eric W. Biederman
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2021-10-25 23:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook

On Mon, Oct 25, 2021 at 3:41 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> I'm rather nervous about all this, and I'm also nervous about the
> existing code.  A quick skim is finding plenty of code paths that assume
> force_sigsegv (or a do_exit that this series touches) are genuinely
> unrecoverable.

I was going to say "what are you talking about", because clearly Eric
kept it all fatal.

But then looked at that patch a bit more before I claimed you were wrong.

And yeah, Eric's force_fatal_sig() is completely broken.

It claims to force a fatal signal, but doesn't actually do that at
all, and is completely misnamed.

It just uses "force_sig_info_to_task()", which still allows user space
to catch signals - so it's not "fatal" in the least. It only punches
through SIG_IGN and blocked signals.

So yeah, that's broken.

I do still think that that could the behavior we possibly want for
that "can't write updated vm86 state back" situation, but for
something that is called "fatal", it really needs to be fatal.

            Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 10/32] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-25 22:25     ` Andy Lutomirski
@ 2021-10-25 23:45       ` Linus Torvalds
  2021-10-26  0:21         ` Andy Lutomirski
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2021-10-25 23:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H Peter Anvin

On Mon, Oct 25, 2021 at 3:25 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> I think the result would be nicer if, instead of adding an extra goto,
> you just literally moved all the cleanup under the unsafe_put_user()s
> above them.  Unless I missed something, none of the put_user stuff reads
> any state that is written by the cleanup code.

Sure it does:

        memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));

is very much part of the cleanup code, and overwrites that regs->pt thing.

Which is exactly what we're writing back to user space in that
unsafe_put_user() thing.

That said, thinking more about this, and looking at it again, I take
back my statement that we could just make it a catchable SIGSEGV
instead.

If we can't write the vm86 state to user space, we will have
fundamentally lost it, and while it's not fatal to the kernel, and
while we've recovered the original 32-bit state, it's not something
that user space can sanely recover from because the register state at
the end of the vm86 work has now been irrecoverably thrown away.

So I think Eric's patch is fine.

Except, as mentioned as part of the other patch, the "force_sigsegv()"
conversion to use "force_fatal_sig()" was broken, because that
function wasn't actually fatal at all.

             Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 10/32] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  2021-10-25 23:45       ` Linus Torvalds
@ 2021-10-26  0:21         ` Andy Lutomirski
  0 siblings, 0 replies; 110+ messages in thread
From: Andy Lutomirski @ 2021-10-26  0:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H. Peter Anvin



On Mon, Oct 25, 2021, at 4:45 PM, Linus Torvalds wrote:
> On Mon, Oct 25, 2021 at 3:25 PM Andy Lutomirski <luto@kernel.org> wrote:
>>
>> I think the result would be nicer if, instead of adding an extra goto,
>> you just literally moved all the cleanup under the unsafe_put_user()s
>> above them.  Unless I missed something, none of the put_user stuff reads
>> any state that is written by the cleanup code.
>
> Sure it does:
>
>         memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));
>
> is very much part of the cleanup code, and overwrites that regs->pt thing.
>
> Which is exactly what we're writing back to user space in that
> unsafe_put_user() thing.

D’oh, right.

>
> That said, thinking more about this, and looking at it again, I take
> back my statement that we could just make it a catchable SIGSEGV
> instead.
>
> If we can't write the vm86 state to user space, we will have
> fundamentally lost it, and while it's not fatal to the kernel, and
> while we've recovered the original 32-bit state, it's not something
> that user space can sanely recover from because the register state at
> the end of the vm86 work has now been irrecoverably thrown away.

There’s “recoverable” and there’s “recoverable”.  Sure, the vm86 state is gone, but the process is getting a signal that doesn’t indicate that one can freely return and carry on as if nothing happened.  But one can catch the signal and go on to do something else.

>
> So I think Eric's patch is fine.

Me too.

>
> Except, as mentioned as part of the other patch, the "force_sigsegv()"
> conversion to use "force_fatal_sig()" was broken, because that
> function wasn't actually fatal at all.
>
>              Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-25 23:15       ` Linus Torvalds
@ 2021-10-26  4:45         ` Eric W. Biederman
  2021-10-26  4:57         ` Eric W. Biederman
  1 sibling, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-26  4:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Oct 25, 2021 at 3:41 PM Andy Lutomirski <luto@kernel.org> wrote:
>>
>> I'm rather nervous about all this, and I'm also nervous about the
>> existing code.  A quick skim is finding plenty of code paths that assume
>> force_sigsegv (or a do_exit that this series touches) are genuinely
>> unrecoverable.
>
> I was going to say "what are you talking about", because clearly Eric
> kept it all fatal.
>
> But then looked at that patch a bit more before I claimed you were wrong.
>
> And yeah, Eric's force_fatal_sig() is completely broken.
>
> It claims to force a fatal signal, but doesn't actually do that at
> all, and is completely misnamed.
>
> It just uses "force_sig_info_to_task()", which still allows user space
> to catch signals - so it's not "fatal" in the least. It only punches
> through SIG_IGN and blocked signals.
>
> So yeah, that's broken.
>
> I do still think that that could the behavior we possibly want for
> that "can't write updated vm86 state back" situation, but for
> something that is called "fatal", it really needs to be fatal.

Once the code gets as far as force_sig_info_to_task the only
bit that is really missing is to make the signals fatal is:

diff --git a/kernel/signal.c b/kernel/signal.c
index 6a5e1802b9a2..fde043f1e59d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1048,7 +1048,6 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
                /*
                 * This signal will be fatal to the whole group.
                 */
-               if (!sig_kernel_coredump(sig)) {
                        /*
                         * Start a group exit and wake everybody up.
                         * This way we don't have other threads
@@ -1065,7 +1064,6 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
                                signal_wake_up(t, 1);
                        } while_each_thread(p, t);
                        return;
-               }
        }
 
        /*

AKA the only real bit missing is the interaction with the coredump code.

Now we can't just delete sig_kernel_coredump a replacement has to be
written.   And the easiest replacement depends on my other set of
changes that are already in linux-next to make coredumps per
signal_struct instead of per mm.

Which means that in a release or two force_fatal_sig will reliably do
what the name says.




So the question is: Should I name force_fatal_sig to something else in
the meantime?  What should I name it?




I do intend to fix that bit in complete_signal, as well as updating the
code in force_siginfo_to_task so that it doesn't need to change the
blocked state or the signal handler.

These special cases have been annoying me for years and now Andy has
found how they are actually hurting us.  So I do intend to fix that code
as quickly as being careful and code review allows.  Which I think means
one additional development cycle after this one.

Eric


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-25 23:15       ` Linus Torvalds
  2021-10-26  4:45         ` Eric W. Biederman
@ 2021-10-26  4:57         ` Eric W. Biederman
  2021-10-26 16:15           ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-26  4:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Oct 25, 2021 at 3:41 PM Andy Lutomirski <luto@kernel.org> wrote:
>>
>> I'm rather nervous about all this, and I'm also nervous about the
>> existing code.  A quick skim is finding plenty of code paths that assume
>> force_sigsegv (or a do_exit that this series touches) are genuinely
>> unrecoverable.
>
> I was going to say "what are you talking about", because clearly Eric
> kept it all fatal.
>
> But then looked at that patch a bit more before I claimed you were wrong.
>
> And yeah, Eric's force_fatal_sig() is completely broken.
>
> It claims to force a fatal signal, but doesn't actually do that at
> all, and is completely misnamed.
>
> It just uses "force_sig_info_to_task()", which still allows user space
> to catch signals - so it's not "fatal" in the least. It only punches
> through SIG_IGN and blocked signals.

Rereading this I think you might be misreading something.

force_siginfo_to_task takes a sigdfl parameter which I am setting in
force_fatal_signal.

When that sigdfl paramter is set force_siginfo_to_task always changes
the signal handler to SIGDFL, and always unblocks the signal.

Because the siglock remains held over send_signal none of those
properties can change during send_signal.  Which means that as long
as we are not talking about a coredump signal complete_signal is
guaranteed to recognize the signal as fatal immediately.

For coredump signals there is a race where siglock is dropped before
get_signal is called that could result in the signal handler being
changed or the signal being blocked.  Which is why I pointed out the
problem is coredumps.

But assuming userspace does not change something in that narrow window
the signal will most definitely be fatal to the target process.

Just as soon as I know if we can have per signal_struct coredumps
without causing regressions I will close the final race.  I can do it
either way but the code is much less complicated with per signal_struct
coredumps.

Hoisting the current zap_threads from fs/coredump.c into complete_signal
is a pain and a half.  While just the per_signal struct part is already
there, and the code just needs a few tweaks to allow get_signal to act
as the coredump rendezvous location.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler
  2021-10-20 17:43 ` [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler Eric W. Biederman
  2021-10-21 16:17   ` Kees Cook
@ 2021-10-26  9:38   ` Christian Borntraeger
  2021-10-28 15:56     ` Eric W. Biederman
  1 sibling, 1 reply; 110+ messages in thread
From: Christian Borntraeger @ 2021-10-26  9:38 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Heiko Carstens, Vasily Gorbik, linux-s390

Am 20.10.21 um 19:43 schrieb Eric W. Biederman:
> Reading the history it is unclear why default_trap_handler calls
> do_exit.  It is not even menthioned in the commit where the change
> happened.  My best guess is that because it is unknown why the
> exception happened it was desired to guarantee the process never
> returned to userspace.
> 
> Using do_exit(SIGSEGV) has the problem that it will only terminate one
> thread of a process, leaving the process in an undefined state.
> 
> Use force_sigsegv(SIGSEGV) instead which effectively has the same
> behavior except that is uses the ordinary signal mechanism and
> terminates all threads of a process and is generally well defined.

Do I get that right, that programs can not block SIGSEGV from force_sigsegv
with a signal handler? Thats how I read the code. If this is true
then

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>   arch/s390/kernel/traps.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
> index bcefc2173de4..51729ea2cf8e 100644
> --- a/arch/s390/kernel/traps.c
> +++ b/arch/s390/kernel/traps.c
> @@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs)
>   {
>   	if (user_mode(regs)) {
>   		report_user_fault(regs, SIGSEGV, 0);
> -		do_exit(SIGSEGV);
> +		force_sigsegv(SIGSEGV);
>   	} else
>   		die(regs, "Unknown program exception");
>   }
> 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-26  4:57         ` Eric W. Biederman
@ 2021-10-26 16:15           ` Linus Torvalds
  2021-10-28 16:33             ` Eric W. Biederman
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2021-10-26 16:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook

On Mon, Oct 25, 2021 at 9:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Rereading this I think you might be misreading something.

Gaah. Yes, indeed.

> force_siginfo_to_task takes a sigdfl parameter which I am setting in
> force_fatal_signal.

.. and I realized that the first time I read through it, but then when
I read through it due to Andy saying it worries him, I missed it and
thought the handler didn't get reset.

So the patch is fine.

             Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 06/20] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  2021-10-20 19:57   ` Linus Torvalds
@ 2021-10-27 14:24     ` Rich Felker
  0 siblings, 0 replies; 110+ messages in thread
From: Rich Felker @ 2021-10-27 14:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook, Yoshinori Sato, Linux-sh list

On Wed, Oct 20, 2021 at 09:57:58AM -1000, Linus Torvalds wrote:
> On Wed, Oct 20, 2021 at 7:44 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > +                       force_sig(SIGKILL);
> 
> I wonder if SIGFPE would be a more intuitive thing.
> 
> Doesn't really matter, this is a "doesn't happen" event anyway, but
> that was just my reaction to reading the patch.

I think SIGKILL makes more sense unless there's a way the process
could handle the resulting SIGFPE and recover. I'd actually like to
see the lazy allocation of FPU state just removed (the amount of space
saved is tiny relative to the complexity cost and the negative aspects
of unrecoverable late failure) but for now let's just go with this.

Rich

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler
  2021-10-26  9:38   ` Christian Borntraeger
@ 2021-10-28 15:56     ` Eric W. Biederman
  2021-10-29 19:32       ` Eric W. Biederman
  0 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-28 15:56 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Heiko Carstens, Vasily Gorbik, linux-s390

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> Am 20.10.21 um 19:43 schrieb Eric W. Biederman:
>> Reading the history it is unclear why default_trap_handler calls
>> do_exit.  It is not even menthioned in the commit where the change
>> happened.  My best guess is that because it is unknown why the
>> exception happened it was desired to guarantee the process never
>> returned to userspace.
>>
>> Using do_exit(SIGSEGV) has the problem that it will only terminate one
>> thread of a process, leaving the process in an undefined state.
>>
>> Use force_sigsegv(SIGSEGV) instead which effectively has the same
>> behavior except that is uses the ordinary signal mechanism and
>> terminates all threads of a process and is generally well defined.
>
> Do I get that right, that programs can not block SIGSEGV from force_sigsegv
> with a signal handler? Thats how I read the code. If this is true
> then
>
> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

99% true, and it is what force_sigsegv(SIGSEGV) intends to do.

Andy Lutormorski pointed at a race where a thread can call sigaction
and change the signal handler after force_sigsegv has run but before
the process dequeues the SIGSEGV.

In principle it isn't too hard to close that race, and I was hoping to
be able to tell you that I had sorted by the time I replied.
Unfortunately it looks like it will take another week or two so will
probably not be ready by the merge window.

I am definitely going to close that race.

Eric


>> Cc: Heiko Carstens <hca@linux.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: linux-s390@vger.kernel.org
>> Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
>> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>   arch/s390/kernel/traps.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
>> index bcefc2173de4..51729ea2cf8e 100644
>> --- a/arch/s390/kernel/traps.c
>> +++ b/arch/s390/kernel/traps.c
>> @@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs)
>>   {
>>   	if (user_mode(regs)) {
>>   		report_user_fault(regs, SIGSEGV, 0);
>> -		do_exit(SIGSEGV);
>> +		force_sigsegv(SIGSEGV);
>>   	} else
>>   		die(regs, "Unknown program exception");
>>   }
>>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 13/20] signal: Implement force_fatal_sig
  2021-10-26 16:15           ` Linus Torvalds
@ 2021-10-28 16:33             ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-28 16:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List, linux-arch,
	Oleg Nesterov, Al Viro, Kees Cook

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Oct 25, 2021 at 9:58 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Rereading this I think you might be misreading something.
>
> Gaah. Yes, indeed.
>
>> force_siginfo_to_task takes a sigdfl parameter which I am setting in
>> force_fatal_signal.
>
> .. and I realized that the first time I read through it, but then when
> I read through it due to Andy saying it worries him, I missed it and
> thought the handler didn't get reset.
>
> So the patch is fine.

Thank you.

Andy is right that there is a race with sigaction changing the signal
handler.

To make complete_signal reliably recognize fatal signals is going to
take a bit of work.  None of it particularly hard but there are a lot of
pieces that need to be changed carefully.  Part of the problem is that
recognizing fatal signals early started out as an optimization, and
there remain places in complete_signal and prepare_signal that assume it
is always possible to not recognize fatal signals early and let
get_signal handle things.

The coredump handling of it is the biggest challenge.  Computing a
destination thread with wants_signal in complete_signal before dealing
with fatal signals (which don't care) means that wants_signal can
cause the early recognition of fatal signals to fail.

The entire blocked vs real_blocked set of signal masks are interesting.
The current way those signal masks work does not appear to allow knowing
when blocked is being overridden and blocked is stored in real_blocked.
Something I think needs to be known if fatal signal are always going to
be recognized early.

Given that race it looks like it is important to make the guarantee that
fatal signals are always recognized early.  So the rest of the kernel
can count on that property.

But I aim to get this set of changes into linux-next first.  As I don't
anything will melt down if we leave that race for a bit longer.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler
  2021-10-28 15:56     ` Eric W. Biederman
@ 2021-10-29 19:32       ` Eric W. Biederman
  0 siblings, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-10-29 19:32 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-kernel, linux-arch, Linus Torvalds, Oleg Nesterov, Al Viro,
	Kees Cook, Heiko Carstens, Vasily Gorbik, linux-s390

ebiederm@xmission.com (Eric W. Biederman) writes:

> Christian Borntraeger <borntraeger@de.ibm.com> writes:
>
>> Am 20.10.21 um 19:43 schrieb Eric W. Biederman:
>>> Reading the history it is unclear why default_trap_handler calls
>>> do_exit.  It is not even menthioned in the commit where the change
>>> happened.  My best guess is that because it is unknown why the
>>> exception happened it was desired to guarantee the process never
>>> returned to userspace.
>>>
>>> Using do_exit(SIGSEGV) has the problem that it will only terminate one
>>> thread of a process, leaving the process in an undefined state.
>>>
>>> Use force_sigsegv(SIGSEGV) instead which effectively has the same
>>> behavior except that is uses the ordinary signal mechanism and
>>> terminates all threads of a process and is generally well defined.
>>
>> Do I get that right, that programs can not block SIGSEGV from force_sigsegv
>> with a signal handler? Thats how I read the code. If this is true
>> then
>>
>> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> 99% true, and it is what force_sigsegv(SIGSEGV) intends to do.
>
> Andy Lutormorski pointed at a race where a thread can call sigaction
> and change the signal handler after force_sigsegv has run but before
> the process dequeues the SIGSEGV.

I now have a simple patch that closes the sigaction vs force_sig race,
that I am adding to this set of changes.  So now I can say programs can
not block force_sigsegv(SIGSEGV) with a signal handler or any other
method.

Eric

>>> Cc: Heiko Carstens <hca@linux.ibm.com>
>>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>>> Cc: linux-s390@vger.kernel.org
>>> Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
>>> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>   arch/s390/kernel/traps.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
>>> index bcefc2173de4..51729ea2cf8e 100644
>>> --- a/arch/s390/kernel/traps.c
>>> +++ b/arch/s390/kernel/traps.c
>>> @@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs)
>>>   {
>>>   	if (user_mode(regs)) {
>>>   		report_user_fault(regs, SIGSEGV, 0);
>>> -		do_exit(SIGSEGV);
>>> +		force_sigsegv(SIGSEGV);
>>>   	} else
>>>   		die(regs, "Unknown program exception");
>>>   }
>>>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-10-20 17:43 ` [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON Eric W. Biederman
  2021-10-21 16:15   ` Kees Cook
@ 2021-11-12 15:40   ` Eric W. Biederman
  2021-11-12 17:51     ` Brian Gerst
  1 sibling, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-11-12 15:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Al Viro, Kees Cook,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H Peter Anvin, Andy Lutomirski

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> The function save_v86_state is only called when userspace was
> operating in vm86 mode before entering the kernel.  Not having vm86
> state in the task_struct should never happen.  So transform the hand
> rolled BUG_ON into an actual BUG_ON to make it clear what is
> happening.

Now that this change has been merged into Linus' tree I have a report
that it is possible to trigger this new BUG_ON.  Which obviously is not
good.

We could revert the change but I think that would just be shooting the
messenger.

Does anyone have an idea where to start to track down what is going on?

A very quick skim through the code suggests that the only code path
that calls save_v86_state that has not already accessed is
current->thread.vm86 is handle_signal.

Another quick look suggests that the only place where X86_VM_MASK gets
set in eflags is in do_sys_vm86.  So it appears do_sys_vm86 must
be called before v8086_mode returns true in handle_signal.

Which seems to suggest that the bug on can't trigger.

But that is obviously wrong.

I will keep digging but if anyone has some ideas that would be appreciated.

Eric


> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: H Peter Anvin <hpa@zytor.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/x86/kernel/vm86_32.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
> index e5a7a10a0164..63486da77272 100644
> --- a/arch/x86/kernel/vm86_32.c
> +++ b/arch/x86/kernel/vm86_32.c
> @@ -106,10 +106,8 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
>  	 */
>  	local_irq_enable();
>  
> -	if (!vm86 || !vm86->user_vm86) {
> -		pr_alert("no user_vm86: BAD\n");
> -		do_exit(SIGSEGV);
> -	}
> +	BUG_ON(!vm86 || !vm86->user_vm86);
> +
>  	set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | vm86->veflags_mask);
>  	user = vm86->user_vm86;

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-11-12 15:40   ` Eric W. Biederman
@ 2021-11-12 17:51     ` Brian Gerst
  2021-11-12 19:57       ` Eric W. Biederman
  0 siblings, 1 reply; 110+ messages in thread
From: Brian Gerst @ 2021-11-12 17:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Linus Torvalds, Oleg Nesterov,
	Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H Peter Anvin,
	Andy Lutomirski

On Fri, Nov 12, 2021 at 10:41 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
> > The function save_v86_state is only called when userspace was
> > operating in vm86 mode before entering the kernel.  Not having vm86
> > state in the task_struct should never happen.  So transform the hand
> > rolled BUG_ON into an actual BUG_ON to make it clear what is
> > happening.
>
> Now that this change has been merged into Linus' tree I have a report
> that it is possible to trigger this new BUG_ON.  Which obviously is not
> good.
>
> We could revert the change but I think that would just be shooting the
> messenger.
>
> Does anyone have an idea where to start to track down what is going on?
>
> A very quick skim through the code suggests that the only code path
> that calls save_v86_state that has not already accessed is
> current->thread.vm86 is handle_signal.
>
> Another quick look suggests that the only place where X86_VM_MASK gets
> set in eflags is in do_sys_vm86.  So it appears do_sys_vm86 must
> be called before v8086_mode returns true in handle_signal.
>
> Which seems to suggest that the bug on can't trigger.
>
> But that is obviously wrong.
>
> I will keep digging but if anyone has some ideas that would be appreciated.
>
> Eric

It's possible that a null pointer was passed to the vm86 syscall.
Since vm86 mode usually requires memory to be mapped at address 0 this
wouldn't trigger a fault when reading the vm86_struct data.  It should
be fine to remove !vm86->user_vm86 from the BUG_ON(), since the write
to userspace can handle a fault.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-11-12 17:51     ` Brian Gerst
@ 2021-11-12 19:57       ` Eric W. Biederman
  2021-11-12 20:40         ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-11-12 19:57 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Linux Kernel Mailing List, Linus Torvalds, Oleg Nesterov,
	Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H Peter Anvin,
	Andy Lutomirski

Brian Gerst <brgerst@gmail.com> writes:

> On Fri, Nov 12, 2021 at 10:41 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>
>> > The function save_v86_state is only called when userspace was
>> > operating in vm86 mode before entering the kernel.  Not having vm86
>> > state in the task_struct should never happen.  So transform the hand
>> > rolled BUG_ON into an actual BUG_ON to make it clear what is
>> > happening.
>>
>> Now that this change has been merged into Linus' tree I have a report
>> that it is possible to trigger this new BUG_ON.  Which obviously is not
>> good.
>>
>> We could revert the change but I think that would just be shooting the
>> messenger.
>>
>> Does anyone have an idea where to start to track down what is going on?
>>
>> A very quick skim through the code suggests that the only code path
>> that calls save_v86_state that has not already accessed is
>> current->thread.vm86 is handle_signal.
>>
>> Another quick look suggests that the only place where X86_VM_MASK gets
>> set in eflags is in do_sys_vm86.  So it appears do_sys_vm86 must
>> be called before v8086_mode returns true in handle_signal.
>>
>> Which seems to suggest that the bug on can't trigger.
>>
>> But that is obviously wrong.
>>
>> I will keep digging but if anyone has some ideas that would be appreciated.
>>
>> Eric
>
> It's possible that a null pointer was passed to the vm86 syscall.
> Since vm86 mode usually requires memory to be mapped at address 0 this
> wouldn't trigger a fault when reading the vm86_struct data.  It should
> be fine to remove !vm86->user_vm86 from the BUG_ON(), since the write
> to userspace can handle a fault.

Agreed, and that is plausible.

Still user space would have had to have mapped address 0 to get that
value set in do_sys_vm86.

I will see about cooking up a patch for that case regardless.  I am not
quite convinced, but perhaps the easiest way to tell is to simply remove
the unnecessary test and ask the fuzzer folks to see if they can still
hit the BUG_ON.

Eric


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-11-12 19:57       ` Eric W. Biederman
@ 2021-11-12 20:40         ` Linus Torvalds
  2021-11-12 21:03           ` Eric W. Biederman
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2021-11-12 20:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Brian Gerst, Linux Kernel Mailing List, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H Peter Anvin, Andy Lutomirski

On Fri, Nov 12, 2021 at 11:57 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Still user space would have had to have mapped address 0 to get that
> value set in do_sys_vm86.

You have to map address 0 anyway just to get vm86 mode to work.

vm86 mode fundamentally requires the low 1MB of virtual memory to be
mapped, since there is no virtual memory offset in the vm86 model.

             Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-11-12 20:40         ` Linus Torvalds
@ 2021-11-12 21:03           ` Eric W. Biederman
  2021-11-12 21:23             ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-11-12 21:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brian Gerst, Linux Kernel Mailing List, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H Peter Anvin, Andy Lutomirski

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Nov 12, 2021 at 11:57 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Still user space would have had to have mapped address 0 to get that
>> value set in do_sys_vm86.
>
> You have to map address 0 anyway just to get vm86 mode to work.
>
> vm86 mode fundamentally requires the low 1MB of virtual memory to be
> mapped, since there is no virtual memory offset in the vm86 model.

True.

However that also means if struct vm86plus_struct is at address 0
instead of the 16bit interrupt table something is badly wrong.

Still if we are going to check for userspace being silly that it should
be in do_sys_vm86.

I have managed to get the fuzzer that hit the problem to run and with
the test for !vm86->user_vm86 removed the BUG_ON is not being hit.

I am going to keep running it for a bit just to make certain, and
then I will put together a proper patch.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-11-12 21:03           ` Eric W. Biederman
@ 2021-11-12 21:23             ` Linus Torvalds
  2021-11-12 21:24               ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2021-11-12 21:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Brian Gerst, Linux Kernel Mailing List, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H Peter Anvin, Andy Lutomirski

On Fri, Nov 12, 2021 at 1:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Still if we are going to check for userspace being silly that it should
> be in do_sys_vm86.

Sore, something like

        if (!user_vm86)
                return -EINVAL;

in do_sys_vm86() sounds fine to me.

It could in theory break some odd test-case, but I can't see anybody
putting the vm86 save area at 0 in a real situation.

But I could see some quick test hack doing it - the IVT at boot is
actually not at zero, but at fffxxxxx. 8086 is magic.

              Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-11-12 21:23             ` Linus Torvalds
@ 2021-11-12 21:24               ` Linus Torvalds
  2021-11-12 21:37                 ` [GIT PULL ] signal/vm86_32: Remove pointless test in BUG_ON Eric W. Biederman
  2021-11-12 21:43                 ` [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON Eric W. Biederman
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2021-11-12 21:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Brian Gerst, Linux Kernel Mailing List, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H Peter Anvin, Andy Lutomirski

On Fri, Nov 12, 2021 at 1:23 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But I could see some quick test hack doing it - the IVT at boot is
> actually not at zero, but at fffxxxxx. 8086 is magic.

.. and it's been too long, and I'm too lazy to check - it may be that
vm86 mode doesn't even do that magic boot-time address thing.

It's not like we really care about vm86 mode any more, since pretty
much nobody users it.

                  Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [GIT PULL ] signal/vm86_32: Remove pointless test in BUG_ON
  2021-11-12 21:24               ` Linus Torvalds
@ 2021-11-12 21:37                 ` Eric W. Biederman
  2021-11-13 19:15                   ` pr-tracker-bot
  2021-11-12 21:43                 ` [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON Eric W. Biederman
  1 sibling, 1 reply; 110+ messages in thread
From: Eric W. Biederman @ 2021-11-12 21:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brian Gerst, Linux Kernel Mailing List, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H Peter Anvin, Andy Lutomirski


Linus,

Please pull the exit-cleanups-for-v5.16 branch from the git tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exit-cleanups-for-v5.16

  HEAD: c7a9b6471c8ee6a2180fc5f2f7a1e284754bdfc5 signal/vm86_32: Remove pointless test in BUG_ON


This branch has only one unpulled change.  Just the removal of an
unnecessary test from a BUG_ON.  Which in my running of the fuzzer
locally fixes the issue.

kernel test robot <oliver.sang@intel.com> writes[1]:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 1a4d21a23c4ca7467726be7db9ae8077a62b2c62 ("signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> with following parameters:
>
>
> [ 70.645554][ T3747] kernel BUG at arch/x86/kernel/vm86_32.c:109!
> [ 70.646185][ T3747] invalid opcode: 0000 [#1] SMP
> [ 70.646682][ T3747] CPU: 0 PID: 3747 Comm: trinity-c6 Not tainted 5.15.0-rc1-00009-g1a4d21a23c4c #1
> [ 70.647598][ T3747] EIP: save_v86_state (arch/x86/kernel/vm86_32.c:109 (discriminator 3))
> [ 70.648113][ T3747] Code: 89 c3 64 8b 35 60 b8 25 c2 83 ec 08 89 55 f0 8b 96 10 19 00 00 89 55 ec e8 c6 2d 0c 00 fb 8b 55 ec 85 d2 74 05 83 3a 00 75 02 <0f> 0b 8b 86 10 19 00 00 8b 4b 38 8b 78 48 31 cf 89 f8 8b 7a 4c 81
> [ 70.650136][ T3747] EAX: 00000001 EBX: f5f49fac ECX: 0000000b EDX: f610b600
> [ 70.650852][ T3747] ESI: f5f79cc0 EDI: f5f79cc0 EBP: f5f49f04 ESP: f5f49ef0
> [ 70.651593][ T3747] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
> [ 70.652413][ T3747] CR0: 80050033 CR2: 00004000 CR3: 35fc7000 CR4: 000406d0
> [ 70.653169][ T3747] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 70.653897][ T3747] DR6: fffe0ff0 DR7: 00000400
> [ 70.654382][ T3747] Call Trace:
> [ 70.654719][ T3747] arch_do_signal_or_restart (arch/x86/kernel/signal.c:792 arch/x86/kernel/signal.c:867)
> [ 70.655288][ T3747] exit_to_user_mode_prepare (kernel/entry/common.c:174 kernel/entry/common.c:209)
> [ 70.655854][ T3747] irqentry_exit_to_user_mode (kernel/entry/common.c:126 kernel/entry/common.c:317)
> [ 70.656450][ T3747] irqentry_exit (kernel/entry/common.c:406)
> [ 70.656897][ T3747] exc_page_fault (arch/x86/mm/fault.c:1535)
> [ 70.657369][ T3747] ? sysvec_kvm_asyncpf_interrupt (arch/x86/mm/fault.c:1488)
> [ 70.657989][ T3747] handle_exception (arch/x86/entry/entry_32.S:1085)

vm86_32.c:109 is: "BUG_ON(!vm86 || !vm86->user_vm86)"

When trying to understand the failure Brian Gerst pointed out[2] that
the code does not need protection against vm86->user_vm86 being NULL.
The copy_from_user code will already handles that case if the address
is going to fault.

Looking futher I realized that if we care about not allowing struct
vm86plus_struct at address 0 it should be do_sys_vm86 (the system
call) that does the filtering.  Not way down deep when the emulation
has completed in save_v86_state.

So let's just remove the silly case of attempting to filter a
userspace address with a BUG_ON.  Existing userspace can't break and
it won't make the kernel any more attackable as the userspace access
helpers will handle it, if it isn't a good userspace pointer.

I have run the reproducer the fuzzer gave me before I made this change
and it reproduced, and after I made this change and I have not seen
the reported failure.  So it does looks like this fixes the reported
issue.

[1] https://lkml.kernel.org/r/20211112074030.GB19820@xsang-OptiPlex-9020
[2] https://lkml.kernel.org/r/CAMzpN2jkK5sAv-Kg_kVnCEyVySiqeTdUORcC=AdG1gV6r8nUew@mail.gmail.com
Suggested-by: Brian Gerst <brgerst@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/vm86_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index f14f69d7aa3c..cce1c89cb7df 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -106,7 +106,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	 */
 	local_irq_enable();
 
-	BUG_ON(!vm86 || !vm86->user_vm86);
+	BUG_ON(!vm86);
 
 	set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | vm86->veflags_mask);
 	user = vm86->user_vm86;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  2021-11-12 21:24               ` Linus Torvalds
  2021-11-12 21:37                 ` [GIT PULL ] signal/vm86_32: Remove pointless test in BUG_ON Eric W. Biederman
@ 2021-11-12 21:43                 ` Eric W. Biederman
  1 sibling, 0 replies; 110+ messages in thread
From: Eric W. Biederman @ 2021-11-12 21:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brian Gerst, Linux Kernel Mailing List, Oleg Nesterov, Al Viro,
	Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H Peter Anvin, Andy Lutomirski

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Nov 12, 2021 at 1:23 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> But I could see some quick test hack doing it - the IVT at boot is
>> actually not at zero, but at fffxxxxx. 8086 is magic.
>
> .. and it's been too long, and I'm too lazy to check - it may be that
> vm86 mode doesn't even do that magic boot-time address thing.
>
> It's not like we really care about vm86 mode any more, since pretty
> much nobody users it.

As I recall at boot CS == 0xffff0000 EIP == 0x0000fff0 and the cpu is in
16bit mode.  Which means the cpu runs the instructions in the last
16bytes of memory at boot up.  Which is just enough for a jump somewhere
else.  Such as 64K backwards where there is enough space to actual have
enough code to do something.

I don't think vm86 even attempts to emulate that behavior as it is only
concerned about 16bit only cpus and emulation.

In the nobody cares camp I have just sent you a pull request to remove
the ancient (except it wasn't a BUG_ON) and problematic test in the
BUG_ON.

I think that is enough to resolve this.

Eric

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [GIT PULL ] signal/vm86_32: Remove pointless test in BUG_ON
  2021-11-12 21:37                 ` [GIT PULL ] signal/vm86_32: Remove pointless test in BUG_ON Eric W. Biederman
@ 2021-11-13 19:15                   ` pr-tracker-bot
  0 siblings, 0 replies; 110+ messages in thread
From: pr-tracker-bot @ 2021-11-13 19:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Brian Gerst, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H Peter Anvin,
	Andy Lutomirski

The pull request you sent on Fri, 12 Nov 2021 15:37:11 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exit-cleanups-for-v5.16

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d4fa09e514cdb51fc7a2289c445c44ba0c87117b

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2021-11-13 19:15 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20 17:32 [PATCH 00/20] exit cleanups Eric W. Biederman
2021-10-20 17:32 ` [OpenRISC] " Eric W. Biederman
2021-10-20 17:32 ` Eric W. Biederman
2021-10-20 17:43 ` [PATCH 01/20] exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit Eric W. Biederman
2021-10-21 16:02   ` Kees Cook
2021-10-20 17:43 ` [PATCH 02/20] exit: Remove calls of do_exit after noreturn versions of die Eric W. Biederman
2021-10-20 17:43   ` [OpenRISC] " Eric W. Biederman
2021-10-21 16:02   ` Kees Cook
2021-10-21 16:02     ` [OpenRISC] " Kees Cook
2021-10-21 16:25     ` Eric W. Biederman
2021-10-21 16:25       ` [OpenRISC] " Eric W. Biederman
2021-10-20 17:43 ` [PATCH 03/20] reboot: Remove the unreachable panic after do_exit in reboot(2) Eric W. Biederman
2021-10-21 16:05   ` Kees Cook
2021-10-20 17:43 ` [PATCH 04/20] signal/sparc32: Remove unreachable do_exit in do_sparc_fault Eric W. Biederman
2021-10-21 16:05   ` Kees Cook
2021-10-20 17:43 ` [PATCH 05/20] signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT Eric W. Biederman
2021-10-21 16:06   ` Kees Cook
2021-10-24  4:24   ` Maciej W. Rozycki
2021-10-25 20:55     ` Eric W. Biederman
2021-10-24 15:27   ` Thomas Bogendoerfer
2021-10-20 17:43 ` [PATCH 06/20] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) Eric W. Biederman
2021-10-20 19:57   ` Linus Torvalds
2021-10-27 14:24     ` Rich Felker
2021-10-21 16:08   ` Kees Cook
2021-10-20 17:43 ` [PATCH 07/20] signal/powerpc: On swapcontext failure force SIGSEGV Eric W. Biederman
2021-10-20 17:43   ` Eric W. Biederman
2021-10-21 16:09   ` Kees Cook
2021-10-21 16:09     ` Kees Cook
2021-10-20 17:43 ` [PATCH 08/20] signal/sparc: In setup_tsb_params convert open coded BUG into BUG Eric W. Biederman
2021-10-21 16:12   ` Kees Cook
2021-10-20 17:43 ` [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON Eric W. Biederman
2021-10-21 16:15   ` Kees Cook
2021-11-12 15:40   ` Eric W. Biederman
2021-11-12 17:51     ` Brian Gerst
2021-11-12 19:57       ` Eric W. Biederman
2021-11-12 20:40         ` Linus Torvalds
2021-11-12 21:03           ` Eric W. Biederman
2021-11-12 21:23             ` Linus Torvalds
2021-11-12 21:24               ` Linus Torvalds
2021-11-12 21:37                 ` [GIT PULL ] signal/vm86_32: Remove pointless test in BUG_ON Eric W. Biederman
2021-11-13 19:15                   ` pr-tracker-bot
2021-11-12 21:43                 ` [PATCH 09/20] signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON Eric W. Biederman
2021-10-20 17:43 ` [PATCH 10/20] signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved Eric W. Biederman
2021-10-21 16:16   ` Kees Cook
2021-10-21 17:02     ` Eric W. Biederman
2021-10-21 20:33       ` Kees Cook
2021-10-21 23:08   ` Andy Lutomirski
2021-10-24 16:06     ` Eric W. Biederman
     [not found]   ` <875ytkygfj.fsf_-_@disp2133>
2021-10-25 21:12     ` [PATCH v2 10/32] " Linus Torvalds
2021-10-25 21:28       ` Eric W. Biederman
2021-10-25 22:25     ` Andy Lutomirski
2021-10-25 23:45       ` Linus Torvalds
2021-10-26  0:21         ` Andy Lutomirski
2021-10-20 17:43 ` [PATCH 11/20] signal/s390: Use force_sigsegv in default_trap_handler Eric W. Biederman
2021-10-21 16:17   ` Kees Cook
2021-10-26  9:38   ` Christian Borntraeger
2021-10-28 15:56     ` Eric W. Biederman
2021-10-29 19:32       ` Eric W. Biederman
2021-10-20 17:43 ` [PATCH 12/20] exit/kthread: Have kernel threads return instead of calling do_exit Eric W. Biederman
2021-10-21 11:12   ` Christoph Hellwig
2021-10-21 15:11     ` Eric W. Biederman
2021-10-21 16:21   ` Kees Cook
2021-10-20 17:43 ` [PATCH 13/20] signal: Implement force_fatal_sig Eric W. Biederman
2021-10-20 20:05   ` Linus Torvalds
2021-10-20 21:25     ` Eric W. Biederman
2021-10-25 22:41     ` Andy Lutomirski
2021-10-25 23:15       ` Linus Torvalds
2021-10-26  4:45         ` Eric W. Biederman
2021-10-26  4:57         ` Eric W. Biederman
2021-10-26 16:15           ` Linus Torvalds
2021-10-28 16:33             ` Eric W. Biederman
2021-10-21 16:24   ` Kees Cook
2021-10-21 16:33     ` Eric W. Biederman
2021-10-21 16:39       ` Kees Cook
2021-10-20 17:44 ` [PATCH 14/20] exit/syscall_user_dispatch: Send ordinary signals on failure Eric W. Biederman
2021-10-21 16:25   ` Kees Cook
2021-10-21 16:37     ` Eric W. Biederman
2021-10-21 16:40       ` Kees Cook
2021-10-21 17:05         ` Eric W. Biederman
2021-10-25 22:32     ` Andy Lutomirski
2021-10-21 16:35   ` Gabriel Krisman Bertazi
2021-10-20 17:44 ` [PATCH 15/20] signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails Eric W. Biederman
2021-10-21 16:34   ` Kees Cook
2021-10-21 16:56     ` Eric W. Biederman
2021-10-20 17:44 ` [PATCH 16/20] signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig Eric W. Biederman
2021-10-21 16:34   ` Kees Cook
2021-10-20 17:44 ` [PATCH 17/20] signal/x86: In emulate_vsyscall force a signal instead of calling do_exit Eric W. Biederman
2021-10-21 16:36   ` Kees Cook
2021-10-20 17:44 ` [PATCH 18/20] exit/rtl8723bs: Replace the macro thread_exit with a simple return 0 Eric W. Biederman
2021-10-21  7:06   ` Greg KH
2021-10-21 15:06     ` Eric W. Biederman
2021-10-21 16:37   ` Kees Cook
2021-10-20 17:44 ` [PATCH 19/20] exit/rtl8712: " Eric W. Biederman
2021-10-21  7:07   ` Greg KH
2021-10-21 16:37   ` Kees Cook
2021-10-20 17:44 ` [PATCH 20/20] exit/r8188eu: " Eric W. Biederman
2021-10-21  7:07   ` Greg KH
2021-10-21 16:37   ` Kees Cook
2021-10-20 21:51 ` [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV) Eric W. Biederman
2021-10-20 21:51   ` [OpenRISC] " Eric W. Biederman
2021-10-20 21:51   ` Eric W. Biederman
2021-10-21  8:09   ` Geert Uytterhoeven
2021-10-21  8:09     ` [OpenRISC] " Geert Uytterhoeven
2021-10-21  8:09     ` Geert Uytterhoeven
2021-10-21 13:33     ` Eric W. Biederman
2021-10-21 13:33       ` [OpenRISC] " Eric W. Biederman
2021-10-21 13:33       ` Eric W. Biederman
2021-10-21  8:32   ` Philippe Mathieu-Daudé
2021-10-21  8:32     ` [OpenRISC] " Philippe =?unknown-8bit?q?Mathieu-Daud=C3=A9?=
2021-10-21  8:32     ` Philippe Mathieu-Daudé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.