All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] do_exit(): panic() when double fault detected
@ 2020-12-06 13:10 Vladimir Kondratiev
  2020-12-06 15:37 ` [NEEDS-REVIEW] " Dave Hansen
  2020-12-07 10:40 ` Andy Shevchenko
  0 siblings, 2 replies; 4+ messages in thread
From: Vladimir Kondratiev @ 2020-12-06 13:10 UTC (permalink / raw)
  To: Jonathan Corbet, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Paul E. McKenney, Andrew Morton, Randy Dunlap, Thomas Gleixner,
	Mauro Carvalho Chehab, Mike Kravetz, Guilherme G. Piccoli,
	Andy Shevchenko, Kars Mulder, Lorenzo Pieralisi,
	Kishon Vijay Abraham I, Arvind Sankar, Joe Perches,
	Rafael Aquini, Eric W. Biederman, Christian Brauner,
	Alexei Starovoitov, Peter Zijlstra (Intel),
	Davidlohr Bueso, Michel Lespinasse, Jann Horn, chenqiwu,
	Minchan Kim, Christophe Leroy
  Cc: Vladimir Kondratiev, linux-doc, linux-kernel, linux-fsdevel

Double fault detected in do_exit() is symptom of integrity
compromised. For safety critical systems, it may be better to
panic() in this case to minimize risk.

Signed-off-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 5 +++++
 include/linux/kernel.h                          | 1 +
 kernel/exit.c                                   | 7 +++++++
 kernel/sysctl.c                                 | 9 +++++++++
 4 files changed, 22 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 44fde25bb221..6cb2a63c47f3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3521,6 +3521,11 @@
 			extra details on the taint flags that users can pick
 			to compose the bitmask to assign to panic_on_taint.
 
+	panic_on_double_fault
+			panic() when double fault detected in do_exit().
+			Useful on safety critical systems; double fault is
+			a symptom of kernel integrity compromised.
+
 	panic_on_warn	panic() instead of WARN().  Useful to cause kdump
 			on a WARN().
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2f05e9128201..0d8822259a36 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -539,6 +539,7 @@ extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_panic_on_stackoverflow;
 
 extern bool crash_kexec_post_notifiers;
+extern int panic_on_double_fault;
 
 /*
  * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
diff --git a/kernel/exit.c b/kernel/exit.c
index 1f236ed375f8..e67ae43644f9 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -68,6 +68,9 @@
 #include <asm/unistd.h>
 #include <asm/mmu_context.h>
 
+int panic_on_double_fault __read_mostly;
+core_param(panic_on_double_fault, panic_on_double_fault, int, 0644);
+
 static void __unhash_process(struct task_struct *p, bool group_dead)
 {
 	nr_threads--;
@@ -757,6 +760,10 @@ void __noreturn do_exit(long code)
 	 */
 	if (unlikely(tsk->flags & PF_EXITING)) {
 		pr_alert("Fixing recursive fault but reboot is needed!\n");
+		if (panic_on_double_fault)
+			panic("Double fault detected in %s[%d]\n",
+			      current->comm, task_pid_nr(current));
+
 		futex_exit_recursive(tsk);
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		schedule();
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index afad085960b8..869a2ca41e8e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2600,6 +2600,15 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &one_thousand,
 	},
 #endif
+	{
+		.procname	= "panic_on_double_fault",
+		.data		= &panic_on_double_fault,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 	{
 		.procname	= "panic_on_warn",
 		.data		= &panic_on_warn,
-- 
2.27.0

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [NEEDS-REVIEW] [PATCH] do_exit(): panic() when double fault detected
  2020-12-06 13:10 [PATCH] do_exit(): panic() when double fault detected Vladimir Kondratiev
@ 2020-12-06 15:37 ` Dave Hansen
  2020-12-06 22:05   ` Jann Horn
  2020-12-07 10:40 ` Andy Shevchenko
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2020-12-06 15:37 UTC (permalink / raw)
  To: Vladimir Kondratiev, Jonathan Corbet, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Paul E. McKenney, Andrew Morton,
	Randy Dunlap, Thomas Gleixner, Mauro Carvalho Chehab,
	Mike Kravetz, Guilherme G. Piccoli, Andy Shevchenko, Kars Mulder,
	Lorenzo Pieralisi, Kishon Vijay Abraham I, Arvind Sankar,
	Joe Perches, Rafael Aquini, Eric W. Biederman, Christian Brauner,
	Alexei Starovoitov, Peter Zijlstra (Intel),
	Davidlohr Bueso, Michel Lespinasse, Jann Horn, chenqiwu,
	Minchan Kim, Christophe Leroy
  Cc: linux-doc, linux-kernel, linux-fsdevel

On 12/6/20 5:10 AM, Vladimir Kondratiev wrote:
> Double fault detected in do_exit() is symptom of integrity
> compromised. For safety critical systems, it may be better to
> panic() in this case to minimize risk.

Does this fix a real problem that you have observed in practice?

Or, is this a general "hardening" which you think is a good practice?

What does this have to do specifically with safety critical systems?

The kernel generally tries to fix things up and keep running whenever
possible, if for no other reason than it helps debug problems.  If that
is an undesirable property for your systems, then I think you have a
much bigger problem than faults during exit().

This option, "panic_on_double_fault", doesn't actually panic on all
double-faults, which means to me that it's dangerously named.  There's
even an unprivileged selftest (tools/testing/selftests/x86/sigreturn.c)
which can cause double faults all day long.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [NEEDS-REVIEW] [PATCH] do_exit(): panic() when double fault detected
  2020-12-06 15:37 ` [NEEDS-REVIEW] " Dave Hansen
@ 2020-12-06 22:05   ` Jann Horn
  0 siblings, 0 replies; 4+ messages in thread
From: Jann Horn @ 2020-12-06 22:05 UTC (permalink / raw)
  To: Dave Hansen, Vladimir Kondratiev
  Cc: Jonathan Corbet, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Paul E. McKenney, Andrew Morton, Randy Dunlap, Thomas Gleixner,
	Mauro Carvalho Chehab, Mike Kravetz, Guilherme G. Piccoli,
	Andy Shevchenko, Kars Mulder, Lorenzo Pieralisi,
	Kishon Vijay Abraham I, Arvind Sankar, Joe Perches,
	Rafael Aquini, Eric W. Biederman, Christian Brauner,
	Alexei Starovoitov, Peter Zijlstra (Intel),
	Davidlohr Bueso, Michel Lespinasse, chenqiwu, Minchan Kim,
	Christophe Leroy, open list:DOCUMENTATION, kernel list,
	linux-fsdevel

On Sun, Dec 6, 2020 at 4:37 PM Dave Hansen <dave.hansen@intel.com> wrote:
> On 12/6/20 5:10 AM, Vladimir Kondratiev wrote:
> > Double fault detected in do_exit() is symptom of integrity
> > compromised. For safety critical systems, it may be better to
> > panic() in this case to minimize risk.
>
> Does this fix a real problem that you have observed in practice?
>
> Or, is this a general "hardening" which you think is a good practice?
>
> What does this have to do specifically with safety critical systems?
>
> The kernel generally tries to fix things up and keep running whenever
> possible, if for no other reason than it helps debug problems.  If that
> is an undesirable property for your systems, then I think you have a
> much bigger problem than faults during exit().
>
> This option, "panic_on_double_fault", doesn't actually panic on all
> double-faults, which means to me that it's dangerously named.

I wonder whether part of the idea here is that normally, when the
kernel fixes up a kernel crash by killing the offending task, a
service management process in userspace (e.g. the init daemon) can
potentially detect this case because it looks as if the task died with
SIGBUS or something. (I don't think that actually always works in
practice though, since AFAICS kernel crashes only lead to the *task*
being killed, not the entire process, and I think killing a single
worker thread of a multithreaded process might just cause the rest of
the userspace process to lock up. Not sure whether that's intentional
or something that should ideally be changed.)

But if the kernel gives up on going through with do_exit() (because it
crashed in do_exit() before being able to mark the task as waitable),
the process may, to userspace, appear to still be alive even though
it's not actually doing anything anymore; and if the kernel doesn't
tell userspace that the process is no longer functional, userspace
can't restore the system to a working state.

But as Dave said, this best-effort fixup is probably not the kind of
behavior you'd want in a "safety critical" system anyway; for example,
often the offending thread will have held some critical spinlock or
mutex or something, and then the rest of the system piles on into a
gigantic deadlock involving the lock in question and possibly multiple
locks that nest around it. You might be better off enabling
panic_on_oops, ideally with something like pstore-based logging of the
crash, and then quickly bringing everything back to a clean state
instead of continuing from an unstable state and then possibly
blocking somewhere.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] do_exit(): panic() when double fault detected
  2020-12-06 13:10 [PATCH] do_exit(): panic() when double fault detected Vladimir Kondratiev
  2020-12-06 15:37 ` [NEEDS-REVIEW] " Dave Hansen
@ 2020-12-07 10:40 ` Andy Shevchenko
  1 sibling, 0 replies; 4+ messages in thread
From: Andy Shevchenko @ 2020-12-07 10:40 UTC (permalink / raw)
  To: Vladimir Kondratiev
  Cc: Jonathan Corbet, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Paul E. McKenney, Andrew Morton, Randy Dunlap, Thomas Gleixner,
	Mauro Carvalho Chehab, Mike Kravetz, Guilherme G. Piccoli,
	Andy Shevchenko, Kars Mulder, Lorenzo Pieralisi,
	Kishon Vijay Abraham I, Arvind Sankar, Joe Perches,
	Rafael Aquini, Eric W. Biederman, Christian Brauner,
	Alexei Starovoitov, Peter Zijlstra (Intel),
	Davidlohr Bueso, Michel Lespinasse, Jann Horn, chenqiwu,
	Minchan Kim, Christophe Leroy, Linux Documentation List,
	Linux Kernel Mailing List, Linux FS Devel

On Sun, Dec 6, 2020 at 3:16 PM Vladimir Kondratiev
<vladimir.kondratiev@intel.com> wrote:

> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

You have a problematic footer. No one will apply or touch this material anyway.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-07 10:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-06 13:10 [PATCH] do_exit(): panic() when double fault detected Vladimir Kondratiev
2020-12-06 15:37 ` [NEEDS-REVIEW] " Dave Hansen
2020-12-06 22:05   ` Jann Horn
2020-12-07 10:40 ` Andy Shevchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.