All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
@ 2020-01-29 18:08 Grzegorz Halat
  2020-01-29 18:39 ` Qian Cai
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Grzegorz Halat @ 2020-01-29 18:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-fsdevel, linux-doc, ghalat, ssaner, atomlin,
	oleksandr, vbendel, kirill, khlebnikov, borntraeger,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet, Tetsuo Handa, Qian Cai

Memory management subsystem performs various checks at runtime,
if an inconsistency is detected then such event is being logged and kernel
continues to run. While debugging such problems it is helpful to collect
memory dump as early as possible. Currently, there is no easy way to panic
kernel when such error is detected.

It was proposed[1] to panic the kernel if panic_on_oops is set but this
approach was not accepted. One of alternative proposals was introduction of
a new sysctl.

Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
kernel will be crashed when an inconsistency is detected by memory
management. This currently means panic when bad page or bad PTE
is detected(this may be extended to other places in MM).

Another use case of this sysctl may be in security-wise environments,
it may be more desired to crash machine than continue to run with
potentially damaged data structures.

Changes since v1 [2]:
- rename the sysctl to panic_on_inconsistent_mm
- move the sysctl from kernel to vm table
- print modules in print_bad_pte() only before calling panic

[1] https://lore.kernel.org/linux-mm/1426495021-6408-1-git-send-email-borntraeger@de.ibm.com/
[2] https://lore.kernel.org/lkml/20200127101100.92588-1-ghalat@redhat.com/

Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
---
 Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++
 include/linux/kernel.h                  |  1 +
 kernel/sysctl.c                         |  9 +++++++++
 mm/memory.c                             |  8 ++++++++
 mm/page_alloc.c                         |  4 +++-
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009ca..57f7926a64b8 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm:
 - overcommit_memory
 - overcommit_ratio
 - page-cluster
+- panic_on_inconsistent_mm
 - panic_on_oom
 - percpu_pagelist_fraction
 - stat_interval
@@ -741,6 +742,19 @@ extra faults and I/O delays for following faults if they would have been part of
 that consecutive pages readahead would have brought in.
 
 
+panic_on_inconsistent_mm
+========================
+
+Controls the kernel's behaviour when inconsistency is detected
+by memory management code, for example bad page state or bad PTE.
+
+0: try to continue operation.
+
+1: panic immediately.
+
+The default value is 0.
+
+
 panic_on_oom
 ============
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0d9db2a14f44..b3bd94c558ab 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
 extern int panic_timeout;
 extern unsigned long panic_print;
 extern int panic_on_oops;
+extern int panic_on_inconsistent_mm;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int panic_on_warn;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 70665934d53e..a9733311e3a1 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1303,6 +1303,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= &two,
 	},
+	{
+		.procname	= "panic_on_inconsistent_mm",
+		.data		= &panic_on_inconsistent_mm,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 	{
 		.procname	= "panic_on_oom",
 		.data		= &sysctl_panic_on_oom,
diff --git a/mm/memory.c b/mm/memory.c
index 45442d9a4f52..b29a18077a6a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -71,6 +71,7 @@
 #include <linux/dax.h>
 #include <linux/oom.h>
 #include <linux/numa.h>
+#include <linux/module.h>
 
 #include <trace/events/kmem.h>
 
@@ -88,6 +89,8 @@
 #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
 #endif
 
+int panic_on_inconsistent_mm __read_mostly;
+
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 /* use the per-pgdat data instead for discontigmem - mbligh */
 unsigned long max_mapnr;
@@ -543,6 +546,11 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
 		 vma->vm_ops ? vma->vm_ops->fault : NULL,
 		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
 		 mapping ? mapping->a_ops->readpage : NULL);
+
+	if (panic_on_inconsistent_mm) {
+		print_modules();
+		panic("Bad page map detected");
+	}
 	dump_stack();
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d047bf7d8fd4..a20cd3ece5ba 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
 	if (bad_flags)
 		pr_alert("bad because of flags: %#lx(%pGp)\n",
 						bad_flags, &bad_flags);
-	dump_page_owner(page);
 
+	dump_page_owner(page);
 	print_modules();
+	if (panic_on_inconsistent_mm)
+		panic("Bad page state detected");
 	dump_stack();
 out:
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-01-29 18:08 [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl Grzegorz Halat
@ 2020-01-29 18:39 ` Qian Cai
  2020-02-03 14:08   ` Christian Borntraeger
  2020-01-29 19:06 ` Qian Cai
  2020-01-30 14:44 ` Vlastimil Babka
  2 siblings, 1 reply; 10+ messages in thread
From: Qian Cai @ 2020-01-29 18:39 UTC (permalink / raw)
  To: Grzegorz Halat
  Cc: Linux Kernel Mailing List, Linux-MM, linux-fsdevel, linux-doc,
	ssaner, atomlin, oleksandr, vbendel, kirill, khlebnikov,
	borntraeger, Andrew Morton, Iurii Zaikin, Kees Cook,
	Luis Chamberlain, Jonathan Corbet, Tetsuo Handa



> On Jan 29, 2020, at 1:08 PM, Grzegorz Halat <ghalat@redhat.com> wrote:
> 
> Memory management subsystem performs various checks at runtime,
> if an inconsistency is detected then such event is being logged and kernel
> continues to run. While debugging such problems it is helpful to collect
> memory dump as early as possible. Currently, there is no easy way to panic
> kernel when such error is detected.
> 
> It was proposed[1] to panic the kernel if panic_on_oops is set but this
> approach was not accepted. One of alternative proposals was introduction of
> a new sysctl.
> 
> Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
> kernel will be crashed when an inconsistency is detected by memory
> management. This currently means panic when bad page or bad PTE
> is detected(this may be extended to other places in MM).
> 
> Another use case of this sysctl may be in security-wise environments,
> it may be more desired to crash machine than continue to run with
> potentially damaged data structures.

It is annoying that I have to repeat my feedback, but I don’t know why
admins want to enable this by allowing normal users to crash the systems
more easily through recoverable MM bugs where I am sure we have plenty.
How does that improve the security?

If this is mainly for debugging purpose, then debugfs would suite better?

> 
> Changes since v1 [2]:
> - rename the sysctl to panic_on_inconsistent_mm
> - move the sysctl from kernel to vm table
> - print modules in print_bad_pte() only before calling panic
> 
> [1] https://lore.kernel.org/linux-mm/1426495021-6408-1-git-send-email-borntraeger@de.ibm.com/
> [2] https://lore.kernel.org/lkml/20200127101100.92588-1-ghalat@redhat.com/
> 
> Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
> ---
> Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++
> include/linux/kernel.h                  |  1 +
> kernel/sysctl.c                         |  9 +++++++++
> mm/memory.c                             |  8 ++++++++
> mm/page_alloc.c                         |  4 +++-
> 5 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 64aeee1009ca..57f7926a64b8 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm:
> - overcommit_memory
> - overcommit_ratio
> - page-cluster
> +- panic_on_inconsistent_mm
> - panic_on_oom
> - percpu_pagelist_fraction
> - stat_interval
> @@ -741,6 +742,19 @@ extra faults and I/O delays for following faults if they would have been part of
> that consecutive pages readahead would have brought in.
> 
> 
> +panic_on_inconsistent_mm
> +========================
> +
> +Controls the kernel's behaviour when inconsistency is detected
> +by memory management code, for example bad page state or bad PTE.
> +
> +0: try to continue operation.
> +
> +1: panic immediately.
> +
> +The default value is 0.
> +
> +
> panic_on_oom
> ============
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 0d9db2a14f44..b3bd94c558ab 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
> extern int panic_timeout;
> extern unsigned long panic_print;
> extern int panic_on_oops;
> +extern int panic_on_inconsistent_mm;
> extern int panic_on_unrecovered_nmi;
> extern int panic_on_io_nmi;
> extern int panic_on_warn;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e..a9733311e3a1 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1303,6 +1303,15 @@ static struct ctl_table vm_table[] = {
> 		.extra1		= SYSCTL_ZERO,
> 		.extra2		= &two,
> 	},
> +	{
> +		.procname	= "panic_on_inconsistent_mm",
> +		.data		= &panic_on_inconsistent_mm,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
> +	},
> 	{
> 		.procname	= "panic_on_oom",
> 		.data		= &sysctl_panic_on_oom,
> diff --git a/mm/memory.c b/mm/memory.c
> index 45442d9a4f52..b29a18077a6a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -71,6 +71,7 @@
> #include <linux/dax.h>
> #include <linux/oom.h>
> #include <linux/numa.h>
> +#include <linux/module.h>
> 
> #include <trace/events/kmem.h>
> 
> @@ -88,6 +89,8 @@
> #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
> #endif
> 
> +int panic_on_inconsistent_mm __read_mostly;
> +
> #ifndef CONFIG_NEED_MULTIPLE_NODES
> /* use the per-pgdat data instead for discontigmem - mbligh */
> unsigned long max_mapnr;
> @@ -543,6 +546,11 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> 		 vma->vm_ops ? vma->vm_ops->fault : NULL,
> 		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
> 		 mapping ? mapping->a_ops->readpage : NULL);
> +
> +	if (panic_on_inconsistent_mm) {
> +		print_modules();
> +		panic("Bad page map detected");
> +	}
> 	dump_stack();
> 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d047bf7d8fd4..a20cd3ece5ba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
> 	if (bad_flags)
> 		pr_alert("bad because of flags: %#lx(%pGp)\n",
> 						bad_flags, &bad_flags);
> -	dump_page_owner(page);
> 
> +	dump_page_owner(page);
> 	print_modules();
> +	if (panic_on_inconsistent_mm)
> +		panic("Bad page state detected");
> 	dump_stack();
> out:
> 	/* Leave bad fields for debug, except PageBuddy could make trouble */
> -- 
> 2.21.1
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-01-29 18:08 [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl Grzegorz Halat
  2020-01-29 18:39 ` Qian Cai
@ 2020-01-29 19:06 ` Qian Cai
  2020-01-30 14:44 ` Vlastimil Babka
  2 siblings, 0 replies; 10+ messages in thread
From: Qian Cai @ 2020-01-29 19:06 UTC (permalink / raw)
  To: Grzegorz Halat
  Cc: Linux Kernel Mailing List, Linux-MM, linux-fsdevel, linux-doc,
	ssaner, atomlin, oleksandr, vbendel, kirill, khlebnikov,
	borntraeger, Andrew Morton, Iurii Zaikin, Kees Cook,
	Luis Chamberlain, Jonathan Corbet, Tetsuo Handa



> On Jan 29, 2020, at 1:08 PM, Grzegorz Halat <ghalat@redhat.com> wrote:
> 
> Memory management subsystem performs various checks at runtime,
> if an inconsistency is detected then such event is being logged and kernel
> continues to run. While debugging such problems it is helpful to collect
> memory dump as early as possible. Currently, there is no easy way to panic
> kernel when such error is detected.

Also, why can’t you have a simple script that checking for the tainted flags
periodically, and then trigger the crash dump once it happened?

> 
> It was proposed[1] to panic the kernel if panic_on_oops is set but this
> approach was not accepted. One of alternative proposals was introduction of
> a new sysctl.
> 
> Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
> kernel will be crashed when an inconsistency is detected by memory
> management. This currently means panic when bad page or bad PTE
> is detected(this may be extended to other places in MM).
> 
> Another use case of this sysctl may be in security-wise environments,
> it may be more desired to crash machine than continue to run with
> potentially damaged data structures.
> 
> Changes since v1 [2]:
> - rename the sysctl to panic_on_inconsistent_mm
> - move the sysctl from kernel to vm table
> - print modules in print_bad_pte() only before calling panic
> 
> [1] https://lore.kernel.org/linux-mm/1426495021-6408-1-git-send-email-borntraeger@de.ibm.com/
> [2] https://lore.kernel.org/lkml/20200127101100.92588-1-ghalat@redhat.com/
> 
> Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
> ---
> Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++
> include/linux/kernel.h                  |  1 +
> kernel/sysctl.c                         |  9 +++++++++
> mm/memory.c                             |  8 ++++++++
> mm/page_alloc.c                         |  4 +++-
> 5 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 64aeee1009ca..57f7926a64b8 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm:
> - overcommit_memory
> - overcommit_ratio
> - page-cluster
> +- panic_on_inconsistent_mm
> - panic_on_oom
> - percpu_pagelist_fraction
> - stat_interval
> @@ -741,6 +742,19 @@ extra faults and I/O delays for following faults if they would have been part of
> that consecutive pages readahead would have brought in.
> 
> 
> +panic_on_inconsistent_mm
> +========================
> +
> +Controls the kernel's behaviour when inconsistency is detected
> +by memory management code, for example bad page state or bad PTE.
> +
> +0: try to continue operation.
> +
> +1: panic immediately.
> +
> +The default value is 0.
> +
> +
> panic_on_oom
> ============
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 0d9db2a14f44..b3bd94c558ab 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
> extern int panic_timeout;
> extern unsigned long panic_print;
> extern int panic_on_oops;
> +extern int panic_on_inconsistent_mm;
> extern int panic_on_unrecovered_nmi;
> extern int panic_on_io_nmi;
> extern int panic_on_warn;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e..a9733311e3a1 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1303,6 +1303,15 @@ static struct ctl_table vm_table[] = {
> 		.extra1		= SYSCTL_ZERO,
> 		.extra2		= &two,
> 	},
> +	{
> +		.procname	= "panic_on_inconsistent_mm",
> +		.data		= &panic_on_inconsistent_mm,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
> +	},
> 	{
> 		.procname	= "panic_on_oom",
> 		.data		= &sysctl_panic_on_oom,
> diff --git a/mm/memory.c b/mm/memory.c
> index 45442d9a4f52..b29a18077a6a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -71,6 +71,7 @@
> #include <linux/dax.h>
> #include <linux/oom.h>
> #include <linux/numa.h>
> +#include <linux/module.h>
> 
> #include <trace/events/kmem.h>
> 
> @@ -88,6 +89,8 @@
> #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
> #endif
> 
> +int panic_on_inconsistent_mm __read_mostly;
> +
> #ifndef CONFIG_NEED_MULTIPLE_NODES
> /* use the per-pgdat data instead for discontigmem - mbligh */
> unsigned long max_mapnr;
> @@ -543,6 +546,11 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> 		 vma->vm_ops ? vma->vm_ops->fault : NULL,
> 		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
> 		 mapping ? mapping->a_ops->readpage : NULL);
> +
> +	if (panic_on_inconsistent_mm) {
> +		print_modules();
> +		panic("Bad page map detected");
> +	}
> 	dump_stack();
> 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d047bf7d8fd4..a20cd3ece5ba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
> 	if (bad_flags)
> 		pr_alert("bad because of flags: %#lx(%pGp)\n",
> 						bad_flags, &bad_flags);
> -	dump_page_owner(page);
> 
> +	dump_page_owner(page);
> 	print_modules();
> +	if (panic_on_inconsistent_mm)
> +		panic("Bad page state detected");
> 	dump_stack();
> out:
> 	/* Leave bad fields for debug, except PageBuddy could make trouble */
> -- 
> 2.21.1
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-01-29 18:08 [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl Grzegorz Halat
  2020-01-29 18:39 ` Qian Cai
  2020-01-29 19:06 ` Qian Cai
@ 2020-01-30 14:44 ` Vlastimil Babka
  2020-01-30 19:28   ` Kees Cook
  2 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2020-01-30 14:44 UTC (permalink / raw)
  To: Grzegorz Halat, linux-kernel
  Cc: linux-mm, linux-fsdevel, linux-doc, ssaner, atomlin, oleksandr,
	vbendel, kirill, khlebnikov, borntraeger, Andrew Morton,
	Iurii Zaikin, Kees Cook, Luis Chamberlain, Jonathan Corbet,
	Tetsuo Handa, Qian Cai

On 1/29/20 7:08 PM, Grzegorz Halat wrote:
> Memory management subsystem performs various checks at runtime,
> if an inconsistency is detected then such event is being logged and kernel
> continues to run. While debugging such problems it is helpful to collect
> memory dump as early as possible. Currently, there is no easy way to panic
> kernel when such error is detected.
> 
> It was proposed[1] to panic the kernel if panic_on_oops is set but this
> approach was not accepted. One of alternative proposals was introduction of
> a new sysctl.
> 
> Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
> kernel will be crashed when an inconsistency is detected by memory
> management. This currently means panic when bad page or bad PTE
> is detected(this may be extended to other places in MM).

I wonder, should enabling the sysctl also effectively convert VM_WARN...
to VM_BUG... ?

> Another use case of this sysctl may be in security-wise environments,
> it may be more desired to crash machine than continue to run with
> potentially damaged data structures.
> 
> Changes since v1 [2]:
> - rename the sysctl to panic_on_inconsistent_mm
> - move the sysctl from kernel to vm table
> - print modules in print_bad_pte() only before calling panic
> 
> [1] https://lore.kernel.org/linux-mm/1426495021-6408-1-git-send-email-borntraeger@de.ibm.com/
> [2] https://lore.kernel.org/lkml/20200127101100.92588-1-ghalat@redhat.com/
> 
> Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
> ---
>  Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++
>  include/linux/kernel.h                  |  1 +
>  kernel/sysctl.c                         |  9 +++++++++
>  mm/memory.c                             |  8 ++++++++
>  mm/page_alloc.c                         |  4 +++-
>  5 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 64aeee1009ca..57f7926a64b8 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm:
>  - overcommit_memory
>  - overcommit_ratio
>  - page-cluster
> +- panic_on_inconsistent_mm
>  - panic_on_oom
>  - percpu_pagelist_fraction
>  - stat_interval
> @@ -741,6 +742,19 @@ extra faults and I/O delays for following faults if they would have been part of
>  that consecutive pages readahead would have brought in.
>  
>  
> +panic_on_inconsistent_mm
> +========================
> +
> +Controls the kernel's behaviour when inconsistency is detected
> +by memory management code, for example bad page state or bad PTE.
> +
> +0: try to continue operation.
> +
> +1: panic immediately.
> +
> +The default value is 0.
> +
> +
>  panic_on_oom
>  ============
>  
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 0d9db2a14f44..b3bd94c558ab 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
>  extern int panic_timeout;
>  extern unsigned long panic_print;
>  extern int panic_on_oops;
> +extern int panic_on_inconsistent_mm;
>  extern int panic_on_unrecovered_nmi;
>  extern int panic_on_io_nmi;
>  extern int panic_on_warn;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e..a9733311e3a1 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1303,6 +1303,15 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= &two,
>  	},
> +	{
> +		.procname	= "panic_on_inconsistent_mm",
> +		.data		= &panic_on_inconsistent_mm,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
> +	},
>  	{
>  		.procname	= "panic_on_oom",
>  		.data		= &sysctl_panic_on_oom,
> diff --git a/mm/memory.c b/mm/memory.c
> index 45442d9a4f52..b29a18077a6a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -71,6 +71,7 @@
>  #include <linux/dax.h>
>  #include <linux/oom.h>
>  #include <linux/numa.h>
> +#include <linux/module.h>
>  
>  #include <trace/events/kmem.h>
>  
> @@ -88,6 +89,8 @@
>  #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
>  #endif
>  
> +int panic_on_inconsistent_mm __read_mostly;
> +
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  /* use the per-pgdat data instead for discontigmem - mbligh */
>  unsigned long max_mapnr;
> @@ -543,6 +546,11 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>  		 vma->vm_ops ? vma->vm_ops->fault : NULL,
>  		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
>  		 mapping ? mapping->a_ops->readpage : NULL);
> +
> +	if (panic_on_inconsistent_mm) {
> +		print_modules();
> +		panic("Bad page map detected");
> +	}
>  	dump_stack();
>  	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d047bf7d8fd4..a20cd3ece5ba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
>  	if (bad_flags)
>  		pr_alert("bad because of flags: %#lx(%pGp)\n",
>  						bad_flags, &bad_flags);
> -	dump_page_owner(page);
>  
> +	dump_page_owner(page);
>  	print_modules();
> +	if (panic_on_inconsistent_mm)
> +		panic("Bad page state detected");
>  	dump_stack();
>  out:
>  	/* Leave bad fields for debug, except PageBuddy could make trouble */
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-01-30 14:44 ` Vlastimil Babka
@ 2020-01-30 19:28   ` Kees Cook
  2020-02-03 12:19     ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2020-01-30 19:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Grzegorz Halat, linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	ssaner, atomlin, oleksandr, vbendel, kirill, khlebnikov,
	borntraeger, Andrew Morton, Iurii Zaikin, Luis Chamberlain,
	Jonathan Corbet, Tetsuo Handa, Qian Cai

On Thu, Jan 30, 2020 at 03:44:50PM +0100, Vlastimil Babka wrote:
> On 1/29/20 7:08 PM, Grzegorz Halat wrote:
> > Memory management subsystem performs various checks at runtime,
> > if an inconsistency is detected then such event is being logged and kernel
> > continues to run. While debugging such problems it is helpful to collect
> > memory dump as early as possible. Currently, there is no easy way to panic
> > kernel when such error is detected.
> > 
> > It was proposed[1] to panic the kernel if panic_on_oops is set but this
> > approach was not accepted. One of alternative proposals was introduction of
> > a new sysctl.
> > 
> > Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
> > kernel will be crashed when an inconsistency is detected by memory
> > management. This currently means panic when bad page or bad PTE
> > is detected(this may be extended to other places in MM).
> 
> I wonder, should enabling the sysctl also effectively convert VM_WARN...
> to VM_BUG... ?

There is already panic_on_warn sysctl... wouldn't that be sufficient?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-01-30 19:28   ` Kees Cook
@ 2020-02-03 12:19     ` Vlastimil Babka
  0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2020-02-03 12:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: Grzegorz Halat, linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	ssaner, atomlin, oleksandr, vbendel, kirill, khlebnikov,
	borntraeger, Andrew Morton, Iurii Zaikin, Luis Chamberlain,
	Jonathan Corbet, Tetsuo Handa, Qian Cai

On 1/30/20 8:28 PM, Kees Cook wrote:
> On Thu, Jan 30, 2020 at 03:44:50PM +0100, Vlastimil Babka wrote:
>> On 1/29/20 7:08 PM, Grzegorz Halat wrote:
>>> Memory management subsystem performs various checks at runtime,
>>> if an inconsistency is detected then such event is being logged and kernel
>>> continues to run. While debugging such problems it is helpful to collect
>>> memory dump as early as possible. Currently, there is no easy way to panic
>>> kernel when such error is detected.
>>>
>>> It was proposed[1] to panic the kernel if panic_on_oops is set but this
>>> approach was not accepted. One of alternative proposals was introduction of
>>> a new sysctl.
>>>
>>> Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
>>> kernel will be crashed when an inconsistency is detected by memory
>>> management. This currently means panic when bad page or bad PTE
>>> is detected(this may be extended to other places in MM).
>>
>> I wonder, should enabling the sysctl also effectively convert VM_WARN...
>> to VM_BUG... ?
> 
> There is already panic_on_warn sysctl... wouldn't that be sufficient?

True, forgot about it. Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-01-29 18:39 ` Qian Cai
@ 2020-02-03 14:08   ` Christian Borntraeger
  2020-02-03 15:06       ` Qian Cai
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Borntraeger @ 2020-02-03 14:08 UTC (permalink / raw)
  To: Qian Cai, Grzegorz Halat
  Cc: Linux Kernel Mailing List, Linux-MM, linux-fsdevel, linux-doc,
	ssaner, atomlin, oleksandr, vbendel, kirill, khlebnikov,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet, Tetsuo Handa



On 29.01.20 19:39, Qian Cai wrote:
> 
> 
>> On Jan 29, 2020, at 1:08 PM, Grzegorz Halat <ghalat@redhat.com> wrote:
>>
>> Memory management subsystem performs various checks at runtime,
>> if an inconsistency is detected then such event is being logged and kernel
>> continues to run. While debugging such problems it is helpful to collect
>> memory dump as early as possible. Currently, there is no easy way to panic
>> kernel when such error is detected.
>>
>> It was proposed[1] to panic the kernel if panic_on_oops is set but this
>> approach was not accepted. One of alternative proposals was introduction of
>> a new sysctl.
>>
>> Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
>> kernel will be crashed when an inconsistency is detected by memory
>> management. This currently means panic when bad page or bad PTE
>> is detected(this may be extended to other places in MM).
>>
>> Another use case of this sysctl may be in security-wise environments,
>> it may be more desired to crash machine than continue to run with
>> potentially damaged data structures.
> 
> It is annoying that I have to repeat my feedback, but I don’t know why
> admins want to enable this by allowing normal users to crash the systems
> more easily through recoverable MM bugs where I am sure we have plenty.
> How does that improve the security?

There are cases where data corruption is a no-go, while "one node going down" 
is acceptable.
And then there is also the case for payed service providers that often need
a dump at the time of the problem to understand rare issues.

So I DO see value in such a thing. We should just piggy-back on panic_on_warn
I guess.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-02-03 14:08   ` Christian Borntraeger
@ 2020-02-03 15:06       ` Qian Cai
  0 siblings, 0 replies; 10+ messages in thread
From: Qian Cai @ 2020-02-03 15:06 UTC (permalink / raw)
  To: Christian Borntraeger, Grzegorz Halat
  Cc: Linux Kernel Mailing List, Linux-MM, linux-fsdevel, linux-doc,
	ssaner, atomlin, oleksandr, vbendel, kirill, khlebnikov,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet, Tetsuo Handa

On Mon, 2020-02-03 at 15:08 +0100, Christian Borntraeger wrote:
> 
> On 29.01.20 19:39, Qian Cai wrote:
> > 
> > 
> > > On Jan 29, 2020, at 1:08 PM, Grzegorz Halat <ghalat@redhat.com> wrote:
> > > 
> > > Memory management subsystem performs various checks at runtime,
> > > if an inconsistency is detected then such event is being logged and kernel
> > > continues to run. While debugging such problems it is helpful to collect
> > > memory dump as early as possible. Currently, there is no easy way to panic
> > > kernel when such error is detected.
> > > 
> > > It was proposed[1] to panic the kernel if panic_on_oops is set but this
> > > approach was not accepted. One of alternative proposals was introduction of
> > > a new sysctl.
> > > 
> > > Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
> > > kernel will be crashed when an inconsistency is detected by memory
> > > management. This currently means panic when bad page or bad PTE
> > > is detected(this may be extended to other places in MM).
> > > 
> > > Another use case of this sysctl may be in security-wise environments,
> > > it may be more desired to crash machine than continue to run with
> > > potentially damaged data structures.
> > 
> > It is annoying that I have to repeat my feedback, but I don’t know why
> > admins want to enable this by allowing normal users to crash the systems
> > more easily through recoverable MM bugs where I am sure we have plenty.
> > How does that improve the security?
> 
> There are cases where data corruption is a no-go, while "one node going down" 
> is acceptable.
> And then there is also the case for payed service providers that often need
> a dump at the time of the problem to understand rare issues.
> 
> So I DO see value in such a thing. We should just piggy-back on panic_on_warn
> I guess.
> 

Indeed, so change pr_alert() to pr_warn() there then?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
@ 2020-02-03 15:06       ` Qian Cai
  0 siblings, 0 replies; 10+ messages in thread
From: Qian Cai @ 2020-02-03 15:06 UTC (permalink / raw)
  To: Christian Borntraeger, Grzegorz Halat
  Cc: Linux Kernel Mailing List, Linux-MM, linux-fsdevel, linux-doc,
	ssaner, atomlin, oleksandr, vbendel, kirill, khlebnikov,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet, Tetsuo Handa

On Mon, 2020-02-03 at 15:08 +0100, Christian Borntraeger wrote:
> 
> On 29.01.20 19:39, Qian Cai wrote:
> > 
> > 
> > > On Jan 29, 2020, at 1:08 PM, Grzegorz Halat <ghalat@redhat.com> wrote:
> > > 
> > > Memory management subsystem performs various checks at runtime,
> > > if an inconsistency is detected then such event is being logged and kernel
> > > continues to run. While debugging such problems it is helpful to collect
> > > memory dump as early as possible. Currently, there is no easy way to panic
> > > kernel when such error is detected.
> > > 
> > > It was proposed[1] to panic the kernel if panic_on_oops is set but this
> > > approach was not accepted. One of alternative proposals was introduction of
> > > a new sysctl.
> > > 
> > > Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
> > > kernel will be crashed when an inconsistency is detected by memory
> > > management. This currently means panic when bad page or bad PTE
> > > is detected(this may be extended to other places in MM).
> > > 
> > > Another use case of this sysctl may be in security-wise environments,
> > > it may be more desired to crash machine than continue to run with
> > > potentially damaged data structures.
> > 
> > It is annoying that I have to repeat my feedback, but I don’t know why
> > admins want to enable this by allowing normal users to crash the systems
> > more easily through recoverable MM bugs where I am sure we have plenty.
> > How does that improve the security?
> 
> There are cases where data corruption is a no-go, while "one node going down" 
> is acceptable.
> And then there is also the case for payed service providers that often need
> a dump at the time of the problem to understand rare issues.
> 
> So I DO see value in such a thing. We should just piggy-back on panic_on_warn
> I guess.
> 

Indeed, so change pr_alert() to pr_warn() there then?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl
  2020-02-03 15:06       ` Qian Cai
  (?)
@ 2020-02-05 22:23       ` Kees Cook
  -1 siblings, 0 replies; 10+ messages in thread
From: Kees Cook @ 2020-02-05 22:23 UTC (permalink / raw)
  To: Qian Cai
  Cc: Christian Borntraeger, Grzegorz Halat, Linux Kernel Mailing List,
	Linux-MM, linux-fsdevel, linux-doc, ssaner, atomlin, oleksandr,
	vbendel, kirill, khlebnikov, Andrew Morton, Iurii Zaikin,
	Luis Chamberlain, Jonathan Corbet, Tetsuo Handa

On Mon, Feb 03, 2020 at 10:06:11AM -0500, Qian Cai wrote:
> On Mon, 2020-02-03 at 15:08 +0100, Christian Borntraeger wrote:
> > 
> > On 29.01.20 19:39, Qian Cai wrote:
> > > 
> > > 
> > > > On Jan 29, 2020, at 1:08 PM, Grzegorz Halat <ghalat@redhat.com> wrote:
> > > > 
> > > > Memory management subsystem performs various checks at runtime,
> > > > if an inconsistency is detected then such event is being logged and kernel
> > > > continues to run. While debugging such problems it is helpful to collect
> > > > memory dump as early as possible. Currently, there is no easy way to panic
> > > > kernel when such error is detected.
> > > > 
> > > > It was proposed[1] to panic the kernel if panic_on_oops is set but this
> > > > approach was not accepted. One of alternative proposals was introduction of
> > > > a new sysctl.
> > > > 
> > > > Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the
> > > > kernel will be crashed when an inconsistency is detected by memory
> > > > management. This currently means panic when bad page or bad PTE
> > > > is detected(this may be extended to other places in MM).
> > > > 
> > > > Another use case of this sysctl may be in security-wise environments,
> > > > it may be more desired to crash machine than continue to run with
> > > > potentially damaged data structures.
> > > 
> > > It is annoying that I have to repeat my feedback, but I don’t know why
> > > admins want to enable this by allowing normal users to crash the systems
> > > more easily through recoverable MM bugs where I am sure we have plenty.
> > > How does that improve the security?
> > 
> > There are cases where data corruption is a no-go, while "one node going down" 
> > is acceptable.
> > And then there is also the case for payed service providers that often need
> > a dump at the time of the problem to understand rare issues.
> > 
> > So I DO see value in such a thing. We should just piggy-back on panic_on_warn
> > I guess.
> > 
> 
> Indeed, so change pr_alert() to pr_warn() there then?

pr_* are just printk levels. If you want a full trace and to hook to
panic_on_warn, you want WARN_ON(condition) (or WARN_ON_ONCE(), etc).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-05 22:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-29 18:08 [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl Grzegorz Halat
2020-01-29 18:39 ` Qian Cai
2020-02-03 14:08   ` Christian Borntraeger
2020-02-03 15:06     ` Qian Cai
2020-02-03 15:06       ` Qian Cai
2020-02-05 22:23       ` Kees Cook
2020-01-29 19:06 ` Qian Cai
2020-01-30 14:44 ` Vlastimil Babka
2020-01-30 19:28   ` Kees Cook
2020-02-03 12:19     ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.