linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
@ 2020-01-27 10:11 Grzegorz Halat
  2020-01-27 10:28 ` Oleksandr Natalenko
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Grzegorz Halat @ 2020-01-27 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-fsdevel, linux-doc, ghalat, ssaner, atomlin,
	oleksandr, vbendel, kirill, khlebnikov, borntraeger,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet

Memory management subsystem performs various checks at runtime,
if an inconsistency is detected then such event is being logged and kernel
continues to run. While debugging such problems it is helpful to collect
memory dump as early as possible. Currently, there is no easy way to panic
kernel when such error is detected.

It was proposed[1] to panic the kernel if panic_on_oops is set but this
approach was not accepted. One of alternative proposals was introduction of
a new sysctl.

The patch adds panic_on_mm_error sysctl. If the sysctl is set then the
kernel will be crashed when an inconsistency is detected by memory
management. This currently means panic when bad page or bad PTE
is detected(this may be extended to other places in MM).

Another use case of this sysctl may be in security-wise environments,
it may be more desired to crash machine than continue to run with
potentially damaged data structures.

[1] https://marc.info/?l=linux-mm&m=142649500728327&w=2

Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
---
 Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++++++++
 include/linux/kernel.h                      |  1 +
 kernel/sysctl.c                             |  9 +++++++++
 mm/memory.c                                 |  7 +++++++
 mm/page_alloc.c                             |  4 +++-
 5 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index def074807cee..2fecd6b2547e 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -61,6 +61,7 @@ show up in /proc/sys/kernel:
 - overflowgid
 - overflowuid
 - panic
+- panic_on_mm_error
 - panic_on_oops
 - panic_on_stackoverflow
 - panic_on_unrecovered_nmi
@@ -611,6 +612,17 @@ an IO error.
    and you can use this option to take a crash dump.
 
 
+panic_on_mm_error:
+==============
+
+Controls the kernel's behaviour when inconsistency is detected
+by memory management code, for example bad page state or bad PTE.
+
+0: try to continue operation.
+
+1: panic immediately.
+
+
 panic_on_oops:
 ==============
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0d9db2a14f44..5f9d408512ff 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
 extern int panic_timeout;
 extern unsigned long panic_print;
 extern int panic_on_oops;
+extern int panic_on_mm_error;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int panic_on_warn;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 70665934d53e..6477e1cce28b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1238,6 +1238,15 @@ static struct ctl_table kern_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
+	{
+		.procname	= "panic_on_mm_error",
+		.data		= &panic_on_mm_error,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
 	{
 		.procname	= "timer_migration",
diff --git a/mm/memory.c b/mm/memory.c
index 45442d9a4f52..cce74ff39447 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -71,6 +71,7 @@
 #include <linux/dax.h>
 #include <linux/oom.h>
 #include <linux/numa.h>
+#include <linux/module.h>
 
 #include <trace/events/kmem.h>
 
@@ -88,6 +89,8 @@
 #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
 #endif
 
+int panic_on_mm_error __read_mostly;
+
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 /* use the per-pgdat data instead for discontigmem - mbligh */
 unsigned long max_mapnr;
@@ -543,6 +546,10 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
 		 vma->vm_ops ? vma->vm_ops->fault : NULL,
 		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
 		 mapping ? mapping->a_ops->readpage : NULL);
+
+	print_modules();
+	if (panic_on_mm_error)
+		panic("Bad page map detected");
 	dump_stack();
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d047bf7d8fd4..2ea6a65ba011 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
 	if (bad_flags)
 		pr_alert("bad because of flags: %#lx(%pGp)\n",
 						bad_flags, &bad_flags);
-	dump_page_owner(page);
 
+	dump_page_owner(page);
 	print_modules();
+	if (panic_on_mm_error)
+		panic("Bad page state detected");
 	dump_stack();
 out:
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
  2020-01-27 10:11 [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl Grzegorz Halat
@ 2020-01-27 10:28 ` Oleksandr Natalenko
  2020-01-27 10:30 ` Aaron Tomlin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2020-01-27 10:28 UTC (permalink / raw)
  To: Grzegorz Halat
  Cc: linux-kernel, linux-mm, linux-fsdevel, linux-doc, ssaner,
	atomlin, vbendel, kirill, khlebnikov, borntraeger, Andrew Morton,
	Iurii Zaikin, Kees Cook, Luis Chamberlain, Jonathan Corbet

On Mon, Jan 27, 2020 at 11:11:00AM +0100, Grzegorz Halat wrote:
> Memory management subsystem performs various checks at runtime,
> if an inconsistency is detected then such event is being logged and kernel
> continues to run. While debugging such problems it is helpful to collect
> memory dump as early as possible. Currently, there is no easy way to panic
> kernel when such error is detected.
> 
> It was proposed[1] to panic the kernel if panic_on_oops is set but this
> approach was not accepted. One of alternative proposals was introduction of
> a new sysctl.
> 
> The patch adds panic_on_mm_error sysctl. If the sysctl is set then the
> kernel will be crashed when an inconsistency is detected by memory
> management. This currently means panic when bad page or bad PTE
> is detected(this may be extended to other places in MM).
> 
> Another use case of this sysctl may be in security-wise environments,
> it may be more desired to crash machine than continue to run with
> potentially damaged data structures.
> 
> [1] https://marc.info/?l=linux-mm&m=142649500728327&w=2
> 
> Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++++++++
>  include/linux/kernel.h                      |  1 +
>  kernel/sysctl.c                             |  9 +++++++++
>  mm/memory.c                                 |  7 +++++++
>  mm/page_alloc.c                             |  4 +++-
>  5 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index def074807cee..2fecd6b2547e 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -61,6 +61,7 @@ show up in /proc/sys/kernel:
>  - overflowgid
>  - overflowuid
>  - panic
> +- panic_on_mm_error
>  - panic_on_oops
>  - panic_on_stackoverflow
>  - panic_on_unrecovered_nmi
> @@ -611,6 +612,17 @@ an IO error.
>     and you can use this option to take a crash dump.
>  
>  
> +panic_on_mm_error:
> +==============
> +
> +Controls the kernel's behaviour when inconsistency is detected
> +by memory management code, for example bad page state or bad PTE.
> +
> +0: try to continue operation.
> +
> +1: panic immediately.
> +
> +
>  panic_on_oops:
>  ==============
>  
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 0d9db2a14f44..5f9d408512ff 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
>  extern int panic_timeout;
>  extern unsigned long panic_print;
>  extern int panic_on_oops;
> +extern int panic_on_mm_error;
>  extern int panic_on_unrecovered_nmi;
>  extern int panic_on_io_nmi;
>  extern int panic_on_warn;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e..6477e1cce28b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1238,6 +1238,15 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> +	{
> +		.procname	= "panic_on_mm_error",
> +		.data		= &panic_on_mm_error,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
> +	},
>  #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
>  	{
>  		.procname	= "timer_migration",
> diff --git a/mm/memory.c b/mm/memory.c
> index 45442d9a4f52..cce74ff39447 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -71,6 +71,7 @@
>  #include <linux/dax.h>
>  #include <linux/oom.h>
>  #include <linux/numa.h>
> +#include <linux/module.h>
>  
>  #include <trace/events/kmem.h>
>  
> @@ -88,6 +89,8 @@
>  #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
>  #endif
>  
> +int panic_on_mm_error __read_mostly;
> +
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  /* use the per-pgdat data instead for discontigmem - mbligh */
>  unsigned long max_mapnr;
> @@ -543,6 +546,10 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>  		 vma->vm_ops ? vma->vm_ops->fault : NULL,
>  		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
>  		 mapping ? mapping->a_ops->readpage : NULL);
> +
> +	print_modules();
> +	if (panic_on_mm_error)
> +		panic("Bad page map detected");
>  	dump_stack();
>  	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d047bf7d8fd4..2ea6a65ba011 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
>  	if (bad_flags)
>  		pr_alert("bad because of flags: %#lx(%pGp)\n",
>  						bad_flags, &bad_flags);
> -	dump_page_owner(page);
>  
> +	dump_page_owner(page);
>  	print_modules();
> +	if (panic_on_mm_error)
> +		panic("Bad page state detected");
>  	dump_stack();
>  out:
>  	/* Leave bad fields for debug, except PageBuddy could make trouble */
> -- 
> 2.21.1
> 

Reviewed-by: Oleksandr Natalenko <oleksandr@redhat.com>

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
  2020-01-27 10:11 [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl Grzegorz Halat
  2020-01-27 10:28 ` Oleksandr Natalenko
@ 2020-01-27 10:30 ` Aaron Tomlin
  2020-01-27 10:42 ` Tetsuo Handa
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Aaron Tomlin @ 2020-01-27 10:30 UTC (permalink / raw)
  To: Grzegorz Halat
  Cc: linux-kernel, linux-mm, linux-fsdevel, linux-doc, ssaner,
	oleksandr, vbendel, kirill, khlebnikov, borntraeger,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet

On Mon 2020-01-27 11:11 +0100, Grzegorz Halat wrote:
> Memory management subsystem performs various checks at runtime,
> if an inconsistency is detected then such event is being logged and kernel
> continues to run. While debugging such problems it is helpful to collect
> memory dump as early as possible. Currently, there is no easy way to panic
> kernel when such error is detected.
> 
> It was proposed[1] to panic the kernel if panic_on_oops is set but this
> approach was not accepted. One of alternative proposals was introduction of
> a new sysctl.
> 
> The patch adds panic_on_mm_error sysctl. If the sysctl is set then the
> kernel will be crashed when an inconsistency is detected by memory
> management. This currently means panic when bad page or bad PTE
> is detected(this may be extended to other places in MM).
> 
> Another use case of this sysctl may be in security-wise environments,
> it may be more desired to crash machine than continue to run with
> potentially damaged data structures.
> 
> [1] https://marc.info/?l=linux-mm&m=142649500728327&w=2
> 
> Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++++++++
>  include/linux/kernel.h                      |  1 +
>  kernel/sysctl.c                             |  9 +++++++++
>  mm/memory.c                                 |  7 +++++++
>  mm/page_alloc.c                             |  4 +++-
>  5 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index def074807cee..2fecd6b2547e 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -61,6 +61,7 @@ show up in /proc/sys/kernel:
>  - overflowgid
>  - overflowuid
>  - panic
> +- panic_on_mm_error
>  - panic_on_oops
>  - panic_on_stackoverflow
>  - panic_on_unrecovered_nmi
> @@ -611,6 +612,17 @@ an IO error.
>     and you can use this option to take a crash dump.
>  
>  
> +panic_on_mm_error:
> +==============
> +
> +Controls the kernel's behaviour when inconsistency is detected
> +by memory management code, for example bad page state or bad PTE.
> +
> +0: try to continue operation.
> +
> +1: panic immediately.
> +
> +
>  panic_on_oops:
>  ==============
>  
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 0d9db2a14f44..5f9d408512ff 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
>  extern int panic_timeout;
>  extern unsigned long panic_print;
>  extern int panic_on_oops;
> +extern int panic_on_mm_error;
>  extern int panic_on_unrecovered_nmi;
>  extern int panic_on_io_nmi;
>  extern int panic_on_warn;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e..6477e1cce28b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1238,6 +1238,15 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> +	{
> +		.procname	= "panic_on_mm_error",
> +		.data		= &panic_on_mm_error,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
> +	},
>  #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
>  	{
>  		.procname	= "timer_migration",
> diff --git a/mm/memory.c b/mm/memory.c
> index 45442d9a4f52..cce74ff39447 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -71,6 +71,7 @@
>  #include <linux/dax.h>
>  #include <linux/oom.h>
>  #include <linux/numa.h>
> +#include <linux/module.h>
>  
>  #include <trace/events/kmem.h>
>  
> @@ -88,6 +89,8 @@
>  #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
>  #endif
>  
> +int panic_on_mm_error __read_mostly;
> +
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  /* use the per-pgdat data instead for discontigmem - mbligh */
>  unsigned long max_mapnr;
> @@ -543,6 +546,10 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>  		 vma->vm_ops ? vma->vm_ops->fault : NULL,
>  		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
>  		 mapping ? mapping->a_ops->readpage : NULL);
> +
> +	print_modules();
> +	if (panic_on_mm_error)
> +		panic("Bad page map detected");
>  	dump_stack();
>  	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d047bf7d8fd4..2ea6a65ba011 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
>  	if (bad_flags)
>  		pr_alert("bad because of flags: %#lx(%pGp)\n",
>  						bad_flags, &bad_flags);
> -	dump_page_owner(page);
>  
> +	dump_page_owner(page);
>  	print_modules();
> +	if (panic_on_mm_error)
> +		panic("Bad page state detected");
>  	dump_stack();
>  out:
>  	/* Leave bad fields for debug, except PageBuddy could make trouble */

Reviewed-by: Aaron Tomlin <atomlin@redhat.com>

-- 
Aaron Tomlin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
  2020-01-27 10:11 [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl Grzegorz Halat
  2020-01-27 10:28 ` Oleksandr Natalenko
  2020-01-27 10:30 ` Aaron Tomlin
@ 2020-01-27 10:42 ` Tetsuo Handa
  2020-01-27 10:55   ` Aaron Tomlin
  2020-01-27 11:32   ` Grzegorz Halat
  2020-01-27 12:26 ` Qian Cai
  2020-01-27 12:51 ` Konstantin Khlebnikov
  4 siblings, 2 replies; 8+ messages in thread
From: Tetsuo Handa @ 2020-01-27 10:42 UTC (permalink / raw)
  To: Grzegorz Halat
  Cc: linux-kernel, linux-mm, linux-fsdevel, linux-doc, ssaner,
	atomlin, oleksandr, vbendel, kirill, khlebnikov, borntraeger,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet

On 2020/01/27 19:11, Grzegorz Halat wrote:
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index def074807cee..2fecd6b2547e 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -61,6 +61,7 @@ show up in /proc/sys/kernel:
>  - overflowgid
>  - overflowuid
>  - panic
> +- panic_on_mm_error

Maybe panic_on_inconsistent_mm is better, for an MM error sounds too generic
(e.g. is page allocation failure an error, is OOM killer an error,
is NULL pointer dereference an error, is use-after-free access an error) ?

Also, should this be in /proc/sys/vm/ than /proc/sys/kernel/ ?

>  - panic_on_oops
>  - panic_on_stackoverflow
>  - panic_on_unrecovered_nmi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
  2020-01-27 10:42 ` Tetsuo Handa
@ 2020-01-27 10:55   ` Aaron Tomlin
  2020-01-27 11:32   ` Grzegorz Halat
  1 sibling, 0 replies; 8+ messages in thread
From: Aaron Tomlin @ 2020-01-27 10:55 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Grzegorz Halat, linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	ssaner, oleksandr, vbendel, kirill, khlebnikov, borntraeger,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet

On Mon 2020-01-27 19:42 +0900, Tetsuo Handa wrote:
> On 2020/01/27 19:11, Grzegorz Halat wrote:
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index def074807cee..2fecd6b2547e 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -61,6 +61,7 @@ show up in /proc/sys/kernel:
> >  - overflowgid
> >  - overflowuid
> >  - panic
> > +- panic_on_mm_error
> 
> Maybe panic_on_inconsistent_mm is better, for an MM error sounds too generic
> (e.g. is page allocation failure an error, is OOM killer an error,
> is NULL pointer dereference an error, is use-after-free access an error) ?

Fair enough; 'panic_on_inconsistent_mm' is more concise.

-- 
Aaron Tomlin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
  2020-01-27 10:42 ` Tetsuo Handa
  2020-01-27 10:55   ` Aaron Tomlin
@ 2020-01-27 11:32   ` Grzegorz Halat
  1 sibling, 0 replies; 8+ messages in thread
From: Grzegorz Halat @ 2020-01-27 11:32 UTC (permalink / raw)
  To: Tetsuo Handa, Aaron Tomlin
  Cc: linux-kernel, linux-mm, linux-fsdevel, linux-doc, Stan Saner,
	Oleksandr Natalenko, Vratislav Bendel, kirill, khlebnikov,
	borntraeger, Andrew Morton, Iurii Zaikin, Kees Cook,
	Luis Chamberlain, Jonathan Corbet

On Mon, 27 Jan 2020 at 11:43, Tetsuo Handa wrote:
>
> Maybe panic_on_inconsistent_mm is better, for an MM error sounds too generic
> (e.g. is page allocation failure an error, is OOM killer an error,
> is NULL pointer dereference an error, is use-after-free access an error) ?
>
yes, panic_on_inconsistent_mm is better

> Also, should this be in /proc/sys/vm/ than /proc/sys/kernel/ ?
Agreed

I will wait a day or two for more feedback and send V2 with sysctl
named as 'vm.panic_on_inconsistent_mm'.

Thanks,
--
Grzegorz Halat


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
  2020-01-27 10:11 [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl Grzegorz Halat
                   ` (2 preceding siblings ...)
  2020-01-27 10:42 ` Tetsuo Handa
@ 2020-01-27 12:26 ` Qian Cai
  2020-01-27 12:51 ` Konstantin Khlebnikov
  4 siblings, 0 replies; 8+ messages in thread
From: Qian Cai @ 2020-01-27 12:26 UTC (permalink / raw)
  To: Grzegorz Halat
  Cc: linux-kernel, linux-mm, linux-fsdevel, linux-doc, ssaner,
	atomlin, oleksandr, vbendel, kirill, khlebnikov, borntraeger,
	Andrew Morton, Iurii Zaikin, Kees Cook, Luis Chamberlain,
	Jonathan Corbet



> On Jan 27, 2020, at 5:11 AM, Grzegorz Halat <ghalat@redhat.com> wrote:
> 
> Memory management subsystem performs various checks at runtime,
> if an inconsistency is detected then such event is being logged and kernel
> continues to run. While debugging such problems it is helpful to collect
> memory dump as early as possible. Currently, there is no easy way to panic
> kernel when such error is detected.
> 
> It was proposed[1] to panic the kernel if panic_on_oops is set but this
> approach was not accepted. One of alternative proposals was introduction of
> a new sysctl.
> 
> The patch adds panic_on_mm_error sysctl. If the sysctl is set then the
> kernel will be crashed when an inconsistency is detected by memory
> management. This currently means panic when bad page or bad PTE
> is detected(this may be extended to other places in MM).
> 
> Another use case of this sysctl may be in security-wise environments,
> it may be more desired to crash machine than continue to run with
> potentially damaged data structures.

Well, on the other hand, this will allow a normal user to more easily crash the system due to a recoverable bug which could result in local DoS.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl
  2020-01-27 10:11 [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl Grzegorz Halat
                   ` (3 preceding siblings ...)
  2020-01-27 12:26 ` Qian Cai
@ 2020-01-27 12:51 ` Konstantin Khlebnikov
  4 siblings, 0 replies; 8+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-27 12:51 UTC (permalink / raw)
  To: Grzegorz Halat, linux-kernel
  Cc: linux-mm, linux-fsdevel, linux-doc, ssaner, atomlin, oleksandr,
	vbendel, kirill, borntraeger, Andrew Morton, Iurii Zaikin,
	Kees Cook, Luis Chamberlain, Jonathan Corbet

On 27/01/2020 13.11, Grzegorz Halat wrote:
> Memory management subsystem performs various checks at runtime,
> if an inconsistency is detected then such event is being logged and kernel
> continues to run. While debugging such problems it is helpful to collect
> memory dump as early as possible. Currently, there is no easy way to panic
> kernel when such error is detected.
> 
> It was proposed[1] to panic the kernel if panic_on_oops is set but this
> approach was not accepted. One of alternative proposals was introduction of
> a new sysctl.
> 
> The patch adds panic_on_mm_error sysctl. If the sysctl is set then the
> kernel will be crashed when an inconsistency is detected by memory
> management. This currently means panic when bad page or bad PTE
> is detected(this may be extended to other places in MM).
> 
> Another use case of this sysctl may be in security-wise environments,
> it may be more desired to crash machine than continue to run with
> potentially damaged data structures.
> 
> [1] https://marc.info/?l=linux-mm&m=142649500728327&w=2
> 
> Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
> ---
>   Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++++++++
>   include/linux/kernel.h                      |  1 +
>   kernel/sysctl.c                             |  9 +++++++++
>   mm/memory.c                                 |  7 +++++++
>   mm/page_alloc.c                             |  4 +++-
>   5 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index def074807cee..2fecd6b2547e 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -61,6 +61,7 @@ show up in /proc/sys/kernel:
>   - overflowgid
>   - overflowuid
>   - panic
> +- panic_on_mm_error
>   - panic_on_oops
>   - panic_on_stackoverflow
>   - panic_on_unrecovered_nmi
> @@ -611,6 +612,17 @@ an IO error.
>      and you can use this option to take a crash dump.
>   
>   
> +panic_on_mm_error:
> +==============
> +
> +Controls the kernel's behaviour when inconsistency is detected
> +by memory management code, for example bad page state or bad PTE.
> +
> +0: try to continue operation.
> +
> +1: panic immediately.
> +
> +
>   panic_on_oops:
>   ==============
>   
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 0d9db2a14f44..5f9d408512ff 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -518,6 +518,7 @@ extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in
>   extern int panic_timeout;
>   extern unsigned long panic_print;
>   extern int panic_on_oops;
> +extern int panic_on_mm_error;
>   extern int panic_on_unrecovered_nmi;
>   extern int panic_on_io_nmi;
>   extern int panic_on_warn;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e..6477e1cce28b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1238,6 +1238,15 @@ static struct ctl_table kern_table[] = {
>   		.extra1		= SYSCTL_ZERO,
>   		.extra2		= SYSCTL_ONE,
>   	},
> +	{
> +		.procname	= "panic_on_mm_error",
> +		.data		= &panic_on_mm_error,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
> +	},
>   #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
>   	{
>   		.procname	= "timer_migration",
> diff --git a/mm/memory.c b/mm/memory.c
> index 45442d9a4f52..cce74ff39447 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -71,6 +71,7 @@
>   #include <linux/dax.h>
>   #include <linux/oom.h>
>   #include <linux/numa.h>
> +#include <linux/module.h>
>   
>   #include <trace/events/kmem.h>
>   
> @@ -88,6 +89,8 @@
>   #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
>   #endif
>   
> +int panic_on_mm_error __read_mostly;
> +
>   #ifndef CONFIG_NEED_MULTIPLE_NODES
>   /* use the per-pgdat data instead for discontigmem - mbligh */
>   unsigned long max_mapnr;
> @@ -543,6 +546,10 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>   		 vma->vm_ops ? vma->vm_ops->fault : NULL,
>   		 vma->vm_file ? vma->vm_file->f_op->mmap : NULL,
>   		 mapping ? mapping->a_ops->readpage : NULL);
> +
> +	print_modules();

I'm not sure that printing all modules for each bad pte is a good idea.
It already prints 'readpage' which is enough to guess filesystem or driver.

Maybe print modules under if (panic...) below?

> +	if (panic_on_mm_error)
> +		panic("Bad page map detected");
>   	dump_stack();
>   	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>   }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d047bf7d8fd4..2ea6a65ba011 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *reason,
>   	if (bad_flags)
>   		pr_alert("bad because of flags: %#lx(%pGp)\n",
>   						bad_flags, &bad_flags);
> -	dump_page_owner(page);
>   
> +	dump_page_owner(page);
>   	print_modules();
> +	if (panic_on_mm_error)
> +		panic("Bad page state detected");
>   	dump_stack();
>   out:
>   	/* Leave bad fields for debug, except PageBuddy could make trouble */
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-01-27 12:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27 10:11 [PATCH 1/1] mm: sysctl: add panic_on_mm_error sysctl Grzegorz Halat
2020-01-27 10:28 ` Oleksandr Natalenko
2020-01-27 10:30 ` Aaron Tomlin
2020-01-27 10:42 ` Tetsuo Handa
2020-01-27 10:55   ` Aaron Tomlin
2020-01-27 11:32   ` Grzegorz Halat
2020-01-27 12:26 ` Qian Cai
2020-01-27 12:51 ` Konstantin Khlebnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).