linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: introduce oom_kill_disable sysctl knob
@ 2020-11-06 20:32 Minchan Kim
  2020-11-06 20:46 ` Randy Dunlap
  2020-11-09  7:37 ` Michal Hocko
  0 siblings, 2 replies; 7+ messages in thread
From: Minchan Kim @ 2020-11-06 20:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, linux-mm, Michal Hocko, Minchan Kim

It's hard to have some tests to be supposed to work under heavy
memory pressure(e.g., injecting some memory hogger) because
out-of-memory killer easily kicks out one of processes so system
is broken or system loses the memory pressure state since it has
plenty of free memory soon so.
Even though we could mark existing process's oom_adj to -1000,
it couldn't cover upcoming processes to be forked for the job.

This knob is handy to keep system memory pressure.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++
 include/linux/mm.h                      |  2 ++
 include/linux/oom.h                     |  1 +
 kernel/sysctl.c                         |  9 +++++++++
 mm/oom_kill.c                           | 24 ++++++++++++++++++++++++
 5 files changed, 50 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index f455fa00c00f..49dcedfaf0c0 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -694,6 +694,20 @@ is used in oom_kill_allocating_task.
 
 The default value is 0.
 
+oom_kill_disable
+================
+
+This disables or enables OOM killing in out-of-memory situations.
+
+If this is set to one, the OOM killer is disabled so OOM kill never
+hapens in out-of-memory situation. It could cause system dangerous
+state due to memory allocation failure so user should be careful to
+use it.
+
+If this is set to zero, the OOM killer is enabled so OOM kill happens
+in out-of-memory situations.
+
+The default value is 0.
 
 overcommit_kbytes
 =================
diff --git a/include/linux/mm.h b/include/linux/mm.h
index db6ae4d3fb4e..a98400cee341 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -215,6 +215,8 @@ int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *,
 		loff_t *);
 int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
 		loff_t *);
+int oom_kill_disable_handler(struct ctl_table *, int, void *, size_t *,
+		loff_t *);
 
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 2db9a1432511..0f378498e6aa 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -126,5 +126,6 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
 /* sysctls */
 extern int sysctl_oom_dump_tasks;
 extern int sysctl_oom_kill_allocating_task;
+extern int sysctl_oom_kill_disable;
 extern int sysctl_panic_on_oom;
 #endif /* _INCLUDE_LINUX_OOM_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index afad085960b8..1fe872fe1c05 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2690,6 +2690,15 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "oom_kill_disable",
+		.data		= &sysctl_oom_kill_disable,
+		.maxlen		= sizeof(sysctl_oom_kill_disable),
+		.mode		= 0644,
+		.proc_handler	= oom_kill_disable_handler,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 	{
 		.procname	= "oom_dump_tasks",
 		.data		= &sysctl_oom_dump_tasks,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8b84661a6410..0f48cdeeb1e7 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -53,6 +53,7 @@
 
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
+int sysctl_oom_kill_disable;
 int sysctl_oom_dump_tasks = 1;
 
 /*
@@ -72,6 +73,29 @@ static inline bool is_memcg_oom(struct oom_control *oc)
 	return oc->memcg != NULL;
 }
 
+int oom_kill_disable_handler(struct ctl_table *table, int write, void *buffer,
+		size_t *lenp, loff_t *ppos)
+{
+	int ret;
+
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (ret || !write)
+		goto out;
+	if (sysctl_oom_kill_disable == 1) {
+		if (!oom_killer_disable(HZ))
+			ret = -EBUSY;
+	} else {
+		if (mutex_lock_killable(&oom_lock)) {
+			ret = -EBUSY;
+			goto out;
+		}
+		oom_killer_enable();
+		mutex_unlock(&oom_lock);
+	}
+out:
+	return ret;
+}
+
 #ifdef CONFIG_NUMA
 /**
  * oom_cpuset_eligible() - check task eligiblity for kill
-- 
2.29.1.341.ge80a0c044ae-goog



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: introduce oom_kill_disable sysctl knob
  2020-11-06 20:32 [PATCH] mm: introduce oom_kill_disable sysctl knob Minchan Kim
@ 2020-11-06 20:46 ` Randy Dunlap
  2020-11-06 22:59   ` Minchan Kim
  2020-11-09  7:37 ` Michal Hocko
  1 sibling, 1 reply; 7+ messages in thread
From: Randy Dunlap @ 2020-11-06 20:46 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: LKML, linux-mm, Michal Hocko

Hi,

Fix a few typos:

On 11/6/20 12:32 PM, Minchan Kim wrote:
> ---
>  Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++
>  include/linux/mm.h                      |  2 ++
>  include/linux/oom.h                     |  1 +
>  kernel/sysctl.c                         |  9 +++++++++
>  mm/oom_kill.c                           | 24 ++++++++++++++++++++++++
>  5 files changed, 50 insertions(+)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index f455fa00c00f..49dcedfaf0c0 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -694,6 +694,20 @@ is used in oom_kill_allocating_task.
>  
>  The default value is 0.
>  
> +oom_kill_disable
> +================
> +
> +This disables or enables OOM killing in out-of-memory situations.
> +
> +If this is set to one, the OOM killer is disabled so OOM kill never
> +hapens in out-of-memory situation. It could cause system dangerous

   happens                            It could cause a dangerous system

> +state due to memory allocation failure so user should be careful to

                                                            careful when
> +use it.

   using it.

> +
> +If this is set to zero, the OOM killer is enabled so OOM kill happens
> +in out-of-memory situations.
> +
> +The default value is 0.
>  
>  overcommit_kbytes
>  =================

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8b84661a6410..0f48cdeeb1e7 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c

>  #ifdef CONFIG_NUMA
>  /**
>   * oom_cpuset_eligible() - check task eligiblity for kill

                                         eligibility

but that's not in your patch, so don't bother with it. :)


-- 
~Randy



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: introduce oom_kill_disable sysctl knob
  2020-11-06 20:46 ` Randy Dunlap
@ 2020-11-06 22:59   ` Minchan Kim
  0 siblings, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2020-11-06 22:59 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Andrew Morton, LKML, linux-mm, Michal Hocko

On Fri, Nov 06, 2020 at 12:46:47PM -0800, Randy Dunlap wrote:
> Hi,
> 
> Fix a few typos:
> 
> On 11/6/20 12:32 PM, Minchan Kim wrote:
> > ---
> >  Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++
> >  include/linux/mm.h                      |  2 ++
> >  include/linux/oom.h                     |  1 +
> >  kernel/sysctl.c                         |  9 +++++++++
> >  mm/oom_kill.c                           | 24 ++++++++++++++++++++++++
> >  5 files changed, 50 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> > index f455fa00c00f..49dcedfaf0c0 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -694,6 +694,20 @@ is used in oom_kill_allocating_task.
> >  
> >  The default value is 0.
> >  
> > +oom_kill_disable
> > +================
> > +
> > +This disables or enables OOM killing in out-of-memory situations.
> > +
> > +If this is set to one, the OOM killer is disabled so OOM kill never
> > +hapens in out-of-memory situation. It could cause system dangerous
> 
>    happens                            It could cause a dangerous system
> 
> > +state due to memory allocation failure so user should be careful to
> 
>                                                             careful when
> > +use it.
> 
>    using it.
> 
> > +
> > +If this is set to zero, the OOM killer is enabled so OOM kill happens
> > +in out-of-memory situations.
> > +
> > +The default value is 0.
> >  
> >  overcommit_kbytes
> >  =================
> 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 8b84661a6410..0f48cdeeb1e7 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> 
> >  #ifdef CONFIG_NUMA
> >  /**
> >   * oom_cpuset_eligible() - check task eligiblity for kill
> 
>                                          eligibility
> 
> but that's not in your patch, so don't bother with it. :)

Sure, I will resend it with fixing after waiting some feedback.

Thanks for the review, Randy.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: introduce oom_kill_disable sysctl knob
  2020-11-06 20:32 [PATCH] mm: introduce oom_kill_disable sysctl knob Minchan Kim
  2020-11-06 20:46 ` Randy Dunlap
@ 2020-11-09  7:37 ` Michal Hocko
  2020-11-09 15:39   ` Minchan Kim
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2020-11-09  7:37 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, LKML, linux-mm

On Fri 06-11-20 12:32:38, Minchan Kim wrote:
> It's hard to have some tests to be supposed to work under heavy
> memory pressure(e.g., injecting some memory hogger) because
> out-of-memory killer easily kicks out one of processes so system
> is broken or system loses the memory pressure state since it has
> plenty of free memory soon so.

I do not follow the reasoning here. So you want to test for a close to
no memory available situation and the oom killer stands in the way
because it puts a relief?

> Even though we could mark existing process's oom_adj to -1000,
> it couldn't cover upcoming processes to be forked for the job.

Why?

> This knob is handy to keep system memory pressure.

This sounds like a very dubious reason to introduce a knob to cripple
the system.

I can see some reason to control the oom handling policy because the
effect of the oom killer is really disruptive but a global on/off switch
sounds like a too coarse interface. Really what kind of production
environment would ever go with oom killer disabled completely?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: introduce oom_kill_disable sysctl knob
  2020-11-09  7:37 ` Michal Hocko
@ 2020-11-09 15:39   ` Minchan Kim
  2020-11-09 16:06     ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Minchan Kim @ 2020-11-09 15:39 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, LKML, linux-mm

On Mon, Nov 09, 2020 at 08:37:06AM +0100, Michal Hocko wrote:
> On Fri 06-11-20 12:32:38, Minchan Kim wrote:
> > It's hard to have some tests to be supposed to work under heavy
> > memory pressure(e.g., injecting some memory hogger) because
> > out-of-memory killer easily kicks out one of processes so system
> > is broken or system loses the memory pressure state since it has
> > plenty of free memory soon so.
> 
> I do not follow the reasoning here. So you want to test for a close to
> no memory available situation and the oom killer stands in the way
> because it puts a relief?

Yub, technically, I'd like to have consistent memory pressure to cause
direct reclaims on proesses on the system and swapping in/out.

> 
> > Even though we could mark existing process's oom_adj to -1000,
> > it couldn't cover upcoming processes to be forked for the job.
> 
> Why?

Thing is the system has out-of-control processes created on demand.
so only option to prevent OOM is echo -1000 > `pidof the process`
since they are forked. However, I have no idea when they are forked
so should race with OOM with /proc polling and OOM is frequently
ahead of me.

> 
> > This knob is handy to keep system memory pressure.
> 
> This sounds like a very dubious reason to introduce a knob to cripple
> the system.
> 
> I can see some reason to control the oom handling policy because the
> effect of the oom killer is really disruptive but a global on/off switch
> sounds like a too coarse interface. Really what kind of production
> environment would ever go with oom killer disabled completely?

I don't think shipping production system will use it. It would be
just testing only option.
My intention uses such heavy memory load to see various system behaviors
before the production launching because it usually happens in real workload
once we shipped but hard to generate such a corner case without artificial
memory pressure.

Any suggestion?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: introduce oom_kill_disable sysctl knob
  2020-11-09 15:39   ` Minchan Kim
@ 2020-11-09 16:06     ` Michal Hocko
  2020-11-09 16:27       ` Minchan Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2020-11-09 16:06 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, LKML, linux-mm

On Mon 09-11-20 07:39:33, Minchan Kim wrote:
> On Mon, Nov 09, 2020 at 08:37:06AM +0100, Michal Hocko wrote:
> > On Fri 06-11-20 12:32:38, Minchan Kim wrote:
> > > It's hard to have some tests to be supposed to work under heavy
> > > memory pressure(e.g., injecting some memory hogger) because
> > > out-of-memory killer easily kicks out one of processes so system
> > > is broken or system loses the memory pressure state since it has
> > > plenty of free memory soon so.
> > 
> > I do not follow the reasoning here. So you want to test for a close to
> > no memory available situation and the oom killer stands in the way
> > because it puts a relief?
> 
> Yub, technically, I'd like to have consistent memory pressure to cause
> direct reclaims on proesses on the system and swapping in/out.
 
> > 
> > > Even though we could mark existing process's oom_adj to -1000,
> > > it couldn't cover upcoming processes to be forked for the job.
> > 
> > Why?
> 
> Thing is the system has out-of-control processes created on demand.
> so only option to prevent OOM is echo -1000 > `pidof the process`
> since they are forked. However, I have no idea when they are forked
> so should race with OOM with /proc polling and OOM is frequently
> ahead of me.

I am still confused. Why would you want all/most processes to be hidden
from the oom killer?
 
> > > This knob is handy to keep system memory pressure.
> > 
> > This sounds like a very dubious reason to introduce a knob to cripple
> > the system.
> > 
> > I can see some reason to control the oom handling policy because the
> > effect of the oom killer is really disruptive but a global on/off switch
> > sounds like a too coarse interface. Really what kind of production
> > environment would ever go with oom killer disabled completely?
> 
> I don't think shipping production system will use it. It would be
> just testing only option.

Then it doesn't really belong to the kernel IMHO.

> My intention uses such heavy memory load to see various system behaviors
> before the production launching because it usually happens in real workload
> once we shipped but hard to generate such a corner case without artificial
> memory pressure.

But changing the oom behavior will result in a completely different
system behavior. So you would be testing something that doesn't really
happen in any production system.

> Any suggestion?

Not really because I still do not understand your objective. You can
generate memory pressure and tune it up for specific testing scenario.
Sure there will be a some interference from the background noise (kernel
subsystems reacting to external events, processes created etc.) but why
that is a problem? This is normal to any running system.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: introduce oom_kill_disable sysctl knob
  2020-11-09 16:06     ` Michal Hocko
@ 2020-11-09 16:27       ` Minchan Kim
  0 siblings, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2020-11-09 16:27 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, LKML, linux-mm

On Mon, Nov 09, 2020 at 05:06:18PM +0100, Michal Hocko wrote:
> On Mon 09-11-20 07:39:33, Minchan Kim wrote:
> > On Mon, Nov 09, 2020 at 08:37:06AM +0100, Michal Hocko wrote:
> > > On Fri 06-11-20 12:32:38, Minchan Kim wrote:
> > > > It's hard to have some tests to be supposed to work under heavy
> > > > memory pressure(e.g., injecting some memory hogger) because
> > > > out-of-memory killer easily kicks out one of processes so system
> > > > is broken or system loses the memory pressure state since it has
> > > > plenty of free memory soon so.
> > > 
> > > I do not follow the reasoning here. So you want to test for a close to
> > > no memory available situation and the oom killer stands in the way
> > > because it puts a relief?
> > 
> > Yub, technically, I'd like to have consistent memory pressure to cause
> > direct reclaims on proesses on the system and swapping in/out.
>  
> > > 
> > > > Even though we could mark existing process's oom_adj to -1000,
> > > > it couldn't cover upcoming processes to be forked for the job.
> > > 
> > > Why?
> > 
> > Thing is the system has out-of-control processes created on demand.
> > so only option to prevent OOM is echo -1000 > `pidof the process`
> > since they are forked. However, I have no idea when they are forked
> > so should race with OOM with /proc polling and OOM is frequently
> > ahead of me.
> 
> I am still confused. Why would you want all/most processes to be hidden
> from the oom killer?

If one of processes in the system is killed, the memory pressure
disappear.

>  
> > > > This knob is handy to keep system memory pressure.
> > > 
> > > This sounds like a very dubious reason to introduce a knob to cripple
> > > the system.
> > > 
> > > I can see some reason to control the oom handling policy because the
> > > effect of the oom killer is really disruptive but a global on/off switch
> > > sounds like a too coarse interface. Really what kind of production
> > > environment would ever go with oom killer disabled completely?
> > 
> > I don't think shipping production system will use it. It would be
> > just testing only option.
> 
> Then it doesn't really belong to the kernel IMHO.
> 
> > My intention uses such heavy memory load to see various system behaviors
> > before the production launching because it usually happens in real workload
> > once we shipped but hard to generate such a corner case without artificial
> > memory pressure.
> 
> But changing the oom behavior will result in a completely different
> system behavior. So you would be testing something that doesn't really
> happen in any production system.

Since OOM is not instantly reacted, it still provides a good chance how
the system reacts on such memory pressure until someeone releases the
memory. For example, Android already gives lots of system processes to 
-1000 which could effectively disable the OOM at certain point.

> 
> > Any suggestion?
> 
> Not really because I still do not understand your objective. You can
> generate memory pressure and tune it up for specific testing scenario.
> Sure there will be a some interference from the background noise (kernel
> subsystems reacting to external events, processes created etc.) but why
> that is a problem? This is normal to any running system.

Putting more pressure from background processes is okay for the goal but
not okay for relieving th memory pressure since we lost the testing
environment.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-09 16:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06 20:32 [PATCH] mm: introduce oom_kill_disable sysctl knob Minchan Kim
2020-11-06 20:46 ` Randy Dunlap
2020-11-06 22:59   ` Minchan Kim
2020-11-09  7:37 ` Michal Hocko
2020-11-09 15:39   ` Minchan Kim
2020-11-09 16:06     ` Michal Hocko
2020-11-09 16:27       ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).