All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, oom: remove sleep from under oom_lock
@ 2018-07-09  7:47 ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2018-07-09  7:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tetsuo Handa, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
munlock and oom reaper unmap, v3") we have a strong synchronization
between the oom_killer and victim's exiting because both have to take
the oom_lock. Therefore the original heuristic to sleep for a short time
in out_of_memory doesn't serve the original purpose.

Moreover Tetsuo has noticed that the short sleep can be more harmful
than actually useful. Hammering the system with many processes can lead
to a starvation when the task holding the oom_lock can block for a
long time (minutes) and block any further progress because the
oom_reaper depends on the oom_lock as well.

Drop the short sleep from out_of_memory when we hold the lock. Keep the
sleep when the trylock fails to throttle the concurrent OOM paths a bit.
This should be solved in a more reasonable way (e.g. sleep proportional
to the time spent in the active reclaiming etc.) but this is much more
complex thing to achieve. This is a quick fixup to remove a stale code.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/oom_kill.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb88cf58..ed9d473c571e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1077,15 +1077,9 @@ bool out_of_memory(struct oom_control *oc)
 		dump_header(oc, NULL);
 		panic("Out of memory and no killable processes...\n");
 	}
-	if (oc->chosen && oc->chosen != (void *)-1UL) {
+	if (oc->chosen && oc->chosen != (void *)-1UL)
 		oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
 				 "Memory cgroup out of memory");
-		/*
-		 * Give the killed process a good chance to exit before trying
-		 * to allocate memory again.
-		 */
-		schedule_timeout_killable(1);
-	}
 	return !!oc->chosen;
 }
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] mm, oom: remove sleep from under oom_lock
@ 2018-07-09  7:47 ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2018-07-09  7:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tetsuo Handa, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
munlock and oom reaper unmap, v3") we have a strong synchronization
between the oom_killer and victim's exiting because both have to take
the oom_lock. Therefore the original heuristic to sleep for a short time
in out_of_memory doesn't serve the original purpose.

Moreover Tetsuo has noticed that the short sleep can be more harmful
than actually useful. Hammering the system with many processes can lead
to a starvation when the task holding the oom_lock can block for a
long time (minutes) and block any further progress because the
oom_reaper depends on the oom_lock as well.

Drop the short sleep from out_of_memory when we hold the lock. Keep the
sleep when the trylock fails to throttle the concurrent OOM paths a bit.
This should be solved in a more reasonable way (e.g. sleep proportional
to the time spent in the active reclaiming etc.) but this is much more
complex thing to achieve. This is a quick fixup to remove a stale code.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/oom_kill.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb88cf58..ed9d473c571e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1077,15 +1077,9 @@ bool out_of_memory(struct oom_control *oc)
 		dump_header(oc, NULL);
 		panic("Out of memory and no killable processes...\n");
 	}
-	if (oc->chosen && oc->chosen != (void *)-1UL) {
+	if (oc->chosen && oc->chosen != (void *)-1UL)
 		oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
 				 "Memory cgroup out of memory");
-		/*
-		 * Give the killed process a good chance to exit before trying
-		 * to allocate memory again.
-		 */
-		schedule_timeout_killable(1);
-	}
 	return !!oc->chosen;
 }
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm, oom: remove sleep from under oom_lock
  2018-07-09  7:47 ` Michal Hocko
  (?)
@ 2018-07-09 22:49 ` David Rientjes
  2018-07-10  9:43   ` Michal Hocko
  -1 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2018-07-09 22:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Tetsuo Handa, linux-mm, LKML, Michal Hocko

On Mon, 9 Jul 2018, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
> munlock and oom reaper unmap, v3") we have a strong synchronization
> between the oom_killer and victim's exiting because both have to take
> the oom_lock. Therefore the original heuristic to sleep for a short time
> in out_of_memory doesn't serve the original purpose.
> 
> Moreover Tetsuo has noticed that the short sleep can be more harmful
> than actually useful. Hammering the system with many processes can lead
> to a starvation when the task holding the oom_lock can block for a
> long time (minutes) and block any further progress because the
> oom_reaper depends on the oom_lock as well.
> 
> Drop the short sleep from out_of_memory when we hold the lock. Keep the
> sleep when the trylock fails to throttle the concurrent OOM paths a bit.
> This should be solved in a more reasonable way (e.g. sleep proportional
> to the time spent in the active reclaiming etc.) but this is much more
> complex thing to achieve. This is a quick fixup to remove a stale code.
> 
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

This reminds me:

mm/oom_kill.c

 54) int sysctl_oom_dump_tasks = 1;
 55) 
 56) DEFINE_MUTEX(oom_lock);
 57) 
 58) #ifdef CONFIG_NUMA

Would you mind documenting oom_lock to specify what it's protecting?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm, oom: remove sleep from under oom_lock
  2018-07-09 22:49 ` David Rientjes
@ 2018-07-10  9:43   ` Michal Hocko
  2018-07-10 18:55     ` David Rientjes
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2018-07-10  9:43 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Tetsuo Handa, linux-mm, LKML

On Mon 09-07-18 15:49:53, David Rientjes wrote:
> On Mon, 9 Jul 2018, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
> > munlock and oom reaper unmap, v3") we have a strong synchronization
> > between the oom_killer and victim's exiting because both have to take
> > the oom_lock. Therefore the original heuristic to sleep for a short time
> > in out_of_memory doesn't serve the original purpose.
> > 
> > Moreover Tetsuo has noticed that the short sleep can be more harmful
> > than actually useful. Hammering the system with many processes can lead
> > to a starvation when the task holding the oom_lock can block for a
> > long time (minutes) and block any further progress because the
> > oom_reaper depends on the oom_lock as well.
> > 
> > Drop the short sleep from out_of_memory when we hold the lock. Keep the
> > sleep when the trylock fails to throttle the concurrent OOM paths a bit.
> > This should be solved in a more reasonable way (e.g. sleep proportional
> > to the time spent in the active reclaiming etc.) but this is much more
> > complex thing to achieve. This is a quick fixup to remove a stale code.
> > 
> > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> This reminds me:
> 
> mm/oom_kill.c
> 
>  54) int sysctl_oom_dump_tasks = 1;
>  55) 
>  56) DEFINE_MUTEX(oom_lock);
>  57) 
>  58) #ifdef CONFIG_NUMA
> 
> Would you mind documenting oom_lock to specify what it's protecting?

What do you think about the following?

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ed9d473c571e..32e6f7becb40 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -53,6 +53,14 @@ int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 
+/*
+ * Serializes oom killer invocations (out_of_memory()) from all contexts to
+ * prevent from over eager oom killing (e.g. when the oom killer is invoked
+ * from different domains).
+ *
+ * oom_killer_disable() relies on this lock to stabilize oom_killer_disabled
+ * and mark_oom_victim
+ */
 DEFINE_MUTEX(oom_lock);
 
 #ifdef CONFIG_NUMA
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm, oom: remove sleep from under oom_lock
  2018-07-10  9:43   ` Michal Hocko
@ 2018-07-10 18:55     ` David Rientjes
  2018-07-10 21:12       ` David Rientjes
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2018-07-10 18:55 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Tetsuo Handa, linux-mm, LKML

On Tue, 10 Jul 2018, Michal Hocko wrote:

> What do you think about the following?
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ed9d473c571e..32e6f7becb40 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -53,6 +53,14 @@ int sysctl_panic_on_oom;
>  int sysctl_oom_kill_allocating_task;
>  int sysctl_oom_dump_tasks = 1;
>  
> +/*
> + * Serializes oom killer invocations (out_of_memory()) from all contexts to
> + * prevent from over eager oom killing (e.g. when the oom killer is invoked
> + * from different domains).
> + *
> + * oom_killer_disable() relies on this lock to stabilize oom_killer_disabled
> + * and mark_oom_victim
> + */
>  DEFINE_MUTEX(oom_lock);
>  
>  #ifdef CONFIG_NUMA

I think it's better, thanks.  However, does it address the question about 
why __oom_reap_task_mm() needs oom_lock protection?  Perhaps it would be 
helpful to mention synchronization between reaping triggered from 
oom_reaper and by exit_mmap().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm, oom: remove sleep from under oom_lock
  2018-07-10 18:55     ` David Rientjes
@ 2018-07-10 21:12       ` David Rientjes
  2018-07-11  8:59         ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2018-07-10 21:12 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Tetsuo Handa, linux-mm, LKML

On Tue, 10 Jul 2018, David Rientjes wrote:

> I think it's better, thanks.  However, does it address the question about 
> why __oom_reap_task_mm() needs oom_lock protection?  Perhaps it would be 
> helpful to mention synchronization between reaping triggered from 
> oom_reaper and by exit_mmap().
> 

Actually, can't we remove the need to take oom_lock in exit_mmap() if 
__oom_reap_task_mm() can do a test and set on MMF_UNSTABLE and, if already 
set, bail out immediately?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm, oom: remove sleep from under oom_lock
  2018-07-10 21:12       ` David Rientjes
@ 2018-07-11  8:59         ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2018-07-11  8:59 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Tetsuo Handa, linux-mm, LKML

On Tue 10-07-18 14:12:28, David Rientjes wrote:
> On Tue, 10 Jul 2018, David Rientjes wrote:
> 
> > I think it's better, thanks.  However, does it address the question about 
> > why __oom_reap_task_mm() needs oom_lock protection?  Perhaps it would be 
> > helpful to mention synchronization between reaping triggered from 
> > oom_reaper and by exit_mmap().
> > 
> 
> Actually, can't we remove the need to take oom_lock in exit_mmap() if 
> __oom_reap_task_mm() can do a test and set on MMF_UNSTABLE and, if already 
> set, bail out immediately?

I think we do not really depend on oom_lock anymore in
__oom_reap_task_mm.  The race it was original added for (mmget_not_zero
vs. exit path) is no longer a problem. I didn't really get to evaluate
it deeper though. There are just too many things going on in parallel.

Tetsuo was proposing some patches to remove the lock but those patches
had some other problems. If we have a simple patch to remove the
oom_lock from the oom reaper then I will review it. I am not sure I can
come up with a patch myself in few days.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-11  8:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09  7:47 [PATCH] mm, oom: remove sleep from under oom_lock Michal Hocko
2018-07-09  7:47 ` Michal Hocko
2018-07-09 22:49 ` David Rientjes
2018-07-10  9:43   ` Michal Hocko
2018-07-10 18:55     ` David Rientjes
2018-07-10 21:12       ` David Rientjes
2018-07-11  8:59         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.