All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, Kirill Tkhai <ktkhai@virtuozzo.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] memcg: killed threads should not invoke memcg OOM killer
Date: Wed, 9 Jan 2019 19:56:57 +0900	[thread overview]
Message-ID: <935ae77c-9663-c3a4-c73a-fa69f9a3065f@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20190107133720.GH31793@dhcp22.suse.cz>

On 2019/01/07 22:37, Michal Hocko wrote:
> On Mon 07-01-19 22:07:43, Tetsuo Handa wrote:
>> On 2019/01/07 20:41, Michal Hocko wrote:
>>> On Sun 06-01-19 15:02:24, Tetsuo Handa wrote:
>>>> Michal and Johannes, can we please stop this stupid behavior now?
>>>
>>> I have proposed a patch with a much more limited scope which is still
>>> waiting for feedback. I haven't heard it wouldn't be working so far.
>>>
>>
>> You mean
>>
>>   mutex_lock_killable would take care of exiting task already. I would
>>   then still prefer to check for mark_oom_victim because that is not racy
>>   with the exit path clearing signals. I can update my patch to use
>>   _killable lock variant if we are really going with the memcg specific
>>   fix.
>>
>> ? No response for two months.
> 
> I mean http://lkml.kernel.org/r/20181022071323.9550-1-mhocko@kernel.org
> which has died in nit picking. I am not very interested to go back there
> and spend a lot of time with it again. If you do not respect my opinion
> as the maintainer of this code then find somebody else to push it
> through.
> 

OK. It turned out that Michal's comment is independent with this patch.
We can apply both Michal's patch and my patch, and here is my patch.

>From 0fb58415770a83d6c40d471e1840f8bc4a35ca83 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 26 Dec 2018 19:13:35 +0900
Subject: [PATCH] memcg: killed threads should not invoke memcg OOM killer

If $N > $M, a single process with $N threads in a memcg group can easily
kill all $M processes in that memcg group, for mem_cgroup_out_of_memory()
does not check if current thread needs to invoke the memcg OOM killer.

  T1@P1     |T2...$N@P1|P2...$M   |OOM reaper
  ----------+----------+----------+----------
                        # all sleeping
  try_charge()
    mem_cgroup_out_of_memory()
      mutex_lock(oom_lock)
             try_charge()
               mem_cgroup_out_of_memory()
                 mutex_lock(oom_lock)
      out_of_memory()
        select_bad_process()
        oom_kill_process(P1)
        wake_oom_reaper()
                                   oom_reap_task() # ignores P1
      mutex_unlock(oom_lock)
                 out_of_memory()
                   select_bad_process(P2...$M)
                        # all killed by T2...$N@P1
                   wake_oom_reaper()
                                   oom_reap_task() # ignores P2...$M
                 mutex_unlock(oom_lock)

We don't need to invoke the memcg OOM killer if current thread was killed
when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) and
memory_max_write() can bail out upon SIGKILL, and try_charge() allows
already killed/exiting threads to make forward progress.

If memcg OOM events in different domains are pending, already OOM-killed
threads needlessly wait for pending memcg OOM events in different domains.
An out_of_memory() call is slow because it involves printk(). With slow
serial consoles, out_of_memory() might take more than a second. Therefore,
allowing killed processes to quickly call mmput() from exit_mm() from
do_exit() will help calling __mmput() (which can reclaim more memory than
the OOM reaper can reclaim) quickly.

At first Michal thought that fatal signal check is racy compared to
tsk_is_oom_victim() check. But actually there is no such race, for
by the moment mutex_unlock(&oom_lock) is called after returning from
out_of_memory(), fatal_signal_pending() == F && tsk_is_oom_victim() == T
can't happen if current thread is holding oom_lock inside
mem_cgroup_out_of_memory(). On the other hand,
fatal_signal_pending() == T && tsk_is_oom_victim() == F can happen, and
bailing out upon that condition will save some process from needlessly
being OOM-killed.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/memcontrol.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b860dd4f7..b0d3bf3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1389,8 +1389,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	};
 	bool ret;
 
-	mutex_lock(&oom_lock);
-	ret = out_of_memory(&oc);
+	if (mutex_lock_killable(&oom_lock))
+		return true;
+	/*
+	 * A few threads which were not waiting at mutex_lock_killable() can
+	 * fail to bail out. Therefore, check again after holding oom_lock.
+	 */
+	ret = fatal_signal_pending(current) || out_of_memory(&oc);
 	mutex_unlock(&oom_lock);
 	return ret;
 }
-- 
1.8.3.1

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, Kirill Tkhai <ktkhai@virtuozzo.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] memcg: killed threads should not invoke memcg OOM killer
Date: Wed, 9 Jan 2019 19:56:57 +0900	[thread overview]
Message-ID: <935ae77c-9663-c3a4-c73a-fa69f9a3065f@i-love.sakura.ne.jp> (raw)
Message-ID: <20190109105657.A9rqqP6me8yPS54e-PzCofrsSG3RBIW_PXTWe7aqQlw@z> (raw)
In-Reply-To: <20190107133720.GH31793@dhcp22.suse.cz>

On 2019/01/07 22:37, Michal Hocko wrote:
> On Mon 07-01-19 22:07:43, Tetsuo Handa wrote:
>> On 2019/01/07 20:41, Michal Hocko wrote:
>>> On Sun 06-01-19 15:02:24, Tetsuo Handa wrote:
>>>> Michal and Johannes, can we please stop this stupid behavior now?
>>>
>>> I have proposed a patch with a much more limited scope which is still
>>> waiting for feedback. I haven't heard it wouldn't be working so far.
>>>
>>
>> You mean
>>
>>   mutex_lock_killable would take care of exiting task already. I would
>>   then still prefer to check for mark_oom_victim because that is not racy
>>   with the exit path clearing signals. I can update my patch to use
>>   _killable lock variant if we are really going with the memcg specific
>>   fix.
>>
>> ? No response for two months.
> 
> I mean http://lkml.kernel.org/r/20181022071323.9550-1-mhocko@kernel.org
> which has died in nit picking. I am not very interested to go back there
> and spend a lot of time with it again. If you do not respect my opinion
> as the maintainer of this code then find somebody else to push it
> through.
> 

OK. It turned out that Michal's comment is independent with this patch.
We can apply both Michal's patch and my patch, and here is my patch.

From 0fb58415770a83d6c40d471e1840f8bc4a35ca83 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 26 Dec 2018 19:13:35 +0900
Subject: [PATCH] memcg: killed threads should not invoke memcg OOM killer

If $N > $M, a single process with $N threads in a memcg group can easily
kill all $M processes in that memcg group, for mem_cgroup_out_of_memory()
does not check if current thread needs to invoke the memcg OOM killer.

  T1@P1     |T2...$N@P1|P2...$M   |OOM reaper
  ----------+----------+----------+----------
                        # all sleeping
  try_charge()
    mem_cgroup_out_of_memory()
      mutex_lock(oom_lock)
             try_charge()
               mem_cgroup_out_of_memory()
                 mutex_lock(oom_lock)
      out_of_memory()
        select_bad_process()
        oom_kill_process(P1)
        wake_oom_reaper()
                                   oom_reap_task() # ignores P1
      mutex_unlock(oom_lock)
                 out_of_memory()
                   select_bad_process(P2...$M)
                        # all killed by T2...$N@P1
                   wake_oom_reaper()
                                   oom_reap_task() # ignores P2...$M
                 mutex_unlock(oom_lock)

We don't need to invoke the memcg OOM killer if current thread was killed
when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) and
memory_max_write() can bail out upon SIGKILL, and try_charge() allows
already killed/exiting threads to make forward progress.

If memcg OOM events in different domains are pending, already OOM-killed
threads needlessly wait for pending memcg OOM events in different domains.
An out_of_memory() call is slow because it involves printk(). With slow
serial consoles, out_of_memory() might take more than a second. Therefore,
allowing killed processes to quickly call mmput() from exit_mm() from
do_exit() will help calling __mmput() (which can reclaim more memory than
the OOM reaper can reclaim) quickly.

At first Michal thought that fatal signal check is racy compared to
tsk_is_oom_victim() check. But actually there is no such race, for
by the moment mutex_unlock(&oom_lock) is called after returning from
out_of_memory(), fatal_signal_pending() == F && tsk_is_oom_victim() == T
can't happen if current thread is holding oom_lock inside
mem_cgroup_out_of_memory(). On the other hand,
fatal_signal_pending() == T && tsk_is_oom_victim() == F can happen, and
bailing out upon that condition will save some process from needlessly
being OOM-killed.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/memcontrol.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b860dd4f7..b0d3bf3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1389,8 +1389,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	};
 	bool ret;
 
-	mutex_lock(&oom_lock);
-	ret = out_of_memory(&oc);
+	if (mutex_lock_killable(&oom_lock))
+		return true;
+	/*
+	 * A few threads which were not waiting at mutex_lock_killable() can
+	 * fail to bail out. Therefore, check again after holding oom_lock.
+	 */
+	ret = fatal_signal_pending(current) || out_of_memory(&oc);
 	mutex_unlock(&oom_lock);
 	return ret;
 }
-- 
1.8.3.1


  parent reply	other threads:[~2019-01-09 10:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-26 10:13 [PATCH] memcg: killed threads should not invoke memcg OOM killer Tetsuo Handa
2018-12-28 10:22 ` Kirill Tkhai
2018-12-28 11:00   ` Tetsuo Handa
2018-12-28 11:28     ` Kirill Tkhai
2019-01-06  6:02 ` Tetsuo Handa
2019-01-06  6:02   ` Tetsuo Handa
2019-01-07 11:41   ` Michal Hocko
2019-01-07 13:07     ` Tetsuo Handa
2019-01-07 13:37       ` Michal Hocko
2019-01-07 14:20         ` Tetsuo Handa
2019-01-09 10:56         ` Tetsuo Handa [this message]
2019-01-09 10:56           ` Tetsuo Handa
2019-01-15 10:17           ` [PATCH v2] " Tetsuo Handa
2019-01-15 11:55             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=935ae77c-9663-c3a4-c73a-fa69f9a3065f@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.