From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> To: Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org, Kirill Tkhai <ktkhai@virtuozzo.com>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [PATCH] memcg: killed threads should not invoke memcg OOM killer Date: Wed, 9 Jan 2019 19:56:57 +0900 [thread overview] Message-ID: <935ae77c-9663-c3a4-c73a-fa69f9a3065f@i-love.sakura.ne.jp> (raw) In-Reply-To: <20190107133720.GH31793@dhcp22.suse.cz> On 2019/01/07 22:37, Michal Hocko wrote: > On Mon 07-01-19 22:07:43, Tetsuo Handa wrote: >> On 2019/01/07 20:41, Michal Hocko wrote: >>> On Sun 06-01-19 15:02:24, Tetsuo Handa wrote: >>>> Michal and Johannes, can we please stop this stupid behavior now? >>> >>> I have proposed a patch with a much more limited scope which is still >>> waiting for feedback. I haven't heard it wouldn't be working so far. >>> >> >> You mean >> >> mutex_lock_killable would take care of exiting task already. I would >> then still prefer to check for mark_oom_victim because that is not racy >> with the exit path clearing signals. I can update my patch to use >> _killable lock variant if we are really going with the memcg specific >> fix. >> >> ? No response for two months. > > I mean http://lkml.kernel.org/r/20181022071323.9550-1-mhocko@kernel.org > which has died in nit picking. I am not very interested to go back there > and spend a lot of time with it again. If you do not respect my opinion > as the maintainer of this code then find somebody else to push it > through. > OK. It turned out that Michal's comment is independent with this patch. We can apply both Michal's patch and my patch, and here is my patch. >From 0fb58415770a83d6c40d471e1840f8bc4a35ca83 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 26 Dec 2018 19:13:35 +0900 Subject: [PATCH] memcg: killed threads should not invoke memcg OOM killer If $N > $M, a single process with $N threads in a memcg group can easily kill all $M processes in that memcg group, for mem_cgroup_out_of_memory() does not check if current thread needs to invoke the memcg OOM killer. T1@P1 |T2...$N@P1|P2...$M |OOM reaper ----------+----------+----------+---------- # all sleeping try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) out_of_memory() select_bad_process() oom_kill_process(P1) wake_oom_reaper() oom_reap_task() # ignores P1 mutex_unlock(oom_lock) out_of_memory() select_bad_process(P2...$M) # all killed by T2...$N@P1 wake_oom_reaper() oom_reap_task() # ignores P2...$M mutex_unlock(oom_lock) We don't need to invoke the memcg OOM killer if current thread was killed when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) and memory_max_write() can bail out upon SIGKILL, and try_charge() allows already killed/exiting threads to make forward progress. If memcg OOM events in different domains are pending, already OOM-killed threads needlessly wait for pending memcg OOM events in different domains. An out_of_memory() call is slow because it involves printk(). With slow serial consoles, out_of_memory() might take more than a second. Therefore, allowing killed processes to quickly call mmput() from exit_mm() from do_exit() will help calling __mmput() (which can reclaim more memory than the OOM reaper can reclaim) quickly. At first Michal thought that fatal signal check is racy compared to tsk_is_oom_victim() check. But actually there is no such race, for by the moment mutex_unlock(&oom_lock) is called after returning from out_of_memory(), fatal_signal_pending() == F && tsk_is_oom_victim() == T can't happen if current thread is holding oom_lock inside mem_cgroup_out_of_memory(). On the other hand, fatal_signal_pending() == T && tsk_is_oom_victim() == F can happen, and bailing out upon that condition will save some process from needlessly being OOM-killed. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- mm/memcontrol.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b860dd4f7..b0d3bf3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1389,8 +1389,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, }; bool ret; - mutex_lock(&oom_lock); - ret = out_of_memory(&oc); + if (mutex_lock_killable(&oom_lock)) + return true; + /* + * A few threads which were not waiting at mutex_lock_killable() can + * fail to bail out. Therefore, check again after holding oom_lock. + */ + ret = fatal_signal_pending(current) || out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; } -- 1.8.3.1
WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> To: Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org, Kirill Tkhai <ktkhai@virtuozzo.com>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [PATCH] memcg: killed threads should not invoke memcg OOM killer Date: Wed, 9 Jan 2019 19:56:57 +0900 [thread overview] Message-ID: <935ae77c-9663-c3a4-c73a-fa69f9a3065f@i-love.sakura.ne.jp> (raw) Message-ID: <20190109105657.A9rqqP6me8yPS54e-PzCofrsSG3RBIW_PXTWe7aqQlw@z> (raw) In-Reply-To: <20190107133720.GH31793@dhcp22.suse.cz> On 2019/01/07 22:37, Michal Hocko wrote: > On Mon 07-01-19 22:07:43, Tetsuo Handa wrote: >> On 2019/01/07 20:41, Michal Hocko wrote: >>> On Sun 06-01-19 15:02:24, Tetsuo Handa wrote: >>>> Michal and Johannes, can we please stop this stupid behavior now? >>> >>> I have proposed a patch with a much more limited scope which is still >>> waiting for feedback. I haven't heard it wouldn't be working so far. >>> >> >> You mean >> >> mutex_lock_killable would take care of exiting task already. I would >> then still prefer to check for mark_oom_victim because that is not racy >> with the exit path clearing signals. I can update my patch to use >> _killable lock variant if we are really going with the memcg specific >> fix. >> >> ? No response for two months. > > I mean http://lkml.kernel.org/r/20181022071323.9550-1-mhocko@kernel.org > which has died in nit picking. I am not very interested to go back there > and spend a lot of time with it again. If you do not respect my opinion > as the maintainer of this code then find somebody else to push it > through. > OK. It turned out that Michal's comment is independent with this patch. We can apply both Michal's patch and my patch, and here is my patch. From 0fb58415770a83d6c40d471e1840f8bc4a35ca83 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 26 Dec 2018 19:13:35 +0900 Subject: [PATCH] memcg: killed threads should not invoke memcg OOM killer If $N > $M, a single process with $N threads in a memcg group can easily kill all $M processes in that memcg group, for mem_cgroup_out_of_memory() does not check if current thread needs to invoke the memcg OOM killer. T1@P1 |T2...$N@P1|P2...$M |OOM reaper ----------+----------+----------+---------- # all sleeping try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) out_of_memory() select_bad_process() oom_kill_process(P1) wake_oom_reaper() oom_reap_task() # ignores P1 mutex_unlock(oom_lock) out_of_memory() select_bad_process(P2...$M) # all killed by T2...$N@P1 wake_oom_reaper() oom_reap_task() # ignores P2...$M mutex_unlock(oom_lock) We don't need to invoke the memcg OOM killer if current thread was killed when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) and memory_max_write() can bail out upon SIGKILL, and try_charge() allows already killed/exiting threads to make forward progress. If memcg OOM events in different domains are pending, already OOM-killed threads needlessly wait for pending memcg OOM events in different domains. An out_of_memory() call is slow because it involves printk(). With slow serial consoles, out_of_memory() might take more than a second. Therefore, allowing killed processes to quickly call mmput() from exit_mm() from do_exit() will help calling __mmput() (which can reclaim more memory than the OOM reaper can reclaim) quickly. At first Michal thought that fatal signal check is racy compared to tsk_is_oom_victim() check. But actually there is no such race, for by the moment mutex_unlock(&oom_lock) is called after returning from out_of_memory(), fatal_signal_pending() == F && tsk_is_oom_victim() == T can't happen if current thread is holding oom_lock inside mem_cgroup_out_of_memory(). On the other hand, fatal_signal_pending() == T && tsk_is_oom_victim() == F can happen, and bailing out upon that condition will save some process from needlessly being OOM-killed. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- mm/memcontrol.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b860dd4f7..b0d3bf3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1389,8 +1389,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, }; bool ret; - mutex_lock(&oom_lock); - ret = out_of_memory(&oc); + if (mutex_lock_killable(&oom_lock)) + return true; + /* + * A few threads which were not waiting at mutex_lock_killable() can + * fail to bail out. Therefore, check again after holding oom_lock. + */ + ret = fatal_signal_pending(current) || out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; } -- 1.8.3.1
next prev parent reply other threads:[~2019-01-09 10:57 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-26 10:13 [PATCH] memcg: killed threads should not invoke memcg OOM killer Tetsuo Handa 2018-12-28 10:22 ` Kirill Tkhai 2018-12-28 11:00 ` Tetsuo Handa 2018-12-28 11:28 ` Kirill Tkhai 2019-01-06 6:02 ` Tetsuo Handa 2019-01-06 6:02 ` Tetsuo Handa 2019-01-07 11:41 ` Michal Hocko 2019-01-07 13:07 ` Tetsuo Handa 2019-01-07 13:37 ` Michal Hocko 2019-01-07 14:20 ` Tetsuo Handa 2019-01-09 10:56 ` Tetsuo Handa [this message] 2019-01-09 10:56 ` Tetsuo Handa 2019-01-15 10:17 ` [PATCH v2] " Tetsuo Handa 2019-01-15 11:55 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=935ae77c-9663-c3a4-c73a-fa69f9a3065f@i-love.sakura.ne.jp \ --to=penguin-kernel@i-love.sakura.ne.jp \ --cc=akpm@linux-foundation.org \ --cc=hannes@cmpxchg.org \ --cc=ktkhai@virtuozzo.com \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=rientjes@google.com \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.