* [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM @ 2019-01-07 14:38 Michal Hocko 2019-01-07 14:38 ` [PATCH 1/2] mm, oom: marks all killed tasks as oom victims Michal Hocko ` (3 more replies) 0 siblings, 4 replies; 28+ messages in thread From: Michal Hocko @ 2019-01-07 14:38 UTC (permalink / raw) To: linux-mm; +Cc: Tetsuo Handa, Johannes Weiner, Andrew Morton, LKML Hi, I have posted this as an RFC previously [1]. Tetsuo has pointed out some issues with the patch 1 which I have fixed hopefully. Other than that this is just a rebase on top of Linus tree. The original cover: this is a follow up for [2] which has been nacked mostly because Tetsuo was able to find a simple workload which can trigger a race where no-eligible task is reported without a good reason. I believe the patch2 addresses that issue and we do not have to play dirty games with throttling just because of the race. I still believe that patch proposed in [2] is a useful one but this can be addressed later. This series comprises 2 patch. The first one is something I meant to do loooong time ago, I just never have time to do that. We need it here to handle CLONE_VM without CLONE_SIGHAND cases. The second patch closes the race. Feedback is appreciated of course. [1] http://lkml.kernel.org/r/20181022071323.9550-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/20181010151135.25766-1-mhocko@kernel.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/2] mm, oom: marks all killed tasks as oom victims 2019-01-07 14:38 [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko @ 2019-01-07 14:38 ` Michal Hocko 2019-01-07 20:58 ` Tetsuo Handa 2019-01-07 14:38 ` [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks Michal Hocko ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-07 14:38 UTC (permalink / raw) To: linux-mm; +Cc: Tetsuo Handa, Johannes Weiner, Andrew Morton, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> Historically we have called mark_oom_victim only to the main task selected as the oom victim because oom victims have access to memory reserves and granting the access to all killed tasks could deplete memory reserves very quickly and cause even larger problems. Since only a partial access to memory reserves is allowed there is no longer this risk and so all tasks killed along with the oom victim can be considered as well. The primary motivation for that is that process groups which do not shared signals would behave more like standard thread groups wrt oom handling (aka tsk_is_oom_victim will work the same way for them). - Use find_lock_task_mm to stabilize mm as suggested by Tetsuo Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/oom_kill.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f0e8cd9edb1a..0246c7a4e44e 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -892,6 +892,7 @@ static void __oom_kill_process(struct task_struct *victim) */ rcu_read_lock(); for_each_process(p) { + struct task_struct *t; if (!process_shares_mm(p, mm)) continue; if (same_thread_group(p, victim)) @@ -911,6 +912,11 @@ static void __oom_kill_process(struct task_struct *victim) if (unlikely(p->flags & PF_KTHREAD)) continue; do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID); + t = find_lock_task_mm(p); + if (!t) + continue; + mark_oom_victim(t); + task_unlock(t); } rcu_read_unlock(); -- 2.20.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] mm, oom: marks all killed tasks as oom victims 2019-01-07 14:38 ` [PATCH 1/2] mm, oom: marks all killed tasks as oom victims Michal Hocko @ 2019-01-07 20:58 ` Tetsuo Handa 2019-01-08 8:11 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-07 20:58 UTC (permalink / raw) To: Michal Hocko, linux-mm; +Cc: Johannes Weiner, Andrew Morton, LKML, Michal Hocko On 2019/01/07 23:38, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Historically we have called mark_oom_victim only to the main task > selected as the oom victim because oom victims have access to memory > reserves and granting the access to all killed tasks could deplete > memory reserves very quickly and cause even larger problems. > > Since only a partial access to memory reserves is allowed there is no > longer this risk and so all tasks killed along with the oom victim > can be considered as well. > > The primary motivation for that is that process groups which do not > shared signals would behave more like standard thread groups wrt oom > handling (aka tsk_is_oom_victim will work the same way for them). > > - Use find_lock_task_mm to stabilize mm as suggested by Tetsuo > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/oom_kill.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index f0e8cd9edb1a..0246c7a4e44e 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -892,6 +892,7 @@ static void __oom_kill_process(struct task_struct *victim) > */ > rcu_read_lock(); > for_each_process(p) { > + struct task_struct *t; > if (!process_shares_mm(p, mm)) > continue; > if (same_thread_group(p, victim)) > @@ -911,6 +912,11 @@ static void __oom_kill_process(struct task_struct *victim) > if (unlikely(p->flags & PF_KTHREAD)) > continue; > do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID); > + t = find_lock_task_mm(p); > + if (!t) > + continue; > + mark_oom_victim(t); > + task_unlock(t); Thank you for updating this patch. This patch is correct from the point of view of avoiding TIF_MEMDIE race. But if I recall correctly, the reason we did not do this is to avoid depleting memory reserves. And we still grant full access to memory reserves for CONFIG_MMU=n case. Shouldn't the changelog mention CONFIG_MMU=n case? > } > rcu_read_unlock(); > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] mm, oom: marks all killed tasks as oom victims 2019-01-07 20:58 ` Tetsuo Handa @ 2019-01-08 8:11 ` Michal Hocko 0 siblings, 0 replies; 28+ messages in thread From: Michal Hocko @ 2019-01-08 8:11 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-mm, Johannes Weiner, Andrew Morton, LKML On Tue 08-01-19 05:58:41, Tetsuo Handa wrote: > On 2019/01/07 23:38, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Historically we have called mark_oom_victim only to the main task > > selected as the oom victim because oom victims have access to memory > > reserves and granting the access to all killed tasks could deplete > > memory reserves very quickly and cause even larger problems. > > > > Since only a partial access to memory reserves is allowed there is no > > longer this risk and so all tasks killed along with the oom victim > > can be considered as well. > > > > The primary motivation for that is that process groups which do not > > shared signals would behave more like standard thread groups wrt oom > > handling (aka tsk_is_oom_victim will work the same way for them). > > > > - Use find_lock_task_mm to stabilize mm as suggested by Tetsuo > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > mm/oom_kill.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index f0e8cd9edb1a..0246c7a4e44e 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -892,6 +892,7 @@ static void __oom_kill_process(struct task_struct *victim) > > */ > > rcu_read_lock(); > > for_each_process(p) { > > + struct task_struct *t; > > if (!process_shares_mm(p, mm)) > > continue; > > if (same_thread_group(p, victim)) > > @@ -911,6 +912,11 @@ static void __oom_kill_process(struct task_struct *victim) > > if (unlikely(p->flags & PF_KTHREAD)) > > continue; > > do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID); > > + t = find_lock_task_mm(p); > > + if (!t) > > + continue; > > + mark_oom_victim(t); > > + task_unlock(t); > > Thank you for updating this patch. This patch is correct from the point of > view of avoiding TIF_MEMDIE race. But if I recall correctly, the reason we > did not do this is to avoid depleting memory reserves. And we still grant > full access to memory reserves for CONFIG_MMU=n case. Shouldn't the changelog > mention CONFIG_MMU=n case? Like so many times before. Does nommu matter in this context at all? You keep bringing it up without actually trying to understand that nommu is so special that reserves for those architectures are of very limited use. I do not really see much point mentioning nommu in every oom patch. Or do you know of a nommu oom killer bug out there? I would be more than curious. Seriously. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-07 14:38 [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko 2019-01-07 14:38 ` [PATCH 1/2] mm, oom: marks all killed tasks as oom victims Michal Hocko @ 2019-01-07 14:38 ` Michal Hocko 2019-01-07 20:59 ` Tetsuo Handa 2019-01-08 8:35 ` kbuild test robot 2019-01-08 14:21 ` [PATCH 3/2] memcg: Facilitate termination of memcg OOM victims Tetsuo Handa 2019-01-09 11:03 ` [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko 3 siblings, 2 replies; 28+ messages in thread From: Michal Hocko @ 2019-01-07 14:38 UTC (permalink / raw) To: linux-mm; +Cc: Tetsuo Handa, Johannes Weiner, Andrew Morton, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> Tetsuo has reported [1] that a single process group memcg might easily swamp the log with no-eligible oom victim reports due to race between the memcg charge and oom_reaper Thread 1 Thread2 oom_reaper try_charge try_charge mem_cgroup_out_of_memory mutex_lock(oom_lock) mem_cgroup_out_of_memory mutex_lock(oom_lock) out_of_memory select_bad_process oom_kill_process(current) wake_oom_reaper oom_reap_task MMF_OOM_SKIP->victim mutex_unlock(oom_lock) out_of_memory select_bad_process # no task If Thread1 didn't race it would bail out from try_charge and force the charge. We can achieve the same by checking tsk_is_oom_victim inside the oom_lock and therefore close the race. [1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/memcontrol.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af7f18b32389..90eb2e2093e7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .gfp_mask = gfp_mask, .order = order, }; - bool ret; + bool ret = true; mutex_lock(&oom_lock); + + /* + * multi-threaded tasks might race with oom_reaper and gain + * MMF_OOM_SKIP before reaching out_of_memory which can lead + * to out_of_memory failure if the task is the last one in + * memcg which would be a false possitive failure reported + */ + if (tsk_is_oom_victim(current)) + goto unlock; + ret = out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-07 14:38 ` [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks Michal Hocko @ 2019-01-07 20:59 ` Tetsuo Handa 2019-01-08 8:14 ` Michal Hocko 2019-01-08 8:35 ` kbuild test robot 1 sibling, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-07 20:59 UTC (permalink / raw) To: Michal Hocko, linux-mm; +Cc: Johannes Weiner, Andrew Morton, LKML, Michal Hocko On 2019/01/07 23:38, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Tetsuo has reported [1] that a single process group memcg might easily > swamp the log with no-eligible oom victim reports due to race between > the memcg charge and oom_reaper This explanation is outdated. I reported that one memcg OOM killer can kill all processes in that memcg. I expect the changelog to be updated. > > Thread 1 Thread2 oom_reaper > try_charge try_charge > mem_cgroup_out_of_memory > mutex_lock(oom_lock) > mem_cgroup_out_of_memory > mutex_lock(oom_lock) > out_of_memory > select_bad_process > oom_kill_process(current) > wake_oom_reaper > oom_reap_task > MMF_OOM_SKIP->victim > mutex_unlock(oom_lock) > out_of_memory > select_bad_process # no task > > If Thread1 didn't race it would bail out from try_charge and force the > charge. We can achieve the same by checking tsk_is_oom_victim inside > the oom_lock and therefore close the race. > > [1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index af7f18b32389..90eb2e2093e7 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > .gfp_mask = gfp_mask, > .order = order, > }; > - bool ret; > + bool ret = true; > > mutex_lock(&oom_lock); And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom victims", mark_oom_victim() will be called on current thread even if we used mutex_lock_killable(&oom_lock) here, like you said mutex_lock_killable would take care of exiting task already. I would then still prefer to check for mark_oom_victim because that is not racy with the exit path clearing signals. I can update my patch to use _killable lock variant if we are really going with the memcg specific fix. . If current thread is not yet killed by the OOM killer but can terminate without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here saves some processes. What is the race you are referring by "racy with the exit path clearing signals" ? > + > + /* > + * multi-threaded tasks might race with oom_reaper and gain > + * MMF_OOM_SKIP before reaching out_of_memory which can lead > + * to out_of_memory failure if the task is the last one in > + * memcg which would be a false possitive failure reported > + */ Not only out_of_memory() failure. Current thread needlessly tries to select next OOM victim. out_of_memory() failure is nothing but a result of no eligible candidate case. > + if (tsk_is_oom_victim(current)) > + goto unlock; > + > ret = out_of_memory(&oc); > + > +unlock: > mutex_unlock(&oom_lock); > return ret; > } > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-07 20:59 ` Tetsuo Handa @ 2019-01-08 8:14 ` Michal Hocko 2019-01-08 10:39 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-08 8:14 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-mm, Johannes Weiner, Andrew Morton, LKML On Tue 08-01-19 05:59:49, Tetsuo Handa wrote: > On 2019/01/07 23:38, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Tetsuo has reported [1] that a single process group memcg might easily > > swamp the log with no-eligible oom victim reports due to race between > > the memcg charge and oom_reaper > > This explanation is outdated. I reported that one memcg OOM killer can > kill all processes in that memcg. I expect the changelog to be updated. I am open to refinements. Any specific wording you have in mind? > > > > Thread 1 Thread2 oom_reaper > > try_charge try_charge > > mem_cgroup_out_of_memory > > mutex_lock(oom_lock) > > mem_cgroup_out_of_memory > > mutex_lock(oom_lock) > > out_of_memory > > select_bad_process > > oom_kill_process(current) > > wake_oom_reaper > > oom_reap_task > > MMF_OOM_SKIP->victim > > mutex_unlock(oom_lock) > > out_of_memory > > select_bad_process # no task > > > > If Thread1 didn't race it would bail out from try_charge and force the > > charge. We can achieve the same by checking tsk_is_oom_victim inside > > the oom_lock and therefore close the race. > > > > [1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > mm/memcontrol.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index af7f18b32389..90eb2e2093e7 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > .gfp_mask = gfp_mask, > > .order = order, > > }; > > - bool ret; > > + bool ret = true; > > > > mutex_lock(&oom_lock); > > And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom > victims", mark_oom_victim() will be called on current thread even if > we used mutex_lock_killable(&oom_lock) here, like you said > > mutex_lock_killable would take care of exiting task already. I would > then still prefer to check for mark_oom_victim because that is not racy > with the exit path clearing signals. I can update my patch to use > _killable lock variant if we are really going with the memcg specific > fix. > > . If current thread is not yet killed by the OOM killer but can terminate > without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here > saves some processes. What is the race you are referring by "racy with the > exit path clearing signals" ? This is unrelated to the patch. > > + > > + /* > > + * multi-threaded tasks might race with oom_reaper and gain > > + * MMF_OOM_SKIP before reaching out_of_memory which can lead > > + * to out_of_memory failure if the task is the last one in > > + * memcg which would be a false possitive failure reported > > + */ > > Not only out_of_memory() failure. Current thread needlessly tries to > select next OOM victim. out_of_memory() failure is nothing but a result > of no eligible candidate case. So? Let me ask again. Does this solve the issue you are seeing? I really do not want to end in nit picking endless thread again and would like to move on. > > + if (tsk_is_oom_victim(current)) > > + goto unlock; > > + > > ret = out_of_memory(&oc); > > + > > +unlock: > > mutex_unlock(&oom_lock); > > return ret; > > } > > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-08 8:14 ` Michal Hocko @ 2019-01-08 10:39 ` Tetsuo Handa 2019-01-08 11:46 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-08 10:39 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Johannes Weiner, Andrew Morton, LKML On 2019/01/08 17:14, Michal Hocko wrote: >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index af7f18b32389..90eb2e2093e7 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >>> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, >>> .gfp_mask = gfp_mask, >>> .order = order, >>> }; >>> - bool ret; >>> + bool ret = true; >>> >>> mutex_lock(&oom_lock); >> >> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom >> victims", mark_oom_victim() will be called on current thread even if >> we used mutex_lock_killable(&oom_lock) here, like you said >> >> mutex_lock_killable would take care of exiting task already. I would >> then still prefer to check for mark_oom_victim because that is not racy >> with the exit path clearing signals. I can update my patch to use >> _killable lock variant if we are really going with the memcg specific >> fix. >> >> . If current thread is not yet killed by the OOM killer but can terminate >> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here >> saves some processes. What is the race you are referring by "racy with the >> exit path clearing signals" ? > > This is unrelated to the patch. Ultimately related! This is the reasoning why your patch should be preferred over my patch. For example, if memcg OOM events in different domains are pending, already OOM-killed threads needlessly wait for pending memcg OOM events in different domains. An out_of_memory() call is slow because it involves printk(). With slow serial consoles, out_of_memory() might take more than a second. I consider that allowing killed processes to call mmput() from exit_mm() from do_exit() quickly (instead of waiting for pending memcg OOM events in different domains at mem_cgroup_out_of_memory()) helps calling __mmput() (which can reclaim more memory than the OOM reaper can reclaim) quickly. Unless what you call "racy" is problematic, I don't see reasons not to apply my patch. So, please please answer what you are referring to with "racy". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-08 10:39 ` Tetsuo Handa @ 2019-01-08 11:46 ` Michal Hocko 0 siblings, 0 replies; 28+ messages in thread From: Michal Hocko @ 2019-01-08 11:46 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-mm, Johannes Weiner, Andrew Morton, LKML On Tue 08-01-19 19:39:58, Tetsuo Handa wrote: > On 2019/01/08 17:14, Michal Hocko wrote: > >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >>> index af7f18b32389..90eb2e2093e7 100644 > >>> --- a/mm/memcontrol.c > >>> +++ b/mm/memcontrol.c > >>> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > >>> .gfp_mask = gfp_mask, > >>> .order = order, > >>> }; > >>> - bool ret; > >>> + bool ret = true; > >>> > >>> mutex_lock(&oom_lock); > >> > >> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom > >> victims", mark_oom_victim() will be called on current thread even if > >> we used mutex_lock_killable(&oom_lock) here, like you said > >> > >> mutex_lock_killable would take care of exiting task already. I would > >> then still prefer to check for mark_oom_victim because that is not racy > >> with the exit path clearing signals. I can update my patch to use > >> _killable lock variant if we are really going with the memcg specific > >> fix. > >> > >> . If current thread is not yet killed by the OOM killer but can terminate > >> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here > >> saves some processes. What is the race you are referring by "racy with the > >> exit path clearing signals" ? > > > > This is unrelated to the patch. > > Ultimately related! This is the reasoning why your patch should be preferred > over my patch. No! I've said I do not mind using mutex_lock_killable on top of this patch. I just want to have this fix minimal. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-07 14:38 ` [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks Michal Hocko 2019-01-07 20:59 ` Tetsuo Handa @ 2019-01-08 8:35 ` kbuild test robot 2019-01-08 9:39 ` Michal Hocko 1 sibling, 1 reply; 28+ messages in thread From: kbuild test robot @ 2019-01-08 8:35 UTC (permalink / raw) To: Michal Hocko Cc: kbuild-all, linux-mm, Tetsuo Handa, Johannes Weiner, Andrew Morton, LKML, Michal Hocko [-- Attachment #1: Type: text/plain, Size: 8106 bytes --] Hi Michal, I love your patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v5.0-rc1 next-20190108] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/oom-memcg-do-not-report-racy-no-eligible-OOM/20190108-092805 config: x86_64-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit include/linux/sched/mm.h:141:37: warning: dereference of noderef expression mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block >> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock vim +/__oom_kill_process +918 mm/oom_kill.c 1af8bb43 Michal Hocko 2016-07-28 845 5989ad7b Roman Gushchin 2018-08-21 846 static void __oom_kill_process(struct task_struct *victim) ^1da177e Linus Torvalds 2005-04-16 847 { 5989ad7b Roman Gushchin 2018-08-21 848 struct task_struct *p; 647f2bdf David Rientjes 2012-03-21 849 struct mm_struct *mm; bb29902a Tetsuo Handa 2016-03-25 850 bool can_oom_reap = true; ^1da177e Linus Torvalds 2005-04-16 851 6b0c81b3 David Rientjes 2012-07-31 852 p = find_lock_task_mm(victim); 6b0c81b3 David Rientjes 2012-07-31 853 if (!p) { 6b0c81b3 David Rientjes 2012-07-31 854 put_task_struct(victim); 647f2bdf David Rientjes 2012-03-21 855 return; 6b0c81b3 David Rientjes 2012-07-31 856 } else if (victim != p) { 6b0c81b3 David Rientjes 2012-07-31 857 get_task_struct(p); 6b0c81b3 David Rientjes 2012-07-31 858 put_task_struct(victim); 6b0c81b3 David Rientjes 2012-07-31 859 victim = p; 6b0c81b3 David Rientjes 2012-07-31 860 } 647f2bdf David Rientjes 2012-03-21 861 880b7689 Tetsuo Handa 2015-11-05 862 /* Get a reference to safely compare mm after task_unlock(victim) */ 647f2bdf David Rientjes 2012-03-21 863 mm = victim->mm; f1f10076 Vegard Nossum 2017-02-27 864 mmgrab(mm); 8e675f7a Konstantin Khlebnikov 2017-07-06 865 8e675f7a Konstantin Khlebnikov 2017-07-06 866 /* Raise event before sending signal: task reaper must see this */ 8e675f7a Konstantin Khlebnikov 2017-07-06 867 count_vm_event(OOM_KILL); fe6bdfc8 Roman Gushchin 2018-06-14 868 memcg_memory_event_mm(mm, MEMCG_OOM_KILL); 8e675f7a Konstantin Khlebnikov 2017-07-06 869 426fb5e7 Tetsuo Handa 2015-11-05 870 /* cd04ae1e Michal Hocko 2017-09-06 871 * We should send SIGKILL before granting access to memory reserves cd04ae1e Michal Hocko 2017-09-06 872 * in order to prevent the OOM victim from depleting the memory cd04ae1e Michal Hocko 2017-09-06 873 * reserves from the user space under its control. 426fb5e7 Tetsuo Handa 2015-11-05 874 */ 079b22dc Eric W. Biederman 2018-09-03 875 do_send_sig_info(SIGKILL, SEND_SIG_PRIV, victim, PIDTYPE_TGID); 16e95196 Johannes Weiner 2015-06-24 876 mark_oom_victim(victim); eca56ff9 Jerome Marchand 2016-01-14 877 pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", 647f2bdf David Rientjes 2012-03-21 878 task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), 647f2bdf David Rientjes 2012-03-21 879 K(get_mm_counter(victim->mm, MM_ANONPAGES)), eca56ff9 Jerome Marchand 2016-01-14 880 K(get_mm_counter(victim->mm, MM_FILEPAGES)), eca56ff9 Jerome Marchand 2016-01-14 881 K(get_mm_counter(victim->mm, MM_SHMEMPAGES))); 647f2bdf David Rientjes 2012-03-21 882 task_unlock(victim); 647f2bdf David Rientjes 2012-03-21 883 647f2bdf David Rientjes 2012-03-21 884 /* 647f2bdf David Rientjes 2012-03-21 885 * Kill all user processes sharing victim->mm in other thread groups, if 647f2bdf David Rientjes 2012-03-21 886 * any. They don't get access to memory reserves, though, to avoid 647f2bdf David Rientjes 2012-03-21 887 * depletion of all memory. This prevents mm->mmap_sem livelock when an 647f2bdf David Rientjes 2012-03-21 888 * oom killed thread cannot exit because it requires the semaphore and 647f2bdf David Rientjes 2012-03-21 889 * its contended by another thread trying to allocate memory itself. 647f2bdf David Rientjes 2012-03-21 890 * That thread will now get access to memory reserves since it has a 647f2bdf David Rientjes 2012-03-21 891 * pending fatal signal. 647f2bdf David Rientjes 2012-03-21 892 */ 4d4048be Oleg Nesterov 2014-01-21 893 rcu_read_lock(); c319025a Oleg Nesterov 2015-11-05 894 for_each_process(p) { 00508538 Michal Hocko 2019-01-07 895 struct task_struct *t; 4d7b3394 Oleg Nesterov 2015-11-05 896 if (!process_shares_mm(p, mm)) c319025a Oleg Nesterov 2015-11-05 897 continue; c319025a Oleg Nesterov 2015-11-05 898 if (same_thread_group(p, victim)) c319025a Oleg Nesterov 2015-11-05 899 continue; 1b51e65e Michal Hocko 2016-10-07 900 if (is_global_init(p)) { aac45363 Michal Hocko 2016-03-25 901 can_oom_reap = false; 862e3073 Michal Hocko 2016-10-07 902 set_bit(MMF_OOM_SKIP, &mm->flags); a373966d Michal Hocko 2016-07-28 903 pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n", a373966d Michal Hocko 2016-07-28 904 task_pid_nr(victim), victim->comm, a373966d Michal Hocko 2016-07-28 905 task_pid_nr(p), p->comm); 647f2bdf David Rientjes 2012-03-21 906 continue; aac45363 Michal Hocko 2016-03-25 907 } 1b51e65e Michal Hocko 2016-10-07 908 /* 1b51e65e Michal Hocko 2016-10-07 909 * No use_mm() user needs to read from the userspace so we are 1b51e65e Michal Hocko 2016-10-07 910 * ok to reap it. 1b51e65e Michal Hocko 2016-10-07 911 */ 1b51e65e Michal Hocko 2016-10-07 912 if (unlikely(p->flags & PF_KTHREAD)) 1b51e65e Michal Hocko 2016-10-07 913 continue; 079b22dc Eric W. Biederman 2018-09-03 914 do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID); 00508538 Michal Hocko 2019-01-07 915 t = find_lock_task_mm(p); 00508538 Michal Hocko 2019-01-07 916 if (!t) 00508538 Michal Hocko 2019-01-07 917 continue; 00508538 Michal Hocko 2019-01-07 @918 mark_oom_victim(t); 00508538 Michal Hocko 2019-01-07 919 task_unlock(t); 647f2bdf David Rientjes 2012-03-21 920 } 6b0c81b3 David Rientjes 2012-07-31 921 rcu_read_unlock(); 647f2bdf David Rientjes 2012-03-21 922 aac45363 Michal Hocko 2016-03-25 923 if (can_oom_reap) 36324a99 Michal Hocko 2016-03-25 924 wake_oom_reaper(victim); aac45363 Michal Hocko 2016-03-25 925 880b7689 Tetsuo Handa 2015-11-05 926 mmdrop(mm); 6b0c81b3 David Rientjes 2012-07-31 927 put_task_struct(victim); ^1da177e Linus Torvalds 2005-04-16 928 } 647f2bdf David Rientjes 2012-03-21 929 #undef K ^1da177e Linus Torvalds 2005-04-16 930 :::::: The code at line 918 was first introduced by commit :::::: 00508538cb045f28a2f60e5d2dff98b77b0da725 mm, oom: marks all killed tasks as oom victims :::::: TO: Michal Hocko <mhocko@suse.com> :::::: CC: 0day robot <lkp@intel.com> --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 67238 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-08 8:35 ` kbuild test robot @ 2019-01-08 9:39 ` Michal Hocko 2019-01-11 0:23 ` [kbuild-all] " Rong Chen 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-08 9:39 UTC (permalink / raw) To: kbuild test robot Cc: kbuild-all, linux-mm, Tetsuo Handa, Johannes Weiner, Andrew Morton, LKML On Tue 08-01-19 16:35:42, kbuild test robot wrote: [...] > All warnings (new ones prefixed by >>): > > include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit > include/linux/sched/mm.h:141:37: warning: dereference of noderef expression > mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock > mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block > >> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock What exactly does this warning say? I do not see anything wrong about the code. find_lock_task_mm returns a locked task when t != NULL and mark_oom_victim doesn't do anything about the locking. Am I missing something or the warning is just confused? [...] > 00508538 Michal Hocko 2019-01-07 915 t = find_lock_task_mm(p); > 00508538 Michal Hocko 2019-01-07 916 if (!t) > 00508538 Michal Hocko 2019-01-07 917 continue; > 00508538 Michal Hocko 2019-01-07 @918 mark_oom_victim(t); > 00508538 Michal Hocko 2019-01-07 919 task_unlock(t); > 647f2bdf David Rientjes 2012-03-21 920 } -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [kbuild-all] [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks 2019-01-08 9:39 ` Michal Hocko @ 2019-01-11 0:23 ` Rong Chen 0 siblings, 0 replies; 28+ messages in thread From: Rong Chen @ 2019-01-11 0:23 UTC (permalink / raw) To: Michal Hocko, kbuild test robot Cc: Tetsuo Handa, LKML, linux-mm, kbuild-all, Johannes Weiner, Andrew Morton On 01/08/2019 05:39 PM, Michal Hocko wrote: > On Tue 08-01-19 16:35:42, kbuild test robot wrote: > [...] >> All warnings (new ones prefixed by >>): >> >> include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit >> include/linux/sched/mm.h:141:37: warning: dereference of noderef expression >> mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock >> mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block >>>> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock > What exactly does this warning say? I do not see anything wrong about > the code. find_lock_task_mm returns a locked task when t != NULL and > mark_oom_victim doesn't do anything about the locking. Am I missing > something or the warning is just confused? Thanks for your reply. It looks like a false positive. We'll look into it. Best Regards, Rong Chen > > [...] >> 00508538 Michal Hocko 2019-01-07 915 t = find_lock_task_mm(p); >> 00508538 Michal Hocko 2019-01-07 916 if (!t) >> 00508538 Michal Hocko 2019-01-07 917 continue; >> 00508538 Michal Hocko 2019-01-07 @918 mark_oom_victim(t); >> 00508538 Michal Hocko 2019-01-07 919 task_unlock(t); >> 647f2bdf David Rientjes 2012-03-21 920 } ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 3/2] memcg: Facilitate termination of memcg OOM victims. 2019-01-07 14:38 [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko 2019-01-07 14:38 ` [PATCH 1/2] mm, oom: marks all killed tasks as oom victims Michal Hocko 2019-01-07 14:38 ` [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks Michal Hocko @ 2019-01-08 14:21 ` Tetsuo Handa 2019-01-08 14:38 ` Michal Hocko 2019-01-09 11:03 ` [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko 3 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-08 14:21 UTC (permalink / raw) To: Michal Hocko, linux-mm; +Cc: Johannes Weiner, Andrew Morton, LKML From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> If memcg OOM events in different domains are pending, already OOM-killed threads needlessly wait for pending memcg OOM events in different domains. An out_of_memory() call is slow because it involves printk(). With slow serial consoles, out_of_memory() might take more than a second. Therefore, allowing killed processes to quickly call mmput() from exit_mm() from do_exit() will help calling __mmput() (which can reclaim more memory than the OOM reaper can reclaim) quickly. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- mm/memcontrol.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90eb2e2..a7d3ba9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1389,14 +1389,19 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, }; bool ret = true; - mutex_lock(&oom_lock); - /* - * multi-threaded tasks might race with oom_reaper and gain - * MMF_OOM_SKIP before reaching out_of_memory which can lead - * to out_of_memory failure if the task is the last one in - * memcg which would be a false possitive failure reported + * Multi-threaded tasks might race with oom_reaper() and gain + * MMF_OOM_SKIP before reaching out_of_memory(). But if current + * thread was already killed or is ready to terminate, there is + * no need to call out_of_memory() nor wait for oom_reaoer() to + * set MMF_OOM_SKIP. These three checks minimize possibility of + * needlessly calling out_of_memory() and try to call exit_mm() + * as soon as possible. */ + if (mutex_lock_killable(&oom_lock)) + return true; + if (fatal_signal_pending(current)) + goto unlock; if (tsk_is_oom_victim(current)) goto unlock; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 3/2] memcg: Facilitate termination of memcg OOM victims. 2019-01-08 14:21 ` [PATCH 3/2] memcg: Facilitate termination of memcg OOM victims Tetsuo Handa @ 2019-01-08 14:38 ` Michal Hocko 0 siblings, 0 replies; 28+ messages in thread From: Michal Hocko @ 2019-01-08 14:38 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-mm, Johannes Weiner, Andrew Morton, LKML On Tue 08-01-19 23:21:23, Tetsuo Handa wrote: > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > If memcg OOM events in different domains are pending, already OOM-killed > threads needlessly wait for pending memcg OOM events in different domains. > An out_of_memory() call is slow because it involves printk(). With slow > serial consoles, out_of_memory() might take more than a second. Therefore, > allowing killed processes to quickly call mmput() from exit_mm() from > do_exit() will help calling __mmput() (which can reclaim more memory than > the OOM reaper can reclaim) quickly. Can you post it separately out of this thread please? It is really a separate topic and I do not want to end with back and forth without making a further progress. > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > --- > mm/memcontrol.c | 17 +++++++++++------ > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 90eb2e2..a7d3ba9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1389,14 +1389,19 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > }; > bool ret = true; > > - mutex_lock(&oom_lock); > - > /* > - * multi-threaded tasks might race with oom_reaper and gain > - * MMF_OOM_SKIP before reaching out_of_memory which can lead > - * to out_of_memory failure if the task is the last one in > - * memcg which would be a false possitive failure reported > + * Multi-threaded tasks might race with oom_reaper() and gain > + * MMF_OOM_SKIP before reaching out_of_memory(). But if current > + * thread was already killed or is ready to terminate, there is > + * no need to call out_of_memory() nor wait for oom_reaoer() to > + * set MMF_OOM_SKIP. These three checks minimize possibility of > + * needlessly calling out_of_memory() and try to call exit_mm() > + * as soon as possible. > */ > + if (mutex_lock_killable(&oom_lock)) > + return true; > + if (fatal_signal_pending(current)) > + goto unlock; > if (tsk_is_oom_victim(current)) > goto unlock; > > -- > 1.8.3.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-07 14:38 [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko ` (2 preceding siblings ...) 2019-01-08 14:21 ` [PATCH 3/2] memcg: Facilitate termination of memcg OOM victims Tetsuo Handa @ 2019-01-09 11:03 ` Michal Hocko 2019-01-09 11:34 ` Tetsuo Handa 3 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-09 11:03 UTC (permalink / raw) To: linux-mm; +Cc: Tetsuo Handa, Johannes Weiner, Andrew Morton, LKML Tetsuo, can you confirm that these two patches are fixing the issue you have reported please? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-09 11:03 ` [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko @ 2019-01-09 11:34 ` Tetsuo Handa 2019-01-09 12:02 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-09 11:34 UTC (permalink / raw) To: Michal Hocko, linux-mm; +Cc: Johannes Weiner, Andrew Morton, LKML On 2019/01/09 20:03, Michal Hocko wrote: > Tetsuo, > can you confirm that these two patches are fixing the issue you have > reported please? > My patch fixes the issue better than your "[PATCH 2/2] memcg: do not report racy no-eligible OOM tasks" does. You can post "[PATCH 1/2] mm, oom: marks all killed tasks as oom victims" based on a report that we needlessly select more OOM victims because MMF_OOM_SKIP is quickly set by the OOM reaper. In fact, updating oom_reap_task() / exit_mmap() to use mutex_lock(&oom_lock); set_bit(MMF_OOM_SKIP, &mm->flags); mutex_unlock(&oom_lock); will mostly close the race as well. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-09 11:34 ` Tetsuo Handa @ 2019-01-09 12:02 ` Michal Hocko 2019-01-10 23:59 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-09 12:02 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-mm, Johannes Weiner, Andrew Morton, LKML On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: > On 2019/01/09 20:03, Michal Hocko wrote: > > Tetsuo, > > can you confirm that these two patches are fixing the issue you have > > reported please? > > > > My patch fixes the issue better than your "[PATCH 2/2] memcg: do not > report racy no-eligible OOM tasks" does. OK, so we are stuck again. Hooray! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-09 12:02 ` Michal Hocko @ 2019-01-10 23:59 ` Tetsuo Handa 2019-01-11 10:25 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-10 23:59 UTC (permalink / raw) To: Andrew Morton; +Cc: Michal Hocko, linux-mm, Johannes Weiner, LKML Michal Hocko wrote: > On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: > > On 2019/01/09 20:03, Michal Hocko wrote: > > > Tetsuo, > > > can you confirm that these two patches are fixing the issue you have > > > reported please? > > > > > > > My patch fixes the issue better than your "[PATCH 2/2] memcg: do not > > report racy no-eligible OOM tasks" does. > > OK, so we are stuck again. Hooray! Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch does not close the race whereas my patch closes the race better. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-10 23:59 ` Tetsuo Handa @ 2019-01-11 10:25 ` Tetsuo Handa 2019-01-11 11:33 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-11 10:25 UTC (permalink / raw) To: Andrew Morton, Michal Hocko; +Cc: linux-mm, Johannes Weiner, LKML On 2019/01/11 8:59, Tetsuo Handa wrote: > Michal Hocko wrote: >> On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: >>> On 2019/01/09 20:03, Michal Hocko wrote: >>>> Tetsuo, >>>> can you confirm that these two patches are fixing the issue you have >>>> reported please? >>>> >>> >>> My patch fixes the issue better than your "[PATCH 2/2] memcg: do not >>> report racy no-eligible OOM tasks" does. >> >> OK, so we are stuck again. Hooray! > > Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? > Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() > when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch > does not close the race whereas my patch closes the race better. > I confirmed that mm-oom-marks-all-killed-tasks-as-oom-victims.patch and memcg-do-not-report-racy-no-eligible-oom-tasks.patch are completely failing to fix the issue I am reporting. :-( Reproducer: ---------- #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sched.h> #include <sys/mman.h> #define NUMTHREADS 256 #define MMAPSIZE 4 * 10485760 #define STACKSIZE 4096 static int pipe_fd[2] = { EOF, EOF }; static int memory_eater(void *unused) { int fd = open("/dev/zero", O_RDONLY); char *buf = mmap(NULL, MMAPSIZE, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_SHARED, EOF, 0); read(pipe_fd[0], buf, 1); read(fd, buf, MMAPSIZE); pause(); return 0; } int main(int argc, char *argv[]) { int i; char *stack; FILE *fp; const unsigned long size = 1048576UL * 200; mkdir("/sys/fs/cgroup/memory/test1", 0755); fp = fopen("/sys/fs/cgroup/memory/test1/memory.limit_in_bytes", "w"); fprintf(fp, "%lu\n", size); fclose(fp); fp = fopen("/sys/fs/cgroup/memory/test1/tasks", "w"); fprintf(fp, "%u\n", getpid()); fclose(fp); if (setgid(-2) || setuid(-2) || pipe(pipe_fd)) return 1; stack = mmap(NULL, STACKSIZE * NUMTHREADS, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_SHARED, EOF, 0); for (i = 0; i < NUMTHREADS; i++) if (clone(memory_eater, stack + (i + 1) * STACKSIZE, CLONE_VM | CLONE_FS | CLONE_FILES, NULL) == -1) break; close(pipe_fd[1]); pause(); // Manually enter Ctrl-C immediately after dump_header() started. return 0; } ---------- Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20190111.txt.xz : ---------- [ 71.146532][ T9694] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 71.151647][ T9694] CPU: 1 PID: 9694 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc1-next-20190111 #272 (...snipped...) [ 71.304689][ T9694] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=9692,uid=-2 [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB [ 71.328523][ T9748] a.out invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0 [ 71.328552][ T9748] CPU: 4 PID: 9748 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc1-next-20190111 #272 (...snipped...) [ 71.328785][ T9748] Out of memory and no killable processes... [ 71.329194][ T9771] a.out invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0 (...snipped...) [ 99.696592][ T9924] Out of memory and no killable processes... [ 99.699001][ T9838] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 (...snipped...) [ 99.833413][ T9838] Out of memory and no killable processes... ---------- $ grep -F 'Out of memory and no killable processes...' serial-20190111.txt | wc -l 213 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 10:25 ` Tetsuo Handa @ 2019-01-11 11:33 ` Michal Hocko 2019-01-11 12:40 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-11 11:33 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, linux-mm, Johannes Weiner, LKML On Fri 11-01-19 19:25:22, Tetsuo Handa wrote: > On 2019/01/11 8:59, Tetsuo Handa wrote: > > Michal Hocko wrote: > >> On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: > >>> On 2019/01/09 20:03, Michal Hocko wrote: > >>>> Tetsuo, > >>>> can you confirm that these two patches are fixing the issue you have > >>>> reported please? > >>>> > >>> > >>> My patch fixes the issue better than your "[PATCH 2/2] memcg: do not > >>> report racy no-eligible OOM tasks" does. > >> > >> OK, so we are stuck again. Hooray! > > > > Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? > > Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() > > when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch > > does not close the race whereas my patch closes the race better. > > > > I confirmed that mm-oom-marks-all-killed-tasks-as-oom-victims.patch and > memcg-do-not-report-racy-no-eligible-oom-tasks.patch are completely failing > to fix the issue I am reporting. :-( OK, this is really interesting. This means that we are racing when marking all the tasks sharing the mm with the clone syscall. Does fatal_signal_pending handle this better? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 11:33 ` Michal Hocko @ 2019-01-11 12:40 ` Tetsuo Handa 2019-01-11 13:34 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-11 12:40 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, linux-mm, Johannes Weiner, LKML On 2019/01/11 20:33, Michal Hocko wrote: > On Fri 11-01-19 19:25:22, Tetsuo Handa wrote: >> On 2019/01/11 8:59, Tetsuo Handa wrote: >>> Michal Hocko wrote: >>>> On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: >>>>> On 2019/01/09 20:03, Michal Hocko wrote: >>>>>> Tetsuo, >>>>>> can you confirm that these two patches are fixing the issue you have >>>>>> reported please? >>>>>> >>>>> >>>>> My patch fixes the issue better than your "[PATCH 2/2] memcg: do not >>>>> report racy no-eligible OOM tasks" does. >>>> >>>> OK, so we are stuck again. Hooray! >>> >>> Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? >>> Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() >>> when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch >>> does not close the race whereas my patch closes the race better. >>> >> >> I confirmed that mm-oom-marks-all-killed-tasks-as-oom-victims.patch and >> memcg-do-not-report-racy-no-eligible-oom-tasks.patch are completely failing >> to fix the issue I am reporting. :-( > > OK, this is really interesting. This means that we are racing > when marking all the tasks sharing the mm with the clone syscall. Nothing interesting. This is NOT a race between clone() and the OOM killer. :-( By the moment the OOM killer is invoked, all clone() requests are already completed. Did you notice that there is no "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" line between [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child and [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside __oom_kill_process() in the first round of out_of_memory() call because find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim complete exit_mm() before find_lock_task_mm() is called. Then, in the second round of out_of_memory() call, [ T9750] (which is fatal_signal_pending() == T && tsk_is_oom_victim() == F) hit task_will_free_mem(current) path and called mark_oom_victim() and woke up the OOM reaper. Then, before the third round of out_of_memory() call starts, the OOM reaper set MMF_OOM_SKIP. When the third round of out_of_memory() call started, [ T9748] could not hit task_will_free_mem(current) path because MMF_OOM_SKIP was already set, and oom_badness() ignored any mm which already has MMF_OOM_SKIP. As a result, [ T9748] failed to find a candidate. And this step repeats for up to number of threads (213 times for this run). > Does fatal_signal_pending handle this better? > Of course. My patch handles it perfectly. Even if we raced with clone() requests, why do we need to care about threads doing clone() requests? Such threads are not inside try_charge(), and therefore such threads can't contribute to this issue by calling out_of_memory() from try_charge(). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 12:40 ` Tetsuo Handa @ 2019-01-11 13:34 ` Michal Hocko 2019-01-11 14:31 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-11 13:34 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, linux-mm, Johannes Weiner, LKML On Fri 11-01-19 21:40:52, Tetsuo Handa wrote: [...] > Did you notice that there is no > > "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" > > line between > > [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child > > and > > [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB > > ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside > __oom_kill_process() in the first round of out_of_memory() call because > find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim > complete exit_mm() before find_lock_task_mm() is called. OK, so we haven't killed anything because the victim has exited by the time we wanted to do so. We still have other tasks sharing that mm pending and not killed because nothing has killed them yet, right? How come the oom reaper could act on this oom event at all then? What am I missing? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 13:34 ` Michal Hocko @ 2019-01-11 14:31 ` Tetsuo Handa 2019-01-11 15:07 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-11 14:31 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, linux-mm, Johannes Weiner, LKML On 2019/01/11 22:34, Michal Hocko wrote: > On Fri 11-01-19 21:40:52, Tetsuo Handa wrote: > [...] >> Did you notice that there is no >> >> "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" >> >> line between >> >> [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child >> >> and >> >> [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB >> >> ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside >> __oom_kill_process() in the first round of out_of_memory() call because >> find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim >> complete exit_mm() before find_lock_task_mm() is called. > > OK, so we haven't killed anything because the victim has exited by the > time we wanted to do so. We still have other tasks sharing that mm > pending and not killed because nothing has killed them yet, right? The OOM killer invoked by [ T9694] called printk() but didn't kill anything. Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. > > How come the oom reaper could act on this oom event at all then? > > What am I missing? > The OOM killer invoked by [ T9750] did not call printk() but hit task_will_free_mem(current) in out_of_memory() and invoked the OOM reaper, without calling mark_oom_victim() on all thread groups sharing current->mm. Did you notice that I wrote that Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() when task_will_free_mem() == true, ? :-( ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 14:31 ` Tetsuo Handa @ 2019-01-11 15:07 ` Michal Hocko 2019-01-11 15:37 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-11 15:07 UTC (permalink / raw) To: Andrew Morton, Tetsuo Handa; +Cc: linux-mm, Johannes Weiner, LKML On Fri 11-01-19 23:31:18, Tetsuo Handa wrote: > On 2019/01/11 22:34, Michal Hocko wrote: > > On Fri 11-01-19 21:40:52, Tetsuo Handa wrote: > > [...] > >> Did you notice that there is no > >> > >> "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" > >> > >> line between > >> > >> [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child > >> > >> and > >> > >> [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB > >> > >> ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside > >> __oom_kill_process() in the first round of out_of_memory() call because > >> find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim > >> complete exit_mm() before find_lock_task_mm() is called. > > > > OK, so we haven't killed anything because the victim has exited by the > > time we wanted to do so. We still have other tasks sharing that mm > > pending and not killed because nothing has killed them yet, right? > > The OOM killer invoked by [ T9694] called printk() but didn't kill anything. > Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. I still do not get it. Those other processes are not sharing signals. Or is it due to injecting the signal too all of them with the proper timing? > > How come the oom reaper could act on this oom event at all then? > > > > What am I missing? > > > > The OOM killer invoked by [ T9750] did not call printk() but hit > task_will_free_mem(current) in out_of_memory() and invoked the OOM reaper, > without calling mark_oom_victim() on all thread groups sharing current->mm. > Did you notice that I wrote that OK, now it starts making sense to me finally. I got hooked up in find_lock_task_mm failing in __oom_kill_process because we do see "Memory cgroup out of memory" and that happens _after_ task_will_free_mem. So the whole oom_reaper scenario didn't make much sense to me. > Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() > when task_will_free_mem() == true, > > ? :-( No, I got lost in your writeup. While the task_will_free_mem is fixable but this would get us to even uglier code so I agree that the approach by my two patches is not feasible. I really wanted to have this heuristic based on the oom victim rather than signal pending because one lesson I've learned over time was that checks for fatal signals can lead to odd corner cases. Memcg is less prone to those issues because we can bypass the charge but still. Anyway, could you update your patch and abstract if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) in try_charge and reuse it in mem_cgroup_out_of_memory under the oom_lock with an explanation please? Andrew, please drop my 2 patches please. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 15:07 ` Michal Hocko @ 2019-01-11 15:37 ` Tetsuo Handa 2019-01-11 16:45 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-11 15:37 UTC (permalink / raw) To: Michal Hocko, Andrew Morton; +Cc: linux-mm, Johannes Weiner, LKML On 2019/01/12 0:07, Michal Hocko wrote: > On Fri 11-01-19 23:31:18, Tetsuo Handa wrote: >> The OOM killer invoked by [ T9694] called printk() but didn't kill anything. >> Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. > > I still do not get it. Those other processes are not sharing signals. > Or is it due to injecting the signal too all of them with the proper > timing? Pressing Ctrl-C between after task_will_free_mem(p) in oom_kill_process() and before __oom_kill_process() (e.g. dump_header()) made fatal_signal_pending() = T for all of them. > Anyway, could you update your patch and abstract > if (unlikely(tsk_is_oom_victim(current) || > fatal_signal_pending(current) || > current->flags & PF_EXITING)) > > in try_charge and reuse it in mem_cgroup_out_of_memory under the > oom_lock with an explanation please? I don't think doing so makes sense, for tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F can't happen for mem_cgroup_out_of_memory() under the oom_lock, and current->flags cannot get PF_EXITING when current is inside mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is appropriate for mem_cgroup_out_of_memory() under the oom_lock because tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T can happen there. Also, doing so might become wrong in future, for mem_cgroup_out_of_memory() is also called from memory_max_write() which does not bail out upon PF_EXITING. I don't think we can call memory_max_write() after current thread got PF_EXITING, but nobody knows what change will happen in future. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 15:37 ` Tetsuo Handa @ 2019-01-11 16:45 ` Michal Hocko 2019-01-12 10:52 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-01-11 16:45 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, linux-mm, Johannes Weiner, LKML On Sat 12-01-19 00:37:05, Tetsuo Handa wrote: > On 2019/01/12 0:07, Michal Hocko wrote: > > On Fri 11-01-19 23:31:18, Tetsuo Handa wrote: > >> The OOM killer invoked by [ T9694] called printk() but didn't kill anything. > >> Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. > > > > I still do not get it. Those other processes are not sharing signals. > > Or is it due to injecting the signal too all of them with the proper > > timing? > > Pressing Ctrl-C between after task_will_free_mem(p) in oom_kill_process() and > before __oom_kill_process() (e.g. dump_header()) made fatal_signal_pending() = T > for all of them. > > > Anyway, could you update your patch and abstract > > if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > > > in try_charge and reuse it in mem_cgroup_out_of_memory under the > > oom_lock with an explanation please? > > I don't think doing so makes sense, for > > tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F > > can't happen for mem_cgroup_out_of_memory() under the oom_lock, and > current->flags cannot get PF_EXITING when current is inside > mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is > appropriate for mem_cgroup_out_of_memory() under the oom_lock because > > tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T > > can happen there. I meant to use the same check consistently. If we can bypass the charge under a list of conditions in the charge path we should be surely be able to the the same for the oom path. I will not insist but unless there is a strong reason I would prefer that. > Also, doing so might become wrong in future, for mem_cgroup_out_of_memory() > is also called from memory_max_write() which does not bail out upon > PF_EXITING. I don't think we can call memory_max_write() after current > thread got PF_EXITING, but nobody knows what change will happen in future. No, this makes no sense what so ever. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-11 16:45 ` Michal Hocko @ 2019-01-12 10:52 ` Tetsuo Handa 2019-01-13 17:36 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2019-01-12 10:52 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, linux-mm, Johannes Weiner, LKML On 2019/01/12 1:45, Michal Hocko wrote: >>> Anyway, could you update your patch and abstract >>> if (unlikely(tsk_is_oom_victim(current) || >>> fatal_signal_pending(current) || >>> current->flags & PF_EXITING)) >>> >>> in try_charge and reuse it in mem_cgroup_out_of_memory under the >>> oom_lock with an explanation please? >> >> I don't think doing so makes sense, for >> >> tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F >> >> can't happen for mem_cgroup_out_of_memory() under the oom_lock, and >> current->flags cannot get PF_EXITING when current is inside >> mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is >> appropriate for mem_cgroup_out_of_memory() under the oom_lock because >> >> tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T >> >> can happen there. > > I meant to use the same check consistently. If we can bypass the charge > under a list of conditions in the charge path we should be surely be > able to the the same for the oom path. I will not insist but unless > there is a strong reason I would prefer that. > You mean something like this? I'm not sure this change is safe. mm/memcontrol.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 17189da..1733d019 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -248,6 +248,12 @@ enum res_type { iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) +static inline bool can_ignore_limit(void) +{ + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || + (current->flags & PF_EXITING); +} + /* Some nice accessors for the vmpressure. */ struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) { @@ -1395,7 +1401,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ - ret = fatal_signal_pending(current) || out_of_memory(&oc); + ret = can_ignore_limit() || out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; } @@ -1724,6 +1730,10 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int mem_cgroup_unmark_under_oom(memcg); if (mem_cgroup_out_of_memory(memcg, mask, order)) + /* + * Returning OOM_SUCCESS upon can_ignore_limit() is OK, for + * the caller will check can_ignore_limit() again. + */ ret = OOM_SUCCESS; else ret = OOM_FAILED; @@ -1783,6 +1793,11 @@ bool mem_cgroup_oom_synchronize(bool handle) finish_wait(&memcg_oom_waitq, &owait.wait); mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask, current->memcg_oom_order); + /* + * Returning upon can_ignore_limit() is OK, for the caller is + * already killed... CheckMe: Is this assumption correct? + * Page fault can't happen after getting PF_EXITING? + */ } else { schedule(); mem_cgroup_unmark_under_oom(memcg); @@ -2215,9 +2230,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(tsk_is_oom_victim(current) || - fatal_signal_pending(current) || - current->flags & PF_EXITING)) + if (unlikely(can_ignore_limit())) goto force; /* @@ -5527,6 +5540,12 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, memcg_memory_event(memcg, MEMCG_OOM); if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) break; + /* + * There is no need to check can_ignore_limit() here, for + * signal_pending(current) above will break anyway. + */ + if (unlikely(can_ignore_limit())) + break; } memcg_wb_domain_size_changed(memcg); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM 2019-01-12 10:52 ` Tetsuo Handa @ 2019-01-13 17:36 ` Michal Hocko 0 siblings, 0 replies; 28+ messages in thread From: Michal Hocko @ 2019-01-13 17:36 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, linux-mm, Johannes Weiner, LKML On Sat 12-01-19 19:52:50, Tetsuo Handa wrote: > On 2019/01/12 1:45, Michal Hocko wrote: > >>> Anyway, could you update your patch and abstract > >>> if (unlikely(tsk_is_oom_victim(current) || > >>> fatal_signal_pending(current) || > >>> current->flags & PF_EXITING)) > >>> > >>> in try_charge and reuse it in mem_cgroup_out_of_memory under the > >>> oom_lock with an explanation please? > >> > >> I don't think doing so makes sense, for > >> > >> tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F > >> > >> can't happen for mem_cgroup_out_of_memory() under the oom_lock, and > >> current->flags cannot get PF_EXITING when current is inside > >> mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is > >> appropriate for mem_cgroup_out_of_memory() under the oom_lock because > >> > >> tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T > >> > >> can happen there. > > > > I meant to use the same check consistently. If we can bypass the charge > > under a list of conditions in the charge path we should be surely be > > able to the the same for the oom path. I will not insist but unless > > there is a strong reason I would prefer that. > > > > You mean something like this? I'm not sure this change is safe. > > mm/memcontrol.c | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 17189da..1733d019 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -248,6 +248,12 @@ enum res_type { > iter != NULL; \ > iter = mem_cgroup_iter(NULL, iter, NULL)) > > +static inline bool can_ignore_limit(void) > +{ > + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || > + (current->flags & PF_EXITING); > +} > + > /* Some nice accessors for the vmpressure. */ > struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) > { > @@ -1395,7 +1401,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > * A few threads which were not waiting at mutex_lock_killable() can > * fail to bail out. Therefore, check again after holding oom_lock. > */ > - ret = fatal_signal_pending(current) || out_of_memory(&oc); > + ret = can_ignore_limit() || out_of_memory(&oc); > mutex_unlock(&oom_lock); > return ret; > } > @@ -2215,9 +2230,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > * bypass the last charges so that they can exit quickly and > * free their memory. > */ > - if (unlikely(tsk_is_oom_victim(current) || > - fatal_signal_pending(current) || > - current->flags & PF_EXITING)) > + if (unlikely(can_ignore_limit())) > goto force; > > /* I meant something as simple as this, indeed. I would just s@can_ignore_limit@should_force_charge@ but this is a minor thing. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2019-01-13 17:36 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-07 14:38 [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko 2019-01-07 14:38 ` [PATCH 1/2] mm, oom: marks all killed tasks as oom victims Michal Hocko 2019-01-07 20:58 ` Tetsuo Handa 2019-01-08 8:11 ` Michal Hocko 2019-01-07 14:38 ` [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks Michal Hocko 2019-01-07 20:59 ` Tetsuo Handa 2019-01-08 8:14 ` Michal Hocko 2019-01-08 10:39 ` Tetsuo Handa 2019-01-08 11:46 ` Michal Hocko 2019-01-08 8:35 ` kbuild test robot 2019-01-08 9:39 ` Michal Hocko 2019-01-11 0:23 ` [kbuild-all] " Rong Chen 2019-01-08 14:21 ` [PATCH 3/2] memcg: Facilitate termination of memcg OOM victims Tetsuo Handa 2019-01-08 14:38 ` Michal Hocko 2019-01-09 11:03 ` [PATCH 0/2] oom, memcg: do not report racy no-eligible OOM Michal Hocko 2019-01-09 11:34 ` Tetsuo Handa 2019-01-09 12:02 ` Michal Hocko 2019-01-10 23:59 ` Tetsuo Handa 2019-01-11 10:25 ` Tetsuo Handa 2019-01-11 11:33 ` Michal Hocko 2019-01-11 12:40 ` Tetsuo Handa 2019-01-11 13:34 ` Michal Hocko 2019-01-11 14:31 ` Tetsuo Handa 2019-01-11 15:07 ` Michal Hocko 2019-01-11 15:37 ` Tetsuo Handa 2019-01-11 16:45 ` Michal Hocko 2019-01-12 10:52 ` Tetsuo Handa 2019-01-13 17:36 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).