From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f70.google.com (mail-lf0-f70.google.com [209.85.215.70]) by kanga.kvack.org (Postfix) with ESMTP id 7E9686B0262 for ; Tue, 12 Jul 2016 03:09:07 -0400 (EDT) Received: by mail-lf0-f70.google.com with SMTP id 33so4090254lfw.1 for ; Tue, 12 Jul 2016 00:09:07 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 19si1652412wmv.99.2016.07.12.00.09.05 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 12 Jul 2016 00:09:05 -0700 (PDT) Date: Tue, 12 Jul 2016 09:09:04 +0200 From: Michal Hocko Subject: Re: [PATCH 3/6] mm,oom: Use list of mm_struct used by OOM victims. Message-ID: <20160712070903.GB14586@dhcp22.suse.cz> References: <201607080058.BFI87504.JtFOOFQFVHSLOM@I-love.SAKURA.ne.jp> <201607080103.CDH12401.LFOHStQFOOFVJM@I-love.SAKURA.ne.jp> <20160711125051.GF1811@dhcp22.suse.cz> <201607121500.AGE04699.FFQOFHVSOtOLMJ@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201607121500.AGE04699.FFQOFHVSOtOLMJ@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: linux-mm@kvack.org, akpm@linux-foundation.org, oleg@redhat.com, rientjes@google.com, vdavydov@parallels.com, mst@redhat.com On Tue 12-07-16 15:00:41, Tetsuo Handa wrote: > Michal Hocko wrote: > > > diff --git a/kernel/fork.c b/kernel/fork.c > > > index 7926993..8e469e0 100644 > > > --- a/kernel/fork.c > > > +++ b/kernel/fork.c > > > @@ -722,6 +722,10 @@ static inline void __mmput(struct mm_struct *mm) > > > } > > > if (mm->binfmt) > > > module_put(mm->binfmt->module); > > > +#ifndef CONFIG_MMU > > > + if (mm->oom_mm.victim) > > > + exit_oom_mm(mm); > > > +#endif > > > > This ifdef is not really needed. There is no reason we should wait for > > the oom_reaper to unlink the mm. > > Oleg wanted to avoid adding OOM related hooks if possible > ( http://lkml.kernel.org/r/20160705205231.GA25340@redhat.com ), > but I thought that calling exit_oom_mm() from here is better for CONFIG_MMU=n case > ( http://lkml.kernel.org/r/201607062043.FEC86485.JFFVLtFOQOSHMO@I-love.SAKURA.ne.jp ). __mmput is a MM part of the exit path so I think sticking oom related things there is not harmful. Yes we do take a global lock (btw. the lock contention could be reduced if you preserve the existing spinlock and use it for enqueing. Besides that the ifdef is really ugly. > I think that not calling exit_oom_mm() from here is better for CONFIG_MMU=y case. > Calling exit_oom_mm() from here will require !list_empty() check after holding > oom_lock at oom_reaper(). Instead, we can do > > +#ifdef CONFIG_MMU > + if (mm->oom_mm.victim) > + set_bit(MMF_OOM_REAPED, &mm->flags); > +#else > + if (mm->oom_mm.victim) > + exit_oom_mm(mm); > +#endif > > here and let oom_has_pending_mm() check for MMF_OOM_REAPED. Yes that would be possible but why should we make this more complicated than necessary. It is natural that exit_oom_mm is called after the address space has been torn down. This would be the most common path. We have oom_reaper as a backup if this doesn't happen in time. It really doesn't make much sense to keep mm on the list artificially if the oom_reaper should just skip over it because it is already empty. So please let's make it as simple as possible. [...] > > > @@ -653,6 +657,9 @@ subsys_initcall(oom_init) > > > */ > > > void mark_oom_victim(struct task_struct *tsk) > > > { > > > + struct mm_struct *mm = tsk->mm; > > > + struct task_struct *old_tsk = mm->oom_mm.victim; > > > + > > > WARN_ON(oom_killer_disabled); > > > /* OOM killer might race with memcg OOM */ > > > if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) > > > @@ -666,6 +673,18 @@ void mark_oom_victim(struct task_struct *tsk) > > > */ > > > __thaw_task(tsk); > > > atomic_inc(&oom_victims); > > > + /* > > > + * Since mark_oom_victim() is called from multiple threads, > > > + * connect this mm to oom_mm_list only if not yet connected. > > > + */ > > > + get_task_struct(tsk); > > > + mm->oom_mm.victim = tsk; > > > + if (!old_tsk) { > > > + atomic_inc(&mm->mm_count); > > > + list_add_tail(&mm->oom_mm.list, &oom_mm_list); > > > + } else { > > > + put_task_struct(old_tsk); > > > + } > > > > Isn't this overcomplicated? Why do we need to replace the old task by > > the current one? > > I'm not sure whether task_in_oom_domain(mm->oom_mm.victim, memcg, nodemask) in > oom_has_pending_mm() will work as expected, especially when all threads in > one thread group (which mm->oom_mm.victim belongs to) reached TASK_DEAD state. > ( http://lkml.kernel.org/r/201607042150.CIB00512.FSOtMHLOOVFFQJ@I-love.SAKURA.ne.jp ) > > I guess that task_in_oom_domain() will return false, and that mm will be selected > by another thread group (which mm->oom_mm.victim does not belongs to). Therefore, > I think we need to replace the old task with the new task (at least when > task_in_oom_domain() returned false) at mark_oom_victim(). Can we do that in a separate patch then. It would make this patch easier to review and we can discuss about this corner case without distracting from the main point of this patch series. > If task_in_oom_domain(mm->oom_mm.victim, memcg, nodemask) in oom_has_pending_mm() > does not work as expected even if we replace the old task with the new task at > mark_oom_victim(), I think we after all need to use something like > > struct task_struct { > (...snipped...) > + struct mm_struct *oom_mm; /* current->mm as of getting TIF_MEMDIE */ > + struct task_struct *oom_mm_list; /* Connected to oom_mm_list global list. */ > -#ifdef CONFIG_MMU > - struct task_struct *oom_reaper_list; > -#endif > (...snipped...) > }; > > or your signal_struct->oom_mm approach. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org