linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, rientjes@google.com, oleg@redhat.com,
	vdavydov@parallels.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/10 -v3] Handle oom bypass more gracefully
Date: Tue, 7 Jun 2016 17:05:34 +0200	[thread overview]
Message-ID: <20160607150534.GO12305@dhcp22.suse.cz> (raw)
In-Reply-To: <201606072330.AHH81886.OOMVHFOFLtFSQJ@I-love.SAKURA.ne.jp>

On Tue 07-06-16 23:30:20, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > To be honest, I don't think we need to apply this pile.
> > 
> > So you do not think that the current pile is making the code easier to
> > understand and more robust as well as the semantic more consistent?
> 
> Right. It is getting too complicated for me to understand.

Yeah, this code is indeed very complicated with subtle side effects. I
believe there are much less side effects with these patches applied.
I might be biased of course and that is for others to judge.

> Below patch on top of 4.7-rc2 will do the job and can do for
> CONFIG_MMU=n kernels as well.
[...]
> @@ -179,7 +184,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
>  	 * unkillable or have been already oom reaped.
>  	 */
>  	adj = (long)p->signal->oom_score_adj;
> -	if (adj == OOM_SCORE_ADJ_MIN ||
> +	if (adj == OOM_SCORE_ADJ_MIN || p->signal->oom_killed ||
>  			test_bit(MMF_OOM_REAPED, &p->mm->flags)) {
>  		task_unlock(p);
>  		return 0;
[...]
> @@ -284,7 +289,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  	 * Don't allow any other task to have access to the reserves.
>  	 */
>  	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
> -		return OOM_SCAN_ABORT;
> +		return timer_pending(&oomkiller_victim_wait_timer) ?
> +			OOM_SCAN_ABORT : OOM_SCAN_CONTINUE;
>  
>  	/*
>  	 * If task is allocating a lot of memory and has been marked to be
> @@ -678,6 +684,8 @@ void mark_oom_victim(struct task_struct *tsk)
>  	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
>  		return;
>  	atomic_inc(&tsk->signal->oom_victims);
> +	mod_timer(&oomkiller_victim_wait_timer, jiffies + 3 * HZ);
> +	tsk->signal->oom_killed = true;
>  	/*
>  	 * Make sure that the task is woken up from uninterruptible sleep
>  	 * if it is frozen because OOM killer wouldn't be able to free

OK, so you are arming the timer for each mark_oom_victim regardless
of the oom context. This means that you have replaced one potential
lockup by other potential livelocks. Tasks from different oom domains
might interfere here...

Also this code doesn't even seem easier. It is surely less lines of
code but it is really hard to realize how would the timer behave for
different oom contexts.

> > > What is missing for
> > > handling subtle and unlikely issues is "eligibility check for not to select
> > > the same victim forever" (i.e. always set MMF_OOM_REAPED or OOM_SCORE_ADJ_MIN,
> > > and check them before exercising the shortcuts).
> > 
> > Which is a hard problem as we do not have enough context for that. Most
> > situations are covered now because we are much less optimistic when
> > bypassing the oom killer and basically most sane situations are oom
> > reapable.
> 
> What is wrong with above patch? How much difference is there compared to
> calling schedule_timeout_killable(HZ) in oom_kill_process() before
> releasing oom_lock and later checking MMF_OOM_REAPED after re-taking
> oom_lock when we can't wake up the OOM reaper?

I fail to see how much this is different, really. Your patch is checking
timer_pending with a global context in the same path and that is imho
much harder to argue about than something which is task->mm based.
 
> > > Current 4.7-rc1 code will be sufficient (and sometimes even better than
> > > involving user visible changes / selecting next OOM victim without delay)
> > > if we started with "decision by timer" (e.g.
> > > http://lkml.kernel.org/r/201601072026.JCJ95845.LHQOFOOSMFtVFJ@I-love.SAKURA.ne.jp )
> > > approach.
> > > 
> > > As long as you insist on "decision by feedback from the OOM reaper",
> > > we have to guarantee that the OOM reaper is always invoked in order to
> > > handle subtle and unlikely cases.
> > 
> > And I still believe that a decision based by a feedback is a better
> > solution than a timeout. So I am pretty much for exploring that way
> > until we really find out we cannot really go forward any longer.
> 
> I'm OK with "a decision based by a feedback" but you don't like waking up
> the OOM reaper ("invoking the oom reaper just to find out what we know
> already and it is unlikely to change after oom_kill_process just doesn't
> make much sense."). So what feedback mechanisms are possible other than
> timeout like above patch?

Is this about the patch 10? Well, yes, there is a case where oom reaper
cannot be invoked and we have no feedback. Then we have no other way
than to wait for some time. I believe it is easier to wait in the oom
context directly than to add a global timer. Both approaches would need
some code in the oom victim selection code and it is much easier to
argue about the victim specific context than a global one as mentioned
above.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2016-06-07 15:05 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03  9:16 [PATCH 0/10 -v3] Handle oom bypass more gracefully Michal Hocko
2016-06-03  9:16 ` [PATCH 01/10] proc, oom: drop bogus task_lock and mm check Michal Hocko
2016-06-03  9:16 ` [PATCH 02/10] proc, oom: drop bogus sighand lock Michal Hocko
2016-06-03  9:16 ` [PATCH 03/10] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-06-03  9:16 ` [PATCH 04/10] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-06-03  9:16 ` [PATCH 05/10] mm, oom: skip vforked tasks from being selected Michal Hocko
2016-06-03  9:16 ` [PATCH 06/10] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-06-06 22:27   ` David Rientjes
2016-06-06 23:20     ` Oleg Nesterov
2016-06-07  6:37       ` Michal Hocko
2016-06-07 22:15       ` David Rientjes
2016-06-08  6:22         ` Michal Hocko
2016-06-08 22:51           ` David Rientjes
2016-06-09  6:46             ` Michal Hocko
2016-06-03  9:16 ` [PATCH 07/10] mm, oom: fortify task_will_free_mem Michal Hocko
2016-06-03  9:16 ` [PATCH 08/10] mm, oom: task_will_free_mem should skip oom_reaped tasks Michal Hocko
2016-06-03  9:16 ` [RFC PATCH 09/10] mm, oom_reaper: do not attempt to reap a task more than twice Michal Hocko
2016-06-03  9:16 ` [RFC PATCH 10/10] mm, oom: hide mm which is shared with kthread or global init Michal Hocko
2016-06-03 15:16   ` Tetsuo Handa
2016-06-06  8:15     ` Michal Hocko
2016-06-06 13:26     ` Michal Hocko
2016-06-07  6:26       ` Michal Hocko
2016-06-03 12:00 ` [PATCH 0/10 -v3] Handle oom bypass more gracefully Tetsuo Handa
2016-06-03 12:20   ` Michal Hocko
2016-06-03 12:22     ` Michal Hocko
2016-06-04 10:57       ` Tetsuo Handa
2016-06-06  8:39         ` Michal Hocko
2016-06-03 15:17     ` Tetsuo Handa
2016-06-06  8:36       ` Michal Hocko
2016-06-07 14:30         ` Tetsuo Handa
2016-06-07 15:05           ` Michal Hocko [this message]
2016-06-07 21:49             ` Tetsuo Handa
2016-06-08  7:27               ` Michal Hocko
2016-06-08 14:55                 ` Tetsuo Handa
2016-06-08 16:05                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160607150534.GO12305@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).