linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	linux-mm@kvack.org, Greg Thelen <gthelen@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Dmitry Vyukov <dvyukov@google.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task
Date: Wed, 8 Aug 2018 08:44:14 +0200	[thread overview]
Message-ID: <20180808064414.GA27972@dhcp22.suse.cz> (raw)
In-Reply-To: <20180807205425.GA5928@cmpxchg.org>

On Tue 07-08-18 16:54:25, Johannes Weiner wrote:
> On Tue, Aug 07, 2018 at 10:23:32PM +0200, Michal Hocko wrote:
> > On Tue 07-08-18 16:02:47, Johannes Weiner wrote:
> > > On Tue, Aug 07, 2018 at 09:25:53AM +0200, Michal Hocko wrote:
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > "memcg, oom: move out_of_memory back to the charge path" has added a
> > > > warning triggered when the oom killer cannot find any eligible task
> > > > and so there is no way to reclaim the oom memcg under its hard limit.
> > > > Further charges for such a memcg are forced and therefore the hard limit
> > > > isolation is weakened.
> > > > 
> > > > The current warning is however too eager to trigger  even when we are not
> > > > really hitting the above condition. Syzbot[1] and Greg Thelen have noticed
> > > > that we can hit this condition even when there is still oom victim
> > > > pending. E.g. the following race is possible:
> > > > 
> > > > memcg has two tasks taskA, taskB.
> > > > 
> > > > CPU1 (taskA)			CPU2			CPU3 (taskB)
> > > > try_charge
> > > >   mem_cgroup_out_of_memory				try_charge
> > > >       select_bad_process(taskB)
> > > >       oom_kill_process		oom_reap_task
> > > > 				# No real memory reaped
> > > >     				  			  mem_cgroup_out_of_memory
> > > > 				# set taskB -> MMF_OOM_SKIP
> > > >   # retry charge
> > > >   mem_cgroup_out_of_memory
> > > >     oom_lock						    oom_lock
> > > >     select_bad_process(self)
> > > >     oom_kill_process(self)
> > > >     oom_unlock
> > > > 							    # no eligible task
> > > > 
> > > > In fact syzbot test triggered this situation by placing multiple tasks
> > > > into a memcg with hard limit set to 0. So no task really had any memory
> > > > charged to the memcg
> > > > 
> > > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
> > > > : Tasks state (memory values in pages):
> > > > : [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> > > > : [   6569]     0  6562     9427        1    53248        0             0 syz-executor0
> > > > : [   6576]     0  6576     9426        0    61440        0             0 syz-executor6
> > > > : [   6578]     0  6578     9426      534    61440        0             0 syz-executor4
> > > > : [   6579]     0  6579     9426        0    57344        0             0 syz-executor5
> > > > : [   6582]     0  6582     9426        0    61440        0             0 syz-executor7
> > > > : [   6584]     0  6584     9426        0    57344        0             0 syz-executor1
> > > > 
> > > > so in principle there is indeed nothing reclaimable in this memcg and
> > > > this looks like a misconfiguration. On the other hand we can clearly
> > > > kill all those tasks so it is a bit early to warn and scare users. Do
> > > > that by checking that the current is the oom victim and bypass the
> > > > warning then. The victim is allowed to force charge and terminate to
> > > > release its temporal charge along the way.
> > > > 
> > > > [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com
> > > > Fixes: "memcg, oom: move out_of_memory back to the charge path"
> > > > Noticed-by: Greg Thelen <gthelen@google.com>
> > > > Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  mm/memcontrol.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 4603ad75c9a9..1b6eed1bc404 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
> > > >  		return OOM_ASYNC;
> > > >  	}
> > > >  
> > > > -	if (mem_cgroup_out_of_memory(memcg, mask, order))
> > > > +	if (mem_cgroup_out_of_memory(memcg, mask, order) ||
> > > > +			tsk_is_oom_victim(current))
> > > >  		return OOM_SUCCESS;
> > > >  
> > > >  	WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
> > > 
> > > This is really ugly. :(
> > > 
> > > If that check is only there to suppress the warning when the limit is
> > > 0, this should really be a separate branch around the warning, with a
> > > fat comment that this is a ridiculous cornercase, and not look like it
> > > is an essential part of the memcg reclaim/oom process.
> > 
> > I do not mind having it in a separate branch. Btw. this is not just about
> > hard limit set to 0. Similar can happen anytime we are getting out of
> > oom victims. The likelihood goes up with the remote memcg charging
> > merged recently.
> 
> What the global OOM killer does in that situation is dump the header
> anyway:
> 
> 	/* Found nothing?!?! Either we hang forever, or we panic. */
> 	if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
> 		dump_header(oc, NULL);
> 		panic("Out of memory and no killable processes...\n");
> 	}
> 
> I think that would make sense here as well - without the panic,
> obviously, but we can add our own pr_err() line following the header.
> 
> That gives us the exact memory situation of the cgroup and who is
> trying to allocate and from what context, but in a format that is
> known to users without claiming right away that it's a kernel issue.

I was considering doing that initially but then decided that warning is
less noisy and still a good "let us know" trigger. It doesn't give us
the whole picture which is obviously a downside but we would at least
know that something is going south one have the trace to who that might
be should this be a bug rather than a misconfiguration.

But I do not mind doing dump_header as well. Care to send a patch?

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-08-08  6:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-07  7:25 [PATCH] memcg, oom: be careful about races when warning about no reclaimable task Michal Hocko
2018-08-07 10:15 ` Tetsuo Handa
2018-08-07 11:04   ` Michal Hocko
2018-08-07 20:19   ` Johannes Weiner
2018-08-07 20:38     ` Tetsuo Handa
2018-08-08 12:57       ` Tetsuo Handa
2018-08-08 13:16         ` Michal Hocko
2018-08-07 20:02 ` Johannes Weiner
2018-08-07 20:23   ` Michal Hocko
2018-08-07 20:54     ` Johannes Weiner
2018-08-08  6:44       ` Michal Hocko [this message]
2018-08-08  7:12         ` Michal Hocko
2018-08-08  7:13           ` [PATCH 1/2] " Michal Hocko
2018-08-08  7:13           ` [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task Michal Hocko
2018-08-08 14:45             ` Johannes Weiner
2018-08-08 16:17               ` Michal Hocko
2018-08-21 14:06                 ` Michal Hocko
2018-08-21 17:20                   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180808064414.GA27972@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dvyukov@google.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).