LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: syzbot <syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com>,
	cgroups@vger.kernel.org, Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Dmitry Torokhov <dtor@google.com>
Subject: Re: WARNING in try_charge
Date: Mon, 6 Aug 2018 19:30:00 +0200
Message-ID: <20180806173000.GA10003@dhcp22.suse.cz> (raw)
In-Reply-To: <CACT4Y+Y+fwJ1-v89kw3vgyHO_BWk97WOVOE1BsZdqDiiSFcGzQ@mail.gmail.com>

On Mon 06-08-18 16:58:01, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
> >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> >> >> A much
> >> >> friendlier for user way to say this would be print a message at the
> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
> >> >> trace (does not give any useful info for user). And return EINVAL if
> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the
> >> >> WARNING each occurrence of which we do want to be reported to kernel
> >> >> developers.
> >> >
> >> > But this is not applicable here. Your misconfiguration is quite obvious
> >> > because you simply set the hard limit to 0. This is not the only
> >> > situation when this can happen. There is no clear point to tell, you are
> >> > doing this wrong. If it was we would do it at that point obviously.
> >>
> >> But, isn't there a point were hard limit is set to 0? I would expect
> >> there is a something like cgroup file write handler with a value of 0
> >> or something.
> >
> > Yeah, but this is only one instance of the problem. Other is that the
> > memcg is not reclaimable for any other reasons. And we do not know what
> > those might be
> >
> >>
> >> > If you have a strong reason to believe that this is an abuse of WARN I
> >> > am all happy to change that. But I haven't heard any yet, to be honest.
> >>
> >> WARN must not be used for anything that is not kernel bugs. If this is
> >> not kernel bug, WARN must not be used here.
> >
> > This is rather strong wording without any backing arguments. I strongly
> > doubt 90% of existing WARN* match this expectation. WARN* has
> > traditionally been a way to tell that something suspicious is going on.
> > Those situation are mostly likely not fatal but it is good to know they
> > are happening.
> >
> > Sure there is that panic_on_warn thingy which you seem to be using and I
> > suspect it is a reason why you are so careful about warnings in general
> > but my experience tells me that this configuration is barely usable
> > except for testing (which is your case).
> >
> > But as I've said, I do not insist on WARN here. All I care about is to
> > warn user that something might go south and this may be either due to
> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.
> 
> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is
> also a kernel bug on top of that and it's actually a kernel bug that
> provokes the warning?

As I've tried to tell already. I cannot tell for sure. It is the killed
oom victim which triggered thw warning and that shouldn't really
happen. Considering this doesn't reproduce with the current linux next
nor linus tree and the oom code has changed since the version you have
tested then I would suspect there was something wrong with the memcg oom
code. But maybe the test doesn't really reproduce reliably.

> If it's a kernel bug, then I propose to stop arguing about
> configuration and concentrate on the bug.
> If it's just the misconfiguration that triggers the warning,  then can
> we separate the 2 causes of the warning (user misconfiguration and
> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print
> a line to console if necessary)? Or if the limit=0 is somehow not
> possible/desirable to detect right away, check limit=0 at the point of
> the warning and don't want?

No we simply cannot. There is numerous situations when this can trigger.
Say you set the hard limit to N and then try to fault in shmem file with
the size >= N. No oom killer will help to reclaim memory. Or say you
migrate the all tasks away from the memcg and then somebody triggers the
memcg OOM in that group. There is simply nobody to kill. See the point?
There is simply no direct contection between the configuration and
actual problem. Too many things might happen between those two points.
Let me repeat. We do warn because we want to hear if this happens. WARN
tends to be a good way to get that attention. If you strongly believe
this is an abuse I won't mind seeing a patch to turn it into something
different.

-- 
Michal Hocko
SUSE Labs

  reply index

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-04 13:33 syzbot
2018-08-04 13:45 ` Tetsuo Handa
2018-08-05 11:33   ` Tetsuo Handa
2018-08-05  8:14 ` syzbot
2018-08-06  9:15 ` Michal Hocko
2018-08-06  9:30   ` Dmitry Vyukov
2018-08-06  9:48     ` Michal Hocko
2018-08-06 10:34       ` Dmitry Vyukov
2018-08-06 11:02         ` Michal Hocko
2018-08-06 11:57           ` Dmitry Vyukov
2018-08-06 14:21             ` Michal Hocko
2018-08-06 14:58               ` Dmitry Vyukov
2018-08-06 17:30                 ` Michal Hocko [this message]
2018-08-06 17:53                   ` Dmitry Vyukov
2018-08-06 15:07               ` Dmitry Vyukov
2018-08-06 15:31               ` Johannes Weiner
2018-08-06 10:39       ` Dmitry Vyukov
2018-08-06 10:47         ` Tetsuo Handa
2018-08-06 11:09           ` Michal Hocko
2018-08-06 11:27           ` syzbot
2018-08-06 11:32             ` Michal Hocko
2018-08-06 11:58               ` Dmitry Vyukov
2018-08-06 14:41               ` Tetsuo Handa
2018-08-06 14:58                 ` Michal Hocko
2018-08-06 15:12                   ` Tetsuo Handa
2018-08-06 14:54               ` David Howells
2018-08-06 15:04                 ` Tetsuo Handa
2018-08-06 11:00         ` syzbot
2018-08-06 15:32         ` Tetsuo Handa
2018-08-06 15:42           ` syzbot
2018-08-06 16:02             ` Tetsuo Handa
2018-08-06 17:44             ` Michal Hocko
2018-08-06 17:49               ` Dmitry Vyukov
2018-08-06 17:56               ` Michal Hocko
2018-08-06 18:13                 ` Michal Hocko
2018-08-06 18:23                   ` syzbot
2018-08-06 18:55                     ` Michal Hocko
2018-08-06 19:12                       ` syzbot
2018-08-06 19:45                         ` Michal Hocko
2018-08-06 19:46                           ` Michal Hocko
2018-08-07 11:18                       ` Dmitry Vyukov
2018-08-07 11:25                         ` Michal Hocko
2018-08-06 18:39                   ` Michal Hocko
2018-08-06 20:26                 ` Tetsuo Handa
2018-08-06 20:34                   ` Michal Hocko
2018-08-06 20:46                     ` Tetsuo Handa
2018-08-06 20:55                       ` Michal Hocko
2018-08-06 21:50                         ` Tetsuo Handa
2018-08-07 10:19                           ` Tetsuo Handa
2018-08-09 13:57 ` Tetsuo Handa
2018-08-09 15:07   ` Michal Hocko
2018-08-09 21:05     ` Tetsuo Handa
2018-08-09 15:34   ` Johannes Weiner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180806173000.GA10003@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dtor@google.com \
    --cc=dvyukov@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git