From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, T_DKIMWL_WL_MED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 641FCC46472 for ; Mon, 6 Aug 2018 17:54:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 08C7F21A64 for ; Mon, 6 Aug 2018 17:54:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="n6ARc77G" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08C7F21A64 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733176AbeHFUEL (ORCPT ); Mon, 6 Aug 2018 16:04:11 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:36170 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732970AbeHFUEL (ORCPT ); Mon, 6 Aug 2018 16:04:11 -0400 Received: by mail-pf1-f194.google.com with SMTP id b11-v6so7223366pfo.3 for ; Mon, 06 Aug 2018 10:54:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=hPqqnX0EJz3Ior5nbXXYJFW7CeM5q2klKKYepQkaJfw=; b=n6ARc77GSrfE7O+bkS+6g6f8R0TKVeNdf67vvEVLeypqqEvfCmUT46u/zOXTZzoaAp 49j+agrOSk4L14ufZE3Ewqjq8bkKiDoAiAT/qO3P/yPfdber+mSC7zq0U/fgLKlVoYyO ShXRNcMHYbZih9L4HXaRqbCKB0ZU54e9NBR3tnOuCXKd8LH5XD8T9W+eT/QV+EA06SUv 3pkGQsaZmMuF4uy6jS5owqfpfXuuMN0s2+BVhr9unp5RzNyqblZJ8Fsku7632gVYugFz /gqLiY9bXu4fS2ayH/bEp6GtKxxs45vOLhLIwsq9rrA9gU1OPezAPa1zrVot2UI5wlH4 RYrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hPqqnX0EJz3Ior5nbXXYJFW7CeM5q2klKKYepQkaJfw=; b=SypG8Dz78NkNTR8gIsHYqj8ccgo+XghSXmRsBUBh6XVwLjs1oJOGFsWm8mvWDDq9Lc P2uHkbLrL9BeHGl/jfy1HUFKSNoMwNxvTMaAqXPrstMxenoPTD3imDFzRJMcc0P2YJSS 1YgF2o5RfhgBlIKTF/6G4WuVRx934ixKD3UAYYhigENUe3nUxUGM+c0UJM4IrjnVAQUY RdzTOW+qY+mckXbAuOoZAqlosnh8CiL1vqihouX9znZKilNmBGqPLkRF2vuAHR3wwIrI D/yrxNmz1Ff0nOr3bN104QGccWIMm5toDM1dd7WGLnV1CZHlwWQOh2FsvJSifdV3bV9y kuyQ== X-Gm-Message-State: AOUpUlGic1FouSo2j+6jRyzqk4ZD5TZR/hODhhFL8Cbg/qaxPE23rcib Nix+5ZfAnBWNFIW+jF3PRwWu6DDigHP6qnTCLLIG/w== X-Google-Smtp-Source: AAOMgpe5ddKHoK6PME9QsWYAHB2DWizC7MWmJlLpZYRe+3TJSn7a9e9UXYeXHEoL9Gp6sb19zq5lYwjWTN9etfddFug= X-Received: by 2002:a63:c046:: with SMTP id z6-v6mr15456964pgi.114.1533578040098; Mon, 06 Aug 2018 10:54:00 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a17:90a:ac14:0:0:0:0 with HTTP; Mon, 6 Aug 2018 10:53:39 -0700 (PDT) In-Reply-To: <20180806173000.GA10003@dhcp22.suse.cz> References: <0000000000005e979605729c1564@google.com> <20180806091552.GE19540@dhcp22.suse.cz> <20180806094827.GH19540@dhcp22.suse.cz> <20180806110224.GI19540@dhcp22.suse.cz> <20180806142124.GP19540@dhcp22.suse.cz> <20180806173000.GA10003@dhcp22.suse.cz> From: Dmitry Vyukov Date: Mon, 6 Aug 2018 19:53:39 +0200 Message-ID: Subject: Re: WARNING in try_charge To: Michal Hocko Cc: syzbot , cgroups@vger.kernel.org, Johannes Weiner , LKML , Linux-MM , syzkaller-bugs , Vladimir Davydov , Dmitry Torokhov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 6, 2018 at 7:30 PM, Michal Hocko wrote: >> >> >> A much >> >> >> friendlier for user way to say this would be print a message at the >> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> >> trace (does not give any useful info for user). And return EINVAL if >> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> >> developers. >> >> > >> >> > But this is not applicable here. Your misconfiguration is quite obvious >> >> > because you simply set the hard limit to 0. This is not the only >> >> > situation when this can happen. There is no clear point to tell, you are >> >> > doing this wrong. If it was we would do it at that point obviously. >> >> >> >> But, isn't there a point were hard limit is set to 0? I would expect >> >> there is a something like cgroup file write handler with a value of 0 >> >> or something. >> > >> > Yeah, but this is only one instance of the problem. Other is that the >> > memcg is not reclaimable for any other reasons. And we do not know what >> > those might be >> > >> >> >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> >> >> WARN must not be used for anything that is not kernel bugs. If this is >> >> not kernel bug, WARN must not be used here. >> > >> > This is rather strong wording without any backing arguments. I strongly >> > doubt 90% of existing WARN* match this expectation. WARN* has >> > traditionally been a way to tell that something suspicious is going on. >> > Those situation are mostly likely not fatal but it is good to know they >> > are happening. >> > >> > Sure there is that panic_on_warn thingy which you seem to be using and I >> > suspect it is a reason why you are so careful about warnings in general >> > but my experience tells me that this configuration is barely usable >> > except for testing (which is your case). >> > >> > But as I've said, I do not insist on WARN here. All I care about is to >> > warn user that something might go south and this may be either due to >> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. >> >> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is >> also a kernel bug on top of that and it's actually a kernel bug that >> provokes the warning? > > As I've tried to tell already. I cannot tell for sure. It is the killed > oom victim which triggered thw warning and that shouldn't really > happen. Considering this doesn't reproduce with the current linux next > nor linus tree and the oom code has changed since the version you have > tested then I would suspect there was something wrong with the memcg oom > code. But maybe the test doesn't really reproduce reliably. > >> If it's a kernel bug, then I propose to stop arguing about >> configuration and concentrate on the bug. >> If it's just the misconfiguration that triggers the warning, then can >> we separate the 2 causes of the warning (user misconfiguration and >> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print >> a line to console if necessary)? Or if the limit=0 is somehow not >> possible/desirable to detect right away, check limit=0 at the point of >> the warning and don't want? > > No we simply cannot. There is numerous situations when this can trigger. > Say you set the hard limit to N and then try to fault in shmem file with > the size >= N. No oom killer will help to reclaim memory. Or say you > migrate the all tasks away from the memcg and then somebody triggers the > memcg OOM in that group. There is simply nobody to kill. See the point? > There is simply no direct contection between the configuration and > actual problem. Too many things might happen between those two points. > Let me repeat. We do warn because we want to hear if this happens. WARN > tends to be a good way to get that attention. If you strongly believe > this is an abuse I won't mind seeing a patch to turn it into something > different. I don't believe it is an abuse, I don't know this code well. Let's assume the misconfiguration is a red-herring for now then.