From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752045AbdJXS7G (ORCPT <rfc822;w@1wt.eu>);
        Tue, 24 Oct 2017 14:59:06 -0400
Received: from gum.cmpxchg.org ([85.214.110.215]:44948 "EHLO gum.cmpxchg.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751591AbdJXS7F (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 24 Oct 2017 14:59:05 -0400
Date: Tue, 24 Oct 2017 14:58:54 -0400
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>, Shakeel Butt <shakeelb@google.com>,
        Alexander Viro <viro@zeniv.linux.org.uk>,
        Vladimir Davydov <vdavydov.dev@gmail.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Linux MM <linux-mm@kvack.org>, linux-fsdevel@vger.kernel.org,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Message-ID: <20171024185854.GA6154@cmpxchg.org>
References: <20171010141733.GB16710@cmpxchg.org>
 <20171010142434.bpiqmsbb7gttrlcb@dhcp22.suse.cz>
 <20171012190312.GA5075@cmpxchg.org>
 <20171013063555.pa7uco43mod7vrkn@dhcp22.suse.cz>
 <20171013070001.mglwdzdrqjt47clz@dhcp22.suse.cz>
 <20171013152421.yf76n7jui3z5bbn4@dhcp22.suse.cz>
 <20171024160637.GB32340@cmpxchg.org>
 <20171024162213.n6jrpz3t5pldkgxy@dhcp22.suse.cz>
 <20171024172330.GA3973@cmpxchg.org>
 <20171024175558.uxqtxwhjgu6ceadk@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20171024175558.uxqtxwhjgu6ceadk@dhcp22.suse.cz>
User-Agent: Mutt/1.9.1 (2017-09-22)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Oct 24, 2017 at 07:55:58PM +0200, Michal Hocko wrote:
> On Tue 24-10-17 13:23:30, Johannes Weiner wrote:
> > On Tue, Oct 24, 2017 at 06:22:13PM +0200, Michal Hocko wrote:
> [...]
> > > What would prevent a runaway in case the only process in the memcg is
> > > oom unkillable then?
> > 
> > In such a scenario, the page fault handler would busy-loop right now.
> > 
> > Disabling oom kills is a privileged operation with dire consequences
> > if used incorrectly. You can panic the kernel with it. Why should the
> > cgroup OOM killer implement protective semantics around this setting?
> > Breaching the limit in such a setup is entirely acceptable.
> > 
> > Really, I think it's an enormous mistake to start modeling semantics
> > based on the most contrived and non-sensical edge case configurations.
> > Start the discussion with what is sane and what most users should
> > optimally experience, and keep the cornercases simple.
> 
> I am not really seeing your concern about the semantic. The most
> important property of the hard limit is to protect from runaways and
> stop them if they happen. Users can use the softer variant (high limit)
> if they are not afraid of those scenarios. It is not so insane to
> imagine that a master task (which I can easily imagine would be oom
> disabled) has a leak and runaway as a result.

Then you're screwed either way. Where do you return -ENOMEM in a page
fault path that cannot OOM kill anything? Your choice is between
maintaining the hard limit semantics or going into an infinite loop.

I fail to see how this setup has any impact on the semantics we pick
here. And even if it were real, it's really not what most users do.

> We are not talking only about the page fault path. There are other
> allocation paths to consume a lot of memory and spill over and break
> the isolation restriction. So it makes much more sense to me to fail
> the allocation in such a situation rather than allow the runaway to
> continue. Just consider that such a situation shouldn't happen in
> the first place because there should always be an eligible task to
> kill - who would own all the memory otherwise?

Okay, then let's just stick to the current behavior.