From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1757211AbdIHVJZ (ORCPT <rfc822;w@1wt.eu>);
        Fri, 8 Sep 2017 17:09:25 -0400
Received: from resqmta-po-05v.sys.comcast.net ([96.114.154.164]:56398 "EHLO
        resqmta-po-05v.sys.comcast.net" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1757192AbdIHVJX (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 8 Sep 2017 17:09:23 -0400
Date: Fri, 8 Sep 2017 16:07:21 -0500 (CDT)
From: Christopher Lameter <cl@linux.com>
X-X-Sender: cl@nuc-kabylake
To: David Rientjes <rientjes@google.com>
cc: Michal Hocko <mhocko@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
        Roman Gushchin <guro@fb.com>, linux-mm@kvack.org,
        Vladimir Davydov <vdavydov.dev@gmail.com>,
        Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
        Andrew Morton <akpm@linux-foundation.org>, Tejun Heo <tj@kernel.org>,
        kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware
 OOM killer
In-Reply-To: <alpine.DEB.2.10.1709071502430.143767@chino.kir.corp.google.com>
Message-ID: <alpine.DEB.2.20.1709081601310.27965@nuc-kabylake>
References: <20170904142108.7165-1-guro@fb.com> <20170904142108.7165-6-guro@fb.com> <20170905134412.qdvqcfhvbdzmarna@dhcp22.suse.cz> <20170905215344.GA27427@cmpxchg.org> <20170906082859.qlqenftxuib64j35@dhcp22.suse.cz> <alpine.DEB.2.20.1709071122360.20082@nuc-kabylake>
 <alpine.DEB.2.10.1709071502430.143767@chino.kir.corp.google.com>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-CMAE-Envelope: MS4wfLzGWi2r+9E7lghwWQxFMfk7dPlH5dyg6qVcVoIbiIdegIOk/0bRM8e6+aj/+9/G3Yg7preVJXAudxkO8LGXbA84/B1ocbUcTUxHsUb5wm25fRiLTSFc
 08iFifod8bpUBZNb3xxxzKSyFJVvq79SeJd3Unb3MDE6e18Jp5CPmVtUViLH9T4YLPfCNPEJVuPkQss/keeC9lvHBwhp+fIbNAdJVrp8FT+ItfdOVH1b5fXN
 lX/ahMDSl7ZV3W0I5QqTW8Yzh1OyY0Ljty36dlGLb0XTkwFIBNjlHh2w2C8r3823aczPrKFPF8X1+Dm76S/iqDINXmyn+o3nldj8uUs27gk1BZCSzffu5ixk
 TB0kgiBcP3Ut8GAEt/ux5zwwSxXi3qjZaHZIKFuIHAjZvJGooKlZW/xPo++/O59ZqMtK3VSgZ5Sp03TZGGJaJ0ypxPvsSXoP5joTke1zTsqGyNF1GYfD2bCy
 QK94zCjNmKpJxEsfpZictr0h1n6BbgtNczw5jnV0J2CIGhC0Z+IgTTPNkOaRofDugKbd9ajJk5BbmnhG
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, 7 Sep 2017, David Rientjes wrote:

> > It has *nothing* to do with zillions of tasks. Its amusing that the SGI
> > ghost is still haunting the discussion here. The company died a couple of
> > years ago finally (ok somehow HP has an "SGI" brand now I believe). But
> > there are multiple companies that have large NUMA configurations and they
> > all have configurations where they want to restrict allocations of a
> > process to subset of system memory. This is even more important now that
> > we get new forms of memory (NVDIMM, PCI-E device memory etc). You need to
> > figure out what to do with allocations that fail because the *allowed*
> > memory pools are empty.
> >
>
> We already had CONSTRAINT_CPUSET at the time, this was requested by Paul
> and acked by him in https://marc.info/?l=linux-mm&m=118306851418425.

Ok. Certainly there were scalability issues (lots of them) and the sysctl
may have helped there if set globally. But the ability to kill the
allocating tasks was primarily used in cpusets for constrained allocation.

The issue of scaling is irrelevant in the context of deciding what to do
about the sysctl. You can address the issue differently if it still
exists. The systems with super high NUMA nodes (hundreds to a
thousand) have somehow fallen out of fashion a bit. So I doubt that this
is still an issue. And no one of the old stakeholders is speaking up.

What is the current approach for an OOM occuring in a cpuset or cgroup
with a restricted numa node set?