From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb0-f197.google.com (mail-yb0-f197.google.com [209.85.213.197]) by kanga.kvack.org (Postfix) with ESMTP id 9C0276B0008 for ; Tue, 17 Jul 2018 16:52:39 -0400 (EDT) Received: by mail-yb0-f197.google.com with SMTP id s46-v6so1205102ybe.8 for ; Tue, 17 Jul 2018 13:52:39 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id b6-v6si375083ywl.566.2018.07.17.13.52.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Jul 2018 13:52:38 -0700 (PDT) Date: Tue, 17 Jul 2018 13:52:22 -0700 From: Roman Gushchin Subject: Re: cgroup-aware OOM killer, how to move forward Message-ID: <20180717205221.GA19862@castle.DHCP.thefacebook.com> References: <20180713221602.GA15005@castle.DHCP.thefacebook.com> <20180713230545.GA17467@castle.DHCP.thefacebook.com> <20180713231630.GB17467@castle.DHCP.thefacebook.com> <20180717173844.GB14909@castle.DHCP.thefacebook.com> <20180717194945.GM7193@dhcp22.suse.cz> <20180717200641.GB18762@castle.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Michal Hocko , linux-mm@kvack.org, akpm@linux-foundation.org, hannes@cmpxchg.org, tj@kernel.org, gthelen@google.com On Tue, Jul 17, 2018 at 01:41:33PM -0700, David Rientjes wrote: > On Tue, 17 Jul 2018, Roman Gushchin wrote: > > > > > Let me show my proposal on examples. Let's say we have the following hierarchy, > > > > and the biggest process (or the process with highest oom_score_adj) is in D. > > > > > > > > / > > > > | > > > > A > > > > | > > > > B > > > > / \ > > > > C D > > > > > > > > Let's look at different examples and intended behavior: > > > > 1) system-wide OOM > > > > - default settings: the biggest process is killed > > > > - D/memory.group_oom=1: all processes in D are killed > > > > - A/memory.group_oom=1: all processes in A are killed > > > > 2) memcg oom in B > > > > - default settings: the biggest process is killed > > > > - A/memory.group_oom=1: the biggest process is killed > > > > > > Huh? Why would you even consider A here when the oom is below it? > > > /me confused > > > > I do not. > > This is exactly a counter-example: A's memory.group_oom > > is not considered at all in this case, > > because A is above ooming cgroup. > > > > I think the confusion is that this says A/memory.group_oom=1 and then the > biggest process is killed, which doesn't seem like it matches the > description you want to give memory.group_oom. It matches perfectly, as the description says that the kernel will look for the most high-level cgroup with group_oom set up to the OOM domain. Here B is oom domain, so A's settings are irrelevant. > > > > > - B/memory.group_oom=1: all processes in B are killed > > > > > > - B/memory.group_oom=0 && > > > > - D/memory.group_oom=1: all processes in D are killed > > > > > > What about? > > > - B/memory.group_oom=1 && D/memory.group_oom=0 > > > > All tasks in B are killed. > > > > Group_oom set to 1 means that the workload can't tolerate > > killing of a random process, so in this case it's better > > to guarantee consistency for B. > > > > This example is missing the usecase that I was referring to, i.e. killing > all processes attached to a subtree because the limit on a common ancestor > has been reached. > > In your example, I would think that the memory.group_oom setting of /A and > /A/B are meaningless because there are no processes attached to them. > > IIUC, your proposal is to select the victim by whatever means, check the > memory.group_oom setting of that victim, and then either kill the victim > or all processes attached to that local mem cgroup depending on the > setting. Sorry, I don't get what are you saying. In cgroup v2 processes can't be attached to A and B. There is no such thing as "local mem cgroup" at all. Thanks!