From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751657AbdJBT2V (ORCPT ); Mon, 2 Oct 2017 15:28:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:44592 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751104AbdJBT2T (ORCPT ); Mon, 2 Oct 2017 15:28:19 -0400 Date: Mon, 2 Oct 2017 21:28:14 +0200 From: Michal Hocko To: Shakeel Butt Cc: Tim Hockin , Roman Gushchin , Johannes Weiner , Tejun Heo , kernel-team@fb.com, David Rientjes , Linux MM , Vladimir Davydov , Tetsuo Handa , Andrew Morton , Cgroups , linux-doc@vger.kernel.org, "linux-kernel@vger.kernel.org" Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20171002192814.sad75tqklp3nmr4m@dhcp22.suse.cz> References: <20170926133040.uupv3ibkt3jtbotf@dhcp22.suse.cz> <20170926172610.GA26694@cmpxchg.org> <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz> <20170927162300.GA5623@castle.DHCP.thefacebook.com> <20171002122434.llbaarb6yw3o3mx3@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 02-10-17 12:00:43, Shakeel Butt wrote: > > Yes and nobody is disputing that, really. I guess the main disconnect > > here is that different people want to have more detailed control over > > the victim selection while the patchset tries to handle the most > > simplistic scenario when a no userspace control over the selection is > > required. And I would claim that this will be a last majority of setups > > and we should address it first. > > IMHO the disconnect/disagreement is which memcgs should be compared > with each other for oom victim selection. Let's forget about oom > priority and just take size into the account. Should the oom selection > algorithm, compare the leaves of the hierarchy or should it compare > siblings? For the single user system, comparing leaves makes sense > while in a multi user system, siblings should be compared for victim > selection. THis is simply not true. This is not about single vs. multi user systems. This is about how the memcg hierarchy is organized (please have a look at the example I've provided previously). I would dare to claim that comparing siblings is a weaker semantic just because it puts stronger constrains on how the hierarchy is organized. Especially when the cgrou v2 is single hierarchy based (so we cannot create intermediate cgroup nodes for other controllers because we would automatically get a cumulative memory consumption). I am sorry to cut the rest of your proposal because it simply goes over the scope of the proposed solution while the usecase you are mentioning is still possible. If we want to compare intermediate nodes (which seems to be the case) then we can always provide a knob to opt-in - be it your oom_gang or others. I am sorry but I would really appreciate to focus on making the step 1 done before diverging into details about potential improvements and a better control over the selection. This whole thing is an opt-in so there is a no risk of a regression. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id A0AF06B0033 for ; Mon, 2 Oct 2017 15:28:22 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id e26so500446pfd.4 for ; Mon, 02 Oct 2017 12:28:22 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id h90si8234028pfh.592.2017.10.02.12.28.21 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 02 Oct 2017 12:28:21 -0700 (PDT) Date: Mon, 2 Oct 2017 21:28:14 +0200 From: Michal Hocko Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20171002192814.sad75tqklp3nmr4m@dhcp22.suse.cz> References: <20170926133040.uupv3ibkt3jtbotf@dhcp22.suse.cz> <20170926172610.GA26694@cmpxchg.org> <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz> <20170927162300.GA5623@castle.DHCP.thefacebook.com> <20171002122434.llbaarb6yw3o3mx3@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Shakeel Butt Cc: Tim Hockin , Roman Gushchin , Johannes Weiner , Tejun Heo , kernel-team@fb.com, David Rientjes , Linux MM , Vladimir Davydov , Tetsuo Handa , Andrew Morton , Cgroups , linux-doc@vger.kernel.org, "linux-kernel@vger.kernel.org" On Mon 02-10-17 12:00:43, Shakeel Butt wrote: > > Yes and nobody is disputing that, really. I guess the main disconnect > > here is that different people want to have more detailed control over > > the victim selection while the patchset tries to handle the most > > simplistic scenario when a no userspace control over the selection is > > required. And I would claim that this will be a last majority of setups > > and we should address it first. > > IMHO the disconnect/disagreement is which memcgs should be compared > with each other for oom victim selection. Let's forget about oom > priority and just take size into the account. Should the oom selection > algorithm, compare the leaves of the hierarchy or should it compare > siblings? For the single user system, comparing leaves makes sense > while in a multi user system, siblings should be compared for victim > selection. THis is simply not true. This is not about single vs. multi user systems. This is about how the memcg hierarchy is organized (please have a look at the example I've provided previously). I would dare to claim that comparing siblings is a weaker semantic just because it puts stronger constrains on how the hierarchy is organized. Especially when the cgrou v2 is single hierarchy based (so we cannot create intermediate cgroup nodes for other controllers because we would automatically get a cumulative memory consumption). I am sorry to cut the rest of your proposal because it simply goes over the scope of the proposed solution while the usecase you are mentioning is still possible. If we want to compare intermediate nodes (which seems to be the case) then we can always provide a knob to opt-in - be it your oom_gang or others. I am sorry but I would really appreciate to focus on making the step 1 done before diverging into details about potential improvements and a better control over the selection. This whole thing is an opt-in so there is a no risk of a regression. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org