From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932384AbdIRPD0 (ORCPT ); Mon, 18 Sep 2017 11:03:26 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:37388 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754310AbdIRPDW (ORCPT ); Mon, 18 Sep 2017 11:03:22 -0400 Date: Mon, 18 Sep 2017 08:02:54 -0700 From: Roman Gushchin To: Michal Hocko CC: David Rientjes , , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Andrew Morton , Tejun Heo , , , , Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20170918150254.GA24257@castle.DHCP.thefacebook.com> References: <20170913122914.5gdksbmkolum7ita@dhcp22.suse.cz> <20170913215607.GA19259@castle> <20170914134014.wqemev2kgychv7m5@dhcp22.suse.cz> <20170914160548.GA30441@castle> <20170915105826.hq5afcu2ij7hevb4@dhcp22.suse.cz> <20170915152301.GA29379@castle> <20170915210807.GA5238@castle> <20170918062045.kcfsboxvfmlg2wjo@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170918062045.kcfsboxvfmlg2wjo@dhcp22.suse.cz> User-Agent: Mutt/1.8.3 (2017-05-23) X-Originating-IP: [2620:10d:c090:200::7:3a18] X-ClientProxiedBy: SN4PR0501CA0055.namprd05.prod.outlook.com (2603:10b6:803:41::32) To DM3PR15MB1083.namprd15.prod.outlook.com (2603:10b6:0:12::9) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9784780c-8557-47d4-3c81-08d4fea65c2d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:DM3PR15MB1083; X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1083;3:urSVi45K3Wof4H7pA5hP+qzwaX9IhLyQwN0bNvP0R10IOOzzIGqfWU0x0pAk2xn+9xo6W9AxTjky3nZOOFb6yS9vafR+QkEObCrgZc2IJTqSeNYKgrmYxhmUzwfL0BnhpE4nLUUjjQleZ98OfewJHPJ0Y63pfPY9CLgowj8vtejRPfpiUbjlHluWQGeqGGSolSeGvVhJWepFbvEquWFzKGUjvhU8EriXKk5/HhzEqsJbJ3FsFR0oaLiwN0C+YERU;25:GRe17wNtWzP9VAnZTRO0VvjvJhO14sTaIjCBx9VPYkwUHBvQbME3fNUysqqYPD9Z39hYBllcKb6Noo5yLNkd41GwzILllK7EKmJIuI7xDc68HPDRVlYKOdcxGZRgfPlBHIzsqVSaGz4hrMMj1CgbnXYsXtkxXJwSZKkm0L0ZPhzSz+geE4snLB2KTNgU4ejuTh+4rBiL7R9beG6VUMQJtZ/H3G72ahhgXw1/mXIxIlXcYu0o6vvQlGMo3e/sY1qkWb60rF8K0Xf++D+WIRKBWfWLAINOQ9/XN6foMgY91trbdTUyWbOM6q5LAQnJVczmhBrUGitONAUgOwyoekKCXQ==;31:ulTimxsPDIbsFRY329phhc+E5qgAJIGrJFwJAOAajdq607Go0hALDz83hSnMv9Huaoaakm8GH9ptp1q+DQYRd2x6C3uIYKrBiIOkXl5nreu5Q65Zc9xTxcrvXkaEwhqfY9Bhj+A4I/GlPDl1nm6Izv6UjQbo//8ORWfww0cy3fty46dBMvfitS6bJW5eR55tegPbjS+7W5pTlHkgG41+aUuvF8Tb2pLD+7jHvIspSBY= X-MS-TrafficTypeDiagnostic: DM3PR15MB1083: X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1083;20:M07gy4gcjARND5hae9tlmMDzOdO9lbIFFanyaZjvGIhtVLQ8SEMEYnn4sidEB81ZvOyq7VfofCR4wK17PJPdmVE04lqaO6yYlTb94FRhPRFaDy+g3savfsItt8HTIPN3bNVG3oruchsrrZ08++pc1DHKDJeTfzsLSLFSMKJwrnOWA197NFojQrfvEI3OUpv+Kzfrylhbz7zlzFJ5+Vp2iciY0bRVlD25DgDnRadCx6o/GXlYmKRTPNvX8XsF7i35ceynVXO+12SGvEh6DLQA6cM1R3JaRr5XjfyfDHfIaZYFAY3B6gE8lO9RaNlpy/m9haMuF1q2Ohswi3Ra2oVVGy3jd3yhfxHBc1dphScQ5jXHHYTnk2eATA+5XGoXFbBLrLkgH9ucHBEKLIf7Df0R+qk0U2kuEyyI4Na6LYuO1rbrQfC5eZCtQ5OYZIMgwOgHU3XGhRmkPIqZhXZjjebGEZiNxVEbTW/1ibJI5OzDjmvCfcIhUAXvkkccq1jTYzri;4:fGD2B7wZDtZQdwV9lXJl7W8blvrDnzwYvxt5QRjjDnSyYPheH5udrLGtIv0VM7Iw48HKCo494ps3Y1EqBIepJy+XITKtkSfq/uzRmyu38mZ3tHfvOwcdr8JzODzGR+yCqEdRBwCTmVNwl+YstGUn6oxtFIWNp0/BVBo+vDgKvCPjdkm9AlswqH8zddl4b/KG49xwrA6cYEY7svu3AVOcPcFTK/mxqI5jsTPqbv+5ecwzWHxvvyt1lxVwK2X05wh4 X-Exchange-Antispam-Report-Test: UriScan:; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(100000703101)(100105400095)(93006095)(93001095)(10201501046)(6041248)(20161123555025)(20161123558100)(20161123560025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DM3PR15MB1083;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DM3PR15MB1083; X-Forefront-PRVS: 04347F8039 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(346002)(376002)(24454002)(189002)(199003)(377424004)(229853002)(97736004)(2906002)(6666003)(39060400002)(4326008)(33656002)(101416001)(6246003)(110136004)(53936002)(25786009)(47776003)(5660300001)(7416002)(54356999)(50986999)(76176999)(6506006)(2950100002)(54906002)(6916009)(9686003)(55016002)(86362001)(478600001)(68736007)(8936002)(7736002)(305945005)(8676002)(81166006)(81156014)(16586007)(58126008)(6116002)(23726003)(1076002)(316002)(93886005)(105586002)(106356001)(189998001)(83506001)(50466002)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM3PR15MB1083;H:castle.DHCP.thefacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM3PR15MB1083;23:vUC7PyLPIPDsBWCZyxAeukrFcanHvKnl1ZmBqoYqY?= =?us-ascii?Q?gJS/03SCpitazhOGp+/Hm6G+h/Un2q7/P4YGzg2X91jUJ4F7Xxzyer+Eg+HA?= =?us-ascii?Q?rSqfF5rPBDu+QJ9OgfWL1Bmv6FQq9xFagFyf2me7O0nwVb1JWupdir1bBAyF?= =?us-ascii?Q?iq2ZJUV2q9XkaycqKTGy81NrsJNsg5gZbPmP+jeGgOfweJIPXqdmPDj2863H?= =?us-ascii?Q?BnizwpQO9RqW7s+sTtO6ENge2G/tTV0D8T0SW/KTVBFff6/YO3XYz1KJ3Awq?= =?us-ascii?Q?MEOM9h8f2IulnpFcNXNHmSbv4jhFTuE5aZll006A7B1FtTGfpKEZMK9GOSYV?= =?us-ascii?Q?csjX7y6+X1/RQLJP27bn102iZ5s/atVaust1iXj1d0W8w8stAA9yESe3yluH?= =?us-ascii?Q?tRz90gJA7KnHOM03bP9uarobN0NQreVFqvtFaaZo9xNah6/LqjEfDjYK6drY?= =?us-ascii?Q?bYYIwg7BqOYF/NhpiPQ7CJqT7Dqxq85y9iLoFzsicqtJcNKAh5L3zl/nysEg?= =?us-ascii?Q?enFEaOjx0y+RcXuu+RPMRdonhYQSX1udUj3/lQbo31LT6POQeJn5feN06GSQ?= =?us-ascii?Q?pMyS18c8HFnU43aw6DZhXMT9vogw9w68EeE74vA2sw6+KW2XabZqco65G8dn?= =?us-ascii?Q?GDt3m++ny4QtZfNMnHbe8p4AjAaIVAi3dfRtbYUYqIBvjyhK8TtXTd9OmU28?= =?us-ascii?Q?N7RyCif8r/bzM6NRW4Dghq6I4WUMXVvag3O4MzLX3vnsgQHtQLJ/zkg1SIhy?= =?us-ascii?Q?g9hCIga5xKVlOQRMDfYWAzVXFefZQ5X5VUiElkXWHue/RxBe8Vvyck/e9HUR?= =?us-ascii?Q?Gn16rhHk0AbFaCVUpHJZrHp9Yr2dXLFxT1nqIypZ/a6Vcr/7KLhYWuklySxA?= =?us-ascii?Q?nnxcHm767vx6w9asUCKYJzgZuGJMIgfX3jHjQc/WD+e0Rfr5xlqxXgoJYtjo?= =?us-ascii?Q?uXrNK3IlW6XFO6iHldUZPS0UBIJ4LhMIkZHB1Qo55H4uCQewC/Xgcs77gd9T?= =?us-ascii?Q?k/QPjD9nOfqxOkcjXRbQhtaZGCt/9acs2QbFJ1njxijUj4SjEbcRrg5bV/+q?= =?us-ascii?Q?xgxbpICMyI55BA5bZyjzY5wcvncX0Psyqgq6smc/aspYhdLcyjehnj1KfWGv?= =?us-ascii?Q?uj4rKdIK2u28m4VmSX0eXXJOlvaCpaRCjdKNTG5gOHBligxclOavKej8GQDq?= =?us-ascii?Q?Hm09fFgX4v+5HMUwciyfO1tn1C3rN9pgg/z1jHOeNZcRUVI7hY2DClc1JETh?= =?us-ascii?Q?iWSgKU9x+JG6e/5SFM=3D?= X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1083;6:UT3Xx26mKW/yj0ybj+bDac4ivgJbN7Z2TrqGW4MAOxRp9hOHU67pFmwE5d/xt3+0bl6nw3aeS4ycKqvV3ZDZo+OzWlIfwuCs3nxb8bJiCp7uFvPTLdJRmIlA7JhcQy+Y3d+kbxQ5sNP8yiOuAGAgga4YF6U9Sitn7JKs8L5AOY8+PWtOL+k+W8mbM4/+bdQ+QYpfjwtIovDFnWf3OvMrimlr0FzRDSsBXGL0sRDJLwh/z2SvBKEBOYZ3XjNHlZrAXmiY5UtM+WQ8XOKnzQLP6zgK2NSU4drQVg3y8M/N8HSdlm0eZwhrkBUQjo/OfF1lJ+hvX8n2o9F8dDcGJ2oOkw==;5:aPtr72OF6DaFqG0YSHsjnHkJGPvWNOLCdyf97bexPDt8mCaFysNhTV23YKEgqOO8Z9GWe+y5HUwWKlWq1g08AmujZ+846EFWzhtueWf4FD1GvhJ88y+Cj02otLRcGLUdlcCNncpImGoBFcht/GRH9g==;24:U0m0MlHwkMJbimU6KYmRsF259COlBrefUPAanC1dsdQ0NnfJGW2EsLNyFA+HWm08PbJN7JvDKuu5t8JMKBkQIgB23P5c6iMYIoHODkfHzdU=;7:Y8P9JBSsw2aQ2a99gm4OnfFtgqvspD2c5b/UEhUkDFRfGp15YpyaL9rbp7tQqq6DkzzkzhH/OYO2OoyW+c7qDhUl9tMQDUIqX05VmwWr6g5Kz3sc2aC/pm70oPC63Xt2TxNJs3BX59U6eOqN6Rd0LjTFGyLkmvCd79Icywmi+fsZKd+edqvgm956sXdZPm7eAQGETqUTBEZHTys3Ik+bktKjsnbXo/FUYvePDAetEJc= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1083;20:baJm68UfX/AHeLkfzo4aOcLb9oxxG3BA04ZCeFUAmEwjfBJEGOYaydnIJ5FntDS2MEa0VvqgK+R3kbQCJ4REqrXgrKmdC1swosRFUWs080RAZsY7yRX7vanMz9DlMVykYJ8xh73zFJF2dPbHA9Of6lcjwKItJKNrMTn30WAmVDQ= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Sep 2017 15:03:01.5420 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM3PR15MB1083 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-18_05:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 18, 2017 at 08:20:45AM +0200, Michal Hocko wrote: > On Fri 15-09-17 14:08:07, Roman Gushchin wrote: > > On Fri, Sep 15, 2017 at 12:55:55PM -0700, David Rientjes wrote: > > > On Fri, 15 Sep 2017, Roman Gushchin wrote: > > > > > > > > But then you just enforce a structural restriction on your configuration > > > > > because > > > > > root > > > > > / \ > > > > > A D > > > > > /\ > > > > > B C > > > > > > > > > > is a different thing than > > > > > root > > > > > / | \ > > > > > B C D > > > > > > > > > > > > > I actually don't have a strong argument against an approach to select > > > > largest leaf or kill-all-set memcg. I think, in practice there will be > > > > no much difference. > > > > > > > > The only real concern I have is that then we have to do the same with > > > > oom_priorities (select largest priority tree-wide), and this will limit > > > > an ability to enforce the priority by parent cgroup. > > > > > > > > > > Yes, oom_priority cannot select the largest priority tree-wide for exactly > > > that reason. We need the ability to control from which subtree the kill > > > occurs in ancestor cgroups. If multiple jobs are allocated their own > > > cgroups and they can own memory.oom_priority for their own subcontainers, > > > this becomes quite powerful so they can define their own oom priorities. > > > Otherwise, they can easily override the oom priorities of other cgroups. > > > > I believe, it's a solvable problem: we can require CAP_SYS_RESOURCE to set > > the oom_priority below parent's value, or something like this. > > As said in other email. We can make priorities hierarchical (in the same > sense as hard limit or others) so that children cannot override their > parent. You mean they can set the knob to any value, but parent's value is enforced, if it's greater than child's value? If so, this sounds logical to me. Then we have size-based comparison and priority-based comparison with similar rules, and all use cases are covered. Ok, can we stick with this design? Then I'll return oom_priorities in place, and post a (hopefully) final version. Thanks! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f199.google.com (mail-qt0-f199.google.com [209.85.216.199]) by kanga.kvack.org (Postfix) with ESMTP id 7ACEE6B0069 for ; Mon, 18 Sep 2017 11:03:25 -0400 (EDT) Received: by mail-qt0-f199.google.com with SMTP id f24so696573qte.7 for ; Mon, 18 Sep 2017 08:03:25 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id b11si7292805qte.509.2017.09.18.08.03.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Sep 2017 08:03:21 -0700 (PDT) Date: Mon, 18 Sep 2017 08:02:54 -0700 From: Roman Gushchin Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20170918150254.GA24257@castle.DHCP.thefacebook.com> References: <20170913122914.5gdksbmkolum7ita@dhcp22.suse.cz> <20170913215607.GA19259@castle> <20170914134014.wqemev2kgychv7m5@dhcp22.suse.cz> <20170914160548.GA30441@castle> <20170915105826.hq5afcu2ij7hevb4@dhcp22.suse.cz> <20170915152301.GA29379@castle> <20170915210807.GA5238@castle> <20170918062045.kcfsboxvfmlg2wjo@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170918062045.kcfsboxvfmlg2wjo@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: David Rientjes , linux-mm@kvack.org, Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Andrew Morton , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org On Mon, Sep 18, 2017 at 08:20:45AM +0200, Michal Hocko wrote: > On Fri 15-09-17 14:08:07, Roman Gushchin wrote: > > On Fri, Sep 15, 2017 at 12:55:55PM -0700, David Rientjes wrote: > > > On Fri, 15 Sep 2017, Roman Gushchin wrote: > > > > > > > > But then you just enforce a structural restriction on your configuration > > > > > because > > > > > root > > > > > / \ > > > > > A D > > > > > /\ > > > > > B C > > > > > > > > > > is a different thing than > > > > > root > > > > > / | \ > > > > > B C D > > > > > > > > > > > > > I actually don't have a strong argument against an approach to select > > > > largest leaf or kill-all-set memcg. I think, in practice there will be > > > > no much difference. > > > > > > > > The only real concern I have is that then we have to do the same with > > > > oom_priorities (select largest priority tree-wide), and this will limit > > > > an ability to enforce the priority by parent cgroup. > > > > > > > > > > Yes, oom_priority cannot select the largest priority tree-wide for exactly > > > that reason. We need the ability to control from which subtree the kill > > > occurs in ancestor cgroups. If multiple jobs are allocated their own > > > cgroups and they can own memory.oom_priority for their own subcontainers, > > > this becomes quite powerful so they can define their own oom priorities. > > > Otherwise, they can easily override the oom priorities of other cgroups. > > > > I believe, it's a solvable problem: we can require CAP_SYS_RESOURCE to set > > the oom_priority below parent's value, or something like this. > > As said in other email. We can make priorities hierarchical (in the same > sense as hard limit or others) so that children cannot override their > parent. You mean they can set the knob to any value, but parent's value is enforced, if it's greater than child's value? If so, this sounds logical to me. Then we have size-based comparison and priority-based comparison with similar rules, and all use cases are covered. Ok, can we stick with this design? Then I'll return oom_priorities in place, and post a (hopefully) final version. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roman Gushchin Subject: Re: [v8 0/4] cgroup-aware OOM killer Date: Mon, 18 Sep 2017 08:02:54 -0700 Message-ID: <20170918150254.GA24257@castle.DHCP.thefacebook.com> References: <20170913122914.5gdksbmkolum7ita@dhcp22.suse.cz> <20170913215607.GA19259@castle> <20170914134014.wqemev2kgychv7m5@dhcp22.suse.cz> <20170914160548.GA30441@castle> <20170915105826.hq5afcu2ij7hevb4@dhcp22.suse.cz> <20170915152301.GA29379@castle> <20170915210807.GA5238@castle> <20170918062045.kcfsboxvfmlg2wjo@dhcp22.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=facebook; bh=hXzptZLiYGkUonFA79cDvUld0RJSbj2WzD2gcTEVlfE=; b=WhsIUjAc+n7stHuUfve1ggF6v8IgpuSbihUfVAWVvK44P6cTZFCxsb5/030BEgaJH+oV jPikSdkvkTQLkrgL4r81vRCzqvwkOdt0ZdYw4vfXBSdAOhA24+Pgb9SNHdom/OTNhUVs CmmZmcYWGZYIJs88QDP9wSQWOxonzRAPQsE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=hXzptZLiYGkUonFA79cDvUld0RJSbj2WzD2gcTEVlfE=; b=H4VElPfPfZd3Obfiv15fB/SpparH+iIL/prHaj48OW6EAN+fXXGf66AvkckpeLaV7rIUskCN7vkwShbduBQ02MUP7lp+qxpsIaah9gVxvHC5+x7AkXve6hJO87IEbTWvysN4v2QSiJH91hRPwbRJltxH4eM0sYS4LOFQE/DCwl0= Content-Disposition: inline In-Reply-To: <20170918062045.kcfsboxvfmlg2wjo@dhcp22.suse.cz> Sender: linux-doc-owner@vger.kernel.org List-ID: Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: David Rientjes , linux-mm@kvack.org, Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Andrew Morton , Tejun Heo , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org On Mon, Sep 18, 2017 at 08:20:45AM +0200, Michal Hocko wrote: > On Fri 15-09-17 14:08:07, Roman Gushchin wrote: > > On Fri, Sep 15, 2017 at 12:55:55PM -0700, David Rientjes wrote: > > > On Fri, 15 Sep 2017, Roman Gushchin wrote: > > > > > > > > But then you just enforce a structural restriction on your configuration > > > > > because > > > > > root > > > > > / \ > > > > > A D > > > > > /\ > > > > > B C > > > > > > > > > > is a different thing than > > > > > root > > > > > / | \ > > > > > B C D > > > > > > > > > > > > > I actually don't have a strong argument against an approach to select > > > > largest leaf or kill-all-set memcg. I think, in practice there will be > > > > no much difference. > > > > > > > > The only real concern I have is that then we have to do the same with > > > > oom_priorities (select largest priority tree-wide), and this will limit > > > > an ability to enforce the priority by parent cgroup. > > > > > > > > > > Yes, oom_priority cannot select the largest priority tree-wide for exactly > > > that reason. We need the ability to control from which subtree the kill > > > occurs in ancestor cgroups. If multiple jobs are allocated their own > > > cgroups and they can own memory.oom_priority for their own subcontainers, > > > this becomes quite powerful so they can define their own oom priorities. > > > Otherwise, they can easily override the oom priorities of other cgroups. > > > > I believe, it's a solvable problem: we can require CAP_SYS_RESOURCE to set > > the oom_priority below parent's value, or something like this. > > As said in other email. We can make priorities hierarchical (in the same > sense as hard limit or others) so that children cannot override their > parent. You mean they can set the knob to any value, but parent's value is enforced, if it's greater than child's value? If so, this sounds logical to me. Then we have size-based comparison and priority-based comparison with similar rules, and all use cases are covered. Ok, can we stick with this design? Then I'll return oom_priorities in place, and post a (hopefully) final version. Thanks!