From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751391AbdH3LX3 (ORCPT ); Wed, 30 Aug 2017 07:23:29 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:48528 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751282AbdH3LX0 (ORCPT ); Wed, 30 Aug 2017 07:23:26 -0400 Date: Wed, 30 Aug 2017 12:22:40 +0100 From: Roman Gushchin To: Michal Hocko CC: , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , David Rientjes , Tejun Heo , , , , Subject: Re: [v6 2/4] mm, oom: cgroup-aware OOM killer Message-ID: <20170830112240.GA4751@castle.dhcp.TheFacebook.com> References: <20170823165201.24086-1-guro@fb.com> <20170823165201.24086-3-guro@fb.com> <20170824114706.GG5943@dhcp22.suse.cz> <20170824122846.GA15916@castle.DHCP.thefacebook.com> <20170824125811.GK5943@dhcp22.suse.cz> <20170824135842.GA21167@castle.DHCP.thefacebook.com> <20170824141336.GP5943@dhcp22.suse.cz> <20170824145801.GA23457@castle.DHCP.thefacebook.com> <20170825081402.GG25498@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170825081402.GG25498@dhcp22.suse.cz> User-Agent: Mutt/1.8.3 (2017-05-23) X-Originating-IP: [2620:10d:c092:200::1:4542] X-ClientProxiedBy: DB6PR07CA0123.eurprd07.prod.outlook.com (2603:10a6:6:16::16) To SN2PR15MB1085.namprd15.prod.outlook.com (2603:10b6:804:22::7) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c0cf92c0-1aad-475a-bc5a-08d4ef997703 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:SN2PR15MB1085; X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1085;3:fOFhjLi460VIgMgj9unnA9yBEd/Qpdd7nZVzMafcddew1HjqhMAgnBdi25W+NmWM/6OsYpEEfbyGtLYBeTZW/Lx7Y+X63MYEZt2vP3ZoRdEatkrJBnVbHrovtt1AA24u+471F3DsqjhnIGq9t+PG2zzBD8g59aVg2CXRSjbsl483t3TKmVPcTtbdPfvUpqKmaX75GY1VsFri0n3XT9DAbBjsDr+lbi3a1GvKZ9FjOMR3KONv+Sd0GJl+Nm8k6cF1;25:CHeVnoEowRQuRSibHdFNt+EKISLVYH8rd03sl1WThQ646ySy7BfDI3cL0yJxhxR5ZPGW17PXAEVmpfv0jeVgA6TyV8j11ZLiVY2X3Mn7rRuNj49DPw1h9fMKTB01UGR/AVhNWD4FjifP9kToGlGUIUIwSP8+hr974F+3XOSoWlSBCR2itUM1sYL096RNNlkd2lyhVfi7Tj6PuV/qraGm5NgfgAfpDZ5TKTdx1GAxMkUbEFqu6RZq0akjDqv80VJhyPpvS/7wufygMewHcFJW6NkxB1usinIlUl98Mz1+3Cwond7qzwMFeHHLd3DPSzWLg0+uYb4On0Pd8CCA3z5D+g==;31:GDTypGgy2RjKo9W1Gc6cJ0HgTn6jvZmhEx8psRELtcr8svBQPZK0iJLD8pQxP8KhmFX1xkeD/IManAi1S3ckXF0tAzl7dAljAd5jgwKPiwTl/gOvMurkFArHGh9pLyyThtGuvgrAPjK7O5p66MDjCtEntyAwz/3rWqc+4El7ZrjsjUiP8DcEiC1pWOkT3qogVX4mmKij4h0299US9Hc6AA8OHjndQ9ffQpf1x/1fh08= X-MS-TrafficTypeDiagnostic: SN2PR15MB1085: X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1085;20:eDeUNFAa42zXLN6BqMRUBv38+Xw6L3WtV3YrCCK87GL+gni89tJMkqlGK0kU3LS69gOvh5P73Sx25/J0d87HgAbGe7gwhZyA2Um3UyHYs5ZdFaD3/DWjq0y0C08SAaCiFKgF4vOday0gyzg+wreHSjVXgIUcxXN+ixKFZizBkyN5U4GZsbUUeXkdZ5LeYyfVY7BQoT5j359J/v+M7aC7hcDNehTPEd9VDTX1QwJVCrhngn9I8WVrsQEMr+DgrhablYWhDf1KBeI6eW8F3wi7cg3LecG7mA737NTCNVnT8Na1gWuKU7mHFqR9JIANWaY7/nRRzVcrX+U3D8b9ZJbxAsnBWB86pdIkLcNHByvq6gDpZ0OxX1L8+a/LSeXXTSJOvbFQgWv3vYjW+mZ0bF2xHQVlccagYnTpLW6i8K07RSmiiUn5RcJiUxo6YQmft+MNnAcQQuoVUTdNij1Tp0TqG4us1GDLbw8ZVilmPqxEmqNqeH/twFnWX2VwcXtB6hF2;4:o+mVpsf7yvKbKE5zT4Qqf3++mCZ0TdWmeuEkxe+XSkosdM4dVvAM6daAXv+6ytPu7Pu2fqJFQ0FzgYLot24a7QYMYJQEkHhz0OnB6bdLYh3Gt+wn7+Tw9FiGPil8NR94XaVLmI9M0Rjx3kiI86FF8cgh3LUD2q1Jiysf0vyrGZOamObjLofBATFLXN8Sl8VqIPe2S4cR6yx/dUox+iPTdsjE9QI/7+3JZNb7+KSl9EKA3Xm/b3LrsArkpLrGz2hIzafDvytmJZ3safxhyHUoKqOQbinZg9ZygJCiVK+fvs4= X-Exchange-Antispam-Report-Test: UriScan:(17755550239193); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(100000703101)(100105400095)(6041248)(20161123562025)(20161123564025)(20161123555025)(20161123560025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:SN2PR15MB1085;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:SN2PR15MB1085; X-Forefront-PRVS: 041517DFAB X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(377424004)(189002)(199003)(24454002)(7416002)(86362001)(53936002)(68736007)(9686003)(55016002)(33656002)(39060400002)(478600001)(2906002)(305945005)(189998001)(25786009)(110136004)(6246003)(83506001)(50986999)(76176999)(54356999)(7736002)(1076002)(6506006)(23726003)(81166006)(81156014)(8936002)(8676002)(6116002)(5660300001)(106356001)(105586002)(6666003)(229853002)(54906002)(97736004)(42186005)(93886005)(101416001)(50466002)(4326008)(47776003)(4001350100001)(2950100002)(6916009)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:SN2PR15MB1085;H:castle.dhcp.TheFacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;SN2PR15MB1085;23:0rYzHgeF6FJHI7RtWQuNfhxz54NDGSavhGVxdRGG2?= =?us-ascii?Q?F+oVQv+mqA0GzR0UHngwuP+aj0u5UosKPnOI5CWpI1f27CBs4VYYowZvThtx?= =?us-ascii?Q?sswvTEZsHPFPfmIC+b5OnU9lkVQgpLN8wLzHvTpLj8+821FzeVgfkupn7Krl?= =?us-ascii?Q?F6YnJE4QFEHF+SemS7+IrOV7jKGwE0EfwZ7KoLTj3yIepjUfPCtS4MqV1HNg?= =?us-ascii?Q?vbKLGMx8Ta1ZwD0C26vomPDu4jzmkVHgODCOXdO5cIPP2N8B79ot/mMd+Zs5?= =?us-ascii?Q?Q6wgrblUixwSu1caJvakXTW8YVARqPOkfxSVsrXNM9/uG145aC6Iy9aRUQca?= =?us-ascii?Q?L6ow/wjOT7NSAx6MreLe6W5mZvG41LkTe4INTymRoubJWCCElBj3Dkj4ayh2?= =?us-ascii?Q?B+dPeuUiB5Il9MF2UMowf2b6D07RUckqLX8RMo8aNUK5ALZnepNYe9WNEWA/?= =?us-ascii?Q?PMRU7kFyJrVSmUMKMSDPiEyOa1ST8i1Ub+MITJJp8IGoaE1SzagBXIovO2oq?= =?us-ascii?Q?ORGpVPQVzOXoNUppRG+1J9VdI2hfq9HvoZifPlrbTn/5CkoJ5U/iX3c9vhbY?= =?us-ascii?Q?h93WiAlpcvkPUDBBx7lG0eCL9dfFJkzqBEsu8lFa65m3inS9O3cOIQ8Gcuiz?= =?us-ascii?Q?b8X9DcEQ/fkIQGHQsmJRCwGBHbcC8Jm9GQmti2oGYwREBxSn1xVArD/X2zgv?= =?us-ascii?Q?gjXAiDf/ACy6G/WyT7+M07icxxNreB+VAZQS0JscCqsO9XtVzQdKEK1Sskez?= =?us-ascii?Q?jpDNP66oAJRb/NWOURNr2mKiv3ay0t7cBg1ThpZrMyqdv36iHV+MdHelY95E?= =?us-ascii?Q?6ilAPpb7lYPnk+3LOFRCIpcWIGS4M0/VI/IeeTQa+/7Vqe3Z/VCguZ39vnI6?= =?us-ascii?Q?+lwUPZS2pW9weVR6o9d2P6u3az1e/G0G39mRYGMMR38Jz6Ro8zhWNO87lo+m?= =?us-ascii?Q?zPLuXSmTTewATIcQ8v/7/JwsqCndLgGd5+xVZm9hd1htkerlGz1WYXBGtgZM?= =?us-ascii?Q?0vkwNaQzAzvCHCUoyisij+yy0wQqwEEJ2BWECDbi5yyk558EuqM2oYY2F0ca?= =?us-ascii?Q?veGo/sEiPYxJU+oCxARWlroRS/iMj0CwpLG7aXqXfPaUc7aIHtgVc8RENL2K?= =?us-ascii?Q?BFRv+zZArJ9zOUeN3MYTD8JI6LUIOEL/a/INxa+p54n+6JxhXHA50ZQCI1xA?= =?us-ascii?Q?RRgc7EhJ68E3iQbGJwu0Si0LW8GVZl2ahgRDCun781++81JNA6pQqJQSQ=3D?= =?us-ascii?Q?=3D?= X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1085;6:DtAxdNwEMq3mw+yWcpUOfH3lkOXPdHRadPFOGLojHnF5SaUEO851+AYGAOkVMMAkdGuyaBxs7fb4c7jZDlgWPnr9uQIj0ML7D4uBQfv3Hv1RfYBNdOHezYqwrISzTH2imIcF+tRK9qTyIsWqXBUh6SClN1ij7jdBQEtAky75aRvWsOFzUoxsFzDzJ8IQ7aDWk0Oy4qOfWqRxCxGGiE83uDiNI7uZlflYEejcgk+H0+rQKMnkCb/wD9JUhnIVAeP+kic+DsNJmzx6AkipYWITyioXjXafwOiT8CasCuAGgVGK9zRCsO56tkeSk9/AykC1trtA0xJg+RUFd+TDdP53jQ==;5:MYJjPpHJAelUSx4/y1ab2SqKIT1qFOzt2n4jcpTbFhG16g9xUQGnNia8x0HgNCAl7IW20QyhrQKvZ5onfoBp8WAklN0Mz5JJYOttq/HeOoaGESPbY6QPdWNJWLsq2V0xNtBu/g6FQkPCEfEPUMqbTw==;24:DqksJhZ/rxGumldxf4B2vbHg5VJR7c6X6hQysqxzF3ZdeQyg5WtVOEcspmTn6moP2gJztl2G6+v3KBEwlshFu6tfUJlHFvTEkSyrj8DYbVs=;7:Z/EbRCzgwUSV7fXvMqrTovrTEPY1msY5ZdSI38an2z2KduvszJ/lizqdEvtEsIQaRNAG2xAGdNcys37FUgJ5tDxOuZPhM+JvB9s0smdeL9SFNhpSO4PHPw/9b/cU6iNwWfM0pgFWu2Gn2tTd58mX2vZ0kDjIdhcyCacRpMifJw27uYjBdRxmTNQLxKAk1Z7JBcYy4157wlJjcP/nSIEm/rrX8DlaeSyX5uxrJG3GQs8= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1085;20:x4yvJvQp66WH2o7hrwTLtrrxPrRqYPF/9lfULS2luTnG60c5VUDTzChZQ4ENpCO5Rdxfo+NPTqzEPXHyILRaOlU+0BgDOxuilx/I1JAdgbXvPyvF8/0Gbb7NHLJlsy9KO+qY1XNbaUuvC4D6YzLacbeh/H0fwHjg/CN+4ezj0q8= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Aug 2017 11:22:55.3256 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN2PR15MB1085 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-30_05:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 25, 2017 at 10:14:03AM +0200, Michal Hocko wrote: > On Thu 24-08-17 15:58:01, Roman Gushchin wrote: > > On Thu, Aug 24, 2017 at 04:13:37PM +0200, Michal Hocko wrote: > > > On Thu 24-08-17 14:58:42, Roman Gushchin wrote: > [...] > > > > Both ways are not ideal, and sum of the processes is not ideal too. > > > > Especially, if you take oom_score_adj into account. Will you respect it? > > > > > > Yes, and I do not see any reason why we shouldn't. > > > > It makes things even more complicated. > > Right now task's oom_score can be in (~ -total_memory, ~ +2*total_memory) range, > > and it you're starting summing it, it can be multiplied by number of tasks... > > Weird. > > oom_score_adj is just a normalized bias so if tasks inside oom will use > it the whole memcg will get accumulated bias from all such tasks so it > is not completely off. I agree that the more tasks use the bias the more > biased the whole memcg will be. This might or might not be a problem. > As you are trying to reimplement the existing oom killer implementation > I do not think we cannot simply ignore API which people are used to. > > If this was a configurable oom policy then I could see how ignoring > oom_score_adj is acceptable because it would be an explicit opt-in. > > > It also will be different in case of system and memcg-wide OOM. > > Why, we do honor oom_score_adj for the memcg OOM now and in fact the > kernel memcg OOM killer shouldn't be very much different from the global > one except for the tasks scope. > > > > > I've started actually with such approach, but then found it weird. > > > > > > > > > Besides that you have > > > > > to check each task for over-killing anyway. So I do not see any > > > > > performance merits here. > > > > > > > > It's an implementation detail, and we can hopefully get rid of it at some point. > > > > > > Well, we might do some estimations and ignore oom scopes but I that > > > sounds really complicated and error prone. Unless we have anything like > > > that then I would start from tasks and build up the necessary to make a > > > decision at the higher level. > > > > Seriously speaking, do you have an example, when summing per-process > > oom_score will work better? > > The primary reason I am pushing for this is to have the common iterator > code path (which we have since Vladimir has unified memcg and global oom > paths) and only parametrize the value calculation and victim selection. > > > Especially, if we're talking about customizing oom_score calculation, > > it makes no sence to me. How you will sum process timestamps? > > Well, I meant you could sum oom_badness for your particular > implementation. If we need some other policy then this wouldn't work and > that's why I've said that I would like to preserve the current common > code and only parametrize value calculation and victim selection... I've spent some time to implement such a version. It really became shorter and more existing code were reused, howewer I've met a couple of serious issues: 1) Simple summing of per-task oom_score doesn't make sense. First, we calculate oom_score per-task, while should sum per-process values, or, better, per-mm struct. We can take only threa-group leader's score into account, but it's also not 100% accurate. And, again, we have a question what to do with per-task oom_score_adj, if we don't task the task's oom_score into account. Using memcg stats still looks to me as a more accurate and consistent way of estimating memcg memory footprint. 2) If we're treating tasks from not-kill-all cgroups as separate oom entities, and compare them with memcgs with kill-all flag, we definitely need per-task oom_priority to provide a clear way to compare entities. Otherwise we need per-memcg size-based oom_score_adj, which is not the best idea, as we agreed earlier. Thanks! Roman