From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932159AbdHYKk0 (ORCPT ); Fri, 25 Aug 2017 06:40:26 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:43012 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932069AbdHYKkV (ORCPT ); Fri, 25 Aug 2017 06:40:21 -0400 Date: Fri, 25 Aug 2017 11:39:51 +0100 From: Roman Gushchin To: Michal Hocko CC: , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , David Rientjes , Tejun Heo , , , , Subject: Re: [v6 2/4] mm, oom: cgroup-aware OOM killer Message-ID: <20170825103951.GA3185@castle.dhcp.TheFacebook.com> References: <20170823165201.24086-1-guro@fb.com> <20170823165201.24086-3-guro@fb.com> <20170824114706.GG5943@dhcp22.suse.cz> <20170824122846.GA15916@castle.DHCP.thefacebook.com> <20170824125811.GK5943@dhcp22.suse.cz> <20170824135842.GA21167@castle.DHCP.thefacebook.com> <20170824141336.GP5943@dhcp22.suse.cz> <20170824145801.GA23457@castle.DHCP.thefacebook.com> <20170825081402.GG25498@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170825081402.GG25498@dhcp22.suse.cz> User-Agent: Mutt/1.8.3 (2017-05-23) X-Originating-IP: [2620:10d:c092:200::1:40d6] X-ClientProxiedBy: VI1PR0701CA0027.eurprd07.prod.outlook.com (2603:10a6:800:90::13) To CO1PR15MB1080.namprd15.prod.outlook.com (2a01:111:e400:7b66::10) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b22b6c90-9ee2-4ae3-95d2-08d4eba5a6f8 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(300000503095)(300135400095)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:CO1PR15MB1080; X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;3:68ff0TUGf+r1miJACb2QPqkB2B6qpchUDj2hT64gBwqQoXVDh4wZYHEiy0pby5wxf8i27viWfJMPBTHKYFiYEMYNN4+QGRBpqyp549ww0ajdThmApufdQvchrmIimVwpm0REvlSNobiSG+AkDTLylBa3RhZUkmZbtn6dFBC3ktsNIacPvl4fIfFFAsigLfWHYeg/gZtt9WXebtTaDNYUP60tciWVyw/7s+uPeWTcrRWWt7bfvQVAvUytXrp1edjh;25:ozaJ29RRXEH00j2CodMMhOf/1kUADBES7uf5fzxvhuZBt4Dnnh0plFVJxcIybEaSWrTvZhUdtTwlW3bFSK4aYI0Gy33gnpP6KSKrJtq24bIjzJWrT2CTR3cIDrpsRwAN8vE0evohUTf4xVzYAUnVRW91CmjNfxh+VZyf4eduD+wHqIZ6jMR7oNXmoxzvQ++X7QuItcjJaY7YYn+imU/nl/3YlneZhTNPYMJkVSNdGoJY3qn4vmn8qKauKo8S1CjwtYnvx2lR6pC72FwkfNpMEwDI4Rnsca+thR7nUGZZv2tyND3jHoU1bL0BUAc88Lh/rMNPtt0LwmVJB8yW7uhyuw==;31:BOBukrOOfblyYN55yLmNsFFBCRTKLBL2PZik0yih3x0xVmeml+AsaymC9847iZA6EWMduNWodmlhlDlZ9yzfcA7TG8UElkqc+g++7Paz8qb+3ra3XRxqhT0pysEK63X9cFtNYq6114UUOd2QQEIhEPdwK2MpoNcV9gyrDgd79kOPKZKarIJECgfEbOQrV5KQoTQ5Qn7r675GazqS1JJGU/3bICtiItvBq+lFHXlIoAg= X-MS-TrafficTypeDiagnostic: CO1PR15MB1080: X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;20:8OKUzRCp5A0wh3AY6I7vy+/idVqprQCQe5TJpvPlYq3ygrpe+vu+BjLUMdSB/5UKNBqAqrvBEhnxeIXNzeJXQBKMatVmJvaKfO3y3o1dVukiYs9CuGIcdxW86THKNL7lWH8TtVPf8DyKSB1JvdF/dJI1XMShW1W1RfnyH3QjUYGMLZe4C+tCtrnzR169Ea4/OqugfwG1VXVWdLRC7za5sYD1aD1eOUpuZZ4Lg3lVmEleM/mh5fhTrRNjnLTlzLdN/zb4jRUdYxViSeKuE5j4NSDh1XGi2ze0jsVlEiZSYDPzfVI/tz9yVmVSSh/C5sLRk8TPsBvGhXeS4Z3GBNqpHmAlGw5DHB60JDnHvbpwKASSli46ZiKhZKJV7AaNrE1xkGetCtfhjDs/dIU9vwb3BFaY1FzTzurcB0BzvolApqJzQ2SArVOWgjALq790VnmF116DYcSRuaVVOtQmZ/wkn+z5XTMKXpEKQlpFf8oBDQ6fGHYKUBIrqAgieDHUM5Ns;4:4Bl/srt4RjAYTAc7WQ3Mkc90egghbQOcrg7140776zu/MiWFXp/ZvQOsAOfIgbSStcBAOKUoTLRR/REfOz7Xd7j2YI7eRmfFcz/8/HL6VGPWqSeCxhGwpz2PXXZWf7TcmbNl8XZNX/4eSdmt+bqPcIybQOXTH8YlI9z0Du7PZR+bHxYO6lpVCAqgOF5VMIPrulBV9JAS6H19r7ZWuoPM4UyzZTgYFm2bTxTfnbTBR8viJUfk8g/Prdqss5dsOcIy9q3u6t3Bu/hVxcKUIUBsPtsi+kzmO5mrjw2dmpfL5Hk= X-Exchange-Antispam-Report-Test: UriScan:(17755550239193); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(100000703101)(100105400095)(6041248)(20161123562025)(20161123560025)(20161123564025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CO1PR15MB1080;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CO1PR15MB1080; X-Forefront-PRVS: 041032FF37 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(7370300001)(6009001)(199003)(189002)(24454002)(76104003)(377424004)(478600001)(93886005)(4326008)(4001350100001)(97736004)(50466002)(7736002)(6506006)(6246003)(25786009)(101416001)(106356001)(305945005)(229853002)(105586002)(86362001)(110136004)(53936002)(189998001)(7350300001)(83506001)(23726003)(33656002)(54906002)(42186005)(81166006)(81156014)(8676002)(2906002)(68736007)(1076002)(47776003)(55016002)(5660300001)(2950100002)(7416002)(6116002)(54356999)(76176999)(6666003)(9686003)(50986999)(6916009)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:CO1PR15MB1080;H:castle.dhcp.TheFacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CO1PR15MB1080;23:lS2w/YuzWnmFmXne1HpITfjDbaxZfl7xwGoJhBphm?= =?us-ascii?Q?jdJlCgyRdPJd7UydwnQllFX5JVpCJKpoybpCv6bD1tePIUNAWMAO5+si+7nE?= =?us-ascii?Q?hQ+cskLOXp3Uxcs3QmIoWDOazlf0ng74uLcyQQzLovKt96wGLO0NANxDEp4Q?= =?us-ascii?Q?rNf9CcPpXw971Q3XmwvLft8AJffrz8PDZnNbx/cJ/1z306nVAyUmikWDCbX5?= =?us-ascii?Q?rzT3mPUDnv5YmPireArmWo7LgLgdv9P9mj63sjh8KAgkRWMcfESbNO8VZrrv?= =?us-ascii?Q?2wXmO85alLOZ4opHCLwzeDk9vwMxlJ3Yin41WMKhYsCmN5xiRtoIqSsTKqjn?= =?us-ascii?Q?/xPGMeOU4U+0MlHk+dPJhnseBPxOuFAYMl8ceo0jH31qRf5HIJUeOXbez9V2?= =?us-ascii?Q?gHgp/9dFcvZCuQQS8UvYSLU4TJp9usl/nbJZRFCR2y96YyJGnEHB2A9i1HFV?= =?us-ascii?Q?0DqyQPXaNxBQdlUN3yKETP7tDmOwRIyqBlM8lL5VRF1AI4iM4xvC5JBQKtM0?= =?us-ascii?Q?nxhi3KsTmsmXDcITz68rCs4UIntQlnLRQgw80ulhegniyW1FnacuUR+pvcg1?= =?us-ascii?Q?UmIk44gWtHWsIj9MS7qN18Qh0Mw719y3kMq1R41j8HVMR798VIvhUxwsyyka?= =?us-ascii?Q?+PIU7qiy7e7ajSQrbu6F9YWYBilbT+0i1IYuEnbgEyy9KuRGGXTE1L13Ievc?= =?us-ascii?Q?tApblSN3KRbQtJwsvSpodseAGpAg4XPRlbgTsg03iACGvCY1w5gex+C2FHGa?= =?us-ascii?Q?TylIFYWpkU+R8Ripa2MumboFc7V1EFP90s6nbrcSFYiwLJinXlquC1Bhs7oU?= =?us-ascii?Q?0+CFVWpGsfFWW8SjZycI7x8zhCkiwjWB6mPfi5c9GJfd2Rb6o1XTrIr3RUFd?= =?us-ascii?Q?HKbd69KVXUwUPFVwQhdg5Y2dVQsnQA0tu3t17dFXoCtvXDaMgUEYZFht1heR?= =?us-ascii?Q?LbMTch0PMPDmdOmVtMAv/+wIrbwLySvqploUL5T51L7pf+L0yOK4YMFsKfqI?= =?us-ascii?Q?5cq986XAuhE54abgSjgV9TPohfqHraQy7Gb70TQHndBTLAxlWKkOnBAebWWJ?= =?us-ascii?Q?WoaEhgrWkZrnTZmGT+pAma8e7xIl/O0ItcSN+sagRugMFXTNORubilzFJFny?= =?us-ascii?Q?oYjNduSKjA+Lk92OZcDjzv3IM8iApc2w6zMIsRZtVnUJsGdTI7oKmFu/bfPc?= =?us-ascii?Q?WCOpqZsWJ9d6WLcDJOyvU/oDhRrJUIC73pKrzACNXM2ZGQirTw8pEQuXURGq?= =?us-ascii?Q?NcrjzXn2akLj4lcYqAKbZ+Qwlsd5RJWCtJowFsV?= X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;6:ua4dBYmWDEJ+Km8p+71mGo7byIStSv2KkVjsl/hSnch7CBVaCTL3rvuljv9pTI3YP/JM7SXs/TcCrqD1m8phnbX2uX5K40Bo/Iu1gAXbuH9nefXPjCP3J5sssnBL6WY5NLxQ7lDaRi9MgZNG1tZig0cDYLkNd7sQFqLT/7ngS/N9Ctq23shaf4WFnpYZuuerK/D0Mzb7MdT/8HUy4CPwrF3w7mOhPAElOxhB+3je0MqHhVpcmT5OwxatkwNjHW+7dAwAHsaAUlEDmRLJHWx3EbC00HIk/utTWwac25FoIqeaKL1EK8XRAD9r2lgeFeJYggbKq6/CTkMXVBoHr9nCHg==;5:CMKjyhzqA0p1R9u56PNWcTFMxF9dUaCFN7sm+N0XjkoezzG2XYTaOiHCQ9ZaYu2fdECT0K13tcsnLXE6552FfTpUd7scvZiVGM6TC9dOSCMwr3i+iKWQzZei69oBeO2pPV/CD5caN2fasVwDf5Cdbg==;24:9CEH5JVFcGkCDqQVPVguOsZqsRWpgDPq134Too0XLoEvsvnj1DUxr5oua7ozOsaTVwe5k6WnZfP/w532V9TKZdkMNRB9O/IQl5UCkbB8zQw=;7:87280y0kZi4OmjfauEyedCbAj1KZvMyTIqnf8FzYdSInW8UmpIHRYS9l6/7TqXaTRTXAoORPFAIzzqRreqvKIMjb9rxHc+dNINgdQw+XsvkhZX7NqZOCVhJO3Qf2MojYwCvrm+r6uzSRx50Hv2XUAI9iC+JKOkDzeux2Y3Hefv8fC2nKHXlQmbd8nZFEcyZZvmkaZ3AhhoHeyUJu8gu0jdkW91Tz+M6vUq1wqUB+kFI= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;20:m1Z93j3a1b8xztsTEvk2zZw/FfHMYnvRSphnOnafwKt3mvmrZ2nEJPbmHy0kFuVTTglPj9eeUX+oVMZISlxaqetm1a0n4tHaU26+uUgGNEaj2t02dj/880UOQCCeblFjHxyDlWwOdL+/oAqoYqVtc+1U1FjEIcHet16XXBOQ9vM= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Aug 2017 10:40:04.2178 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1PR15MB1080 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-25_03:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 25, 2017 at 10:14:03AM +0200, Michal Hocko wrote: > On Thu 24-08-17 15:58:01, Roman Gushchin wrote: > > On Thu, Aug 24, 2017 at 04:13:37PM +0200, Michal Hocko wrote: > > > On Thu 24-08-17 14:58:42, Roman Gushchin wrote: > [...] > > > > Both ways are not ideal, and sum of the processes is not ideal too. > > > > Especially, if you take oom_score_adj into account. Will you respect it? > > > > > > Yes, and I do not see any reason why we shouldn't. > > > > It makes things even more complicated. > > Right now task's oom_score can be in (~ -total_memory, ~ +2*total_memory) range, > > and it you're starting summing it, it can be multiplied by number of tasks... > > Weird. > > oom_score_adj is just a normalized bias so if tasks inside oom will use > it the whole memcg will get accumulated bias from all such tasks so it > is not completely off. I agree that the more tasks use the bias the more > biased the whole memcg will be. This might or might not be a problem. > As you are trying to reimplement the existing oom killer implementation > I do not think we cannot simply ignore API which people are used to. > > If this was a configurable oom policy then I could see how ignoring > oom_score_adj is acceptable because it would be an explicit opt-in. > > > It also will be different in case of system and memcg-wide OOM. > > Why, we do honor oom_score_adj for the memcg OOM now and in fact the > kernel memcg OOM killer shouldn't be very much different from the global > one except for the tasks scope. Assume, you have two tasks (2Gb and 1Gb) in a cgroup with limit 3Gb. The second task has oom_score_adj +100. Total memory is 64Gb, for example. I case of memcg-wide oom first task will be selected; in case of system-wide OOM - the second. Personally I don't like this, but it looks like we have to respect oom_score_adj set to -1000, I'll alter my patch. > > > > > I've started actually with such approach, but then found it weird. > > > > > > > > > Besides that you have > > > > > to check each task for over-killing anyway. So I do not see any > > > > > performance merits here. > > > > > > > > It's an implementation detail, and we can hopefully get rid of it at some point. > > > > > > Well, we might do some estimations and ignore oom scopes but I that > > > sounds really complicated and error prone. Unless we have anything like > > > that then I would start from tasks and build up the necessary to make a > > > decision at the higher level. > > > > Seriously speaking, do you have an example, when summing per-process > > oom_score will work better? > > The primary reason I am pushing for this is to have the common iterator > code path (which we have since Vladimir has unified memcg and global oom > paths) and only parametrize the value calculation and victim selection. I agree, but I'm not sure that we can (and have to) totally unify the way, how oom_score is calculated for processes and cgroups. But I'd like to see an unified oom_priority approach. This will allow to define an OOM killing order in a clear way, and use size-based tiebreaking for items of the same priority. Root-cgroup processes will be compared with other memory consumers by oom_priority first and oom_score afterwards. What do you think about it? Thanks!