From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752454AbdFUVU0 (ORCPT ); Wed, 21 Jun 2017 17:20:26 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:56669 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752123AbdFUVUV (ORCPT ); Wed, 21 Jun 2017 17:20:21 -0400 Authentication-Results: kvack.org; dkim=none (message not signed) header.d=none;kvack.org; dmarc=none action=none header.from=fb.com; From: Roman Gushchin To: CC: Roman Gushchin , Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , Tetsuo Handa , , , , Subject: [v3 5/6] mm, oom: don't mark all oom victims tasks with TIF_MEMDIE Date: Wed, 21 Jun 2017 22:19:15 +0100 Message-ID: <1498079956-24467-6-git-send-email-guro@fb.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1498079956-24467-1-git-send-email-guro@fb.com> References: <1498079956-24467-1-git-send-email-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [2620:10d:c092:200::1:60ad] X-ClientProxiedBy: VI1PR08CA0189.eurprd08.prod.outlook.com (10.175.227.147) To SN2PR15MB1088.namprd15.prod.outlook.com (10.169.192.138) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 39c5618c-19bc-42c8-7a3b-08d4b8eb4036 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:SN2PR15MB1088; X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1088;3:4OTkl97V/u97VFtl2ckkeNUcQdb8BEC3lpszSCPbgUm+GIw2nCODGyLSH4+KZMhjl6Jokn4M9oyMIaaVT+UGuq7scyk92Xp4WBDVR2esUJodcEh4yMHR+Dba+D5AyUsFh7lMlQ9peXqTvHJuc8z/MTT8GU0YGdVJpo4Bg+lYhkqWg2+lo5GqpH//gfOmwVqISFO1fvYy0ewsd5zcwlXIULk1c/7K3MvZBJLMhZjpWN7UBy5Zr3aeMmzqu8rwFoTbHC9XxTbqX6+lpFKQtylsLlMW7Vr4iO7hpZ/AEDaFi7StGnj0KrN2AR7cYpzvWo/PryUZykNxNer+IVJ3tC/ywA==;25:ARDLbU/DY/5kezzv8yxz1MfwGcM1Ey3XSEmQNrnC4QijnB8YsWm+QVEyA8yE0e04tPPTvdMbpnVxbe6KIRDRBMEqqeNPQ6YzywkdNZY3129m90s+jn4kWdY6gGk9aCaBcQJJlCj5OEQZygV2mdekQ8k3rLSRvIMRg27FEI1afKJGO/iRMLtI4S/XKSQUnIetYB2hOUEWg4nsc1Q1qXwVk2hS2PD+OnUnvInDxW01ZSdBhp3pgN0ERfqvV0Krl6ufBoiUaBCU5MH5MAZVoAZcU6qQKbVrNABD141I1if8VS6p5PKG35GJBsZmshSAH5oMquap0rnwHBeOEiIjhyhxej5f/Gx9qU7xbEjZxTnC1/g++eiNmTXw5iTMQZUocE8v03We0H6axAlt0KwQANAZdXcFFmFcC8UXjKVcfhGghFQo5CIScFwxeExhRI/XsdwF9sgp68mYysKYpiCEwuhon5Uy8EzK4QXoKbow+NhJSks= X-MS-TrafficTypeDiagnostic: SN2PR15MB1088: X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1088;31:N+YGExmE38uxtXyjJ853OmPgGIyMmiKkD4ZQn6n11COLnY2x7+oWHS1oP5pba8OeTAw/mSNk1ugA7/HRrQvoeXXTwIuRp5J6ysx+NYg53vQANuiJORzmH0RTCkVG90mHWzsP+CCH0T3yGsk+eQ7/IxALVaMV/kKWLGMKBdE4Lm+ZuW5SCJ8+vJzoEq7+h/X69rqu5Ive3SmmiN5APn/msKVvkEiJLXSaIWyblNEepEY=;20:U0ywD2/Hifhn9JANavd6i7at1b9QxwUs2l+Vsa3ioJr4kLwY/qC1vSyKpvld6vVZWItmm2pvayAhSw9Lo3vSORfveyUCeAho01b7gnhHo6Q/WYplP+1+MtrzvBl6Y3oQu/cMNhFFZWHVRrhLd7TpvqTrFkEgXkfzA9Doz/iVyd4x7GgFOEBVriQSwb3/Yf+h/3835ACQZhP/EABnh5OAIfcFby1sCSwqxSiN2t3E3fKiByXQXOmI5mjnPAAc6u+zfxq+fkXW0hbd6hAM4AHQw7zq8jsn6TvYNrpAKma842aWBcyZxp4s0VX2BvdP+mm4gOf6RTdiDPPCElBzaVGDaaz0cXFMh91Pn7ZI/r1gr15bOx3lVKnl2KSqPyTQS7bNgRta4Ry++1/MEZ5JchmYeVzVwlIa57Gpx87k67AsUEPJ6VW7bAveQ95xSvvSSN/bBBF9u/70HBNtTv5UI8+FyP6NtOgW0wIttdcPWCbwscrd3KTEWaLNKe4HQdPk66G8 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(9452136761055)(67672495146484); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(100000703101)(100105400095)(3002001)(93006095)(93001095)(6041248)(20161123564025)(20161123560025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123558100)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:SN2PR15MB1088;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:SN2PR15MB1088; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;SN2PR15MB1088;4:jP6TBz2m6X4GHiuqPtJhczLUsTfyRmEY4+1apXLWl3?= =?us-ascii?Q?Ug7rF6fA3KTA1qdJdtsLNwbKp5rlqTTWEpIa1p0ESRQ0ti1lOKsBKtJ+MO3U?= =?us-ascii?Q?4cvztqe9PvpWldojTNQZA52DYGcF+u0bMY/ekPUe2f9upJEpoJRV/okJFkbD?= =?us-ascii?Q?WQbhisOb8xkQtqcVsOQ76IVhsc3xQSKmJgtwspQI7UmZR5dCHj+GADw223IJ?= =?us-ascii?Q?idn/4Ets4COwnl37FVvWeEGoYfZTFUgHuMifoHJIbZHq0Ub1MIP8AAg8brT4?= =?us-ascii?Q?xlC7rWtYxXy6i3+hJDRWl+VlpXFOATCjrUzwP96hFCJUspakKyb8Msza8LX5?= =?us-ascii?Q?YTgab29amzIq/DiCwal9rkGtAaMSE6EshZvX8dEVNZHTgmBv53HW7M95aN4e?= =?us-ascii?Q?j8VWqj862TtZTPQqzlrRMaUkNw9Tp/JO5gPgVLgkasGH21N/RrAMINQ1V5og?= =?us-ascii?Q?li9VBAS/Uw4VuanmL4RfB6tc+yNqAsmk3oRbyjwHr84dJh7gLrwKGIVG7RC8?= =?us-ascii?Q?XzTZWxFw1Dn329uadEncQ3biQol0BGuH0zu2INJV2sIZ8muC3EU/v98r6bpp?= =?us-ascii?Q?v2aGGg3z5nF5QsRZNGjhVAiZ3UcGsUTqPZpXh051e4WeofgrgllDlbrwBK/4?= =?us-ascii?Q?ZUaf3+dhwlxG7jidPDxCxF5FMJNdLuXtlKYJjkhaXgGbiFWgaLcdh4l5hV5z?= =?us-ascii?Q?7DDXghdKO1AYyQo9fIxOlOesFmOuLVFgtLssPOfZSVbbMnhpA6yfhBTk31el?= =?us-ascii?Q?EK2NY4t3kkvtm7m69rwB+uoDlOBRXHy1DW3BOZM+2v3RrOv+AMS1GuUA5PLv?= =?us-ascii?Q?ZYdoPur1bYbCqbyyRim7O3sqlfGVCBvxEVa/peXLyEKxEb0SFZzoMq/qGRBX?= =?us-ascii?Q?UnYjmCB2A9Ygm444kjXa4ukVtS9C3fLRIOg6N9BXAI/UdZaOTi+eAF4v2rqU?= =?us-ascii?Q?H/m4NrcXWm7p3TcdZiDgMBzRpGr7HuPYcg5Mmw986tUYdhw8O4xjljCDGJW6?= =?us-ascii?Q?qztKpE7yX4CId0RrcwDqLEBU6k+r8g8VFg8Ucuae1HYoo1C2bzSQR+8tAbmy?= =?us-ascii?Q?4hQHwnj88/Fdk61lKMcggJ6J9lWiGdChaGOrDEX+AAkbKi8KReaWl0X0O856?= =?us-ascii?Q?zEgB8PGGb2h1ZTJ068XcvLsSDwz0q9G2yAaBNKRmbKE38Gmzd9GACfilm4Mj?= =?us-ascii?Q?SYr+GrvxFPEpKBJznwUQwplVulzVIuI8dA6Apv5NTF59jhx/l7/e1E5A=3D?= =?us-ascii?Q?=3D?= X-Forefront-PRVS: 0345CFD558 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(39400400002)(39850400002)(39410400002)(39450400003)(50986999)(2906002)(76176999)(25786009)(6916009)(4326008)(36756003)(7736002)(48376002)(5003940100001)(110136004)(6116002)(53936002)(38730400002)(6666003)(2950100002)(6506006)(8676002)(47776003)(81166006)(54906002)(6486002)(50226002)(50466002)(6512007)(189998001)(5660300001)(86362001)(33646002)(2361001)(305945005)(2351001)(478600001)(42186005)(53416004)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:SN2PR15MB1088;H:castle.thefacebook.com;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;SN2PR15MB1088;23:0c0dECSjBa+0x8JQOowNkOMxecbNm8IMjnegdNhy+?= =?us-ascii?Q?pd1Lw5y2vi84HIm6pNm0atduDaB9GOx/w81Ey5S3Yg6vTiTBFrFEwtuFz91C?= =?us-ascii?Q?X4IXkD+GZ1MhYYs5zk55Jumb9QzY6ieNSuvcr2xbnTSCCnL6Qk80UTIwuZq8?= =?us-ascii?Q?e9RMeT5oHeJs/1zl+b8g1tZZ2c3QK1gXJqCBxJqeBAMVscNfEvcOgLv6aiQY?= =?us-ascii?Q?8snquTgm1m654TbfCMZmO+8MzUJmcsRX2l/SxIR1LK6MeLpyIru3z6WKkJC5?= =?us-ascii?Q?7ztgzB/q4sThkOSKfiLRAFya/ySUZkQE6yjgWcHQ5N5zMiKZtBoFMxFp7/bb?= =?us-ascii?Q?Y/VvoC5X95TqTQibdnd7diNNcKE/UF7omilPVUVDxGUKhN4nYPvsbLc/nRVl?= =?us-ascii?Q?HUQZtgG7SuseulTIeqiRXB/Bz0Ztc2w2NmtJyHNq9A1WGa/H/yt0azpuT933?= =?us-ascii?Q?bCKmQfYJo5u0SCBtt4wQO6TW6ahFepw/+ZXQ81MgYk3XseoDDYKUeFGZVjT+?= =?us-ascii?Q?kpV7kWNBWme09NGYqobH52K3Cgm9p00tShgFt7D+1YiwHLIPAXsSrAxagpqw?= =?us-ascii?Q?t/kgIG7/D6ggkHnPgycx3xSduaYGyyphwQERTGqdb/I9lQccy/Y4KP+19nyX?= =?us-ascii?Q?p7nRArNbv5uuEhH/hMVhfhNTLwyPJj8ugtCzfwc4gz3Q4P0N0nAkIB4hthZ5?= =?us-ascii?Q?KYEN6DgVNgqkdTnTqaodBFHOUeWEPHfAQ8WJDKWzD2CHcKGMYmfp6LOEDXiT?= =?us-ascii?Q?teP/3rLxlfchyohiWDk65Eaa3Jl1FFXOS/Sm/wr5ZCYXhAlQ48jPrbU/iDZH?= =?us-ascii?Q?H+0LvXO8AL5YZftk4YsmVwHalygdLFwN1L5KcPgzJQTytZHDMnMB5JEorA6q?= =?us-ascii?Q?gwX26ftgcOo618OLMR/nEfILRefXK3Ycb+NJxjWhqVZuJajdBKYdhj4IUrZr?= =?us-ascii?Q?3tGGeF5Ln2/uIxUHwqJNgC2PT6h5aCSIyJ90nZR5I/qnbsMKA9NyeWk1Z5rG?= =?us-ascii?Q?hHnM4MUZ7VbmUxvCm2lBk9NiNrCo1t5ZAGWn4mqDFTndxpvOBbLzdFZ1Tqma?= =?us-ascii?Q?JqrxYk=3D?= X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;SN2PR15MB1088;6:KDNHmtQhjsRJhy4Hje/UuUkWHYkw1S7x1D8aNDNheA?= =?us-ascii?Q?vTZxg679p7bW9hpM61UWTa8jJl3/Hb+PX8vbyn4RNlwODZ+7dIuDsVdvp6W5?= =?us-ascii?Q?8kufGTTiuhS4f0zwbWVRpRF+gCB3AcGS+HgP1hoLElLHAgL9hJg6sbPWJI5Z?= =?us-ascii?Q?FTYcue3cuampTneK6ahP2vo9woI3/NDp3485FALzio1U7tat/JUJjVmtppJQ?= =?us-ascii?Q?mfR3RdE+iil3W9SKx6PElXlGPpu9IXR2vcBmHdJzjK0V7RkydFsekW2v4ibA?= =?us-ascii?Q?JrOWSinulkRySmzPmt6wzxZkYlPkZqjdUIezNqyW7h7MNRyVHpOqQH0v7yoE?= =?us-ascii?Q?fsshmhjdUYzPlCQch7vfUIRIB4iVPFLM8VyrXpow5isJ1scJ9V0/wzQyBVmx?= =?us-ascii?Q?BJwvlbBvjNduMb2k1gQPeu1b1rqlALuQ/GbK2B1CX8CUscPtlhLJyvNW6Blu?= =?us-ascii?Q?QkcursFKNF9X+dkRSGXuXTCsamxk8IeP/dyURg3qv57vcXJaDL1uFc74T9wb?= =?us-ascii?Q?rnNuVGa4BKmqJiytfquc6EE2BOYzbz5QBAt3WNpNWBpQ9Z9mxa105x+63/vT?= =?us-ascii?Q?VTLnVOPiEmxDWPppm65HOzLZwRlNA2b3kp+1zLzHfAQ4nbegLH0I8hgPv10g?= =?us-ascii?Q?/uhLA8K59hw7/OHZ3boxPnVY7QginEa0nMjIiXklf+rIvYaWXYnaHnSWMr9j?= =?us-ascii?Q?O2agZ1F4rZiw/BFMW3OsTQiu3RkaxzQ0ypsSbkIOf/joDMAFVzJGKWyz0sKe?= =?us-ascii?Q?uvLGs6rOIgls4S6W9wWeaVX9+kxmE5lc3l4Q1et9V3/sIJA8wTIOvzZi8Y61?= =?us-ascii?Q?s5MntPBmkErd1ZvIq3t0MWaLApXjtY3gyfbKUsc1TBkdRu87tdBBo2ncoSvX?= =?us-ascii?Q?P+nB3j9+2k4nChw6dT9OhCRxQfYjAC8Ty2AH3OTuQky5JS+8rKZTCccZoolx?= =?us-ascii?Q?NDOuWRIWoNRiORpQRvVQywdTdQm9z3FtU+6JKp9K4m1qBDm44BnLl/CaJwhw?= =?us-ascii?Q?k=3D?= X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1088;5:lq2GlAEdzTPchwB4Nwf9w3ejTo2q2UT8vmbVjTrs7rAevLlT9cOIvaUrj8X2SMUQjZuxQ8GkzGk996WCYzXxpf7WgvyE4qDsAQitjL/4GCHL53y/8sXbRc0xpJ+5EelAPFMzOeIeNcLJn30wUlCLLoNe7w+TJh6ai/xSf3+StRRJBkm3RFQ6bsykZWv+iITHTnaUR7XfWGz00HfONxlxqHbELqRIRMas8f+FpM3CgxWGF+CWAaFDTNhndy42NXfD9o58VhJvXt/5KdFu3gJyqlsZkeK1s2URieiCwtzCP7AQyzcSskIhSLJeRYkJqt1GoQznLFHf1uTzRi4/tLY844gzlx9lxrc2TbOYvmKURtT3KZUe3w/JWTgwW/VyoI112AwaYANSxov18pE/dVOhNEd7nINeACgidO4UD4bZVkNMSHJEMsrBQRlZtQrIdV1YWKrpY2U2FmsmAgM68SCbNmXmAhJQFFiElLF0ueqt/wdlAqcxkQD7z9vhdDKGGJIT;24:3zq7A0uPygTclopK8VA4BVVxD8FJdqGE+GUr69L7L1t4iNzMiiTkbRnyhMzInzW1vLLsdTY0JXc24DsC1xiZbZsHrJ5RzFdOmEF3coaqU10= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1088;7:27M3pnIPGpqnH22865hIhNL2nQZfIlYXboFwYHhqSwMe+57XPw4sUnekrIJ+ZU6GOoFhjZY4sNsV5vdJ0D6FJVx8NFNWv1GlDnw5vIz5r3m4aiZU3FaYVUErhJdkLEqoc2zns6nvtUqJA19RICYlAeZPzcxu6RW2UuFOVStAB6CpFI7sK3ZH7Ybwa0rnBRa28zx4iq+r42u8a1MhO2/KHu0LBKHHGcIPkLKpJKlU07gqomp4A+GULPjTGuw7Nhl1agu08yaT/1TYJY65mmKUf4XMkGCOB5us7i4o6wuZos5Pu9Yld8Qc98jTQuGKAwOxHMwnPrnhz6Ft/LDloo1U7AZqUdq9urnz8ATiDRft18clr1R0M8rBWYW/OCJQiGvddkXWsYQNNPAPecH3kOtmskLYBYUw3vkqeLLk6a2bgGAjXKONrww+vccgwrAQiZtZUv/scVOpUetbZo0KJIUKEjZt7Ndunyo0F7zs/1dOPCNgnhAoFmJaHa5Z3s8WBDKySIHdL17tcvyebh0QAtDt09H9nGRB4kX7rzr5bI8BvJ48qkKouo5q0szO4JSoDwyCh9E1sGLNgi8bk6rfj3kpYGFNvkDrUVIY6gXvkmiXZFWCVRCHlH1pP4wIxQK0YYpnibz7yitZb8l13AlAxQSMEjveR8YbWxsckAVfRtslEJ0/ZITqC0bjjcyFWvVDpSMjgkCTw1RNmwJRBS2dqmzC4ldu/CrmoOpscIKtAn+Ni86gKW1oJNkKQPdMmZ3bVp6f6CxJGL6sCpppTYDdQMCaBsR1rolpPYHHOZC1PW7FRJk= X-Microsoft-Exchange-Diagnostics: 1;SN2PR15MB1088;20:zQoRTBYXZ5mt2qJwDyWYnY1qOnE1Eq+/9tFnYshHldHz3dHeZUHqFQ5oQppnMtT83aNanQQO2fNsx6qCM+1fNb3/bBxhUrlCaMUAOPXt++DkXkEiH1HefZULVAALSjVCcujDviXTQYJAZ9ZUjyoU84Cx8At7V/TPzoFNcRA4iSc= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Jun 2017 21:19:47.8500 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN2PR15MB1088 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-21_04:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We want to limit the number of tasks which are having an access to the memory reserves. To ensure the progress it's enough to have one such process at the time. If we need to kill the whole cgroup, let's give an access to the memory reserves only to the first process in the list, which is (usually) the biggest process. This will give us good chances that all other processes will be able to quit without an access to the memory reserves. Otherwise, to keep going forward, let's grant the access to the memory reserves for tasks, which can't be reaped by the oom_reaper. As it will be done from the oom reaper thread, which handles the oom reaper queue consequently, there is no high risk to have too many such processes at the same time. To implement this solution, we need to stop using TIF_MEMDIE flag as an universal marker for oom victims tasks. It's not a big issue, as we have oom_mm pointer/tsk_is_oom_victim(), which are just better. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tejun Heo Cc: Tetsuo Handa Cc: kernel-team@fb.com Cc: cgroups@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org --- kernel/exit.c | 2 +- mm/oom_kill.c | 31 ++++++++++++++++++++++--------- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index d211425..5b95d74 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -554,7 +554,7 @@ static void exit_mm(void) task_unlock(current); mm_update_next_owner(mm); mmput(mm); - if (test_thread_flag(TIF_MEMDIE)) + if (tsk_is_oom_victim(current)) exit_oom_victim(); } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 489ab69..b55bd18 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -556,8 +556,18 @@ static void oom_reap_task(struct task_struct *tsk) struct mm_struct *mm = tsk->signal->oom_mm; /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) + while (attempts++ < MAX_OOM_REAP_RETRIES && + !__oom_reap_task_mm(tsk, mm)) { + + /* + * If the task has no access to the memory reserves, + * grant it to help the task to exit. + */ + if (!test_tsk_thread_flag(tsk, TIF_MEMDIE)) + set_tsk_thread_flag(tsk, TIF_MEMDIE); + schedule_timeout_idle(HZ/10); + } if (attempts <= MAX_OOM_REAP_RETRIES) goto done; @@ -647,16 +657,13 @@ static inline void wake_oom_reaper(struct task_struct *tsk) */ static void mark_oom_victim(struct task_struct *tsk) { - struct mm_struct *mm = tsk->mm; - WARN_ON(oom_killer_disabled); - /* OOM killer might race with memcg OOM */ - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) - return; /* oom_mm is bound to the signal struct life time. */ - if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) - mmgrab(tsk->signal->oom_mm); + if (cmpxchg(&tsk->signal->oom_mm, NULL, tsk->mm) != NULL) + return; + + mmgrab(tsk->signal->oom_mm); /* * Make sure that the task is woken up from uninterruptible sleep @@ -665,7 +672,13 @@ static void mark_oom_victim(struct task_struct *tsk) * that TIF_MEMDIE tasks should be ignored. */ __thaw_task(tsk); - atomic_inc(&oom_victims); + + /* + * If there are no oom victims in flight, + * give the task an access to the memory reserves. + */ + if (atomic_inc_return(&oom_victims) == 1) + set_tsk_thread_flag(tsk, TIF_MEMDIE); } /** -- 2.7.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roman Gushchin Subject: [v3 5/6] mm, oom: don't mark all oom victims tasks with TIF_MEMDIE Date: Wed, 21 Jun 2017 22:19:15 +0100 Message-ID: <1498079956-24467-6-git-send-email-guro@fb.com> References: <1498079956-24467-1-git-send-email-guro@fb.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=2QyPOu5zqNb7JGFU1gu2eScx6IcBRyDoy2cz7HE/Kxg=; b=AL4JMF7GWqnj16Jd6UZfgxuKLv3tqPoNL4LoPGOH1EEWJ+/VgULmly5hRoedya5KZQJ+ /YnFSj36xHJygm2OdO94JNHx0E2lW9MseWoumnxKMWYNJIEFhAkpHUEeQ49VIU190mja TEBQUYDtKmvINqoQob3cO99JtHMDseaskME= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=2QyPOu5zqNb7JGFU1gu2eScx6IcBRyDoy2cz7HE/Kxg=; b=CkO23oUVZLgl6f5xoGWcU/LaVRobnC7gzwLEXkpR77rWc9L7zMJcD3CYx5cj9CSM+xhzjj9tc5Z+W0bRcbWQnf69m87qKcQNpxOgR62o3NqX1B9pfRxkmgtxoyCKEpqVh5ozI0mSCC6pVWB6n9dU5BC8XPCueFe214L3eycvaXI= In-Reply-To: <1498079956-24467-1-git-send-email-guro@fb.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org Cc: Roman Gushchin , Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , Tetsuo Handa , kernel-team@fb.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org We want to limit the number of tasks which are having an access to the memory reserves. To ensure the progress it's enough to have one such process at the time. If we need to kill the whole cgroup, let's give an access to the memory reserves only to the first process in the list, which is (usually) the biggest process. This will give us good chances that all other processes will be able to quit without an access to the memory reserves. Otherwise, to keep going forward, let's grant the access to the memory reserves for tasks, which can't be reaped by the oom_reaper. As it will be done from the oom reaper thread, which handles the oom reaper queue consequently, there is no high risk to have too many such processes at the same time. To implement this solution, we need to stop using TIF_MEMDIE flag as an universal marker for oom victims tasks. It's not a big issue, as we have oom_mm pointer/tsk_is_oom_victim(), which are just better. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tejun Heo Cc: Tetsuo Handa Cc: kernel-team@fb.com Cc: cgroups@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org --- kernel/exit.c | 2 +- mm/oom_kill.c | 31 ++++++++++++++++++++++--------- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index d211425..5b95d74 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -554,7 +554,7 @@ static void exit_mm(void) task_unlock(current); mm_update_next_owner(mm); mmput(mm); - if (test_thread_flag(TIF_MEMDIE)) + if (tsk_is_oom_victim(current)) exit_oom_victim(); } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 489ab69..b55bd18 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -556,8 +556,18 @@ static void oom_reap_task(struct task_struct *tsk) struct mm_struct *mm = tsk->signal->oom_mm; /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) + while (attempts++ < MAX_OOM_REAP_RETRIES && + !__oom_reap_task_mm(tsk, mm)) { + + /* + * If the task has no access to the memory reserves, + * grant it to help the task to exit. + */ + if (!test_tsk_thread_flag(tsk, TIF_MEMDIE)) + set_tsk_thread_flag(tsk, TIF_MEMDIE); + schedule_timeout_idle(HZ/10); + } if (attempts <= MAX_OOM_REAP_RETRIES) goto done; @@ -647,16 +657,13 @@ static inline void wake_oom_reaper(struct task_struct *tsk) */ static void mark_oom_victim(struct task_struct *tsk) { - struct mm_struct *mm = tsk->mm; - WARN_ON(oom_killer_disabled); - /* OOM killer might race with memcg OOM */ - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) - return; /* oom_mm is bound to the signal struct life time. */ - if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) - mmgrab(tsk->signal->oom_mm); + if (cmpxchg(&tsk->signal->oom_mm, NULL, tsk->mm) != NULL) + return; + + mmgrab(tsk->signal->oom_mm); /* * Make sure that the task is woken up from uninterruptible sleep @@ -665,7 +672,13 @@ static void mark_oom_victim(struct task_struct *tsk) * that TIF_MEMDIE tasks should be ignored. */ __thaw_task(tsk); - atomic_inc(&oom_victims); + + /* + * If there are no oom victims in flight, + * give the task an access to the memory reserves. + */ + if (atomic_inc_return(&oom_victims) == 1) + set_tsk_thread_flag(tsk, TIF_MEMDIE); } /** -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org