From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09097C64EB8 for ; Tue, 9 Oct 2018 14:09:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C000B2086D for ; Tue, 9 Oct 2018 14:09:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C000B2086D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726573AbeJIV0i (ORCPT ); Tue, 9 Oct 2018 17:26:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:57012 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726415AbeJIV0h (ORCPT ); Tue, 9 Oct 2018 17:26:37 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 217E3AE17; Tue, 9 Oct 2018 14:09:28 +0000 (UTC) Date: Tue, 9 Oct 2018 16:09:25 +0200 From: Michal Hocko To: Tetsuo Handa Cc: ytk.lee@samsung.com, "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Oleg Nesterov , David Rientjes , Vladimir Davydov , Andrew Morton , Linus Torvalds Subject: Re: [PATCH] mm, oom_adj: avoid meaningless loop to find processes sharing mm Message-ID: <20181009140925.GS8528@dhcp22.suse.cz> References: <20181009063541.GB8528@dhcp22.suse.cz> <20181009075015.GC8528@dhcp22.suse.cz> <20181009111005.GK8528@dhcp22.suse.cz> <99008444-b6b1-efc9-8670-f3eac4d2305f@i-love.sakura.ne.jp> <20181009125841.GP8528@dhcp22.suse.cz> <41754dfe-3be7-f64e-45c9-2525d3b20d62@i-love.sakura.ne.jp> <20181009132622.GR8528@dhcp22.suse.cz> <0ab96b81-042e-b9d9-8d63-b423941d8072@i-love.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0ab96b81-042e-b9d9-8d63-b423941d8072@i-love.sakura.ne.jp> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 09-10-18 22:51:00, Tetsuo Handa wrote: > On 2018/10/09 22:26, Michal Hocko wrote: > > On Tue 09-10-18 22:14:24, Tetsuo Handa wrote: > >> On 2018/10/09 21:58, Michal Hocko wrote: > >>> On Tue 09-10-18 21:52:12, Tetsuo Handa wrote: > >>>> On 2018/10/09 20:10, Michal Hocko wrote: > >>>>> On Tue 09-10-18 19:00:44, Tetsuo Handa wrote: > >>>>>>> 2) add OOM_SCORE_ADJ_MIN and do not kill tasks sharing mm and do not > >>>>>>> reap the mm in the rare case of the race. > >>>>>> > >>>>>> That is no problem. The mistake we made in 4.6 was that we updated oom_score_adj > >>>>>> to -1000 (and allowed unprivileged users to OOM-lockup the system). > >>>>> > >>>>> I do not follow. > >>>>> > >>>> > >>>> http://tomoyo.osdn.jp/cgi-bin/lxr/source/mm/oom_kill.c?v=linux-4.6.7#L493 > >>> > >>> Ahh, so you are not referring to the current upstream code. Do you see > >>> any specific problem with the current one (well, except for the possible > >>> race which I have tried to evaluate). > >>> > >> > >> Yes. "task_will_free_mem(current) in out_of_memory() returns false due to MMF_OOM_SKIP > >> being already set" is a problem for clone(CLONE_VM without CLONE_THREAD/CLONE_SIGHAND) > >> with the current code. > > > > a) I fail to see how that is related to your previous post and b) could > > you be more specific. Is there any other scenario from the two described > > in my earlier email? > > > > I do not follow. Just reverting commit 44a70adec910d692 and commit 97fd49c2355ffded > is sufficient for closing the copy_process() versus __set_oom_adj() race. Please go back and see why this has been done in the first place. > We went too far towards complete "struct mm_struct" based OOM handling. But stepping > back to "struct signal_struct" based OOM handling solves Yong-Taek's for_each_process() > latency problem and your copy_process() versus __set_oom_adj() race problem and my > task_will_free_mem(current) race problem. And again, I have put an evaluation of the race and try to see what is the effect. Then you have started to fire hard to follow notes and it is not clear whether the analysis/conclusions is wrong/incomplete. So an we get back to that analysis and stick to the topic please? -- Michal Hocko SUSE Labs