From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758643Ab3LFRzV (ORCPT ); Fri, 6 Dec 2013 12:55:21 -0500 Received: from mail-wi0-f182.google.com ([209.85.212.182]:62888 "EHLO mail-wi0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758615Ab3LFRzS (ORCPT ); Fri, 6 Dec 2013 12:55:18 -0500 MIME-Version: 1.0 In-Reply-To: <20131206151944.GC2674@redhat.com> References: <3917C05D9F83184EAA45CE249FF1B1DD0253093A@SHSMSX103.ccr.corp.intel.com> <20131128063505.GN3556@cmpxchg.org> <20131128120018.GL2761@dhcp22.suse.cz> <20131128183830.GD20740@redhat.com> <20131202141203.GA31402@redhat.com> <20131205172931.GA26018@redhat.com> <20131206151944.GC2674@redhat.com> From: Sameer Nanda Date: Fri, 6 Dec 2013 09:54:56 -0800 X-Google-Sender-Auth: 7LEpdJtlt9IIw4FEkmb99EdHdkI Message-ID: Subject: Re: [PATCH] Fix race between oom kill and task exit To: Oleg Nesterov Cc: David Rientjes , Andrew Morton , Michal Hocko , William Dauchy , Johannes Weiner , "Ma, Xindong" , "rusty@rustcorp.com.au" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Greg KH , "Tu, Xiaobing" , azurIt Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 6, 2013 at 7:19 AM, Oleg Nesterov wrote: > On 12/05, David Rientjes wrote: >> >> On Thu, 5 Dec 2013, Oleg Nesterov wrote: >> >> > > Your v2 series looks good and I suspect anybody trying them doesn't have >> > > additional reports of the infinite loop? Should they be marked for >> > > stable? >> > >> > Unlikely... >> > >> > I think the patch from Sameer makes more sense for stable as a temporary >> > (and obviously incomplete) fix. >> >> There's a problem because none of this is currently even in linux-next. I >> think we could make a case for getting Sameer's patch at >> http://marc.info/?l=linux-kernel&m=138436313021133 to be merged for >> stable, > > Probably. > > Ah, I just noticed that this change > > - if (p->flags & PF_EXITING) { > + if (p->flags & PF_EXITING || !pid_alive(p)) { > > is not needed. !pid_alive(p) obviously implies PF_EXITING. Ah right. > >> but then we'd have to revert it in linux-next > > Or perhaps Sameer can just send his fix to stable/gregkh. > > Just the changelog should clearly explain that this is the minimal > workaround for stable. Once again it doesn't (and can't) fix all > problems even in oom_kill_process() paths, but it helps anyway to > avoid the easy-to-trigger hang. I don't mind doing that if that seems to be the consensus. FWIW, I've already added my patch to the Chrome OS kernel repo. > >> before merging your >> series at http://marc.info/?l=linux-kernel&m=138616217925981. > > Just in case, I won't mind to rediff my patches on top of Sameer's > patch and then add git-revert patch. > >> All of the >> issues you present in that series seem to be stable material, so why not >> just go ahead with your series and mark it for stable for 3.13? > > OK... I can do this too. > > I do not really like this because it adds thread_head/node but doesn't > remove the old ->thread_group. We will do this later, but obviously > this is not the stable material. > > IOW, if we send this to stable, thread_head/node/for_each_thread will > be only used by oom_kill.c. > > And this is risky. For example, 1/4 depends on (at least) another patch > I sent in preparation for this change, commit 81907739851 > "kernel/fork.c:copy_process(): don't add the uninitialized > child to thread/task/pid lists", perhaps on something else. > > So personally I'd prefer to simply send the workaround for stable. > > Oleg. > -- Sameer