From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754391AbaK0M3N (ORCPT ); Thu, 27 Nov 2014 07:29:13 -0500 Received: from cantor2.suse.de ([195.135.220.15]:35588 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751468AbaK0M3L (ORCPT ); Thu, 27 Nov 2014 07:29:11 -0500 Date: Thu, 27 Nov 2014 13:29:08 +0100 From: Michal Hocko To: Oleg Nesterov Cc: Cong Wang , "Rafael J. Wysocki" , Tejun Heo , David Rientjes , Andrew Morton , linux-kernel@vger.kernel.org Subject: Re: oom && coredump Message-ID: <20141127122908.GB18833@dhcp22.suse.cz> References: <20141017171904.GA12263@redhat.com> <20141020184657.GA505@dhcp22.suse.cz> <20141020190620.GA21882@redhat.com> <20141020195618.GA606@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141020195618.GA606@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Sorry this one has somehow completely fallen off my radar] On Mon 20-10-14 21:56:18, Oleg Nesterov wrote: > On 10/20, Oleg Nesterov wrote: > > > > And I agree that it > > is hardly possible to close this race, and this patch makes the things > > better. > > speaking of "partial" fixes for oom problems... > > Perhaps the patch below makes sense? Sure, it is racy, but probably > better than nothing. And in any case (imo) this SIGNAL_GROUP_COREDUMP > check doesn't look bad, the coredumping task can consume more memory, > and we can't assume it is going to actually exit soon. I am not familiar with this area much (it is too scary...). I guess the issue is that OOM killer might try to kill a task which is currently in the middle of cordumping which is not killable, right? And if it is blocked on memory allocation then we are effectively dead locked. Right? Wouldn't it be better to make coredumping killable? Is this even possible? > And at least we can kill that ugly and wrong ptrace check. Why is the ptrace check wrong? PF_EXITING should be set after ptrace_event(PTRACE_EVENT_EXIT, code). But then I can see unlikely(tsk->flags & PF_EXITING) check right after PTRACE_EVENT_EXIT notification. Is this the thing? But I do not get why we do not have to care about PTRACE_EVENT_EXIT once SIGNAL_GROUP_COREDUMP is checked anymore or how they are related? What prevents the original issue when the OOM victim is blocked by ptrace for ever? > What do you think? > > Oleg. > > --- x/mm/oom_kill.c > +++ x/mm/oom_kill.c > @@ -254,6 +254,12 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist, > } > #endif > > +static inline bool task_will_free_mem(struct task_struct *task) > +{ > + return (task->flags & PF_EXITING) && > + !(task->signal->flags & SIGNAL_GROUP_COREDUMP); > +} > + > enum oom_scan_t oom_scan_process_thread(struct task_struct *task, > unsigned long totalpages, const nodemask_t *nodemask, > bool force_kill) > @@ -281,14 +287,9 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task, > if (oom_task_origin(task)) > return OOM_SCAN_SELECT; > > - if (task->flags & PF_EXITING && !force_kill) { > - /* > - * If this task is not being ptraced on exit, then wait for it > - * to finish before killing some other task unnecessarily. > - */ > - if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) > - return OOM_SCAN_ABORT; > - } > + if (task_will_free_mem(task) && !force_kill) > + return OOM_SCAN_ABORT; > + > return OOM_SCAN_OK; > } > > @@ -426,7 +427,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > * If the task is already exiting, don't alarm the sysadmin or kill > * its children or threads, just set TIF_MEMDIE so it can die quickly > */ > - if (p->flags & PF_EXITING) { > + if (task_will_free_mem(p)) { > set_tsk_thread_flag(p, TIF_MEMDIE); > put_task_struct(p); > return; > @@ -632,7 +633,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > * select it. The goal is to allow it to allocate so that it may > * quickly exit and free its memory. > */ > - if (fatal_signal_pending(current) || current->flags & PF_EXITING) { > + if (fatal_signal_pending(current) || task_will_free_mem(current)) { > set_thread_flag(TIF_MEMDIE); > return; > } > -- Michal Hocko SUSE Labs