From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f71.google.com (mail-lf0-f71.google.com [209.85.215.71]) by kanga.kvack.org (Postfix) with ESMTP id B31EE6B0005 for ; Thu, 14 Apr 2016 07:31:10 -0400 (EDT) Received: by mail-lf0-f71.google.com with SMTP id q8so43967287lfe.3 for ; Thu, 14 Apr 2016 04:31:10 -0700 (PDT) Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com. [74.125.82.67]) by mx.google.com with ESMTPS id ym9si26194381wjc.4.2016.04.14.04.31.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Apr 2016 04:31:09 -0700 (PDT) Received: by mail-wm0-f67.google.com with SMTP id n3so22008681wmn.1 for ; Thu, 14 Apr 2016 04:31:09 -0700 (PDT) Date: Thu, 14 Apr 2016 13:31:08 +0200 From: Michal Hocko Subject: Re: [PATCH] mm,oom: Clarify reason to kill other threads sharing the vitctim's memory. Message-ID: <20160414113108.GE2850@dhcp22.suse.cz> References: <1460631391-8628-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <1460631391-8628-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1460631391-8628-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: Andrew Morton , Oleg Nesterov , David Rientjes , linux-mm@kvack.org On Thu 14-04-16 19:56:31, Tetsuo Handa wrote: > Current comment for "Kill all user processes sharing victim->mm in other > thread groups" is not clear that doing so is a best effort avoidance. > > I tried to update that logic along with TIF_MEMDIE for several times > but not yet accepted. Therefore, this patch changes only comment so that > we can apply now. > > Signed-off-by: Tetsuo Handa > --- > mm/oom_kill.c | 29 ++++++++++++++++++++++------- > 1 file changed, 22 insertions(+), 7 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index e78818d..43d0002 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -814,13 +814,28 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, > task_unlock(victim); > > /* > - * Kill all user processes sharing victim->mm in other thread groups, if > - * any. They don't get access to memory reserves, though, to avoid > - * depletion of all memory. This prevents mm->mmap_sem livelock when an ^^^^^^^^^ this was an useful information which you have dropped. Why? > - * oom killed thread cannot exit because it requires the semaphore and > - * its contended by another thread trying to allocate memory itself. > - * That thread will now get access to memory reserves since it has a > - * pending fatal signal. > + * Kill all user processes sharing victim->mm in other thread groups, > + * if any. This reduces possibility of hitting mm->mmap_sem livelock > + * when an OOM victim thread cannot exit because it requires the > + * mm->mmap_sem for read at exit_mm() while another thread is trying > + * to allocate memory with that mm->mmap_sem held for write. > + * > + * Any thread except the victim thread itself which is killed by > + * this heuristic does not get access to memory reserves as of now, > + * but it will get access to memory reserves by calling out_of_memory() > + * or mem_cgroup_out_of_memory() since it has a pending fatal signal. > + * > + * Note that this heuristic is not perfect because it is possible that > + * a thread which shares victim->mm and is doing memory allocation with > + * victim->mm->mmap_sem held for write is marked as OOM_SCORE_ADJ_MIN. Is this really helpful? I would rather be explicit that we _do not care_ about these configurations. It is just PITA maintain and it doesn't make any sense. So rather than trying to document all the weird thing that might happen I would welcome a warning "mm shared with OOM_SCORE_ADJ_MIN task. Something is broken in your configuration!" > + * Also, it is possible that a thread which shares victim->mm and is > + * doing memory allocation with victim->mm->mmap_sem held for write > + * (possibly the victim thread itself which got TIF_MEMDIE) is blocked > + * at unkillable locks from direct reclaim paths because nothing > + * prevents TIF_MEMDIE threads which already started direct reclaim > + * paths from being blocked at unkillable locks. In such cases, the > + * OOM reaper will be unable to reap victim->mm and we will need to > + * select a different OOM victim. This is a more general problem and not related to this particular code. Whenever we select a victim and call mark_oom_victim we hope it will eventually get out of its kernel code path (unless it was running in the userspace) so I am not sure this is placed properly. > */ > rcu_read_lock(); > for_each_process(p) { > -- > 1.8.3.1 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org