From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751476AbcFXPGv (ORCPT ); Fri, 24 Jun 2016 11:06:51 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:36386 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751019AbcFXPGq (ORCPT ); Fri, 24 Jun 2016 11:06:46 -0400 Date: Fri, 24 Jun 2016 17:06:37 +0200 From: Michal Hocko To: Oleg Nesterov Cc: Linus Torvalds , Andy Lutomirski , Andy Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , "linux-arch@vger.kernel.org" , Borislav Petkov , Nadav Amit , Kees Cook , Brian Gerst , "kernel-hardening@lists.openwall.com" , Josh Poimboeuf , Jann Horn , Heiko Carstens Subject: Re: [PATCH v3 00/13] Virtually mapped stacks with guard pages (x86, core) Message-ID: <20160624150637.GD20203@dhcp22.suse.cz> References: <20160623143126.GA16664@redhat.com> <20160623170352.GA17372@redhat.com> <20160623185221.GA17983@redhat.com> <20160624140558.GA20208@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160624140558.GA20208@dhcp22.suse.cz> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 24-06-16 16:05:58, Michal Hocko wrote: > On Thu 23-06-16 20:52:21, Oleg Nesterov wrote: > > On 06/23, Linus Torvalds wrote: > > > > > > On Thu, Jun 23, 2016 at 10:03 AM, Oleg Nesterov wrote: > > > > > > > > Let me quote my previous email ;) > > > > > > > > And we can't free/nullify it when the parent/debuger reaps a zombie, > > > > say, mark_oom_victim() expects that get_task_struct() protects > > > > thread_info as well. > > > > > > > > probably we can fix all such users though... > > > > > > TIF_MEMDIE is indeed a potential problem, but I don't think > > > mark_oom_victim() is actually problematic. > > > > > > mark_oom_victim() is called with either "current", > > > > This is no longer true in -mm tree. > > > > But I agree, this is fixable (and in fact I still hope TIF_MEMDIE will die, > > at least in its current form). > > We can move the flag to the task_struct. There are still some bits left > there. This would be trivial so that the oom usage doesn't stay in the > way. Here is the patch. I've found two bugs when the TIF_MEMDIE was checked on current rather than the given task. I will separate them into their own patches (was just too lazy for it now). If the approach looks reasonable then I will repost next week. --- >>From 1baaa1f8f9568f95d8feccb28cf1994f8ca0df9f Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 24 Jun 2016 16:46:18 +0200 Subject: [PATCH] mm, oom: move TIF_MEMDIE to the task_struct There is an interest to drop thread_info->flags usage for further clean ups. TIF_MEMDIE stands in the way so let's move it out of the thread_info into the task_struct. We cannot use flags because the oom killer will set it for !current task without any locking so let's add task_struct::memdie. It has to be atomic because we need it to be updated atomically. Signed-off-by: Michal Hocko --- arch/alpha/include/asm/thread_info.h | 1 - arch/arc/include/asm/thread_info.h | 2 -- arch/arm/include/asm/thread_info.h | 1 - arch/arm64/include/asm/thread_info.h | 1 - arch/avr32/include/asm/thread_info.h | 2 -- arch/blackfin/include/asm/thread_info.h | 1 - arch/c6x/include/asm/thread_info.h | 1 - arch/cris/include/asm/thread_info.h | 1 - arch/frv/include/asm/thread_info.h | 1 - arch/h8300/include/asm/thread_info.h | 1 - arch/hexagon/include/asm/thread_info.h | 1 - arch/ia64/include/asm/thread_info.h | 1 - arch/m32r/include/asm/thread_info.h | 1 - arch/m68k/include/asm/thread_info.h | 1 - arch/metag/include/asm/thread_info.h | 1 - arch/microblaze/include/asm/thread_info.h | 1 - arch/mips/include/asm/thread_info.h | 1 - arch/mn10300/include/asm/thread_info.h | 1 - arch/nios2/include/asm/thread_info.h | 1 - arch/openrisc/include/asm/thread_info.h | 1 - arch/parisc/include/asm/thread_info.h | 1 - arch/powerpc/include/asm/thread_info.h | 1 - arch/s390/include/asm/thread_info.h | 1 - arch/score/include/asm/thread_info.h | 1 - arch/sh/include/asm/thread_info.h | 1 - arch/sparc/include/asm/thread_info_32.h | 1 - arch/sparc/include/asm/thread_info_64.h | 1 - arch/tile/include/asm/thread_info.h | 2 -- arch/um/include/asm/thread_info.h | 2 -- arch/unicore32/include/asm/thread_info.h | 1 - arch/x86/include/asm/thread_info.h | 1 - arch/xtensa/include/asm/thread_info.h | 1 - drivers/staging/android/lowmemorykiller.c | 2 +- fs/ext4/mballoc.c | 2 +- include/linux/sched.h | 2 ++ kernel/cpuset.c | 12 ++++++------ kernel/exit.c | 2 +- kernel/freezer.c | 2 +- mm/ksm.c | 4 ++-- mm/memcontrol.c | 2 +- mm/oom_kill.c | 20 ++++++++++---------- mm/page_alloc.c | 6 +++--- 42 files changed, 28 insertions(+), 62 deletions(-) diff --git a/arch/alpha/include/asm/thread_info.h b/arch/alpha/include/asm/thread_info.h index 32e920a83ae5..126eaaf6559d 100644 --- a/arch/alpha/include/asm/thread_info.h +++ b/arch/alpha/include/asm/thread_info.h @@ -65,7 +65,6 @@ register struct thread_info *__current_thread_info __asm__("$8"); #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_SYSCALL_AUDIT 4 /* syscall audit active */ #define TIF_DIE_IF_KERNEL 9 /* dik recursion lock */ -#define TIF_MEMDIE 13 /* is terminating due to OOM killer */ #define TIF_POLLING_NRFLAG 14 /* idle is polling for TIF_NEED_RESCHED */ #define _TIF_SYSCALL_TRACE (1<memdie) && time_before_eq(jiffies, lowmem_deathpending_timeout)) { task_unlock(p); rcu_read_unlock(); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index c1ab3ec30423..ddc12f571c50 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4815,7 +4815,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, #endif trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); - /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ + /* __GFP_NOFAIL: retry infinitely, ignore memdie tasks and memcg limit. */ err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, GFP_NOFS|__GFP_NOFAIL); if (err) diff --git a/include/linux/sched.h b/include/linux/sched.h index 6d81a1eb974a..4c91fc0c2e8e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1856,6 +1856,8 @@ struct task_struct { unsigned long task_state_change; #endif int pagefault_disabled; + /* oom victim - give it access to memory reserves */ + atomic_t memdie; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; #endif diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 73e93e53884d..857fac0b973d 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1038,9 +1038,9 @@ static void cpuset_change_task_nodemask(struct task_struct *tsk, * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(atomic_read(&tsk->memdie))) return; - if (current->flags & PF_EXITING) /* Let dying task have memory */ + if (tsk->flags & PF_EXITING) /* Let dying task have memory */ return; task_lock(tsk); @@ -2496,12 +2496,12 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * If we're in interrupt, yes, we can always allocate. If @node is set in * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and this * node is set in the nearest hardwalled cpuset ancestor to current's cpuset, - * yes. If current has access to memory reserves due to TIF_MEMDIE, yes. + * yes. If current has access to memory reserves due to memdie, yes. * Otherwise, no. * * GFP_USER allocations are marked with the __GFP_HARDWALL bit, * and do not allow allocations outside the current tasks cpuset - * unless the task has been OOM killed as is marked TIF_MEMDIE. + * unless the task has been OOM killed as is marked memdie. * GFP_KERNEL allocations are not so marked, so can escape to the * nearest enclosing hardwalled ancestor cpuset. * @@ -2524,7 +2524,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * affect that: * in_interrupt - any node ok (current task context irrelevant) * GFP_ATOMIC - any node ok - * TIF_MEMDIE - any node ok + * memdie - any node ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. */ @@ -2542,7 +2542,7 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask) * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(atomic_read(¤t->memdie))) return true; if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ return false; diff --git a/kernel/exit.c b/kernel/exit.c index 9e6e1356e6bb..8bfdda9bc99a 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -434,7 +434,7 @@ static void exit_mm(struct task_struct *tsk) task_unlock(tsk); mm_update_next_owner(mm); mmput(mm); - if (test_thread_flag(TIF_MEMDIE)) + if (atomic_read(¤t->memdie)) exit_oom_victim(tsk); } diff --git a/kernel/freezer.c b/kernel/freezer.c index a8900a3bc27a..e1bd9f2780fe 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -42,7 +42,7 @@ bool freezing_slow_path(struct task_struct *p) if (p->flags & (PF_NOFREEZE | PF_SUSPEND_TASK)) return false; - if (test_thread_flag(TIF_MEMDIE)) + if (atomic_read(&p->memdie)) return false; if (pm_nosig_freezing || cgroup_freezing(p)) diff --git a/mm/ksm.c b/mm/ksm.c index 73d43bafd9fb..8d5a295fb955 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -396,11 +396,11 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) * * VM_FAULT_OOM: at the time of writing (late July 2009), setting * aside mem_cgroup limits, VM_FAULT_OOM would only be set if the - * current task has TIF_MEMDIE set, and will be OOM killed on return + * current task has memdie set, and will be OOM killed on return * to user; and ksmd, having no mm, would never be chosen for that. * * But if the mm is in a limited mem_cgroup, then the fault may fail - * with VM_FAULT_OOM even if the current task is not TIF_MEMDIE; and + * with VM_FAULT_OOM even if the current task is not memdie; and * even ksmd can fail in this way - though it's usually breaking ksm * just to undo a merge it made a moment before, so unlikely to oom. * diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3e8f9e5e9291..df411de17a75 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1987,7 +1987,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(test_thread_flag(TIF_MEMDIE) || + if (unlikely(atomic_read(¤t->memdie) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4c21f744daa6..9d24007cdb82 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -473,7 +473,7 @@ static bool __oom_reap_task(struct task_struct *tsk) * [...] * out_of_memory * select_bad_process - * # no TIF_MEMDIE task selects new victim + * # no memdie task selects new victim * unmap_page_range # frees some memory */ mutex_lock(&oom_lock); @@ -593,7 +593,7 @@ static void oom_reap_task(struct task_struct *tsk) } /* - * Clear TIF_MEMDIE because the task shouldn't be sitting on a + * Clear memdie because the task shouldn't be sitting on a * reasonably reclaimable memory anymore or it is not a good candidate * for the oom victim right now because it cannot release its memory * itself nor by the oom reaper. @@ -669,14 +669,14 @@ void mark_oom_victim(struct task_struct *tsk) { WARN_ON(oom_killer_disabled); /* OOM killer might race with memcg OOM */ - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) + if (!atomic_add_unless(&tsk->memdie, 1, 1)) return; atomic_inc(&tsk->signal->oom_victims); /* * Make sure that the task is woken up from uninterruptible sleep * if it is frozen because OOM killer wouldn't be able to free * any memory and livelock. freezing_slow_path will tell the freezer - * that TIF_MEMDIE tasks should be ignored. + * that memdie tasks should be ignored. */ __thaw_task(tsk); atomic_inc(&oom_victims); @@ -687,7 +687,7 @@ void mark_oom_victim(struct task_struct *tsk) */ void exit_oom_victim(struct task_struct *tsk) { - if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE)) + if (!atomic_add_unless(&tsk->memdie, -1, 0)) return; atomic_dec(&tsk->signal->oom_victims); @@ -771,7 +771,7 @@ bool task_will_free_mem(struct task_struct *task) * If the process has passed exit_mm we have to skip it because * we have lost a link to other tasks sharing this mm, we do not * have anything to reap and the task might then get stuck waiting - * for parent as zombie and we do not want it to hold TIF_MEMDIE + * for parent as zombie and we do not want it to hold memdie */ p = find_lock_task_mm(task); if (!p) @@ -836,7 +836,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, /* * If the task is already exiting, don't alarm the sysadmin or kill - * its children or threads, just set TIF_MEMDIE so it can die quickly + * its children or threads, just set memdie so it can die quickly */ if (task_will_free_mem(p)) { mark_oom_victim(p); @@ -893,7 +893,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, mm = victim->mm; atomic_inc(&mm->mm_count); /* - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent + * We should send SIGKILL before setting memdie in order to prevent * the OOM victim from depleting the memory reserves from the user * space under its control. */ @@ -1016,7 +1016,7 @@ bool out_of_memory(struct oom_control *oc) * quickly exit and free its memory. * * But don't select if current has already released its mm and cleared - * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur. + * memdie flag at exit_mm(), otherwise an OOM livelock may occur. */ if (current->mm && task_will_free_mem(current)) { mark_oom_victim(current); @@ -1096,7 +1096,7 @@ void pagefault_out_of_memory(void) * be a racing OOM victim for which oom_killer_disable() * is waiting for. */ - WARN_ON(test_thread_flag(TIF_MEMDIE)); + WARN_ON(atomic_read(¤t->memdie)); } mutex_unlock(&oom_lock); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89128d64d662..6c550afde6a4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3050,7 +3050,7 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...) * of allowed nodes. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) - if (test_thread_flag(TIF_MEMDIE) || + if (atomic_read(¤t->memdie) || (current->flags & (PF_MEMALLOC | PF_EXITING))) filter &= ~SHOW_MEM_FILTER_NODES; if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) @@ -3428,7 +3428,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) alloc_flags |= ALLOC_NO_WATERMARKS; else if (!in_interrupt() && ((current->flags & PF_MEMALLOC) || - unlikely(test_thread_flag(TIF_MEMDIE)))) + unlikely(atomic_read(¤t->memdie)))) alloc_flags |= ALLOC_NO_WATERMARKS; } #ifdef CONFIG_CMA @@ -3637,7 +3637,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, } /* Avoid allocations with no watermarks from looping endlessly */ - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL)) + if (atomic_read(¤t->memdie) && !(gfp_mask & __GFP_NOFAIL)) goto nopage; /* -- 2.8.1 -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH v3 00/13] Virtually mapped stacks with guard pages (x86, core) Date: Fri, 24 Jun 2016 17:06:37 +0200 Message-ID: <20160624150637.GD20203@dhcp22.suse.cz> References: <20160623143126.GA16664@redhat.com> <20160623170352.GA17372@redhat.com> <20160623185221.GA17983@redhat.com> <20160624140558.GA20208@dhcp22.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mail-wm0-f65.google.com ([74.125.82.65]:36386 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751019AbcFXPGq (ORCPT ); Fri, 24 Jun 2016 11:06:46 -0400 Content-Disposition: inline In-Reply-To: <20160624140558.GA20208@dhcp22.suse.cz> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Oleg Nesterov Cc: Linus Torvalds , Andy Lutomirski , Andy Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , "linux-arch@vger.kernel.org" , Borislav Petkov , Nadav Amit , Kees Cook , Brian Gerst , "kernel-hardening@lists.openwall.com" , Josh Poimboeuf , Jann Horn , Heiko Carstens On Fri 24-06-16 16:05:58, Michal Hocko wrote: > On Thu 23-06-16 20:52:21, Oleg Nesterov wrote: > > On 06/23, Linus Torvalds wrote: > > > > > > On Thu, Jun 23, 2016 at 10:03 AM, Oleg Nesterov wrote: > > > > > > > > Let me quote my previous email ;) > > > > > > > > And we can't free/nullify it when the parent/debuger reaps a zombie, > > > > say, mark_oom_victim() expects that get_task_struct() protects > > > > thread_info as well. > > > > > > > > probably we can fix all such users though... > > > > > > TIF_MEMDIE is indeed a potential problem, but I don't think > > > mark_oom_victim() is actually problematic. > > > > > > mark_oom_victim() is called with either "current", > > > > This is no longer true in -mm tree. > > > > But I agree, this is fixable (and in fact I still hope TIF_MEMDIE will die, > > at least in its current form). > > We can move the flag to the task_struct. There are still some bits left > there. This would be trivial so that the oom usage doesn't stay in the > way. Here is the patch. I've found two bugs when the TIF_MEMDIE was checked on current rather than the given task. I will separate them into their own patches (was just too lazy for it now). If the approach looks reasonable then I will repost next week. --- >From 1baaa1f8f9568f95d8feccb28cf1994f8ca0df9f Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 24 Jun 2016 16:46:18 +0200 Subject: [PATCH] mm, oom: move TIF_MEMDIE to the task_struct There is an interest to drop thread_info->flags usage for further clean ups. TIF_MEMDIE stands in the way so let's move it out of the thread_info into the task_struct. We cannot use flags because the oom killer will set it for !current task without any locking so let's add task_struct::memdie. It has to be atomic because we need it to be updated atomically. Signed-off-by: Michal Hocko --- arch/alpha/include/asm/thread_info.h | 1 - arch/arc/include/asm/thread_info.h | 2 -- arch/arm/include/asm/thread_info.h | 1 - arch/arm64/include/asm/thread_info.h | 1 - arch/avr32/include/asm/thread_info.h | 2 -- arch/blackfin/include/asm/thread_info.h | 1 - arch/c6x/include/asm/thread_info.h | 1 - arch/cris/include/asm/thread_info.h | 1 - arch/frv/include/asm/thread_info.h | 1 - arch/h8300/include/asm/thread_info.h | 1 - arch/hexagon/include/asm/thread_info.h | 1 - arch/ia64/include/asm/thread_info.h | 1 - arch/m32r/include/asm/thread_info.h | 1 - arch/m68k/include/asm/thread_info.h | 1 - arch/metag/include/asm/thread_info.h | 1 - arch/microblaze/include/asm/thread_info.h | 1 - arch/mips/include/asm/thread_info.h | 1 - arch/mn10300/include/asm/thread_info.h | 1 - arch/nios2/include/asm/thread_info.h | 1 - arch/openrisc/include/asm/thread_info.h | 1 - arch/parisc/include/asm/thread_info.h | 1 - arch/powerpc/include/asm/thread_info.h | 1 - arch/s390/include/asm/thread_info.h | 1 - arch/score/include/asm/thread_info.h | 1 - arch/sh/include/asm/thread_info.h | 1 - arch/sparc/include/asm/thread_info_32.h | 1 - arch/sparc/include/asm/thread_info_64.h | 1 - arch/tile/include/asm/thread_info.h | 2 -- arch/um/include/asm/thread_info.h | 2 -- arch/unicore32/include/asm/thread_info.h | 1 - arch/x86/include/asm/thread_info.h | 1 - arch/xtensa/include/asm/thread_info.h | 1 - drivers/staging/android/lowmemorykiller.c | 2 +- fs/ext4/mballoc.c | 2 +- include/linux/sched.h | 2 ++ kernel/cpuset.c | 12 ++++++------ kernel/exit.c | 2 +- kernel/freezer.c | 2 +- mm/ksm.c | 4 ++-- mm/memcontrol.c | 2 +- mm/oom_kill.c | 20 ++++++++++---------- mm/page_alloc.c | 6 +++--- 42 files changed, 28 insertions(+), 62 deletions(-) diff --git a/arch/alpha/include/asm/thread_info.h b/arch/alpha/include/asm/thread_info.h index 32e920a83ae5..126eaaf6559d 100644 --- a/arch/alpha/include/asm/thread_info.h +++ b/arch/alpha/include/asm/thread_info.h @@ -65,7 +65,6 @@ register struct thread_info *__current_thread_info __asm__("$8"); #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_SYSCALL_AUDIT 4 /* syscall audit active */ #define TIF_DIE_IF_KERNEL 9 /* dik recursion lock */ -#define TIF_MEMDIE 13 /* is terminating due to OOM killer */ #define TIF_POLLING_NRFLAG 14 /* idle is polling for TIF_NEED_RESCHED */ #define _TIF_SYSCALL_TRACE (1<memdie) && time_before_eq(jiffies, lowmem_deathpending_timeout)) { task_unlock(p); rcu_read_unlock(); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index c1ab3ec30423..ddc12f571c50 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4815,7 +4815,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, #endif trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); - /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ + /* __GFP_NOFAIL: retry infinitely, ignore memdie tasks and memcg limit. */ err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, GFP_NOFS|__GFP_NOFAIL); if (err) diff --git a/include/linux/sched.h b/include/linux/sched.h index 6d81a1eb974a..4c91fc0c2e8e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1856,6 +1856,8 @@ struct task_struct { unsigned long task_state_change; #endif int pagefault_disabled; + /* oom victim - give it access to memory reserves */ + atomic_t memdie; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; #endif diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 73e93e53884d..857fac0b973d 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1038,9 +1038,9 @@ static void cpuset_change_task_nodemask(struct task_struct *tsk, * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(atomic_read(&tsk->memdie))) return; - if (current->flags & PF_EXITING) /* Let dying task have memory */ + if (tsk->flags & PF_EXITING) /* Let dying task have memory */ return; task_lock(tsk); @@ -2496,12 +2496,12 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * If we're in interrupt, yes, we can always allocate. If @node is set in * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and this * node is set in the nearest hardwalled cpuset ancestor to current's cpuset, - * yes. If current has access to memory reserves due to TIF_MEMDIE, yes. + * yes. If current has access to memory reserves due to memdie, yes. * Otherwise, no. * * GFP_USER allocations are marked with the __GFP_HARDWALL bit, * and do not allow allocations outside the current tasks cpuset - * unless the task has been OOM killed as is marked TIF_MEMDIE. + * unless the task has been OOM killed as is marked memdie. * GFP_KERNEL allocations are not so marked, so can escape to the * nearest enclosing hardwalled ancestor cpuset. * @@ -2524,7 +2524,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * affect that: * in_interrupt - any node ok (current task context irrelevant) * GFP_ATOMIC - any node ok - * TIF_MEMDIE - any node ok + * memdie - any node ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. */ @@ -2542,7 +2542,7 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask) * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(atomic_read(¤t->memdie))) return true; if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ return false; diff --git a/kernel/exit.c b/kernel/exit.c index 9e6e1356e6bb..8bfdda9bc99a 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -434,7 +434,7 @@ static void exit_mm(struct task_struct *tsk) task_unlock(tsk); mm_update_next_owner(mm); mmput(mm); - if (test_thread_flag(TIF_MEMDIE)) + if (atomic_read(¤t->memdie)) exit_oom_victim(tsk); } diff --git a/kernel/freezer.c b/kernel/freezer.c index a8900a3bc27a..e1bd9f2780fe 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -42,7 +42,7 @@ bool freezing_slow_path(struct task_struct *p) if (p->flags & (PF_NOFREEZE | PF_SUSPEND_TASK)) return false; - if (test_thread_flag(TIF_MEMDIE)) + if (atomic_read(&p->memdie)) return false; if (pm_nosig_freezing || cgroup_freezing(p)) diff --git a/mm/ksm.c b/mm/ksm.c index 73d43bafd9fb..8d5a295fb955 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -396,11 +396,11 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) * * VM_FAULT_OOM: at the time of writing (late July 2009), setting * aside mem_cgroup limits, VM_FAULT_OOM would only be set if the - * current task has TIF_MEMDIE set, and will be OOM killed on return + * current task has memdie set, and will be OOM killed on return * to user; and ksmd, having no mm, would never be chosen for that. * * But if the mm is in a limited mem_cgroup, then the fault may fail - * with VM_FAULT_OOM even if the current task is not TIF_MEMDIE; and + * with VM_FAULT_OOM even if the current task is not memdie; and * even ksmd can fail in this way - though it's usually breaking ksm * just to undo a merge it made a moment before, so unlikely to oom. * diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3e8f9e5e9291..df411de17a75 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1987,7 +1987,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(test_thread_flag(TIF_MEMDIE) || + if (unlikely(atomic_read(¤t->memdie) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4c21f744daa6..9d24007cdb82 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -473,7 +473,7 @@ static bool __oom_reap_task(struct task_struct *tsk) * [...] * out_of_memory * select_bad_process - * # no TIF_MEMDIE task selects new victim + * # no memdie task selects new victim * unmap_page_range # frees some memory */ mutex_lock(&oom_lock); @@ -593,7 +593,7 @@ static void oom_reap_task(struct task_struct *tsk) } /* - * Clear TIF_MEMDIE because the task shouldn't be sitting on a + * Clear memdie because the task shouldn't be sitting on a * reasonably reclaimable memory anymore or it is not a good candidate * for the oom victim right now because it cannot release its memory * itself nor by the oom reaper. @@ -669,14 +669,14 @@ void mark_oom_victim(struct task_struct *tsk) { WARN_ON(oom_killer_disabled); /* OOM killer might race with memcg OOM */ - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) + if (!atomic_add_unless(&tsk->memdie, 1, 1)) return; atomic_inc(&tsk->signal->oom_victims); /* * Make sure that the task is woken up from uninterruptible sleep * if it is frozen because OOM killer wouldn't be able to free * any memory and livelock. freezing_slow_path will tell the freezer - * that TIF_MEMDIE tasks should be ignored. + * that memdie tasks should be ignored. */ __thaw_task(tsk); atomic_inc(&oom_victims); @@ -687,7 +687,7 @@ void mark_oom_victim(struct task_struct *tsk) */ void exit_oom_victim(struct task_struct *tsk) { - if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE)) + if (!atomic_add_unless(&tsk->memdie, -1, 0)) return; atomic_dec(&tsk->signal->oom_victims); @@ -771,7 +771,7 @@ bool task_will_free_mem(struct task_struct *task) * If the process has passed exit_mm we have to skip it because * we have lost a link to other tasks sharing this mm, we do not * have anything to reap and the task might then get stuck waiting - * for parent as zombie and we do not want it to hold TIF_MEMDIE + * for parent as zombie and we do not want it to hold memdie */ p = find_lock_task_mm(task); if (!p) @@ -836,7 +836,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, /* * If the task is already exiting, don't alarm the sysadmin or kill - * its children or threads, just set TIF_MEMDIE so it can die quickly + * its children or threads, just set memdie so it can die quickly */ if (task_will_free_mem(p)) { mark_oom_victim(p); @@ -893,7 +893,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, mm = victim->mm; atomic_inc(&mm->mm_count); /* - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent + * We should send SIGKILL before setting memdie in order to prevent * the OOM victim from depleting the memory reserves from the user * space under its control. */ @@ -1016,7 +1016,7 @@ bool out_of_memory(struct oom_control *oc) * quickly exit and free its memory. * * But don't select if current has already released its mm and cleared - * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur. + * memdie flag at exit_mm(), otherwise an OOM livelock may occur. */ if (current->mm && task_will_free_mem(current)) { mark_oom_victim(current); @@ -1096,7 +1096,7 @@ void pagefault_out_of_memory(void) * be a racing OOM victim for which oom_killer_disable() * is waiting for. */ - WARN_ON(test_thread_flag(TIF_MEMDIE)); + WARN_ON(atomic_read(¤t->memdie)); } mutex_unlock(&oom_lock); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89128d64d662..6c550afde6a4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3050,7 +3050,7 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...) * of allowed nodes. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) - if (test_thread_flag(TIF_MEMDIE) || + if (atomic_read(¤t->memdie) || (current->flags & (PF_MEMALLOC | PF_EXITING))) filter &= ~SHOW_MEM_FILTER_NODES; if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) @@ -3428,7 +3428,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) alloc_flags |= ALLOC_NO_WATERMARKS; else if (!in_interrupt() && ((current->flags & PF_MEMALLOC) || - unlikely(test_thread_flag(TIF_MEMDIE)))) + unlikely(atomic_read(¤t->memdie)))) alloc_flags |= ALLOC_NO_WATERMARKS; } #ifdef CONFIG_CMA @@ -3637,7 +3637,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, } /* Avoid allocations with no watermarks from looping endlessly */ - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL)) + if (atomic_read(¤t->memdie) && !(gfp_mask & __GFP_NOFAIL)) goto nopage; /* -- 2.8.1 -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f65.google.com ([74.125.82.65]:36386 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751019AbcFXPGq (ORCPT ); Fri, 24 Jun 2016 11:06:46 -0400 Date: Fri, 24 Jun 2016 17:06:37 +0200 From: Michal Hocko Subject: Re: [PATCH v3 00/13] Virtually mapped stacks with guard pages (x86, core) Message-ID: <20160624150637.GD20203@dhcp22.suse.cz> References: <20160623143126.GA16664@redhat.com> <20160623170352.GA17372@redhat.com> <20160623185221.GA17983@redhat.com> <20160624140558.GA20208@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160624140558.GA20208@dhcp22.suse.cz> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Oleg Nesterov Cc: Linus Torvalds , Andy Lutomirski , Andy Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , "linux-arch@vger.kernel.org" , Borislav Petkov , Nadav Amit , Kees Cook , Brian Gerst , "kernel-hardening@lists.openwall.com" , Josh Poimboeuf , Jann Horn , Heiko Carstens Message-ID: <20160624150637.ivrgeakbnXcGcpfgm5SLixA3TcZB-D0XwQaccFxzqV8@z> On Fri 24-06-16 16:05:58, Michal Hocko wrote: > On Thu 23-06-16 20:52:21, Oleg Nesterov wrote: > > On 06/23, Linus Torvalds wrote: > > > > > > On Thu, Jun 23, 2016 at 10:03 AM, Oleg Nesterov wrote: > > > > > > > > Let me quote my previous email ;) > > > > > > > > And we can't free/nullify it when the parent/debuger reaps a zombie, > > > > say, mark_oom_victim() expects that get_task_struct() protects > > > > thread_info as well. > > > > > > > > probably we can fix all such users though... > > > > > > TIF_MEMDIE is indeed a potential problem, but I don't think > > > mark_oom_victim() is actually problematic. > > > > > > mark_oom_victim() is called with either "current", > > > > This is no longer true in -mm tree. > > > > But I agree, this is fixable (and in fact I still hope TIF_MEMDIE will die, > > at least in its current form). > > We can move the flag to the task_struct. There are still some bits left > there. This would be trivial so that the oom usage doesn't stay in the > way. Here is the patch. I've found two bugs when the TIF_MEMDIE was checked on current rather than the given task. I will separate them into their own patches (was just too lazy for it now). If the approach looks reasonable then I will repost next week. --- From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com Date: Fri, 24 Jun 2016 17:06:37 +0200 From: Michal Hocko Message-ID: <20160624150637.GD20203@dhcp22.suse.cz> References: <20160623143126.GA16664@redhat.com> <20160623170352.GA17372@redhat.com> <20160623185221.GA17983@redhat.com> <20160624140558.GA20208@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160624140558.GA20208@dhcp22.suse.cz> Subject: [kernel-hardening] Re: [PATCH v3 00/13] Virtually mapped stacks with guard pages (x86, core) To: Oleg Nesterov Cc: Linus Torvalds , Andy Lutomirski , Andy Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , "linux-arch@vger.kernel.org" , Borislav Petkov , Nadav Amit , Kees Cook , Brian Gerst , "kernel-hardening@lists.openwall.com" , Josh Poimboeuf , Jann Horn , Heiko Carstens List-ID: On Fri 24-06-16 16:05:58, Michal Hocko wrote: > On Thu 23-06-16 20:52:21, Oleg Nesterov wrote: > > On 06/23, Linus Torvalds wrote: > > > > > > On Thu, Jun 23, 2016 at 10:03 AM, Oleg Nesterov wrote: > > > > > > > > Let me quote my previous email ;) > > > > > > > > And we can't free/nullify it when the parent/debuger reaps a zombie, > > > > say, mark_oom_victim() expects that get_task_struct() protects > > > > thread_info as well. > > > > > > > > probably we can fix all such users though... > > > > > > TIF_MEMDIE is indeed a potential problem, but I don't think > > > mark_oom_victim() is actually problematic. > > > > > > mark_oom_victim() is called with either "current", > > > > This is no longer true in -mm tree. > > > > But I agree, this is fixable (and in fact I still hope TIF_MEMDIE will die, > > at least in its current form). > > We can move the flag to the task_struct. There are still some bits left > there. This would be trivial so that the oom usage doesn't stay in the > way. Here is the patch. I've found two bugs when the TIF_MEMDIE was checked on current rather than the given task. I will separate them into their own patches (was just too lazy for it now). If the approach looks reasonable then I will repost next week. --- >>From 1baaa1f8f9568f95d8feccb28cf1994f8ca0df9f Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 24 Jun 2016 16:46:18 +0200 Subject: [PATCH] mm, oom: move TIF_MEMDIE to the task_struct There is an interest to drop thread_info->flags usage for further clean ups. TIF_MEMDIE stands in the way so let's move it out of the thread_info into the task_struct. We cannot use flags because the oom killer will set it for !current task without any locking so let's add task_struct::memdie. It has to be atomic because we need it to be updated atomically. Signed-off-by: Michal Hocko --- arch/alpha/include/asm/thread_info.h | 1 - arch/arc/include/asm/thread_info.h | 2 -- arch/arm/include/asm/thread_info.h | 1 - arch/arm64/include/asm/thread_info.h | 1 - arch/avr32/include/asm/thread_info.h | 2 -- arch/blackfin/include/asm/thread_info.h | 1 - arch/c6x/include/asm/thread_info.h | 1 - arch/cris/include/asm/thread_info.h | 1 - arch/frv/include/asm/thread_info.h | 1 - arch/h8300/include/asm/thread_info.h | 1 - arch/hexagon/include/asm/thread_info.h | 1 - arch/ia64/include/asm/thread_info.h | 1 - arch/m32r/include/asm/thread_info.h | 1 - arch/m68k/include/asm/thread_info.h | 1 - arch/metag/include/asm/thread_info.h | 1 - arch/microblaze/include/asm/thread_info.h | 1 - arch/mips/include/asm/thread_info.h | 1 - arch/mn10300/include/asm/thread_info.h | 1 - arch/nios2/include/asm/thread_info.h | 1 - arch/openrisc/include/asm/thread_info.h | 1 - arch/parisc/include/asm/thread_info.h | 1 - arch/powerpc/include/asm/thread_info.h | 1 - arch/s390/include/asm/thread_info.h | 1 - arch/score/include/asm/thread_info.h | 1 - arch/sh/include/asm/thread_info.h | 1 - arch/sparc/include/asm/thread_info_32.h | 1 - arch/sparc/include/asm/thread_info_64.h | 1 - arch/tile/include/asm/thread_info.h | 2 -- arch/um/include/asm/thread_info.h | 2 -- arch/unicore32/include/asm/thread_info.h | 1 - arch/x86/include/asm/thread_info.h | 1 - arch/xtensa/include/asm/thread_info.h | 1 - drivers/staging/android/lowmemorykiller.c | 2 +- fs/ext4/mballoc.c | 2 +- include/linux/sched.h | 2 ++ kernel/cpuset.c | 12 ++++++------ kernel/exit.c | 2 +- kernel/freezer.c | 2 +- mm/ksm.c | 4 ++-- mm/memcontrol.c | 2 +- mm/oom_kill.c | 20 ++++++++++---------- mm/page_alloc.c | 6 +++--- 42 files changed, 28 insertions(+), 62 deletions(-) diff --git a/arch/alpha/include/asm/thread_info.h b/arch/alpha/include/asm/thread_info.h index 32e920a83ae5..126eaaf6559d 100644 --- a/arch/alpha/include/asm/thread_info.h +++ b/arch/alpha/include/asm/thread_info.h @@ -65,7 +65,6 @@ register struct thread_info *__current_thread_info __asm__("$8"); #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_SYSCALL_AUDIT 4 /* syscall audit active */ #define TIF_DIE_IF_KERNEL 9 /* dik recursion lock */ -#define TIF_MEMDIE 13 /* is terminating due to OOM killer */ #define TIF_POLLING_NRFLAG 14 /* idle is polling for TIF_NEED_RESCHED */ #define _TIF_SYSCALL_TRACE (1<memdie) && time_before_eq(jiffies, lowmem_deathpending_timeout)) { task_unlock(p); rcu_read_unlock(); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index c1ab3ec30423..ddc12f571c50 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4815,7 +4815,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, #endif trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); - /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ + /* __GFP_NOFAIL: retry infinitely, ignore memdie tasks and memcg limit. */ err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, GFP_NOFS|__GFP_NOFAIL); if (err) diff --git a/include/linux/sched.h b/include/linux/sched.h index 6d81a1eb974a..4c91fc0c2e8e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1856,6 +1856,8 @@ struct task_struct { unsigned long task_state_change; #endif int pagefault_disabled; + /* oom victim - give it access to memory reserves */ + atomic_t memdie; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; #endif diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 73e93e53884d..857fac0b973d 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1038,9 +1038,9 @@ static void cpuset_change_task_nodemask(struct task_struct *tsk, * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(atomic_read(&tsk->memdie))) return; - if (current->flags & PF_EXITING) /* Let dying task have memory */ + if (tsk->flags & PF_EXITING) /* Let dying task have memory */ return; task_lock(tsk); @@ -2496,12 +2496,12 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * If we're in interrupt, yes, we can always allocate. If @node is set in * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and this * node is set in the nearest hardwalled cpuset ancestor to current's cpuset, - * yes. If current has access to memory reserves due to TIF_MEMDIE, yes. + * yes. If current has access to memory reserves due to memdie, yes. * Otherwise, no. * * GFP_USER allocations are marked with the __GFP_HARDWALL bit, * and do not allow allocations outside the current tasks cpuset - * unless the task has been OOM killed as is marked TIF_MEMDIE. + * unless the task has been OOM killed as is marked memdie. * GFP_KERNEL allocations are not so marked, so can escape to the * nearest enclosing hardwalled ancestor cpuset. * @@ -2524,7 +2524,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * affect that: * in_interrupt - any node ok (current task context irrelevant) * GFP_ATOMIC - any node ok - * TIF_MEMDIE - any node ok + * memdie - any node ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. */ @@ -2542,7 +2542,7 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask) * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(atomic_read(¤t->memdie))) return true; if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ return false; diff --git a/kernel/exit.c b/kernel/exit.c index 9e6e1356e6bb..8bfdda9bc99a 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -434,7 +434,7 @@ static void exit_mm(struct task_struct *tsk) task_unlock(tsk); mm_update_next_owner(mm); mmput(mm); - if (test_thread_flag(TIF_MEMDIE)) + if (atomic_read(¤t->memdie)) exit_oom_victim(tsk); } diff --git a/kernel/freezer.c b/kernel/freezer.c index a8900a3bc27a..e1bd9f2780fe 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -42,7 +42,7 @@ bool freezing_slow_path(struct task_struct *p) if (p->flags & (PF_NOFREEZE | PF_SUSPEND_TASK)) return false; - if (test_thread_flag(TIF_MEMDIE)) + if (atomic_read(&p->memdie)) return false; if (pm_nosig_freezing || cgroup_freezing(p)) diff --git a/mm/ksm.c b/mm/ksm.c index 73d43bafd9fb..8d5a295fb955 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -396,11 +396,11 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) * * VM_FAULT_OOM: at the time of writing (late July 2009), setting * aside mem_cgroup limits, VM_FAULT_OOM would only be set if the - * current task has TIF_MEMDIE set, and will be OOM killed on return + * current task has memdie set, and will be OOM killed on return * to user; and ksmd, having no mm, would never be chosen for that. * * But if the mm is in a limited mem_cgroup, then the fault may fail - * with VM_FAULT_OOM even if the current task is not TIF_MEMDIE; and + * with VM_FAULT_OOM even if the current task is not memdie; and * even ksmd can fail in this way - though it's usually breaking ksm * just to undo a merge it made a moment before, so unlikely to oom. * diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3e8f9e5e9291..df411de17a75 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1987,7 +1987,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(test_thread_flag(TIF_MEMDIE) || + if (unlikely(atomic_read(¤t->memdie) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4c21f744daa6..9d24007cdb82 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -473,7 +473,7 @@ static bool __oom_reap_task(struct task_struct *tsk) * [...] * out_of_memory * select_bad_process - * # no TIF_MEMDIE task selects new victim + * # no memdie task selects new victim * unmap_page_range # frees some memory */ mutex_lock(&oom_lock); @@ -593,7 +593,7 @@ static void oom_reap_task(struct task_struct *tsk) } /* - * Clear TIF_MEMDIE because the task shouldn't be sitting on a + * Clear memdie because the task shouldn't be sitting on a * reasonably reclaimable memory anymore or it is not a good candidate * for the oom victim right now because it cannot release its memory * itself nor by the oom reaper. @@ -669,14 +669,14 @@ void mark_oom_victim(struct task_struct *tsk) { WARN_ON(oom_killer_disabled); /* OOM killer might race with memcg OOM */ - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) + if (!atomic_add_unless(&tsk->memdie, 1, 1)) return; atomic_inc(&tsk->signal->oom_victims); /* * Make sure that the task is woken up from uninterruptible sleep * if it is frozen because OOM killer wouldn't be able to free * any memory and livelock. freezing_slow_path will tell the freezer - * that TIF_MEMDIE tasks should be ignored. + * that memdie tasks should be ignored. */ __thaw_task(tsk); atomic_inc(&oom_victims); @@ -687,7 +687,7 @@ void mark_oom_victim(struct task_struct *tsk) */ void exit_oom_victim(struct task_struct *tsk) { - if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE)) + if (!atomic_add_unless(&tsk->memdie, -1, 0)) return; atomic_dec(&tsk->signal->oom_victims); @@ -771,7 +771,7 @@ bool task_will_free_mem(struct task_struct *task) * If the process has passed exit_mm we have to skip it because * we have lost a link to other tasks sharing this mm, we do not * have anything to reap and the task might then get stuck waiting - * for parent as zombie and we do not want it to hold TIF_MEMDIE + * for parent as zombie and we do not want it to hold memdie */ p = find_lock_task_mm(task); if (!p) @@ -836,7 +836,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, /* * If the task is already exiting, don't alarm the sysadmin or kill - * its children or threads, just set TIF_MEMDIE so it can die quickly + * its children or threads, just set memdie so it can die quickly */ if (task_will_free_mem(p)) { mark_oom_victim(p); @@ -893,7 +893,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, mm = victim->mm; atomic_inc(&mm->mm_count); /* - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent + * We should send SIGKILL before setting memdie in order to prevent * the OOM victim from depleting the memory reserves from the user * space under its control. */ @@ -1016,7 +1016,7 @@ bool out_of_memory(struct oom_control *oc) * quickly exit and free its memory. * * But don't select if current has already released its mm and cleared - * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur. + * memdie flag at exit_mm(), otherwise an OOM livelock may occur. */ if (current->mm && task_will_free_mem(current)) { mark_oom_victim(current); @@ -1096,7 +1096,7 @@ void pagefault_out_of_memory(void) * be a racing OOM victim for which oom_killer_disable() * is waiting for. */ - WARN_ON(test_thread_flag(TIF_MEMDIE)); + WARN_ON(atomic_read(¤t->memdie)); } mutex_unlock(&oom_lock); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89128d64d662..6c550afde6a4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3050,7 +3050,7 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...) * of allowed nodes. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) - if (test_thread_flag(TIF_MEMDIE) || + if (atomic_read(¤t->memdie) || (current->flags & (PF_MEMALLOC | PF_EXITING))) filter &= ~SHOW_MEM_FILTER_NODES; if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) @@ -3428,7 +3428,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) alloc_flags |= ALLOC_NO_WATERMARKS; else if (!in_interrupt() && ((current->flags & PF_MEMALLOC) || - unlikely(test_thread_flag(TIF_MEMDIE)))) + unlikely(atomic_read(¤t->memdie)))) alloc_flags |= ALLOC_NO_WATERMARKS; } #ifdef CONFIG_CMA @@ -3637,7 +3637,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, } /* Avoid allocations with no watermarks from looping endlessly */ - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL)) + if (atomic_read(¤t->memdie) && !(gfp_mask & __GFP_NOFAIL)) goto nopage; /* -- 2.8.1 -- Michal Hocko SUSE Labs