From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 86AA96B0292 for ; Thu, 27 Jul 2017 05:04:07 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id z48so32579143wrc.4 for ; Thu, 27 Jul 2017 02:04:07 -0700 (PDT) Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com. [74.125.82.68]) by mx.google.com with ESMTPS id 2si15250125wro.472.2017.07.27.02.04.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 02:04:06 -0700 (PDT) Received: by mail-wm0-f68.google.com with SMTP id r123so13553066wmb.5 for ; Thu, 27 Jul 2017 02:04:06 -0700 (PDT) From: Michal Hocko Subject: [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Date: Thu, 27 Jul 2017 11:03:55 +0200 Message-Id: <20170727090357.3205-1-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML Hi, this is a part of a larger series I posted back in Oct last year [1]. I have dropped patch 3 because it was incorrect and patch 4 is not applicable without it. The primary reason to apply patch 1 is to remove a risk of the complete memory depletion by oom victims. While this is a theoretical risk right now there is a demand for memcg aware oom killer which might kill all processes inside a memcg which can be a lot of tasks. That would make the risk quite real. This issue is addressed by limiting access to memory reserves. We no longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim instead. See Patch 1 for more details. Patch 2 is a trivial follow up cleanup. I would still like to get rid of TIF_MEMDIE completely but I do not have time to do it now and it is not a pressing issue. [1] http://lkml.kernel.org/r/20161004090009.7974-1-mhocko@kernel.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 191436B02B4 for ; Thu, 27 Jul 2017 05:04:09 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id g71so12854435wmg.13 for ; Thu, 27 Jul 2017 02:04:09 -0700 (PDT) Received: from mail-wm0-f65.google.com (mail-wm0-f65.google.com. [74.125.82.65]) by mx.google.com with ESMTPS id n16si18522334wrn.214.2017.07.27.02.04.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 02:04:07 -0700 (PDT) Received: by mail-wm0-f65.google.com with SMTP id r77so9748253wmd.2 for ; Thu, 27 Jul 2017 02:04:07 -0700 (PDT) From: Michal Hocko Subject: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Date: Thu, 27 Jul 2017 11:03:56 +0200 Message-Id: <20170727090357.3205-2-mhocko@kernel.org> In-Reply-To: <20170727090357.3205-1-mhocko@kernel.org> References: <20170727090357.3205-1-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML , Michal Hocko From: Michal Hocko For ages we have been relying on TIF_MEMDIE thread flag to mark OOM victims and then, among other things, to give these threads full access to memory reserves. There are few shortcomings of this implementation, though. First of all and the most serious one is that the full access to memory reserves is quite dangerous because we leave no safety room for the system to operate and potentially do last emergency steps to move on. Secondly this flag is per task_struct while the OOM killer operates on mm_struct granularity so all processes sharing the given mm are killed. Giving the full access to all these task_structs could lead to a quick memory reserves depletion. We have tried to reduce this risk by giving TIF_MEMDIE only to the main thread and the currently allocating task but that doesn't really solve this problem while it surely opens up a room for corner cases - e.g. GFP_NO{FS,IO} requests might loop inside the allocator without access to memory reserves because a particular thread was not the group leader. Now that we have the oom reaper and that all oom victims are reapable after 1b51e65eab64 ("oom, oom_reaper: allow to reap mm shared by the kthreads") we can be more conservative and grant only partial access to memory reserves because there are reasonable chances of the parallel memory freeing. We still want some access to reserves because we do not want other consumers to eat up the victim's freed memory. oom victims will still contend with __GFP_HIGH users but those shouldn't be so aggressive to starve oom victims completely. Introduce ALLOC_OOM flag and give all tsk_is_oom_victim tasks access to the half of the reserves. This makes the access to reserves independent on which task has passed through mark_oom_victim. Also drop any usage of TIF_MEMDIE from the page allocator proper and replace it by tsk_is_oom_victim as well which will make page_alloc.c completely TIF_MEMDIE free finally. CONFIG_MMU=n doesn't have oom reaper so let's stick to the original ALLOC_NO_WATERMARKS approach but be careful because they still might deplete all the memory reserves so keep the semantic as close to the original implementation as possible and give them access to memory reserves only up to exit_mm (when tsk->mm is cleared) rather than while tsk_is_oom_victim which is until signal struct is gone. There is a demand to make the oom killer memcg aware which will imply many tasks killed at once. This change will allow such a usecase without worrying about complete memory reserves depletion. Signed-off-by: Michal Hocko --- mm/internal.h | 11 +++++++++++ mm/oom_kill.c | 9 +++++---- mm/page_alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++++---------- 3 files changed, 61 insertions(+), 14 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 24d88f084705..1ebcb1ed01b5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -480,6 +480,17 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, /* Mask to get the watermark bits */ #define ALLOC_WMARK_MASK (ALLOC_NO_WATERMARKS-1) +/* + * Only MMU archs have async oom victim reclaim - aka oom_reaper so we + * cannot assume a reduced access to memory reserves is sufficient for + * !MMU + */ +#ifdef CONFIG_MMU +#define ALLOC_OOM 0x08 +#else +#define ALLOC_OOM ALLOC_NO_WATERMARKS +#endif + #define ALLOC_HARDER 0x10 /* try to alloc harder */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e8b4f030c1c..c9f3569a76c7 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -824,7 +824,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message) /* * If the task is already exiting, don't alarm the sysadmin or kill - * its children or threads, just set TIF_MEMDIE so it can die quickly + * its children or threads, just give it access to memory reserves + * so it can die quickly */ task_lock(p); if (task_will_free_mem(p)) { @@ -889,9 +890,9 @@ static void oom_kill_process(struct oom_control *oc, const char *message) count_memcg_event_mm(mm, OOM_KILL); /* - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent - * the OOM victim from depleting the memory reserves from the user - * space under its control. + * We should send SIGKILL before granting access to memory reserves + * in order to prevent the OOM victim from depleting the memory + * reserves from the user space under its control. */ do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); mark_oom_victim(victim); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 80e4adb4c360..5e5911f40014 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2930,7 +2930,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, { long min = mark; int o; - const bool alloc_harder = (alloc_flags & ALLOC_HARDER); + const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); /* free_pages may go negative - that's OK */ free_pages -= (1 << order) - 1; @@ -2943,10 +2943,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, * the high-atomic reserves. This will over-estimate the size of the * atomic reserve but it avoids a search. */ - if (likely(!alloc_harder)) + if (likely(!alloc_harder)) { free_pages -= z->nr_reserved_highatomic; - else - min -= min / 4; + } else { + /* + * OOM victims can try even harder than normal ALLOC_HARDER + * users + */ + if (alloc_flags & ALLOC_OOM) + min -= min / 2; + else + min -= min / 4; + } + #ifdef CONFIG_CMA /* If allocation can't use CMA areas don't use free CMA pages */ @@ -3184,7 +3193,7 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask) * of allowed nodes. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) - if (test_thread_flag(TIF_MEMDIE) || + if (tsk_is_oom_victim(current) || (current->flags & (PF_MEMALLOC | PF_EXITING))) filter &= ~SHOW_MEM_FILTER_NODES; if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) @@ -3603,6 +3612,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask) return alloc_flags; } +static bool oom_reserves_allowed(struct task_struct *tsk) +{ + if (!tsk_is_oom_victim(tsk)) + return false; + + /* + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves + * depletion and shouldn't give access to memory reserves passed the + * exit_mm + */ + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm) + return false; + + return true; +} + bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) { if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) @@ -3614,7 +3639,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) return true; if (!in_interrupt() && ((current->flags & PF_MEMALLOC) || - unlikely(test_thread_flag(TIF_MEMDIE)))) + oom_reserves_allowed(current))) return true; return false; @@ -3770,6 +3795,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, unsigned long alloc_start = jiffies; unsigned int stall_timeout = 10 * HZ; unsigned int cpuset_mems_cookie; + bool reserves; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3875,15 +3901,24 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); - if (gfp_pfmemalloc_allowed(gfp_mask)) - alloc_flags = ALLOC_NO_WATERMARKS; + /* + * Distinguish requests which really need access to whole memory + * reserves from oom victims which can live with their own reserve + */ + reserves = gfp_pfmemalloc_allowed(gfp_mask); + if (reserves) { + if (tsk_is_oom_victim(current)) + alloc_flags = ALLOC_OOM; + else + alloc_flags = ALLOC_NO_WATERMARKS; + } /* * Reset the zonelist iterators if memory policies can be ignored. * These allocations are high priority and system rather than user * orientated. */ - if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { + if (!(alloc_flags & ALLOC_CPUSET) || reserves) { ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->high_zoneidx, ac->nodemask); @@ -3960,7 +3995,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto got_pg; /* Avoid allocations with no watermarks from looping endlessly */ - if (test_thread_flag(TIF_MEMDIE) && + if (tsk_is_oom_victim(current) && (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))) goto nopage; -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id C42156B025F for ; Thu, 27 Jul 2017 05:12:02 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id z36so26278063wrb.13 for ; Thu, 27 Jul 2017 02:12:02 -0700 (PDT) Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com. [74.125.82.66]) by mx.google.com with ESMTPS id 16si14633699wrt.399.2017.07.27.02.12.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 02:12:01 -0700 (PDT) Received: by mail-wm0-f66.google.com with SMTP id q189so11726519wmd.0 for ; Thu, 27 Jul 2017 02:12:01 -0700 (PDT) From: Michal Hocko Subject: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Date: Thu, 27 Jul 2017 11:03:57 +0200 Message-Id: <20170727090357.3205-3-mhocko@kernel.org> In-Reply-To: <20170727090357.3205-1-mhocko@kernel.org> References: <20170727090357.3205-1-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML , Michal Hocko From: Michal Hocko TIF_MEMDIE is set only to the tasks whick were either directly selected by the OOM killer or passed through mark_oom_victim from the allocator path. tsk_is_oom_victim is more generic and allows to identify all tasks (threads) which share the mm with the oom victim. Please note that the freezer still needs to check TIF_MEMDIE because we cannot thaw tasks which do not participage in oom_victims counting otherwise a !TIF_MEMDIE task could interfere after oom_disbale returns. Signed-off-by: Michal Hocko --- kernel/cgroup/cpuset.c | 8 ++++---- mm/memcontrol.c | 2 +- mm/oom_kill.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ca8376e5008c..1cc53dff0d94 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2497,12 +2497,12 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * If we're in interrupt, yes, we can always allocate. If @node is set in * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and this * node is set in the nearest hardwalled cpuset ancestor to current's cpuset, - * yes. If current has access to memory reserves due to TIF_MEMDIE, yes. + * yes. If current has access to memory reserves as an oom victim, yes. * Otherwise, no. * * GFP_USER allocations are marked with the __GFP_HARDWALL bit, * and do not allow allocations outside the current tasks cpuset - * unless the task has been OOM killed as is marked TIF_MEMDIE. + * unless the task has been OOM killed. * GFP_KERNEL allocations are not so marked, so can escape to the * nearest enclosing hardwalled ancestor cpuset. * @@ -2525,7 +2525,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * affect that: * in_interrupt - any node ok (current task context irrelevant) * GFP_ATOMIC - any node ok - * TIF_MEMDIE - any node ok + * tsk_is_oom_victim - any node ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. */ @@ -2543,7 +2543,7 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask) * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(tsk_is_oom_victim(current))) return true; if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ return false; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 544d47e5cbbd..86a48affb938 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(test_thread_flag(TIF_MEMDIE) || + if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index c9f3569a76c7..65cc2f9aaa05 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) * [...] * out_of_memory * select_bad_process - * # no TIF_MEMDIE task selects new victim + * # no TIF_MEMDIE, selects new victim * unmap_page_range # frees some memory */ mutex_lock(&oom_lock); -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id B03F56B02F4 for ; Thu, 27 Jul 2017 10:01:25 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id x3so15742977oia.8 for ; Thu, 27 Jul 2017 07:01:25 -0700 (PDT) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id z8si295598oig.314.2017.07.27.07.01.21 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 27 Jul 2017 07:01:21 -0700 (PDT) Subject: Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim From: Tetsuo Handa References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-3-mhocko@kernel.org> In-Reply-To: <20170727090357.3205-3-mhocko@kernel.org> Message-Id: <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> Date: Thu, 27 Jul 2017 23:01:05 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@kernel.org, akpm@linux-foundation.org Cc: rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.com Michal Hocko wrote: > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 544d47e5cbbd..86a48affb938 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > * bypass the last charges so that they can exit quickly and > * free their memory. > */ > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > + if (unlikely(tsk_is_oom_victim(current) || > fatal_signal_pending(current) || > current->flags & PF_EXITING)) > goto force; Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index c9f3569a76c7..65cc2f9aaa05 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > * [...] > * out_of_memory > * select_bad_process > - * # no TIF_MEMDIE task selects new victim > + * # no TIF_MEMDIE, selects new victim > * unmap_page_range # frees some memory > */ > mutex_lock(&oom_lock); This comment is wrong. No MMF_OOM_SKIP mm selects new victim. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id 241BA6B0313 for ; Thu, 27 Jul 2017 10:08:59 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id x3so15759089oia.8 for ; Thu, 27 Jul 2017 07:08:59 -0700 (PDT) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id r13si10476677oie.235.2017.07.27.07.08.57 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 27 Jul 2017 07:08:57 -0700 (PDT) Subject: Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim From: Tetsuo Handa References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-3-mhocko@kernel.org> <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> In-Reply-To: <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> Message-Id: <201707272308.GHD90652.OSHQtOOVJFLFMF@I-love.SAKURA.ne.jp> Date: Thu, 27 Jul 2017 23:08:42 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@kernel.org, akpm@linux-foundation.org Cc: rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.com Tetsuo Handa wrote: > Michal Hocko wrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 544d47e5cbbd..86a48affb938 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > * bypass the last charges so that they can exit quickly and > > * free their memory. > > */ > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > + if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > goto force; > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index c9f3569a76c7..65cc2f9aaa05 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > > * [...] > > * out_of_memory > > * select_bad_process > > - * # no TIF_MEMDIE task selects new victim > > + * # no TIF_MEMDIE, selects new victim > > * unmap_page_range # frees some memory > > */ > > mutex_lock(&oom_lock); > > This comment is wrong. No MMF_OOM_SKIP mm selects new victim. > Oops. "MMF_OOM_SKIP mm selects new victim." according to http://lkml.kernel.org/r/201706271952.FEB21375.SFJFHOQLOtVOMF@I-love.SAKURA.ne.jp . -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 0ED4F6B03B4 for ; Thu, 27 Jul 2017 10:18:05 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id u89so35478316wrc.1 for ; Thu, 27 Jul 2017 07:18:05 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id f45si16009302wrf.145.2017.07.27.07.18.03 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 27 Jul 2017 07:18:04 -0700 (PDT) Date: Thu, 27 Jul 2017 16:18:01 +0200 From: Michal Hocko Subject: Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Message-ID: <20170727141801.GA31031@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-3-mhocko@kernel.org> <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu 27-07-17 23:01:05, Tetsuo Handa wrote: > Michal Hocko wrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 544d47e5cbbd..86a48affb938 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > * bypass the last charges so that they can exit quickly and > > * free their memory. > > */ > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > + if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > goto force; > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? I will double check. > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index c9f3569a76c7..65cc2f9aaa05 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > > * [...] > > * out_of_memory > > * select_bad_process > > - * # no TIF_MEMDIE task selects new victim > > + * # no TIF_MEMDIE, selects new victim > > * unmap_page_range # frees some memory > > */ > > mutex_lock(&oom_lock); > > This comment is wrong. No MMF_OOM_SKIP mm selects new victim. This hunk shouldn't make it to the patch. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 8BA2A6B0494 for ; Thu, 27 Jul 2017 10:45:48 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id z48so33504856wrc.4 for ; Thu, 27 Jul 2017 07:45:48 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id e204si7298675wme.193.2017.07.27.07.45.47 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 27 Jul 2017 07:45:47 -0700 (PDT) Date: Thu, 27 Jul 2017 16:45:44 +0200 From: Michal Hocko Subject: Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Message-ID: <20170727144544.GC31031@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-3-mhocko@kernel.org> <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu 27-07-17 23:01:05, Tetsuo Handa wrote: > Michal Hocko wrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 544d47e5cbbd..86a48affb938 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > * bypass the last charges so that they can exit quickly and > > * free their memory. > > */ > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > + if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > goto force; > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? OK, so your concern was > Does this test_thread_flag(TIF_MEMDIE) (or tsk_is_oom_victim(current)) make sense? > > If current thread is OOM-killed, SIGKILL must be pending before arriving at > do_exit() and PF_EXITING must be set after arriving at do_exit(). > But I can't find locations which do memory allocation between clearing > SIGKILL and setting PF_EXITING. I can't find them either and maybe there are none. But why do we care in this particular patch which merely replaces TIF_MEMDIE check by tsk_is_oom_victim? The code will surely not become less valid. If you believe this check is redundant then send a patch with the clear justification. But I would say, at least from the robustness point of view I would just keep it there. We do not really have any control on what happens between clearing signals and setting PF_EXITING. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw0-f199.google.com (mail-yw0-f199.google.com [209.85.161.199]) by kanga.kvack.org (Postfix) with ESMTP id 39F086B049B for ; Thu, 27 Jul 2017 10:55:39 -0400 (EDT) Received: by mail-yw0-f199.google.com with SMTP id n140so236265291ywd.13 for ; Thu, 27 Jul 2017 07:55:39 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id l125si4362414ywd.69.2017.07.27.07.55.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 07:55:38 -0700 (PDT) Date: Thu, 27 Jul 2017 15:55:13 +0100 From: Roman Gushchin Subject: Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Message-ID: <20170727145513.GA1185@castle.DHCP.thefacebook.com> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-3-mhocko@kernel.org> <201707272301.EII82876.tOOJOFLMHFQSFV@I-love.SAKURA.ne.jp> <20170727144544.GC31031@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170727144544.GC31031@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Tetsuo Handa , akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Jul 27, 2017 at 04:45:44PM +0200, Michal Hocko wrote: > On Thu 27-07-17 23:01:05, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 544d47e5cbbd..86a48affb938 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > > * bypass the last charges so that they can exit quickly and > > > * free their memory. > > > */ > > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > > + if (unlikely(tsk_is_oom_victim(current) || > > > fatal_signal_pending(current) || > > > current->flags & PF_EXITING)) > > > goto force; > > > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? > > OK, so your concern was > > > Does this test_thread_flag(TIF_MEMDIE) (or tsk_is_oom_victim(current)) make sense? > > > > If current thread is OOM-killed, SIGKILL must be pending before arriving at > > do_exit() and PF_EXITING must be set after arriving at do_exit(). > > > But I can't find locations which do memory allocation between clearing > > SIGKILL and setting PF_EXITING. > > I can't find them either and maybe there are none. But why do we care > in this particular patch which merely replaces TIF_MEMDIE check by > tsk_is_oom_victim? The code will surely not become less valid. If > you believe this check is redundant then send a patch with the clear > justification. But I would say, at least from the robustness point of > view I would just keep it there. We do not really have any control on > what happens between clearing signals and setting PF_EXITING. I agree, this check is probably redundant, but it really makes no difference, let's keep it bullet-proof. If we care about performance here, we can rearrange the checks: if (unlikely(fatal_signal_pending(current) || current->flags & PF_EXITING) || tsk_is_oom_victim(current)) goto force; Roman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 47CA46B0588 for ; Sat, 29 Jul 2017 04:34:23 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id t25so104387352pfg.15 for ; Sat, 29 Jul 2017 01:34:23 -0700 (PDT) Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id 32si9666029pla.214.2017.07.29.01.34.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 29 Jul 2017 01:34:21 -0700 (PDT) Date: Sat, 29 Jul 2017 16:33:35 +0800 From: kbuild test robot Subject: Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Message-ID: <201707291609.lv5NwUPI%fengguang.wu@intel.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="n8g4imXOkfNTN/H1" Content-Disposition: inline In-Reply-To: <20170727090357.3205-3-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: kbuild-all@01.org, Andrew Morton , David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML , Michal Hocko --n8g4imXOkfNTN/H1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Michal, [auto build test ERROR on cgroup/for-next] [also build test ERROR on v4.13-rc2 next-20170728] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-oom-do-not-rely-on-TIF_MEMDIE-for-memory-reserves-access/20170728-101955 base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next config: i386-randconfig-c0-07291424 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): In file included from include/linux/ioport.h:12:0, from include/linux/device.h:16, from include/linux/node.h:17, from include/linux/cpu.h:16, from kernel/cgroup/cpuset.c:25: kernel/cgroup/cpuset.c: In function '__cpuset_node_allowed': >> include/linux/compiler.h:123:18: error: implicit declaration of function 'tsk_is_oom_victim' [-Werror=implicit-function-declaration] static struct ftrace_likely_data \ ^ include/linux/compiler.h:156:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ kernel/cgroup/cpuset.c:2546:2: note: in expansion of macro 'if' if (unlikely(tsk_is_oom_victim(current))) ^ include/linux/compiler.h:146:24: note: in expansion of macro '__branch_check__' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^ kernel/cgroup/cpuset.c:2546:6: note: in expansion of macro 'unlikely' if (unlikely(tsk_is_oom_victim(current))) ^ cc1: some warnings being treated as errors -- In file included from include/linux/ioport.h:12:0, from include/linux/device.h:16, from include/linux/node.h:17, from include/linux/cpu.h:16, from kernel//cgroup/cpuset.c:25: kernel//cgroup/cpuset.c: In function '__cpuset_node_allowed': >> include/linux/compiler.h:123:18: error: implicit declaration of function 'tsk_is_oom_victim' [-Werror=implicit-function-declaration] static struct ftrace_likely_data \ ^ include/linux/compiler.h:156:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ kernel//cgroup/cpuset.c:2546:2: note: in expansion of macro 'if' if (unlikely(tsk_is_oom_victim(current))) ^ include/linux/compiler.h:146:24: note: in expansion of macro '__branch_check__' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^ kernel//cgroup/cpuset.c:2546:6: note: in expansion of macro 'unlikely' if (unlikely(tsk_is_oom_victim(current))) ^ cc1: some warnings being treated as errors vim +/tsk_is_oom_victim +123 include/linux/compiler.h 1f0d69a9 Steven Rostedt 2008-11-12 120 d45ae1f7 Steven Rostedt (VMware 2017-01-17 121) #define __branch_check__(x, expect, is_constant) ({ \ 1f0d69a9 Steven Rostedt 2008-11-12 122 int ______r; \ 134e6a03 Steven Rostedt (VMware 2017-01-19 @123) static struct ftrace_likely_data \ 1f0d69a9 Steven Rostedt 2008-11-12 124 __attribute__((__aligned__(4))) \ 45b79749 Steven Rostedt 2008-11-21 125 __attribute__((section("_ftrace_annotated_branch"))) \ 1f0d69a9 Steven Rostedt 2008-11-12 126 ______f = { \ 134e6a03 Steven Rostedt (VMware 2017-01-19 127) .data.func = __func__, \ 134e6a03 Steven Rostedt (VMware 2017-01-19 128) .data.file = __FILE__, \ 134e6a03 Steven Rostedt (VMware 2017-01-19 129) .data.line = __LINE__, \ 1f0d69a9 Steven Rostedt 2008-11-12 130 }; \ d45ae1f7 Steven Rostedt (VMware 2017-01-17 131) ______r = __builtin_expect(!!(x), expect); \ d45ae1f7 Steven Rostedt (VMware 2017-01-17 132) ftrace_likely_update(&______f, ______r, \ d45ae1f7 Steven Rostedt (VMware 2017-01-17 133) expect, is_constant); \ 1f0d69a9 Steven Rostedt 2008-11-12 134 ______r; \ 1f0d69a9 Steven Rostedt 2008-11-12 135 }) 1f0d69a9 Steven Rostedt 2008-11-12 136 :::::: The code at line 123 was first introduced by commit :::::: 134e6a034cb004ed5acd3048792de70ced1c6cf5 tracing: Show number of constants profiled in likely profiler :::::: TO: Steven Rostedt (VMware) :::::: CC: Steven Rostedt (VMware) --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation --n8g4imXOkfNTN/H1 Content-Type: application/gzip Content-Disposition: attachment; filename=".config.gz" Content-Transfer-Encoding: base64 H4sICJJEfFkAAy5jb25maWcAlDzbctw2su/5iinnPOw+JNbNsk6d0gNIgjPIEARNgDMjvbAU eZyoVpa80miT/P3pBngBwOY4m0ocE924970b8+MPPy7Y2+H5693h4f7u8fGvxW/7p/3L3WH/ efHl4XH/f4tMLUplFjwT5mdALh6e3v58/3B+dbm4+Pn0/OeTn17uTxfr/cvT/nGRPj99efjt Dbo/PD/98COgp6rMxbK9vEiEWTy8Lp6eD4vX/eGHrn13ddmen13/5X2PH6LUpm5SI1TZZjxV Ga9HoGpM1Zg2V7Vk5vrd/vHL+dlPuKx3PQar0xX0y93n9bu7l/vf3/95dfn+3q7y1W6i/bz/ 4r6HfoVK1xmvWt1UlarNOKU2LF2bmqV8CpOyGT/szFKyqq3LrIWd61aK8vrqGJztrk8vaYRU yYqZ744ToAXDlZxnrV62mWRtwculWY1rXfKS1yJthWYInwKSZjltXG25WK5MvGV2067YhrdV 2uZZOkLrreay3aWrJcuylhVLVQuzktNxU1aIpGaGw8UV7CYaf8V0m1ZNWwNsR8FYuuJtIUq4 IHHLRwy7KM1NU7UVr+0YrObeZu0J9SAuE/jKRa1Nm66acj2DV7Elp9HcikTC65JZ8q2U1iIp eISiG11xuLoZ8JaVpl01MEsl4QJXsGYKwx4eKyymKZLJHJZUdasqIyQcSwaMBWckyuUcZsbh 0u32WAHcELAnsGurZTVpK9jtTbvUc0M2Va0S7oFzsWs5q4sb+G4l92ihWhoGZwGUuuGFvj7r 2wdWhhvWwPLvHx9+ff/1+fPb4/71/f80JZMcKYMzzd//HPE0/M/JElV7axD1p3arau/ikkYU GRwTb/nOrUIHbG5WQDZ4gLmCP1rDNHa2km5p5eYjSre3b9AyCDFhWl5u4Dxw4VKY6/NhS2kN F28ZV8Dlv3s3CsyurTVcU3ITboUVG15rIC7sRzS3rDEqYoE1ECQv2uWtqGhIApAzGlTc+tLB h+xu53rMzF/cXgBg2Ku3Kn+rMdyu7RgCrpA4K3+V0y7q+IgXxIBAiKwpgDOVNkh11+/+8fT8 tP+nd316y+i96Bu9EVVKwkAKAFPITw1vOIngyAWYRdU3LTOgjlbE8vIVKzMrS4aOjeYgV8kx WZORmtnel+VhiwHrBtIqemIHzlm8vv36+tfrYf91JPZBmQBjWYYn9AyA9EptaQjPcw4KH6fO c9Anej3FQ4kJwgvx6UGkWNZW7NLgdOVTP7ZkSjJRUm0gxEG0wincTMeSWtBr6ADHhrUSNoSA JZOCbHYSJhDOumK15t1cw8X5W7LD5Zq4xhStGa0aGBuUhUlXmYrFvo+SMeMxuQ/ZgGbOUDEX DPXdTVoQl2sl52aklVi743gg1UujjwLbpFYsS2Gi42hgDLUs+6Uh8aRCrYNL7onWPHzdv7xS dGtEum5VyYEwvaFK1a5uURJLS0rDyUMjmABCZSIlTtz1EhEH2laKzcCMAq2l7dFZ3eTM5qp5 b+5e/7U4wJoXd0+fF6+Hu8Pr4u7+/vnt6fDw9Fu0eGvSpKlqSuNoZ5gZ6cPeywgmBUGiM+TZ lIOIAVRDIqHCAzPYBKRmV1ynzUITJ1tz0MqpZxzDByhXOEDfsg4wbJ+oCeedjgNLKYrxhjyI M3n5Mk2ssRDAclaC9+Cp4bERjA6We5azgwDB2/vzj9VOotIET564V2tIgC1ennnmk1h3vsik xZ782FwoHCEHQSlyc3360W/Hmwbz3ocPG6lqUZp1q1nO4zHOA8HegDnkzBuwmDPHVJRpmaDI AISmRC8DjMs2LxrtuQ/pslZNpf2TAf2ULqkjKdYduo/tbM0RRis+C3KLpYScA1ciCxbSNdcZ aRZ00ByI7dY6lnG/zpymulagUE0wF94fLqCDzc+X8Y1IOTEddIzZLtoer/P44GNFolW6HkBO lo/zrHi6rhRQCAocsIRpIwMtGtA4IAdoA8bSA9qW89cFCiBHL6KqeQryl7qyOnTv8P7hZKy1 XGd+CAC+mYTRnBrybN06i+xYaIjMV2gJrVZo8I1VC1fR94V3xungNKF6tjeA8YYyusAIDX1P YsuoFo2nFUHglLBBlflukeNOkZ16cRDXEYRmyivrTdoYRNSnSnW1hiUWzOAavaOtPKqJBW80 kwSzVgD9BuyggQ3QEms7vU5vDa9o0Ps+JeDSiZ6R1evUH2W9Qz99I70j6lvaaK6xPdGqaMBU gb3S0nlATcBZtLSINqfnglo5Gn+3pRS+q+jxHS9yUEN1QBbRjdBKF+fPG/JQc9iCF+XglfJt Ki2WJStyj1fsEfoN1kqyDeN8VX7sGleBI86E8vuybCNgtV13Wjog7VjnKKeYvkpF+6kR9TrW FwmrazC0ySFtNCYjhYgjfJixHYxK7/RPTy4mVkoXp6z2L1+eX77ePd3vF/w/+yewrBjYWCna VmAijubLzOBdgASBsON2I22chFz+Rrr+rTW+IhoP3H2M3dVr6loKlgRMVTS0L6cLRakr7A+n XC9577qGowEUtSAaSm0NvqOiaRUu3nBp1Uq7Afs3F6n1smiurlUuisjWHOw7EF9WEcWCT7lu Hif2Lch7jsL9tf/SyAockoTToqULOpEwO58NS4McAV5CpZai8TtHZ+CailTgTTZl2CMynJAi 0PwDSxuM6i2LQy0C5ARaU7A4E4HWcZTMtdbckADQOnQH1woeTZtTuiKQY6Ojb1FXSq0jIIaN 4duIZaMawnPTcAno73S+KxGYBflnRH7TK/EpAphMXVyDsELBlLgBqwT9S6uGbCAvWmPNlyD4 y8wF6buLaVkVbzQtqN0BXmxJWdhqCyzJmbOaIpgUO6CAEaztGmJNDvIQ2k1Tl+AtwBkIP4cR yy/iYlasztA4t6ae4RjAjAzBcRBi/l5E1d25ZI2MydEe88hIk8tx9OJ8ilRWGMSPz9S1uujj DCxTzUx8W1RgONsARx/PJNaneYoisgVRYCYnuAQDrCqapSh94gwax6jJ0IwxDCt3C74T5oYU Eh42aHHkffivVtV3sR2NgjO5/h4mClyHPSd4ANdeHsoLSwCR7RkCKf8hxgFSLGMLNsIAWmoK RmvkKTZsQNGifoUhGrhBMK9i2nQkICyKo868Rt8jFpfT+MWM8CoxUMa7lAlmLzzDVmVNARIR ZTPaajVB6NpBrAKcZo+mObsIAamIloBhr6vwcoGY+qyCKQJSHaeFtVGBXkzZJU0k5cCVL0H/ wHFuQXR4i1RFhkZjl3I6nwBY2oVMx3uuMMThKb48P6JL7Uo3uFV7mXSwGXGUdWJY0QfZ6+3u v0LujRjiREZ9YkAxGa+TZzjPg+Lujmo6nDHyg3moBo8jsttdGihVm59+vXvdf178yxmb316e vzw8BiE7ROpWQMxuob3dEwZSpxBPpgDMpcOte+7UBXFIPuJ5ezE3xkX7cU4i9YrfGQYrjkzn HTDoEUyQeX41HCU6Mz5vW89Ho+F8fer5Ao5PiYl7DrbBwQKMlMZTZEkXvxrGKZKM5TMxKJ1q AZf4qeF+YLCPQCR6STYWIrDBx4CF4ct6Tn/0WLfAk9lRDBABypjYYA7QUpnZ1LZVlpScR6Rt YuJlQlOrP83jt/LTdGfo35AJBXuCYPioig3ZoOru5fCAJR8L89e3ve8+MbD7bMgCnEcMmwTO KAOPuRxxaCEgdjRGL7x0PsKDwSWItO8Nblgtjg4vWUoPL3Wm9HeGLzL5HQy9nJl+dGEKEEUz RzAO08wcYwdfM+DoYB+9b50LqhmTlJdX9MY9KpydD2lEfkJ3P2QlaNsI6KZ6whFqoe9/32MK 3/e6hXJRxlIpP1PXtWagK3EJU0iaB5QMn60LhXYI5OH12dx+2CMJ32j8vhmXeaRXN/n1u/sv /x6il3AU8/vxgOubxLd3++Yk9/KOSbxPL3DFwowe0+WpF/0rbSUIyOEKbLCmJLIAQwEHMwo9 sFp6mVsr1F1nYHG1Lf2lupqfGSDONAcbnGibCM8smk1EjijzkLhzvaW7TtrHPIATaS/P9/vX 1+eXxQFEms3AfdnfHd5efPGGYj0smppUxuScgfPHXVQ9AmFeqYdjnUcEl5WV9GFjAhacP8cS rLdc2IzMmPEDnQSWTQZ2wEzJHd+BM5NhKdMYLh66I0I/LMkxiODmKCpNW4SIwuQ4fpf2oIxY lOAyEcEGujanh2YnqLP0/Ox0N7PH8zN0c9AYLTPm5xQQOJB1V0yRM1E0YfgWOp7tTk/nRxe1 COx1x5nAJcY5Mq0NC5CaenUDjvlGaHCdlqEZAlfOUEYGocyu7chxDCgDS1BqYCOH6cao+UaS 6n7UYraL63h87iNZ5hg1SjiCl5EoZVz0edSxF1eX5IzywxGA0XQgFmFS0q6GvJwbEHwlIxop xHfAx+F0QLWHXtDQ9cyS1h9n2q/o9rRutKJzfdL6dnzGrJBbUWKxTDqzkA58TutVyQtGmlRL Dr7JcncaXLVtbIuZ60lvwAiaPeSNYOl5S5eGWeDMgWHCZqYXarsZzu+cn1CeWEbHVF5Xg+qS 7pc+SnE6D6vApwNVUKaRtLdxZomut5+lG+UvRvYwghDCgAMiTSJKIRtpHfocDOPi5vrCh1v+ T00hdRBb6qpBML7DC9qXxBE1Gh64HC8I0TXbewsKvHsIKAcCHXbKmnoKsMEdyQ0jx2pkGrSv Km6muYJMCmIDpS3l1VgMEglyLWlx56BypoQPrAhZGRtgo+RfB96oAmQks9HuuO+Rblayhldr Y64YAYoJUvWNocrkYCcal6ZOarXmpRW8GGSjNIYlnjDZ3TVhdUnBlyylHd8Oy9HF/MAhGViK L12oScasgNgYJ9MrME2mIFH+wm1SwFlvXn7v6/PTw+H5JYi/+DH3jpPKSfp2glOzaiaFPUFN sUiKpAAP1RpIahuSqb1Ue7DtRs5opRjgdT29TEREIlxXudj5DGIUyKAkKA4RV3TEzhEN0giM 0VS0iQCOAYgKkH5zN63rcEnAciIIBZQKC+FA/8+UyAHkIqhq6xovL2h7aCN1VYAhdv49MAYY j6KcUXbMCMT+/rp6yClt/ix5q/Jcc3N98ufHqxP7T7TPyJfJwYKG1paXjHg/YD2EebAV272N K+GCPBktCiSzordSscCz4dfDao727RclWdmwsOBjWJGDUXUNrnM4WmuVp+vnuTfjcMhQvpBz OTguk9CSDJq7QScZ2D5+ufRjiO65kNApOAvEwN1BgM1esDgmYgftDFj3NqCkmcHRRmXs4qze uYjGT7CSwI/FdA2uQiAN4zRUm1/8PAqo1Q2ImyyrWzP7FCsB5eILXWfSK8wweKPLhshjrrV3 m33Yw2Y9XIVuVl9fnPxv+Lrpb3hgIYQuJDiS86GgYFFt2U3gtpFo0hW7zOclXF7arKpJqf/M sJb5rQHo52o46LuozQ8nwEeclx6ach02wlRMXw+loreVUh7H3iaNpzVvz/NAi95q2b/0GWmm e4YD11jR3lzfy/KmF5LouMu+7umrHebCSkAtvK7R3LI5UCciserPE1RYWmDbsUBhHZU3uxqv dlKU7AnVykS2hK2EbBOh8O1NXTdVyEI2zgLcjN6q7KlwRHTdQ3SwS8Chx8TE9vpy5GlmVi2X TRG9RpCmrsOvVjPYs7jls+2dhB8qeE5m0CzNYfoWTdse+TRU6BWbTZ659GcULAnub/Q+ZBXk xXhO+2dd7p7OHN62pycn1L3dtmcfTgIhe9ueh6jRKPQw1zDMIH6t6b6qsWI+iIHwHaft+bRm emWrJih1DlJVoPkNxFejRj/tFLqXKkT73PLRsf7WzYP+Z1H3FdBg0SxnSsy7mrFNpoMyvT5A DxRMaV/Q5FiJU2SmnbzTsBTQ0VlH9t0SBsP6+Y/9ywIM67vf9l/3TwcbGGVpJRbP3zAB9Oo/ +u1SzdTO/ed+chBxI1PLIWE0DXqNOK6YZ+i1/eTMaS9jfSRVnPpZc/zqzW57XXqSY3SpfHz1 2uW7sUvlv3K1LV2VnFuI9QK091rYy331JUfLmbpHNz7Y1bl2o81sAohs06oNSFGRcf8laTgS T4+8ErIYLN5KwgwYhjdxa2OML8xs4wbmVlFbzsrpjlVKVThamI1O1BwuMSip64/BRSKcYzUL 7p7dkMDJYkQlaYkVDcqWS5CnjC6usbhmxWvp1z65DTXaKKBuDbyWxw88Y4xj1QxuDsuWTQXm XRbvMYYRZHZko6nAytG55/nAZ5NYilu8Kg0DSTN7Kr34EioORThOSWhv0vWdydb5xya5Wakj aGBWNChBsIZui1pRlcUNpSQGdmYVn1RG9u1dcV44BQLorGJl8inbegJO4KMBIKvZ3HR3xPD3 mcC8DvVt/25skb/s//22f7r/a/F6fxfWnfQsFsbQLNMt1QaflGKsz8yApy/hBjBy5Ux40MJ7 pwCHmXtFQeLiCWq4B1o7U10wJmrfr/z9LqrMwEYtaVIiewAMDT77bODv97L2R2MEpZeDkw6P iMToD4Y8wv/iHOb2TxPAuOuZdQ1bvB5fMi6+xBS5+Pzy8B+X8vdX5A6MjseMZmc1ia2Fxm2a 9mPNpyY7fRMj+cPgAZdq264v4/DcCKITCzZLsLPmj5yRT9asrjjPwFhwce9alGpmKSOiSL2n dyFIh9LJrvPC5dmiRfiOUHfmpX18HP4OCxhA5bJuynhUbF4BSc/ui4/EWU8E1Ovvdy/7z561 SG4mKrYKgfbXP7Dag1XOFSOloPj8uA8FX2gb9C2WaguWBT8uEwAlL5tA8KNaxvelesRLVVMV 5BsVR7Pd3HZ1ydtrv//FP0D5LvaH+5//6QWn0+AmUT0vFTqrlBFsgVK6z2m3TNR85imvQ1DF 3I8RWDArKYWJMDeyZ8ZDm7cOH9M+edfx4tIyOTspuHsQRE/C0WoO4k/YyEJbBJvAuq2pt9gd +pgd8Nt1JeNxsO1Yln1EmXMrBhRSQIdQ9AuOaLcRedQGMzOCPcHD3YEBkk5OqTJUfN3enBbR Vca/YRDc3dEjAgZ1YaHOk8Ri1JlptWkCRl/ZTNQMMjMhIdgiooLbH1CZUp1Qm7ChqidsVTEt 5t6p9Y8hJs44cmwscLL968NvT1sQbQsEp8/wF/327dvzyyHga7iWbXxPW/s7ItNWLJ0Z3G4Y 9Pfn18Pi/vnp8PL8+AhO+KhCBxT+9Pnb88PTwVeruBdwwm2Uf7Ju7PT6x8Ph/nd65GAUvYV/ hUlXhlO81lWje3E794tSYXk6NAYxI/im4uQpxkKCvIJtWdXOFyG6oCDzO+B3u1OnH6Arlfhl hQiiVyU3Hz6cUOU+mKYrk5B2MElAoNaw1yx8Ndk1tUaLj2fU8D0Cph7s7uxvE5zE4I6Z6l1r dq2NnVKz4HnzckmXMQ5IYUR5nKGRWJbtZ1l6WLqSoTffAyQupU0zvpkQV3337eEzVng6Ahup anoyHz7uqMHTSrc7uiTE73xJF774oyx5eXbkROqdRfEeRViT40bnSc+C/M/9/dvh7tfHvf1d vIXNLx9eF+8X/Ovb411kzmABvDT4oGQcEj66HHNvZcCXjS0OTgI+QFlx8OH9X6nqxtJpLar4 ERpDYokxyUYp/KoQnDp8C9ZFE8/jX3/q6j6FCoLAJR9kU7k//PH88i+06okYIDgga/LnEJpS eC8v8As4gC0jlmzXnC4wAAqnRoVW/EEvjDlLVgc/3IZ5NzAYC6a1yG8CiO1SrW4s+4F3Iavg kRFgxO/OhqZp7HIE9SFMSqaYwPaAT1A4JNNq41d7str7kv5HUovMz36573YDw3av5oI92far k7PTT1Rbu9z4Y3sA6QBjnIOn9E0URSCO4ZMutRIVVcDJDPMfauILCVaBhd01j8OaGfM1VRVl L4vq/xm7uue2cV3/r/hxd+b01pK/78x5oCnJZqOvirKt9EWTJu42c9Kkk7hnu//9JUhKIinQ uQ+7jQGIpEiKBEDgxygyQ8rhp1j4qHXAES6sKkiJZ1uX+wJ/dRbHMfTWwoCSGGhtnuo/JBgD g/Au03VoSAKah+mNyQjty7W+kivR7RRvfJRDviov0iOeSiJmJJFZG2ZVA7X783j12TannscR V/QQQqFeG2eKt01ZfiPXCjwWpUzRc0AzBKUqjeW1SiTekRne0Jh8jaYiF5DK3tsNllpXsE8d uBXA9fDb1oZw2H5O7QUqSYuTxgi0F9fJ5fymQZ+MoIOsImLzwpw29l4tfoqd7oQLtltqHMYA YXfqfTckF6rtfx/vz5PI3b1B8ohUdGwoGvYJPJ6qBwyS+LLdEihJKaRKAqIKeu4MQsLY5k6z q1Hhn0j+pWXiL2NjLwHxzBWkWJdJIgrfMRYyM10kma5WU4Sk9d8RGYvDBy5LGPybeFyTQiJr nf62uGVMbqRFgeJxyDH5RODY0q1Yk1tHP0dlriKcgGCccddsQlvpaeLNkUAG+6iTy7QZE2su /h+MXqdIvMBnahpCBLxKUkD3jSoi3VfBXiMCSZ2Xl/uXJ0vZYRUe8swqcJ0bS7woTWZ22WUi 6pOU1LC2kC6Scg+upBSUGSUVfuIgBUZ2u6r8+dsreOU+gL06eVAfvGNeclaNOUbhdX3bgszI Mn55/ktoy2+9MSypydXScmEIJQx/jRPLt0UeuXzN1YGNwB16m2cAkkipTRX2n004ptylMKek ralWA1pLHFmuKEGrEhhqbFcU8nlcOuJAajOqNTT0jTspyLYvEMFBbM8it/w9votuAejOz4m8 T3FhoXggabe1EZekHJ1Pv86Xl5fL9/FYD884kePwtuaGJH5/psT6vadsW3NlZlvUA6lqjCZa Xal9Zszaz1FyXtww4val5m0px6JBDAlS72c3nqc9KFiGxOzE0Gg3Q2TcaUPjMpSuOhGrjuyW DaaCGyJZdRzVdtxb2Z2IEBBa7rhDeF3FJPPnMMO7pxamTkeBpDWDKn61tjktSTaOoiYxY+xp sgMd2tohlEYeyCS8DI/c7B6DPSJOC4AAP5EKQLT5uOyWxoB3o/GR2iI/YEIQoyXUzhRwNsS3 a6EhWULywANypJgN/TE0SXkPPNHZhpxf6e6FVCYLSaHmCAU/7F8SdpQREH7PPqnxGuwotpUM XGtXlg3mH+tYcMYskTAk+qXEd5+a+4KgYh6e5IaZOrf63amPg69IkVleHrC1TbN3pbnogMa+ sRZcRdE5zj7zBSR8MU2UsMRWXVhyVRgKtJY2STxwAwQ+T6j1Q5hOO1bbUdtAzlH9CzjWlw4E vo+kda9NlLvXSfJ4fgL8tB8/fj0/3ktv2OQPIfqnXvqNNR8KyGIG/ha3CZxhOydwynwxn7vi ktiyEEXh7flZ7LRekGczhAQluVXIzAYF+R7ijgZbKot9ncjrMBD/EqcnNVXX3dkonAAskT3V WGIQ0lN9yHP7gKKjAcgfbvMDcDPokihX2LJyOcLUX6hfTDOwvE1vxK3MtB4Y+izEMRgHdPzH e02eFGNV96Bg8vZxWqI7g6imzsrEweFTtDaDzCg8Skbm/6ZOPtbgxKlUtULPy2SMkATuRapP hGVeEOuUuH+G5SMwF7H4VaSXMGBL+3IULJd6XfOdUAGh2qfplqDBcxJFCzxkhlN38Cym4FMw ufjUUPaP2Ck9WmhvIFWofaTYsH/qQsT2kRX24Se/5UbiMz4HO/Du8nDFGDOl4HDRQZoXG5fl q1a/7S9M03hpAq5pYpaZq3z3tAn0Dqcs8jaOCBCWE3NOAEsmcvYgrUMXAlqODUXcxwOMVkmx FJYQ9p7ZZ2rin3yEASaxh1VOGb651tiEjmqjP4rE/Bvc8XVt5aULIqTpAPqORVRB/ihLNCsb ETWSn0WDBBFL5Rtotn0v6NY4QCUOXx79OjLaz2nRIGB2fMGMEd6rUOls3WYgGOcbktT6oig0 e+dJSu/4pFmvVxssqa+TCMK1YaqUeWn90B9gJrqM7OLeAitRLwUn4gm8OXnpCzTJSztiWgMX WUaFxjLKD0JXEj9wQ0cLeVxaHRt8VpyLDbVm5Sz0HAJ+qYgHVhfgksrPLWViCkW4B6urKSJ0 s8TzCjqRg5OYOBKgYo1V0ZFXxVIHKEbtl9U2mjw8vsHR4sPk6/n+7tfbeSK/aaGrv7xO5FGX euTpfH85P5iD2ff59np/8gY/JO34vp6kkfiK2/KmptHRE2or1HL4mtq4xuFClDr87oSoOGqG 5scsHkEr928tmL5nErIVm4SxqigqdQg1qXax9Ukb5NGIoUIJHQ1q9vh2jyzqcS52Pg73F83S 4zQ0gQKjRbho2qgs7DSBgexRdMWWnt26d4OwbdYSjo9ouRf6AopuAGhUrKDGMiOM2GzU+ZK4 ahrMWBMdvpmFfD41UIbEbpgWHEBuIHgU9nRjP+OLxWzRZsnOPNo2qQOMknjFlSMhwxs1vjKv LO19L/bsFDfCSBnxjbApCarrMp6Gm+nUsBAUJTT8+d1A1oKzsNOUOtZ2H6xWWEpSJyBbsZla gQ/7jC5nCyxSIeLBch2asketr6r8UGzJFhtauT8YpqCwC3UIj1hayGa+Nl9JrADWxmbEDflu oaKhux0qipiUojRStWGwmI6+jTgGzcBwCfdPK45YUULs2iXNVVk6ZqWakZFmuV4t0CHXIpsZ bbBNVrNZVLfrzb6MuRGaQLerYDr6CBTVa5oPXPEdcqF1gyOo35br8++7twl7fru8/voh0cd1 WOzl9e75TXrhnx6fz7Ar3D/+hD/NXqohbu7KxILVRWu78jHydDm/3k2Sckcm3x5ff/wNYWoP L38/P73cPUzUBWrDGkXAv0zAbCrNQ2mV8Gea0j2pNWMPB2rdmMnEarIeM6mu6VOHy/lpInRB qQAru7A/baAsQchHsRiPqUNBewiR8zHp3esDVo1X/uVnjwHGL3eX8yQb0t7+oAXP/nSNXGhf X9ww9+geX4dok0oYTC+TJIfOnCp8zj0h5lzt1a0lEjzXTBVSP5Ri+HS+E1rG21kY6i/3chJK p83Hx4cz/Pc/l98XCHGafD8//fz4+PztZfLyPBEFKB++idUXxW0jTCAJEWDV1aoDXG4Thbpg xQMC6s/oi+7RNQWXiyewZVqwdhaGhKK0jjjCLvGTSKNSihlMBl+UgeojgiWTIfHmSvRyscGa l4DInD2l6/Ufhujl+++PP8XT3SL58euvv749/nb7fQRF1GvP41tAOk01i5bzqY8uVvz9CKbT eDnHbhgLSMs4Sf5txK0ar/NmLPpI4Xbcu6LAPgQYx0UV+c7HdAlFkmwLPCyzExn6a/y0WKGX aHxmr59+0efk+FuPwC5lzFJMl8J4QRgpCxbNDGsJyaLVHD+R6SRqxpoSe1aOIm4r9UCzFUvS +FrxoFuF2BQBnctHX6DNAQ6233YC+7KeLZfjIj9Jj2qOlclpEHrSwPvZL7rn2jSt18EqRGd4 vQ6D2fVHw6DBHs35ejUPFleeLSMaTsVkaBXywrjVHT+PT1dfjx9PKLxfz2csU0AY40eZGJGr b8hTupnG2JjUVSa0YKzUIyPrkDYeI71/nq6XdGpHWMtForh8P7/6lgl1xPByOf+v0FTE7v7y bSLExdZ19/T2MoG8skehzbz9PN8/3j11iNtfX0T5P+9e736cL05sQdeaudxXr3UjfKJz7NuN ahqGqzXWE/t6uVhOsd24k/gcLRdYoYdMdM8q9C0v3YIK2NXdafpoLZXA1mIbNpyXhEUyq9PY hEHK/uVAmAJFh9k5VGejko3RrVAwqX8IjfU//5pc7n6e/zWh0QehVP85Xuu5Nf/pvlJU3H/f sQuORx10ZVbofK9aYRtG+K0IXb07tDXo1aqyF3p71ukdCncME+sYV9LTYrdzQEskHQKVhHVw m4/dB7Jn684YeHOGmEPi8XhQ24T2ZLsmJv8veb6X4pBmjpQJdKFfin8QhnPBXE+HhBaAwvVW VpWelqbFSV77jB/lyLmKO5kkr+CRxI5hvlx9u73gucrVoEW44qCvOtoWcG1EVRVm/JBgaY/o 0AggfimLCCtLMkt5OqMWpD7h5m3y9+Plu5B//iB0p8mz0MT/e548wlVU3+7uLdtPFkL26FFt z0OjCSWDxkdMl5a8z0XFDL+5LE10Jg2EBuOQicTIIns6fnvOUtR4l7xBMYT3vHc74P7X2+Xl x0Re12i8fOeQiMT0ViuVXeVnjo+2ak8zt8dsm5nLKRgLaFukmGEVw9gx5nZEdKKj5giavFID v3WyFxnlDfScoz+7TspkWNC15ORHp33g1WAmhHQ3Rsi4oUuDZB1PTgGHlDmUIxuPy5HVMbet W2UUvNvpxvEIzLYUt9cUM0MT+CSrqgtLS1ZUqTx7n6lLsQ03o6fGirXFdTTmnjhDiQuMuHSJ t6V9KCapcUKqUeOUFu3vI+CvvI0HbhPmaKkNpipKrlKVndb16rFJRHR5Sc9IJRZ6DApBsoUG AtEIo8dyln8iM8xFqthKD3faIL5H++tVVGHzWaqPpCo9fOUKw/LjaO6SDjH4/BY9MpPsiI4e uWLCaD6meygWoIxVkK/Ex8WydLnGPM6a686lAQXUoir70KE637ek6VDc8ffNig8vz0//uN+4 eS1I90lNbbtZTQw9VM64y4G90mswmr6XH/tD1COJj/M5clvVm/9oJ7RH+6JgK+bl293T09e7 +/9MPk6ezn/d3f+DZth2CoVn8xz5euQDY9dZhp/S6aMq6rv1NzlwB4lG+aLiOJ4Es8188kci DK2T+O/PseGRsCqGMM2hcR2lLazgsZ7Mt+Y9vT05L7j9MoSyHOapdoN6c4L8wSL50VKD8uO1 BH/gVlWJBpQJVrlN+zAn9vzz18VrislAQsubBYRRaLXFTBIAXbODXxUH0oasKD9FVuCoN1aw i+JkBG6N0RzZ3MPb+fUJwNp6zerNaS3EUPHYOdK3ORCYdsA2EkeM0yqO87b5dzAN59dlbv+9 Wq5tkU/FLdqK+OhkGY/4eHIDjJMvi0k9eRPfStehWWdHEzonHrphCJSLxRo/XneENkjfDSL1 zRZvwuc6mK7wpc+QCQNPGEMvE+nUuWq5xo/Jesn05sYTUNCLQDDs+xJy7npyE3vBmpLlPMA1 GFNoPQ/e6WY18d95t2w9C2fvy8zekclIs5otNu8IUXzNGgTKKgjxoOheJo9PtSe0pJcpylhC mL9THScZP3gSoYaB07e/XAsqG0qsixM5ETxNe5A65O/OKEgxx+/CMCbBTHxF7wxwnYVCuTnQ vS95fJA8pfPp7J2vpqnfbTklZRB4/J+90JZiQRfGsjis4vKnWG1DhCTsodK+P7vnbG8xY2jg p8WOiX/LEn+c3+ZEKFEU26IQqZZnNjZ0LzKyXQaWhPDtrtNF2gCXhtQxxZ08RiNiOGdm2EZt 1CXnAPPU5CIfjQQSuFoB2oI/f8zk39f7CushocMzO/xe0VWCO7T5ysuLSbTYrDAPi+LTW1IS t0boVDe63eZ4goscIfR1xAS2giT1y9SsQd4RZuDWcxuO6jUaBNOS+OfxkTdNQ0avaKdm6I7u Zyr67gP7wDGXfa+NcI0XrOkdpSU5Ee+DMWYRRo0sd1lPp8W2wlxFvcAuCbHqdxUrPeTWDgIf eAdA/c8K3NXei8kcIOLBDeulOIuE1p77zmV7uTqLsLk11CbB55A3UYw2NDFJeuaJVBUzHbI9 Bw6/0tTO5x4aDYiIRYWNty2ztZCRBx6kETuX0PcvemKR+HG9N77s43x/uDre0XaDjSvJYmoC 7w71HqptsatI0qDNInwx9aRd9TKgc/tyA3qhBv8ujSFJb8S8EaqqlV2nviMJUYuiZSg2rHvK Jhje0CCCK7OErDozetDkr9dltl5OG5xLotV6tbnGs/MDLH4lzJjgCr/OIOTJzE602AehBLOG monEJn97CINpYIUCmGy4YQuuGWQ0X89s9ReTvl3TOtuJNdRb3m1d83LkCPBKzsfxn4iMLzvK lI3IZjrDdi5LCBblqsC7ak+yku+ZvzlxXOO+YktoR1I0V3AsNOzViEhy+MRqfsCZu6KIbC+W yWUpE4P+Xht2h9y8htxq3k2dhEG48nBT4vmK4rTwtelEaJG1p7VzRH9FFtcZTDlhIgXB2l4O LD4Va5PHFWrJZTwIcNPAEovThPA2Y+V70yyTP/A+YlmzPKT6FjV8+PK48Ri/ViU3qwBzVFtT uqald8mL88y+8csaSkAVrxfNdInz5d8VJLRc4Z+Yr+5uxcLHP6rXq6b5f8yAU7ZZNd7vALhT LGjGFQpCvJmS5107Id4ZcM0KztDLuOypGMxW69mVvmK1deJh8TmVS4Vn3RLscDptnHz1scT8 GnPle0vNbhmKBWRKVllrwwaaTM7SGN3fbaGRMm2x6yBEz2ZsoSypPbs4b9bLha8bSr5cTFee Hf5LXC/D0DsVvkhd8r3+KfaZ2o/tgrSNzjg226uMzZ2RlSQ73xAownxyKImZZtBR3Jkk6WGk 47td+SAYUUKXMrO0AU3Dl1PFXFjfpPSj7u9eH2TkOPtYTNyoFLvBSCqdIyF/tmw9nYcuUfzf ticVmdbrkK6CqUsvSaW8pjaVMst5oqgp2yrqcE4t6Tg2luLpcHj0OUHMHBRgR0L0BEhdkyi3 1wUkjikpOa6bH6QQ0nqwFux+7ChtzheLNUJP5wgxzg7B9CZAOEmmNnZ1iPX97vXu/gLgsW7S U11bRzpHHxjkZt2WtX01l76QBcieASKpGcFjzHsJITfKTrmlKYk8LtKsaIhKX0jRHpV8nhF9 f/0wTLc5Bb861kLNMnE2O1q7M8YmL74UNn4747giKzSGKPUgEbQ7jnkYZSJyK+/9NE0QSeVW kF/v91VjpqlRfLSu6xO/bxRBxdCcXyH0coTto4dHJghbdqtmrNX9TmOiqKCsYkpquE27T5xB 5FQSrPW9dKwEBhIdEkNIkHhhwixYjbBiIs1azQNNkxE3pMI5edUCjo5x25/JreC67CzuRdAX 6m5s964UnSCRQPntEUp7Vzjhnkg7syPwaGTrDepwvcasGVMota71MTmZfReoxRIf3bXqjRP8 0Y6Vvzx/gEIERc5QGaaKZJ3poqDDUlxP1BL2Jm8QvTPpk3kxoqZxSvOm9JC9JXEaLBkHnRtt Rc++8qClkGiumHjbuIoIUqPe+T7VZGcDa+F8o+Vuz3ok2+1tSbjnbhvryffmMmQ6vifTCJs7 b8RuPJJ0qqzGvQTbuG9ggCe+b4XIHYyqrUpMHdZM8fmJz0L3rvvkwOzqvjYzhUUqYUDYjtEi tfEhvELvFwwn9crn75Ym8zvqKoW9Al4et4brW4Dfymv0godK+pmHHk1LbBKVpe/YXyeW+18C LvgSKmgepWY1klqSnGnQDGvbHXgcrmTBD0qllAo5Ub7yhKC2hZTjbFQBIFD7xE+kpvvIPFZQ bYILI4rEgOnYn0aXNfYkhZLPCrVN95UPfN+93IOEupkbeXQXOwhpiMyR4au2KeHFyDLeokTx oI4qVblTR2obErqabZa4bQMHbGLio6nvRX5rx3JnJ+dejk5SXkTgnDeVdL2aLX+7AGGcdpSh deR0DWYHrq7HhoXkO3VpaXcVW/eB0R30kkNg3HU1KOpYDE71pP8aZzFByWM7eNPk54djUaPJ 5yCVOx40ulN14UsF3fXVeQUoenADnGMNF6VURXM7fhNez2ZfynDu5zjO/Til+vLnvnbxtbur nOaIjSW9ddbIjiZhTK48o3B3VJxSSJEwMrNZ6s7CcLg3zlgiBFUGGcDl09ZyE1KN7I+tN8CE y/Gs0DJBzA5N16zs19Pl8efT+TekaIkm0u+PP9F2io1wqyJ+RJFpGucmGL0udPQ1DPTsgEdP dBJpTeezKR4i1MmUlGwWc9x7bcv89vQGSIi+HTc8Sxta2tG/wNLgXJ4bbswj836MydNfL6+P l+8/3pzuS+GSp9quGoglTTAiMQvtnTOQHe8kz5V0Ihoh6P4LZKzCWWBFqPfE5QwhNi4xi1aL JUZr+Xy9DkecdWA6sOSa45wXSBr3BEUoZuab3RBJPLeLH646swrRZNHKzRpzSMuxhOzJzcJ9 VJCXMyzuWjM3y8ZuggqmtgnqzEvhNUAQMAayDMXRbHz7kFw6/nm7nH9MvgKWmHp08gckTD79 Mzn/+Hp+eDg/TD5qqQ/CMIJUyz/toaewINk7GJCjmLNdLuEb7E3FYf4fY1fSHDeOrP+KjjOH fsOdrIMP4FJVbHEzwaqifGGobbVbMbblkO2YmX//kAAXLAmqD14qvyR2JBJAIhN7dKSx0MoW 60pPy+IsXGNLycPQEzScInAWJ8/RJlRRF1ej73VLRglquUmdmgabe4jZOEdGYhDU8wIg9ve+ NiBoWQ9FptLEfuXdGn3m59PrN7aJZdC/xHR+/PT4/ScerYlnLfyGWZtx8StWwTmtpfYDaSnT kdejnvml7loEabSpQwmaSPF0P2tNJEu1mldahLSVOPtk2RkG4FzFas++sYC4fIOFDXx864ha 46tOA89U/aGsw+LknJbaM4iN/OUZ3L8oYR3gJSlblNECdaoXECHfh46l8/Lx37rgL77xaEXd +QEchIDtty3Uzd3PlztwBcK6lw2rT8/gCYSNNZ7qj/9TXikM3eSGSTJ7XurQlyown0QEx5nQ HjW5wvUE1Z/c/BG4eQKTB/m4EboI+R7CM1GNtr2SENrL09eX1//dfX38/p3JPz7PkenCvwSf C9z1o6VC81mEPFYFuc47bP4IcOw856CVMb+RLjXSgdNDtM85ehzgH8fFlhm54ohgEnCPNGCp mjZyWvXQjEasOJmhZv0uh0VfuiJTdwmcfB2TEDc057AQceaAZmP4t7nL4LJJ6zalWWI3SUa9 WkMSG0WxKRAL6LsudoDJ4Rt1oyxIZIWLl+npv9/ZrDJLNT8IMIeKoFtCHM4s6pm2aHIwNHes Xc9hb8Q/80Y9O5WJ68O+tepM+UzCWG/hoSszL+GXcWKaHXOzRbS69+WHtsFPBcSuoX9g4ghO qSwqguDi7ydtpeXxXgbZvY4Y051/CHyDmDCtNDRarc/CIUywd4+i6sIA3fhMXE4n+AZl4/BQ C64NP7iembRppK6Mzjrx5ZuUhXg4BO8kNzz7I3ZVxNWs0yFBPdGI8VVNZXvWMu4QmcJ9skKY PRfzBSNaPc98zbOKmJltTq5gs4p8eJM2Djd3EgKI19f97T/P82aofvzxU6kv46wJhfBs8Fak HZU0ZiSnXiDvVWTEvdUYMItXOXv65VHxlMWYhWYznIteTUTQqXZctwJQHgcXpCoP/jpG4UG9 v6ipRNZCeG9+7LvWj/23P06QVmFAHDkWILECLg4khWxlsyLpey/Wn3fyEL3kij7t5RjbGcjR 2STiRKgfex6O6YcgOgb/HWy3GTJzNWTeAfUWKXPNqeFlWRduK4acPPdF2raD5vBt5kaxNdpx Vz2YNRf0vZjCORGsmPSY9R6SZxBfjk1F6QBQiEbxrXISCX7KbSnOqSAGxQtCsiE5BCExEX2s yvTERnctdNVF1YxUxYnpilf8adrCZHnassCmTd6C0BQb7bAdOUFHpRT7CObOiDssW6oDlr9S 9blaPK79IlGZdnS8FNV0IhclmOWcENiTxk6AtOSMeCYyL5+MI5cDoM51Mnt5QZjqEzqRHCli SbEfQ9fkZyVIDrI91wLMBZCbboFACfHinSbXZcWWWUNOljuENXGmq0Qh5tJu4ciLgTuT560X RGGEZzXGcXTAR9zCxMZA4Ib4Ea7Cc8CNfWUeL4zf5IlRJVDiYEqaY3YFrVM/iJEu4trbAe2j WW/DemkZEHy0CnEcILN5Masx8+0HJkekw9bzrZalMf85XUvtJg6I8/EH2ymbdg/CfxBikDM7 Hs7jwJUWQYWeYPQaHkDYAEWFViFcG1Z5sCfRCoeP53zwAtTvMskHq58IlQebGQpH5GE5MyB2 bECIADSLI6z57pOhUCzDFrrr4MCR1G541sXm5k+6qwpaZ1gJUtfBG4sbC+21wzB2LvZlTiNv z7s1+KvGKp0XVcWmYW0iZXjP9iEpUm22yXfCIw4k3vGEIaEfhxQr+GznbX3WvyZBszPqW2hl GJjifxnIUFCzAKcqdBOK1JIBnoMCTHEgWIEZgJuQzDA/ByGNmeK5PEeujwzVMq1JgfVAWnfF iNBZDppk2jotxEcWHMzCKN4p+XxOo1F/zwIPS5CN+t71dsdcVTYFORVmmkIyI3OTAwekieC6 0Q2R8QuA5+JJBZ6HFp1DAb5/U3ii3dpxDnQywvIdORG2JCos7sEsNweixJbsAV+LJZYo8vdk OOfAu5RD4V6dOccBGSUM8N0Y67k663yxWhn5DVkUYg+D1hau5TvOjRrjVGwQ1DFSWEZFltWq TtCZA64n9odKjd5QSjBaBnSYs2UUL8MB26xLcOj5geXL0LPcwas8e3UQ9jRIgQEIvBjLuRky cbJS6v4ATdZsYEN+r4bAEceoasMgtpfbE8nAcXAQDYufph4kqdLVhnHdzFlrd1KIZuTh5St7 P/RQT9Zb+3tsZxNZxCE6VgUAthGXigwton7AbiNx0RLN4gk3ypKYPCcO90cOTPsgCPa1O9iI RRZ/PKso6GjAdot7vXjJ8oODL24AeeiB/MLxoYpcBxu/txpfrOl5wBuPARb3MBKH/9+3OLK9 AYEYNqz6Wl24sb+/CBRMp2I73J0MGIfnOogYZUB08xxkoaU1zYK4RgX5gh32+k8wpT62fDDd Loy4LXWNqjUcx8UMh/z9nQ0dBvrWUGY6cBTt6wRMRXW9JE/QO4ONiboOppAwIE48dGEnrNWT XRFRNkS5uJTp6tGRhPjeGyN1yFDPJSt8rjM8mM5Qd+6uyOUMyPjidLQNGBI4e00ADLgeAXHB s+6iq7cYX5REtvcLM8/g4mFvN4bEw7bAt4RtY9wcKx9AB9f2VETi8fb2N5wDaVJOR2WFQEC+ gWH4ftJVnIQDsnESUKS5y95ANivPmO20ylKcke3ielG2a2W1zhGwtjTOiE224d5x0Wt5roso vgMEAWyQ+lPRwMO2+WgddsXkYaqpHNF4Yec7L3v6SujGhXbrS+5vA7x5qs6iFo4lqtapBc/r RTfdSksgGuyLIyl7EQHzb3/Cg5lyzyo7dZE/mO9fqqrNiBbwcmG3FwVhXGtpSyklzYn/tVun v10XvA4zE4Qu3QaIkUlRg5pVov3O/XSbowvO2CPPpIuATLwgWUXkYyWB0Dab8oHJ75YedQM/ hWFLeZs/jMMPnBEMh16/Ko8ANysrwbJ8jltiiTJmZ4xrqTZ4nADTqwnkLmtTopoayhdB9lTM 1xULxXBzsgJNeyMP7QW/j1u5xDOTid9+FQ1MPlwArx9w8yXj8Pb2+PPjX59ePls9VtL2OCDV mLvfBIQ5wD5ZvNgGz/kZUV0u33IygLcHvCriem1JBWnu+dGVlP368Yey7OHmdefrjOQ8ki5S 2xtC7JtwiNwEzW25BMJyW5ngtMEfx70ikez9BRzVsiaRkyf5Vbits7YVqcoaDKt3GWKmyekM M1yk2cQ2WMGc80zlR6NJoRJpF7K9B1O2WnV+tHlhSZyyxI/l0GUe2njFpW93K1emMcsQTxsO G2kvzzcIFqYUuIx8xyloqlEL0NC1hmYjtLVlNCSx6x21RBhRT+Pc7Q8D4Z7bWtnZlBUvBD+f cH21EM1V74zIEVXD+4KpMI7WpWkWe4FGZGtIqLHV4IdEGLOZiB+nsdkaoMziJVm0Lf0LRk/i +Gj/6jCjW/Y1yc4f9HRg1BUd24T5ezNuCyiofd6UB8c3WlGCs9gBaYCWsgbHXJ47pynM+ij5 7Y/HH0+fNhkMIQNVJ93ZTknrcmRr1C1X5edgiPguK/9GRuVuXizdbguluKbTvT79fP769PLr 593pha0b317UxXgV+F1fgAEwW9VAT8GGIZuOXUtpmVZr0ED68u354487+vzl+ePLt7v08eO/ v395VINTUtSVYppBhDQtufT15fHTx5evInrTn88f70idSjE34KNtCPEkeGAZHhVuSwvDMTLT YzTyHFrR5KfHilDFjk3mP7HRPGW1JdSzzIi/KxAsc/iP7ZHVn7++fQTD68WnuHF/Wx9zTUHj FC2iBNBMAxVOpX4sP7ZZaOqlBdOwMmGpit618I/I4CWxgxRGuMKD+AFZq2hnG3iuMsvNG/Bw 96YOakzCk+CGI1qewphEd/EJTdPD8wXMEoZXkluljPpHQA09q0O7lQU7vl7AyMNSjfBj/Rl2 0YsQAOHGchy1Ws9ErNrnMgqYaIM6IimeB3j7QctM2toDjSWkPXCDtIT0fX8h/f3eUxnwUVPK /tSAQGXCtpeAcilLuoJM2Xm4oa5o19LMjiSQcgJii6GgcSmTHjBuO5zVba4a3AB0zzYcFeqR loHcgEg+6t2IIUKMHGPEceObMMbPWGeGOMav2zc4ibQBMpvtINQkMKnJQXUwtpI9/IxyxS0X hBuOHV5ydIjE2az6TdEcPTet7bOPrVwXS5KmydZCmRTjr5Wqm1ddstQNHMfmHZNnv1oqy8SB jpMWv1TQQwc1e+WgsDTXUrpPnEQjiX2NSqRFhshfWgZxpLsI4UAdqq8qV6Ldjz9nuX9I2ODE jl5FCuoTb5KOIdJ8apJD3dka13xHAtShnEjt++EIru4I6kwY2FYzf+VjMLGzXAbNaVf1xQp3 pGLbF+yUp6OR64SqZ0NuRWaxPVq811kKv7wfULtNskvTqZ5rzB2oC6st+ppDwsWrBzO9xEgP 6EmEW/WtDAf09FOCPSQ3RsWWLoYxWepj5+HLDt4c2wtCLprsZkDkBLuz+Va5XuwjiVa1H/rG YNrc5diqbD4P4WT8XRlA/I2UptGItzIoEdVzaBBrUfcU/FaH+O3JAqreggV1V7Jz2D6pGBxY vLrOsO/anIcuDKE26OfDJsUjwloU6a5dvqJec12JppW5wXEsR/Cf1VaDFkR3YwHnJRfhUYde assNwcYOx8X8tPjvfjDrFLvFnLWJGKk51/8T9W2TCsLmYDd1kof+IUHTbogIq2ciYltgyZVv PvbzNPYiEsaV/N3PTXVexSLcy6LC5FmEt8a0X48jadiOTJ7UG2a4g1mRklYHH/WFq/BEXuwS PAVY/2L8AlZjeqsluFk6LvZVJnQXpLHYxmElZOVbuYARe4y9G9t4QIEO5aVTgZIoOFgh2cZI hQ741OJQ6Fmh2Mdruyjpb1Vk0dltSeCWRxLTvC/UHNIquHBzjOXAwAQ1qpB4mNqu+JxVEM+3 IQe0OU2NXcKOlw+FYkUjYdckcfDO41BiEUMcPOxL1k5+4beR1wsoDDR0eQlSNXoJ0PX6DaJe 3REHbWWAKN4BNKyTOIrxmjPVKnQj1DmzwrRotSjm+XibCz0V73zTbbOOJZG1yFz7fUNEcDbX f0uo7bzjMJjQTjHVWQlb3/iYygIYQ2CAruUoiNBpNrWpyEvCX8ppPj1EoMnXx+9/wdEs4t2S nDAbbHESchrUq4ATBM1N0XYEjN7KITsXfYvpsrn8pJX9AC/85ZTL3jGAmndMTx9Nt1Mc429a 6hqnsi1vdYSXeyp8X9PZDZNJP6YodEzBeeF6x46B7bXo+bX9OyaAZLhqST6xzsghaGUNXmjU z4dBK/6pqCd+Bm0pow3bvK6Anvz07ePLp6fXu5fXu7+evnxn/wMHPtLhMHwi/HnFjvqAd0Fo WbkWG8yFBRyQDkzDOaDOXIGrJ3mht5igcS23G7RakDo/dReMNukjYyZn5T1K35JXij2jJ3CT yUcKcqdOsu7uH+TXp+eXu+yle335+PTjx8vrPyEC+5/Pn3+9PsKxuzxn5oThKNHSDk17uRZE qthMmM/1Q5S8mLe885GkppRk95UaroH3y0E27lsoEATuvAoEBM9IN1z6LWq92teco615lDbB Yqkn59yanjfSp9ev/3pm9Lv86Y9fnz8/f/ustx3/6vZmussdiE6nt+nIL/5Fq7Xp70Um24qZ jMI3Yk7Q1DaxYRayam9TVVyZLOT+sLnnE+y4Ry3zdE0r0txPxZXk2vy/nopan8i303HEaEzW ZPpkOtUklFWemRYhND9SrZOBfMkxqxc+nnXBWZ/IyTNTyMq+v9DpfVFjp6x8vmekBwuMc15r U/i9GvINSGmbnXHnSrwdhAdXNoMteXWkKVaTo/z5x/cvj/+76x6/PX3RZB9nnKprro0TQadl 3aleZTesbJq2As+ATnz4kOHWmhv373nJ9iRO7NSFo4euMZjZ34S24Ez2eh1d5+j4QaN35FxA EZZzolHhn4mHl1RiSgjZzxos4rqpeu86bu/SUT38NdioE/iDWxWoQSzvxr7MTwVW8hVRuqhc ogzfpa/Pnz4/ab0lIueVI/vPGCeqQTFf8y91ylWOnOAXAXwpZp09QdDTHH3Hz8c4BBA4lx2Y ZOfdCBujUzGlSehc/el4U+sDq183NH4QGV0ES9zU0SSSfToAxBZV9qdkgKMD5cHxtGk/R1wX p4Saps7xchqOXYAeqC5rNMmvcega3blC6HUDb9N1xqpNLcgTOafiBNUmP/qsO2kr+bmkJfsr rTM91XqkR8wQQNSyechV4z3enTzYg+Ub8Ay2ejvmI+34+vj16e6PX3/+yZShXA86cZTekS6K GlfbJHIKIejhzaBCa9qhPD4opFy+O2K/uY3ftaDI+guJsj/Hsqp6tmgZQNZ2D6woxABKiFqY Vmqw0hnrIfJ2ORYVWNlP6QPqo57x0QeK5wwAmjMAtpy7vr2WbNhD2Hj289LUpOsKOIgucCEJ 9W77ojw1bFay7QpmvLqUsu2o2sTFkakLLHXZSRkwMyEgXMnJ2dQELmrRcO/QPaYiBd+A+ZtQ tNWsh7Li1YfAjujg+mtxrmpYZED/8OVSSbCrPf0365ZjO+Ul3G41xrh4SIvec+SFQaYaw094 qJfbgzAxxNobN4Tig4sOVpC1sCX0NvQ6DHPrlztYg7+vhx3ZSR2Da+BqtePdXLN6gETZgCwJ QtJvRTbAdt6/ceDDpS+vxCAg2XCy3XXNwrFmgheljAO196sicUL5JR6MCNKzyQ3hihrVyRaf EeDLyloEvkOzoWR4cD38Mkeglnnsqx3mzwNV/piSKxMulu9LrcdLOvmqPrpQXeyYGQZgSTR2 sGbMSxCAIL+yI656zozj7Bq7TNn0sVWzKVomIEt1Ct4/9Kqc8vPjaBDY/jGTPasvZOUOC8rS tnnbuiptYDqF2sID07S0cAVcGmCxFbjY8fXhU+uL3UxjSyypYTejqO8KmF3ogHrOZ6lw02Il XWFsXI0I8TRqFVjIFnFhWC7AdEnZBmgcghDVvyHNzeOK2un83s42JuqCTaKmrS0Dtk5Zl2gS aaZxm8STMfwXFL/phKWqb0lOz0Vh9Oqlne7dA3rlxWeFvp8BImUi08HON3kzKjEUV3k0VVlu qjFAzCpC6RzlQs4JMCzmjpGyloCBb75azUItt+9mojYhuLF0t/1ymReFG8adqOx+3dXJIXCn W1XkWPkoYTsqgiEk75Ikciz5Ahhjo3njwRySraVCnFWtaa/3slg7R77qdkQDMdcOEkuXhCFe IFDW8YbAzFek0hoXxQiTzbJ2K9g19Jy46vA80jxy0YkiFaPPxqyRDmeYmkTB2YtEYeK4xdVK 9WiEbXVb9Re4KoGoAkzaoADXyVAkqy6DJ4eyoO2lybWfU0v1CCAqHey/2fQsZScwSipNrkca AVKXqR9M51tedCqpJ7e6zEuV+DvJ7k3KHFdTuQegopRwOK8Sa7YB6gEyijQT136WyExQXU5l Y/HuPfPxiiKDgVewR5oBYlSDqSpbHdteK09NRpBROX3nezJ9FrETW5iYkNGaB1SV6aildC36 tKXFrMfoFdxQPbqSwmZTf3kSwnWpVjcwfz+ll6OeIS3eX+Cs1dZSdXcJHHfSQoVBb3aVPynu wGdqMGk7O5kb8rPWavlYZ5IrN5p5kuwQT3BclOm5CoN91OEGQ2+UmomJ0MKq9wBOTqacdjrR jUyqcJCvlIPk4KrdVm2Su4kbYYY7CxokRooVdXFPFQB+GNzICY1vPgyeb9kTrrhnccLB8Kwu E99DnRYsqOKWquEmbOor94UWqbSCulGSGDTtCJG3eRbhx7MAni6Uayey9jHTi3Hoi7rQk2NI TfANFu9LiAKmR4zCOSY62Lv4d/Lhw07Lw+ygxBIuthHhMg/eOI8FS+UXJqzNOSaHpxCiV97t zKPcHOFGD6TkZm8Pju5MYD5nMmMm0Yx0hUqDRj0yxV2TOzWXrGXTkKwqEAgdAJoN9zKvEtTD lZhe2p51pga2sPYCL8MA9czJUVqe9RWCbVLLscNo/GBLW5bJJUlcx6R5CE2fieTmIfLAt8/n FELBq2lw0v8z9iTLjeNK3ucrdJvuiKlp7ZYO7wCRlIgStyIgWa6LwmWrXIq2LY8sx2u/r59M gAsAJtR9spWZWIg1kavSodd5tMxtzfqDvrOt4Uyw06riUtzdwQuBvCMUxn/8wLqe+QYXkNPd rnPQoKtpFt1Wh7RT3WQyunLYKf+rjuTcppG7JZlJBK8eVibMnZiVir/i9iRhd0h6taIxUdGY rogMxYK7Q9seWyVS7rt2oiDORyuXnmchX1Eseot0sig38PDrlTNUF7x2DusqyGS02NsmnbVV rAJfOY4qis41E2ViMCIfbi3WOWPhvpqPunfYfErCNJNsY+rU2yajqleuVsKdXv/70vt5Oj8d Lpja5f7xsffj4/h8+XJ87f08nl9Qmv2OBD0sVsm1jRADVX3OqQLP+IElRGiA3RWm7ItmO/+2 qQnIZIyAX+flajB0W0vyhDmQ3XQ8HUcdLjwSssxHNJQaUnhedDjXLB1OOrdaEexiT0gYwJYc LlBPekyFTyPSBK/CzTvNKSDpf6OZ8qlzlgkubvoqzqN926IWessXpM4ECVrhoslEczZzwz21 YH3zeD9VCdty4TsEtrvh0Jndu3RpODXH4Rdls+OuS+Yu/Cpru9tLRKj3qbeHSAHvYAW4SqTf qYvob+oq0LtaWYj535NApl4g0DAmFFpTvdYEWkf+t/UIvkqZ5/M1hS8Zq02FIot/QKb1XX/b q6DKNdydqwrP+gOXQ7Gxo+F1bPc9ZVAoc0kfVvBRfzLuYjvyyGZedZoClUGisuDqd6suo25J 6GO1HohFaxl2NW3hugCmCXr5PfrXdGwd+7yMbrnbTg3tPlFDR3Kreaflre9aFrYWrKk8L9fO +bqIFnmHLWs6gvml+6To2iKTDJj51PMxaS43VANupmXnlKPk7IjZ2TbGyuIhKSKvxYMSFRiZ O2MedrOexqY5Afxo0yjAEzJbSUs+DfiSUUO/iZ347lBRtRQ7lowCIyXcP6vudLTRWJCNMVCL 3SsWlJud24IC7pdUKDWFLhz9ggIKT548hdzgyvVUt4iSNc/sbqElsZmsQ8M4/HKARZmHfB3d Cbc/ept7mgzulHWjXRVMwSrPSicIWAv1D0iE5sRLu7YoiZwwBwr6HfrqqWUVpQteOqtmtTRN pxECFch8487j+i6yAbdwg5iuWKqyu9IxakYoDyxDRQWSDkDe8ixmmfs5a+BhOSxmj/4YSZLA l0tFYSPne5Moy7e5A8tXvLtyayj+KIwPbeDmjCCw3KSLJCpYOOygVvNxvwO8jSO0qnEnVil8 03wjIhd+V0cGMaEcA03kS+kOHTyk4DjwLoYULhJez7NVMJPUaxExeal5BnN7wDULWynJzXVl ADtfV0SSYZo/Bwo7Lwk6J1EF3pPWXCYBYUZhonXVFCIKnU1aJND5EhlWF1HylDndFox3RqSy k3SAmOkg4ZlLK3EFwGEbOY1BDUWycYCllfkTt1sZRRkTth1CA/QfJyKF18bX/K5qor3FDLhT 2poWybe0uEEh80LQaR0UNobt7Bw4MgamTrr6ABPaWUUbvM72hRi5K+aWBZ60rQrLOVztlIoV sTuepc7J8D0qc3eUaph/fL/fhXDD5Z3TTIft3Mcb33JmSWGl6yYvfpX627z8N2Kxz+MAeAou ZRJV9m82vqNkR6AKvBczsY/trbchwyhhCeOFhETYMYMVaODFr8/34wOwCsn9J50kWFUW35Ez leWFwu+CiG9JCsTqHJ++5LmKgoWrqOuvpDp4+rcy3HzGjn2q9LTy8+3wJaD6Ku8KYNw3gaCf KNgUnCP4iKUfckiwSQruzfS7uaXGO7UNW+GnV6FW3JYi+gZsgl2kAotwdjOjFM41vrbnMFpa 2KLTBlSrHWfGqYHJFTdOyjijXOX+oGNNpcEfIvwDi/RizAgftBnhO8EmsbAIYytWTg3qxDsB BPCseYz/kYPcFvXF6GnrTuQypRrNl7BnmDA3l42UZnz9FlXFknQ7rJFL/OuR8SLV7UKQ4X9w aPkyhdJ2k92ETwgNFjfmkxdBaIUmwtRM2oPgDXSHT8s8ccgrmQE18rVpuS9zOtKkkjIVa4dh B4yZOa7A80oeWCKKGtbdB0a+ZXE5PvxJbeOm9CYTbBlhnsFN6okUI4Dt1wveg+8iO134+/Vd d0hNYyq6H7//qti7bD8y8ww32HJiJ9BoEe1UEWOOqgab8cFfWiVlVtdC94rxJKpSJIsSOa8M vbniW3SuzFaK61YjgpFoiclQBWubIm/FQTodDWdOPxV04kKV0Va/031ty+VrAI2MzKx9Cth4 kts1ZZEc0/loFfq2ZEWnjM5vTMlPFdqNx6D7hIGEKO1Ig524PU6KyYTIJ9Dg7EDyLdg/LoCd dluZTUzBfw10zMraD/dk5GsIpmSsID2atXeyXUobz/kKWa7zCkJkDNFLKBzO+p3vk6OJGbFM L4ZuxAIFlwFDv3L/B8okmMwHuytDgEtzQqfOUPhc0lk+FJKL0WCZjAbzndPhCqEF584OVEqX H8/H1z9/G/yu+KBytehVsaI/MDUzJePp/dYy7r+3p5ceSHzTpJ3R0ZG0/J+GPp1+LAZPnS26 qdmxo/J8fHqyTlE92nAErSwjOBPsWpNZuBwOrjiX3Rmu8CEX9CVgUcURMECLiFFMkEVovlTp qgLSH9EiqQ4OsrtVEPs2H/Xx7XL/4/nw3rvosWsnOztcfh6fL+jPrZyRe7/hEF/uUXXnznQz lCXLBHcswe3+MxhsWuhv0RUs45TEFO3WMb6qMotvvzKCNxUwWDnavomg3CwcVPvCaZqLHAe+ ClzKwBZXIwCzyExng5mr8kecugOJisKUaQNj69psoR4+BQi6PmNo+hZlK8tGHmFNtCe4VrMo ETYWWVkbklu2c0uRwCiklJhSB2HggJwa6ggMgx+mlkBBhR6JkXCfrlLak6eloYbpFqt0IwZW UGvkKkKabYnFpupaM4rB8/HwerEYCybuMmBId3unL+b04IuEmpXFZtk7vaE/vhnyHutbcsuK 51ZBza6zzQ4OiiJhlLwN401YIqiiciw2f6Kjp1rAfQdc5qr1ifGuUwjNaAHTJwTt5rJxwkvC My3gtFQHcUVYblH2z0vKDAEpQmAwKwq3YkZGukUMHPVBbgtqVGvos9BVNFg0wHJRl70qXm5s ZhWB6XJK2pXgHqqtLNtR1+7f9XLaHs8w8V29RuUk7jx6Wqjfc7SiWaBGzWTLKrhjBlBB09Se NANcO4xWTg3dSBPp8eF8ej/9vPTiz7fD+cu29/RxgDcIEZclviuikpataBQGuyzoZSUkW3E7 RQ7s/iik9ix62GE03jAwR8CE7gd7M1aZFJOhClikZWA8771f7jHKgytmYg8PB3hVnV4OFxP6 ev98ekKzk8fj0/ECjAzcbFDsYs0pC2+mdpgSDdlzVPApj+skiZLOAFe111X/OH55PJ4POm42 3Y68GZmm9BWg8sHSE3L/dv8A1b0+HP5Btwc2U6wg1PMCEDfjaXNUql7CH123+Hy9/Dq8H5tR qxFPn7B+Hk5vB6jq9f3UDiswCv8+nf9UA/D5n8P5f3r85e3wqDodkD0FTnrUMKDHp18Xo8qK SIpk+NfNXzUVe1CJow+vh/PTZ09NK047D8xqo5uZqTyvAOZ4lof30zPyur7BVFQ1G9n7guvr 9RHmVUWSbxeoSG8mtlfkbtXIPoFLvv/z4w0rhtYOGEP+8PDLikSvN8m+oySsFtLj+XR8NL5M xGlk8dHcwz/XFS9yVlJioUaHjWoFMxXP8lZKlal+L3PJMCkPbDZDx9/i0YOgQrduBLVUqQl0 Wl+nq8yQH63EflmsGPqot0D9LgUuar3fJRl6kaxvv5v6ojQ3w4PgL5tVYDzdB5ajBkLgdkDj ABvoOKTEYQoMfGpJSRBG+8St0XrJmPRVGd0tTC1MBXDHoAbjZ5d52kVY4voa6OirGrCVKKcB 5sWC2dKwGudPE1ZT0Kr/Grvli9J+Izffo6JrhCio7yLtN0gNdXyUmz7e0kqZGi9oQ4gabft1 BOVdIXO8BvdBUEaWWFUBYBNg0A1PvLJ/Kv+vatzNpo3H4Z5g9+HGwJhptOATkXFI81wsgWeU Cg3hLS1w6hl8LG36pdKZ32L4DI/5b0PgywdVJUTPZzOPwbgiKBfS47+9+colsOVX+liTqIxw nuRDKUefyuWaJ7R3rAwGmFPIN0xxoX2dfUg8DpNIeKxHCm+9qeDXvqxo4utcIVI+3sk1Cqji 7hqehxErWHiNBGUza6RxNRttI3VW9pAV9DjUmW6yJL/1L0ZqKJulXnAcynaf4sJbpPaDFMfU N95FxL55kahWlqy8NgyqB5UqwlOHVlMs5NXVVlHFvqGqCfybHvoRpAX9sNEDHcRSJacbLWkN RPU6z2S/3x/ut94w+ZpOWf5so8yTIEvRbH2buGrq6roo0sAfVx8d4ktJD+YuH1wba0BP9hHc mx4nwirmVHfS6+W0S6s153Q4Z2tZMk63Wlf7zRNwWdmf7Vfphpbe6hZKcW20lSlFoOOt0It9 25F/EWPOPYtIbEr9VEGD8sVG+gykqpo2GZfeutIEWLIiuepXj31B2Zt1t9ccIZ0svEEXvLCv 6Bh4pKhpjSoLHCLe7sD1rDemiRkawSIbWZQRPNIiisVs3lWnlxd4lgXPp4c/dUQdfMG017rB lHZTmBhIJ4CugRF8MpoMfKiB5ZVg40iPDYMkCIPopk/3B3Hz4YTGCRWyJyjoTjWxe6luZTvv BVaT+Jg4k2RHn7wmCQ9IjZhBsg2M74tvRcGzyhxBT62aU3H6OFOJqqACUcLOmQ0nhjIHoNFW EtAFLPwa2h4AKllUwT05+GKtUIAz/m8IUrnxuEvWFNKT/iOqPMPhvedhTuBwW+SUeIzDQG4M cbgO0ovv6uNDTyF7xf3TQSkFeqIrG6ruA0Xo8s7l4eV0ObydTw/dgS8jtK1Ch/HmLf728t4R 3KCZ9G/i8/1yeOnlsEF/Hd9+b5OwhTZxk6VNnAK3ouP/pjsHbpyP2Y7vRck8HiI52k54Ud8l fdEV6gGwLEk/rWiH53396dFfF8xS58ujqolVulg7OkGFcDXDFbjhCkbjORXMvSKDA200MlMA tHAnK5OJmNnpaypUKTGsPCXWrwhEOpn0h0TJ2kiLXr6wWEpKasrNVyVH+elmuTSVdy1sHyxs 8FoFOtPhAwxwpXCChyxVl/7XjH9glOmQqgQa8J5VmjBNMjS2Dsq7b6tXIv1xiG8rp0WZ9SkU 7pLR2JjGCuA+shcpG8zI6C1pMJj03UBQJtSOBhWy4cyMJMmsWPshsDSheStpgOXtpUBkBMn1 ToRGJgT10259vQu+rgf9gZmTCq6KkWWGxG7GVsIaDXASswDQyq0AgNnYtJoAwHwyGbiZdzTU eq4okCd13i4Y98mcGYCZDs1uCrmejSx3QgAsmJEP/rrc2hT4DueUdzEg5nbmnyozIZ2xCpGz WZWXzBCewDO7P3DLtBs720ZJXqAGQgJfm9NSp3hH51pJZDAcz4xRUQAzMwOm0hhNRxZgPrVZ ljQoRuMhxUBkbHNjWXW0Sbu586EtZuv7Vkz7FQb92YAavjrpU2oldlMZnwC6KpzmtsvpoO+O qp73l7dnuPeMWy34dXhRprOikZcbbEnC4OyJq+c8dcCwb24m8e332bxrSxEfH6sWlJJFc8uG jTHqnEQjNjDk60IUdUGqEKY6tQrRuKqTFaf+8WrK5mttBPoO623h2wuTviekPCYQmnneduFk PKbuTkBM5kM00DAdLxR0ZNwDAepBmXloFrm0Iel0OBrZhnlsNyFzPyBiNjRtRoNifGOy93rt 6Pobndjjx8vLZ+s5XUUNPfzfx+H14bPR7/wHFRxhKP4okqRZXop3Vtzg/eV0/iM8vl/Oxx8f VRB6RVP8un8/fEmA8PDYS06nt95vUMPvvZ9NC+9GC/9EidTcPKuBeS7r387hXWxGfSv9lga4 l161oFZ3Zb4fsR2nnpFcrkY6yLhe84f758svY2PV0POlV95fDr309Hq8WD1my2g87o+duRz1 B74EYxo57O63j5fj4/Hy2R0Ulg5HVpD9WJq3bhzioWz6nUgxNFeM/m0PYgzvDjvMAL+hrypE DJsR4rAsLmj+9XK4f/84H14Or5feBwyKNYncmUTeTmLT4DrdTalLgGdbnNGpmlGL1TMR5FQn Ip2GgjALc9SLthyOJbRci4Vfw70YeYK9sQR2cJ9+y7IiFPORZwEo5HxKIxfx4IZMtIsIO2lQ kI6GgxndOcTRT+h05FiuAmRKzjsipqb0YlUMWQFLgfX7BvvcnOUiGc77dpJGG0dGaFGowdCK 8fRVMAytQFCXRdmfmEu7bsENyJjI0k0oynZjb9CbvJAwXfRYFtCZYd9FN5tjMBjbbNxoZCoI ZSBGY1vSo0A3ZMiE6mtQ7z0xuRwFsG1MATSekJkoN2IymA0NdeI2yJKxFb95G6XAiJgxF7fJ dDBrdnl6//R6uOjnBnEereHBZ+rY8bf5CFn353ObJ6teFClbZb68imw1Gjj+B6PJcNwn9rmq pnOkd1UZaTCBJ6vb4H81Kc3fng9/OSyDYms23SOEvz48H187I/JPTAWwyipOH/1oVKG3yk0h abRELTCqd2m0uBNL4T4Y66v27XSBA/rYeTiGAmbbekYVuJqMFVckeEv56oPvNE/8JC3mg357 jxbnwzveDsTaWRT9aT9d2YsDnhrUoRcXVh+LZGBeg/q3yxskI5tITKZWUnn12ykEsNGNM6oY oNnyzzah7v0jJ8Dse5SHw/6Ufj98Lxgc0lP6tnpFyxZnnRXn01/HF2QRUBv9eHzXNkHEvZbw EBVhXEb7rUe8WC77ZICl3XziBAwDylmnk/Lw8oYMIjnJabKb96fOqZcW/T7FVSvEjXnW3QnT iUD9No+zTC6sH3seWja9CCp4tiryjPa2QwKZ51QwAFU2Kpd2A8qi2XaO3KaRGX8BflaZRLqS PCQN2HwQ7MYWy49wCRcIGQgPkUu2jqwGTvfnR6N+oyaO9MAgTDoThQV9MkYshIb2bq8KnlPX k+U9AT/00WMWRmBQUsc7YtDUeCmdOrpZsTUUg6N6fMVagivqKKBRHiSziVs3PMI9BQBjhwPg 5TdMz2JcbZjaCEOrst0+K/81aAgLjBe7sN2AlQEWnKQBp900tP0VlM0DyawUfCKSKEqUZZ4k 9vRoHJPxjS8zKeIXUZlwWt+nCVZRyjN6dDUBT3e064pGJ0UwmHlcVzRFGgmPxlHjCy4k5r6h hb+aRuQB2opdo5CpN7GiwqOUn2IkFVZyIqO7Rn2/y+j4rlXJaFWy/aJIKZXz0vSXhB9qI1vG aQiEG3trm9+hJ2eJZ3aEOhJLk4q4KidhZ4ejk7T4+PGuFCbt5q5Mp23TrEWQ7teYpnsjFkMb BT9QObcfzrJ0H+vIk+2+MZFYlt6YQBXAritc//CWAi3RgIB+GgeL7scdzj9P5xd10b3ox3fX 371kxj0t400WYvThpFFtdY0ps7DM7bA2FWi/4FjaVYXXbJMZ3gH1zxYg2zoWmrA6r+jcZdz5 WrXarbR/lPrN3RO6qODdgVmaSQzhx74KlmBb5hmI2HSaQbhwIsiooCBFEu1sg0LdhSp0H6HM Co1DFX7s86XthFKnQIIBTBm1pyorMyvmUhiEC+axSREBdJ4vlhLqzmjLtuXtPliu9CVGEqzy fJVETedogfeSq4VQwJGMZuGCGBd5eDrfN4ENXaHcEa2D1d41+fMAzsZof4vxUbS3kzktqHq1 fUmjnRzul5SECzAjK3Z3BYCtKDDBWZA49SikiIJNSeccAZKxW+EY2QjMrqQ64qCcthxU3ZKD iTJlw2lFCqqLWDiz62Nv+IOvi9DivfC3P/eO2KcLNQH2vcBhcgHnWS1fO6gKgeHRh9aI1ZB9 PjQVlA0YA/l3yKs8s0ystf1vK7ww0GQHFlJ327gGKgg1MQ1OJ63Es2pVTVDL2dQ05SbbC5YB WgVxoEdGU/tGW2OZgME1+OuMJ82otTt26BtjrN88in1rDpl4e+1qSBW6wk77xWHzI9hxZUnh qkBr6DuLgu4UtYyXosnh1h5lGkSKQxWmw6cvmbfIt00ujeiC6ifawqMnrhY2oGWYdb1jlP6K EI66jP4ejXeyomqg1EbWNWyZyv124AKGTqlAWmcP28h8Kca+/bVURwy5wbbA7rI7e4s1MNi4 /1/YlTU3juPg9/0VqXnardqZiZ2jk4c8UBJtq60rOmInL65M2ttJzSbpylE78+8XAEmJB+iu mqm08UEkRZEgCIKAzrgEf5x5M7Hgmh/afNL7h8e9s9FadCQWQs7sV1DYfs9uMpLngTjPu/ry /PzYG85f6yKX3HXgO+B3WYdssWMyB2d19/tC9L9XPV8vYE6vlB084VBufBb8bW4IYzxKvOJ1 dXryhcPzGnVS0Ievfnl6f724OLv8dfYLxzj0C8tDruqNOLK25JyEcOF2E7x/877//PYKSyvz 7lPSCmuYA2nte6naIG4HevveJRKxCzBYWu5cgSAINjBF1kprdq9lWzmpMvQe2WinZeO2iQiT sOJPmolnK/qeDWI3LGFuJ3YtmkQtn6jqT9D3JShLJMqgqb1kvS31RRqby/qchftjDELKDAqE zajandpGNwf5cuJk33GxL9xphcNyYZ8Oesg8ipxFkXhjLs65jb3HMosVfB5tjG3895DTA435 ec+cn0cLvowglyfn0Sovz/gDFa8AfrvoMp1yOQzcJn4J3h0EJY6wHWdCc56dzd17kT7In/0g l+jSnI2RY1XvfWFDnseay4VVsfFTvrwznhx8HgNw/gQ2fsmXNzuJ0CPNmnntWtf5xa5laINL w4v8sGzasbIMOZWwzUw5Oqgvg53uY0TaWvS5Gx90xG4xKVvOm+EN01LIn7KAlsNfQzAcOTRc VNyty5GjGvI+0g851xX90K69VIEI4ZIarIfr/dvL/r9Hj/cPf6q092YBweTxaM1cFGLZ+Z6/ P96eXj7+VIcKz/v372GEA5XFifyNneWkq0kBXqoU9Ubuj7qCDjwQcoxXSilVsS4dtvjCVYp1 His+Rkb6+vwD1v1fP56e90egqD38+U6v8KDob9ZbTCXSJimvFtxVQlnhBTTSf4ERM5CJ3jYG a7wcul7tjSylqRWlevJqdjw/tS1Abd6ACEFbecnrta0UGRUMXNxpbgV6b4aPJ7W90JLgqjeV fRxo9oCWIiAxuHfnt1cxwtYbdyaoAJQYGt4xXXmY6pa6KrgNB0WJ3AjYQqiOaGraZXR+B2m6 841Vk+sWxudGijX5zfLxbSjqLepTFFsiJI7apvpQV8d/zTgu/9azagGqbeQC+48pSNtRtv/j 8/t3ZyZRt8ttj1GHXfuDKgdxCpDOaZj4LPQBXgm0w7+69F1V6y11lAPji/ovQCytXPj0tobN qjAhFh2oTr7CV+4iZOioYqEf817ScCxAHrJj2mWjo29+8LuMqGOytj+bqU0HGtbxZsEQghEE 8mnA0fbTAvW8NrJp5hfbFYKNxqri7dPgKWVZwOANm2SQaCPwEGQNm3ulqHtP33ACYQzJqnlU 9B/m4TCZrIMr/3oQhjnzhfUshInTsGGNiGmVL3VMgrBL6L1wh70o6k0geHiQHqfXw27jRdZK RZZRu1+cnkfohfj5Q8n+1f3Ld/eWR73o0UAzND/xSBZtdoBvbANCsLuCOdiLzmmdEj0jRIO+ HmA4zY/tE8G6x9ghpcXYRIJcRXl3N6IY5BQEaHMNshdEdFY7MgU5QYDXjk3LIeuCZi5oGj6l kIAvkY1mn6lrieyvpS5Mhsw4rEe/rDK1QB0QEdiqtZSNZ5lSDhPosDrK6qN/vv94ekEn1vd/ Hz1/fuz/2sM/9h8Pv/32279CZaDtYUXv5ZbNPKMH3XS7zJ1B6rlw9mw2CgPRUW/wZODAi5Hd MVgxbJPFDWtjnHw9sADs6Gj7TUi0QtrJUKfqcUkRTT4KfDfEJxYP0wHUULnzw2KZETi+77Rk sJqnZRbBL0+gXRkpGdAZGIBVykxiutYqi4Tr1hJWifHoy8P/Ohdp8Op5uCQ2OUvulj6FjK+5 o+MoIG0lJioCFWN0DoEVy1ElvI+LMOMWY3W53UW4/KGHSuxbIO49ayEo06HroYfNNJ/PnCf1 F7FI8roLJ78e59dac2sDnc3jVOZ10JvwsIy/Eml6dCfbtm5BsHxVKihnWFfWR8NhGy/zApdr l6J0I5pjHlCKNSpN14PTUQSR453qIu8ZCnRsHrHs8YAtcBb8vMG2om4V4PJMMwTNbn0sHCbs Uar0lr/7jmcGdjlBiPeKnAoxBIm3Ci+GSrXiMLpsRbPieczmbeENKQbcbfJ+hZFYO78eBZek yVEqLCc3CSUrve1SGs7ISfuLoBCYjU4eF7oirktTRXviqlUxkNx2q6Z4+SJbirPkXRmkMAvE 72j58KfHCdDB26Zhp1lF0VDdAKNo3Pqd8ox/h1+QZgw/tv8lwm9snbUwH5g7nGqvQcNaBG1Q q3tY7moD45UpbqpYD1T1VbklWX+hrgLNdFWHn84AowobdqPcJbCswDdQgRO94z0HkzBFqshV X80gKpBcaCnQT7KKxMgMQ9WwMZVGO1opSmGHJsWa3A/oPmos9dQAtSdSjUteG4sxGBkemcs/ n8bjQNId0PrDMZjcwVDoBSxmTbDcTccXZV4H7+7NhdIeAitY4O1wwNMxyyhDdgkI1VUp2O2o PVNHPmeJthhizXeGowRFHNvoBRE1rVffxTjtKJXi84XsXv3+/cOxTxTrzPawxGpJo4GdkC0L iN55wXCTaaEAPS7e422Cx71xnIw8+Eosm9nGklJ6fsrpjNi4ldxmQ9n4Te6pc1eyaBzdi8A1 oH3tXHElOtkXuUwshCZ57wwPIg6DHY+NSC3suVcUo8pvqXCts6jV5ZmkhCuzk8tTCrqL22xe P8IYvU1+4PCTKjnohKS6ho4u4/hA9lV+Csky8pXITgL6CpqQYILiJQXP7NUJTA3CSQ1rO7/M nIjJ+PuQZWNIOqH9SvI7Epr206O10TBW9a4aCs7SQbj9bFgyLzGJTRT5sipjMZN03XzFlpEG nQJ3eacWdemIOBzNaa95OIu0aItbYx4fOkutxZBzeltDhgI7Bo79lF2bU1qWLDmN0a9xt82S 1K226XFeehfjJ4DZJXAacVYPMJOUxS14Ak+zi4FNJ6FDl/Re7jwcauNCEKo+eCMXRz8l6tkd by+OJ/OGj8EnmvGYmkFXcx5FReHqxJKCBsXqeDk5cbDZsEZ8CI5GRshXT8Z+NB4aVhPtOBR6 W0bHL2hk4vWCtBEHBFMNwqDEaZRXoETF7BOqJlLCD+BVOdkfmPfB8aU3TY3jA6oCV+FiEjkm 6vYPn294ZyY41NIpG01BsJ7AOomqOwC4ytj+cgF7j6nHZLbz8z5qhy+NsC8MwC5bQe9JlQqR zVarvSIxoHZHTuckKrwTmpiLpoEW/hQhD/IKGjdQzO3mVu2K3dieAdMBiLbWXeMK2QXMPXRZ 6+qhjWVDBY05T6kYzNKs1vMD43jqD+divYde/TI6mmzrVhkdrC5QUUGNCpW+/f3j4/Xo4fVt f/T6dvS4/+8PO8m4DiEqiqWwA+g75HlIVwcTITFkBdU9zZuVrRL7SPjQysnuaBFD1tbZfI40 lnE8/AiaHm2JiLV+3TQh99pOj2lKwMiXTHM6EdCy8KVlyhBLUYkl0yZNDyvTXpUsN+b1oJMh YxN1uZaL2fyiHIoAQJWAJYbVN/Q3IOM52PUgBxkg9CccYWWELoZ+Je3oy4aO+0Y16QKsy8uw oGUxSP2AzlXsfQ2Tl0Ldt/j8eMRrog/3H/tvR/LlAacaXmb439PH45F4f399eCIou/+4D6Zc mpZh/QwtXQn4b37c1MXt7OT4LHwTeZ3fMANnJWC1ujGNTSjexfPrNzvXhqkiCfsu7cM+S5nx IW3Xbk0r2g0zBphKtkyBsG7oFFIqXsT9+2Os2aUIi1xxxC1X+U05xRLJnr7DFjOsoU1P5kzf EHnMd8+APBU6oeCmEoD97DjLF+GAYCVhdCiU2SlDY/hyGB0Y5da9/mREU5nN3CgGHAfrEjjh 87Nw+gD5ZH4cDuCVmHFErgggn83mTKMB4NzNNNov29kl99imOZuFQUrSpx+PbixFs8R1TBlA 3bGpgi38jJEmSK/yyDAS1ZDk4eyAHVf4iUED2LiJDj0gyIhmBp4oZVHk4UqUCvT+iT3U9WdM NyCdu/k8RikP32bBLwzrlbhjFIxOFJ3gxo+is31spCc3YvjkvCPaNrIK26bpu66Tc7bGXgqm sn5T44c4NKk0CxbKeoFhsIEnO8zU2LELPB4JpfBdHdAuTkPRVNyFIwpoqyl65f3Lt9fno+rz +Y/9mwmZxLUEE2LBZopTybI2QQNMNfDIykuA5GDicL8RU8r6jFscQb1f8x7TguPtY1v3tzSm HacSGyCwh/l4pzXHeLNGVq7DRpDVtbFyckoIkXDpVRf3MvdMJ8S0dIvjIGJZfCnrLBx+iKzy RbX7cnm2PYzqDgg+LPDgZfpUiHIcYmRD7HgvAuu5NOXsPhbDteATKVzjfY7VxeXZX2m48BuG 9GS75d+J0PN5HDRl34SLvVP6IRzKj8BjKicNie62LCVuqmk/TuYR28Y4wc2QFJqrGxJkDIUQ xsT6D6m775RB8f3p+4uKjUGOqY6FXt2fsA0IrXOuF+Id7munrbPC5bZvxS6VuNXO0VOVtwPU VSbaW6Y2v7ykoLSD3Wj+iFl01zfOhWDtZpbfichhfZJX2AJlhDeSs3j64+3+7e+jt9fPj6cX W4VtRZ6d7xrLxTPJ+1ZiYi7nC03W5QnnDmuoWba/pzk37fq2Spvb3aKtS28faLMUsoqglcSL TbntlGsgvHOMJnt1vBDimBXM3OH1oCjZGrvGuLxAZUTfyM7drVwKEx3kuEOanbuyJN0p5ZqV B1BlP+zcAlz1HfX28PBG02HKyOT2wqtwQviojJpFtBsRCbCsOJKcMxKmngqYWteLijwJ9yyp 0z5M2dOrzqWcHD2XfW4cVuiWxHYAaBL0vGuhRmomQ/odNAzXK1dRIWqgvoDewpSMVK5k0FRY btBfeDpbyvYOyXY3KQpaAdgPpGEK9MDGUNAMuZNUUhNFWzJVAbVfDSV3ZKQ5ugY+W1Bakn4N aL4ftHnj3fIub1ggAWDOIsVdKVhgexfhryN0qyfMhGcMs8656kRuRZZv1VkryYK6zWxZILqu TnMQgyQvW+GYcymUgOvCiyQ8PfEOzfEsy836iSeXFYYlqyOJQUxyUp/BzL9mgI2vXUl2bcvp ok7cX8xkqwq8FGnN6eIO8whZBOgNdy+fZbzrSd5eo/2AO80rm9zJBgs/Fm7MqQ69RAtWKnUY 9aS2XmwU34CQLYqBKJsjnYFZigH6gGSysZ1eOv/0GpbZUu4qmC3qfPz/YxVDtkCxAQA= --n8g4imXOkfNTN/H1-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id F18D76B05C1 for ; Mon, 31 Jul 2017 02:46:30 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id 185so15990447wmk.12 for ; Sun, 30 Jul 2017 23:46:30 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id f62si3048836wmf.6.2017.07.30.23.46.29 for (version=TLS1 cipher=AES128-SHA bits=128/128); Sun, 30 Jul 2017 23:46:29 -0700 (PDT) Date: Mon, 31 Jul 2017 08:46:26 +0200 From: Michal Hocko Subject: Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Message-ID: <20170731064625.GB13036@dhcp22.suse.cz> References: <20170727090357.3205-3-mhocko@kernel.org> <201707291609.lv5NwUPI%fengguang.wu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201707291609.lv5NwUPI%fengguang.wu@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: kbuild test robot Cc: kbuild-all@01.org, Andrew Morton , David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML On Sat 29-07-17 16:33:35, kbuild test robot wrote: > Hi Michal, > > [auto build test ERROR on cgroup/for-next] > [also build test ERROR on v4.13-rc2 next-20170728] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-oom-do-not-rely-on-TIF_MEMDIE-for-memory-reserves-access/20170728-101955 > base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next > config: i386-randconfig-c0-07291424 (attached as .config) > compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > All errors (new ones prefixed by >>): > > In file included from include/linux/ioport.h:12:0, > from include/linux/device.h:16, > from include/linux/node.h:17, > from include/linux/cpu.h:16, > from kernel/cgroup/cpuset.c:25: > kernel/cgroup/cpuset.c: In function '__cpuset_node_allowed': > >> include/linux/compiler.h:123:18: error: implicit declaration of function 'tsk_is_oom_victim' [-Werror=implicit-function-declaration] Thanks for the report. We need this --- commit 638b5ab1ed275f23b52a71941b66c8966d332cd7 Author: Michal Hocko Date: Mon Jul 31 08:45:53 2017 +0200 fold me - fix implicit declaration of function 'tsk_is_oom_victim' reported by 0day Signed-off-by: Michal Hocko diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 1cc53dff0d94..734ae4fa9775 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 66F096B052F for ; Tue, 1 Aug 2017 08:16:49 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id q189so2232838wmd.6 for ; Tue, 01 Aug 2017 05:16:49 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id y191si1116283wmy.175.2017.08.01.05.16.46 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 01 Aug 2017 05:16:46 -0700 (PDT) Date: Tue, 1 Aug 2017 14:16:44 +0200 From: Michal Hocko Subject: Re: [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Message-ID: <20170801121643.GI15774@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170727090357.3205-1-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML On Thu 27-07-17 11:03:55, Michal Hocko wrote: > Hi, > this is a part of a larger series I posted back in Oct last year [1]. I > have dropped patch 3 because it was incorrect and patch 4 is not > applicable without it. > > The primary reason to apply patch 1 is to remove a risk of the complete > memory depletion by oom victims. While this is a theoretical risk right > now there is a demand for memcg aware oom killer which might kill all > processes inside a memcg which can be a lot of tasks. That would make > the risk quite real. > > This issue is addressed by limiting access to memory reserves. We no > longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim > instead. See Patch 1 for more details. Patch 2 is a trivial follow up > cleanup. Any comments, concerns? Can we merge it? > I would still like to get rid of TIF_MEMDIE completely but I do not have > time to do it now and it is not a pressing issue. > > [1] http://lkml.kernel.org/r/20161004090009.7974-1-mhocko@kernel.org -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f72.google.com (mail-lf0-f72.google.com [209.85.215.72]) by kanga.kvack.org (Postfix) with ESMTP id B49276B0533 for ; Tue, 1 Aug 2017 08:24:15 -0400 (EDT) Received: by mail-lf0-f72.google.com with SMTP id o85so3972002lff.0 for ; Tue, 01 Aug 2017 05:24:15 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com. [67.231.153.30]) by mx.google.com with ESMTPS id 26si12253754ljt.143.2017.08.01.05.24.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Aug 2017 05:24:14 -0700 (PDT) Date: Tue, 1 Aug 2017 13:23:44 +0100 From: Roman Gushchin Subject: Re: [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Message-ID: <20170801122344.GA8457@castle.DHCP.thefacebook.com> References: <20170727090357.3205-1-mhocko@kernel.org> <20170801121643.GI15774@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170801121643.GI15774@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Andrew Morton , David Rientjes , Johannes Weiner , Tetsuo Handa , linux-mm@kvack.org, LKML On Tue, Aug 01, 2017 at 02:16:44PM +0200, Michal Hocko wrote: > On Thu 27-07-17 11:03:55, Michal Hocko wrote: > > Hi, > > this is a part of a larger series I posted back in Oct last year [1]. I > > have dropped patch 3 because it was incorrect and patch 4 is not > > applicable without it. > > > > The primary reason to apply patch 1 is to remove a risk of the complete > > memory depletion by oom victims. While this is a theoretical risk right > > now there is a demand for memcg aware oom killer which might kill all > > processes inside a memcg which can be a lot of tasks. That would make > > the risk quite real. > > > > This issue is addressed by limiting access to memory reserves. We no > > longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim > > instead. See Patch 1 for more details. Patch 2 is a trivial follow up > > cleanup. > > Any comments, concerns? Can we merge it? I've rebased the cgroup-aware OOM killer and ran some tests. Everything works well. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 7AA9C6B0539 for ; Tue, 1 Aug 2017 08:29:09 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id a186so2260274wmh.9 for ; Tue, 01 Aug 2017 05:29:09 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id w140si1161278wmw.19.2017.08.01.05.29.08 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 01 Aug 2017 05:29:08 -0700 (PDT) Date: Tue, 1 Aug 2017 14:29:05 +0200 From: Michal Hocko Subject: Re: [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Message-ID: <20170801122905.GL15774@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170801121643.GI15774@dhcp22.suse.cz> <20170801122344.GA8457@castle.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170801122344.GA8457@castle.DHCP.thefacebook.com> Sender: owner-linux-mm@kvack.org List-ID: To: Roman Gushchin Cc: Andrew Morton , David Rientjes , Johannes Weiner , Tetsuo Handa , linux-mm@kvack.org, LKML On Tue 01-08-17 13:23:44, Roman Gushchin wrote: > On Tue, Aug 01, 2017 at 02:16:44PM +0200, Michal Hocko wrote: > > On Thu 27-07-17 11:03:55, Michal Hocko wrote: > > > Hi, > > > this is a part of a larger series I posted back in Oct last year [1]. I > > > have dropped patch 3 because it was incorrect and patch 4 is not > > > applicable without it. > > > > > > The primary reason to apply patch 1 is to remove a risk of the complete > > > memory depletion by oom victims. While this is a theoretical risk right > > > now there is a demand for memcg aware oom killer which might kill all > > > processes inside a memcg which can be a lot of tasks. That would make > > > the risk quite real. > > > > > > This issue is addressed by limiting access to memory reserves. We no > > > longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim > > > instead. See Patch 1 for more details. Patch 2 is a trivial follow up > > > cleanup. > > > > Any comments, concerns? Can we merge it? > > I've rebased the cgroup-aware OOM killer and ran some tests. > Everything works well. Thanks for your testing. Can I assume your Tested-by? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id B0D786B054B for ; Tue, 1 Aug 2017 08:43:27 -0400 (EDT) Received: by mail-pg0-f72.google.com with SMTP id y129so16857595pgy.1 for ; Tue, 01 Aug 2017 05:43:27 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com. [67.231.153.30]) by mx.google.com with ESMTPS id x84si18391105pgx.426.2017.08.01.05.43.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Aug 2017 05:43:26 -0700 (PDT) Date: Tue, 1 Aug 2017 13:42:38 +0100 From: Roman Gushchin Subject: Re: [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Message-ID: <20170801124238.GA9497@castle.dhcp.TheFacebook.com> References: <20170727090357.3205-1-mhocko@kernel.org> <20170801121643.GI15774@dhcp22.suse.cz> <20170801122344.GA8457@castle.DHCP.thefacebook.com> <20170801122905.GL15774@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170801122905.GL15774@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Andrew Morton , David Rientjes , Johannes Weiner , Tetsuo Handa , linux-mm@kvack.org, LKML On Tue, Aug 01, 2017 at 02:29:05PM +0200, Michal Hocko wrote: > On Tue 01-08-17 13:23:44, Roman Gushchin wrote: > > On Tue, Aug 01, 2017 at 02:16:44PM +0200, Michal Hocko wrote: > > > On Thu 27-07-17 11:03:55, Michal Hocko wrote: > > > > Hi, > > > > this is a part of a larger series I posted back in Oct last year [1]. I > > > > have dropped patch 3 because it was incorrect and patch 4 is not > > > > applicable without it. > > > > > > > > The primary reason to apply patch 1 is to remove a risk of the complete > > > > memory depletion by oom victims. While this is a theoretical risk right > > > > now there is a demand for memcg aware oom killer which might kill all > > > > processes inside a memcg which can be a lot of tasks. That would make > > > > the risk quite real. > > > > > > > > This issue is addressed by limiting access to memory reserves. We no > > > > longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim > > > > instead. See Patch 1 for more details. Patch 2 is a trivial follow up > > > > cleanup. > > > > > > Any comments, concerns? Can we merge it? > > > > I've rebased the cgroup-aware OOM killer and ran some tests. > > Everything works well. > > Thanks for your testing. Can I assume your Tested-by? Sure. I wonder if we can get rid of TIF_MEMDIE completely, if we will count OOM victims on per-oom-victim-signal-struct rather than on per-thread basis? Say, assign oom_mm using cmpxchg, and call exit_oom_victim() from __exit_signal()? __thaw_task() can be called from mark_oom_victim() unconditionally. Do you see any problems with this approach? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id CE4CA6B054D for ; Tue, 1 Aug 2017 08:55:00 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id p17so2345895wmd.5 for ; Tue, 01 Aug 2017 05:55:00 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id o80si1233985wme.162.2017.08.01.05.54.59 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 01 Aug 2017 05:54:59 -0700 (PDT) Date: Tue, 1 Aug 2017 14:54:57 +0200 From: Michal Hocko Subject: Re: [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Message-ID: <20170801125457.GM15774@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170801121643.GI15774@dhcp22.suse.cz> <20170801122344.GA8457@castle.DHCP.thefacebook.com> <20170801122905.GL15774@dhcp22.suse.cz> <20170801124238.GA9497@castle.dhcp.TheFacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170801124238.GA9497@castle.dhcp.TheFacebook.com> Sender: owner-linux-mm@kvack.org List-ID: To: Roman Gushchin Cc: Andrew Morton , David Rientjes , Johannes Weiner , Tetsuo Handa , linux-mm@kvack.org, LKML On Tue 01-08-17 13:42:38, Roman Gushchin wrote: > On Tue, Aug 01, 2017 at 02:29:05PM +0200, Michal Hocko wrote: > > On Tue 01-08-17 13:23:44, Roman Gushchin wrote: > > > On Tue, Aug 01, 2017 at 02:16:44PM +0200, Michal Hocko wrote: > > > > On Thu 27-07-17 11:03:55, Michal Hocko wrote: > > > > > Hi, > > > > > this is a part of a larger series I posted back in Oct last year [1]. I > > > > > have dropped patch 3 because it was incorrect and patch 4 is not > > > > > applicable without it. > > > > > > > > > > The primary reason to apply patch 1 is to remove a risk of the complete > > > > > memory depletion by oom victims. While this is a theoretical risk right > > > > > now there is a demand for memcg aware oom killer which might kill all > > > > > processes inside a memcg which can be a lot of tasks. That would make > > > > > the risk quite real. > > > > > > > > > > This issue is addressed by limiting access to memory reserves. We no > > > > > longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim > > > > > instead. See Patch 1 for more details. Patch 2 is a trivial follow up > > > > > cleanup. > > > > > > > > Any comments, concerns? Can we merge it? > > > > > > I've rebased the cgroup-aware OOM killer and ran some tests. > > > Everything works well. > > > > Thanks for your testing. Can I assume your Tested-by? > > Sure. Thanks! > I wonder if we can get rid of TIF_MEMDIE completely, > if we will count OOM victims on per-oom-victim-signal-struct rather than > on per-thread basis? Say, assign oom_mm using cmpxchg, and call > exit_oom_victim() from __exit_signal()? __thaw_task() can be called from > mark_oom_victim() unconditionally. > > Do you see any problems with this approach? Ohh, I wish we could do that. All my previous attempts failed though. I have always hit the problem to tell that the last thread of the process is exiting to know when to call exit_oom_victim and release the oom disable barrier. Maybe things have changed somehow since I've tried the last time but this is a tricky code. I will certainly get back to it some day but not likely anytime soon. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id 4D5B96B0561 for ; Tue, 1 Aug 2017 11:30:49 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id s21so1986454oie.5 for ; Tue, 01 Aug 2017 08:30:49 -0700 (PDT) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id m11si10503459oib.497.2017.08.01.08.30.47 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 01 Aug 2017 08:30:47 -0700 (PDT) Subject: Re: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access From: Tetsuo Handa References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> In-Reply-To: <20170727090357.3205-2-mhocko@kernel.org> Message-Id: <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> Date: Wed, 2 Aug 2017 00:30:33 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@kernel.org, akpm@linux-foundation.org Cc: rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.com Michal Hocko wrote: > CONFIG_MMU=n doesn't have oom reaper so let's stick to the original > ALLOC_NO_WATERMARKS approach but be careful because they still might > deplete all the memory reserves so keep the semantic as close to the > original implementation as possible and give them access to memory > reserves only up to exit_mm (when tsk->mm is cleared) rather than while > tsk_is_oom_victim which is until signal struct is gone. Currently memory allocations from __mmput() can use memory reserves but this patch changes __mmput() not to use memory reserves. You say "keep the semantic as close to the original implementation as possible" but this change is not guaranteed to be safe. > @@ -2943,10 +2943,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, > * the high-atomic reserves. This will over-estimate the size of the > * atomic reserve but it avoids a search. > */ > - if (likely(!alloc_harder)) > + if (likely(!alloc_harder)) { > free_pages -= z->nr_reserved_highatomic; > - else > - min -= min / 4; > + } else { > + /* > + * OOM victims can try even harder than normal ALLOC_HARDER > + * users > + */ > + if (alloc_flags & ALLOC_OOM) ALLOC_OOM is ALLOC_NO_WATERMARKS if CONFIG_MMU=n. I wonder this test makes sense for ALLOC_NO_WATERMARKS. > + min -= min / 2; > + else > + min -= min / 4; > + } > + > > #ifdef CONFIG_CMA > /* If allocation can't use CMA areas don't use free CMA pages */ > @@ -3603,6 +3612,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > return alloc_flags; > } > > +static bool oom_reserves_allowed(struct task_struct *tsk) > +{ > + if (!tsk_is_oom_victim(tsk)) > + return false; > + > + /* > + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves > + * depletion and shouldn't give access to memory reserves passed the > + * exit_mm > + */ > + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm) > + return false; Branching based on CONFIG_MMU is ugly. I suggest timeout based next OOM victim selection if CONFIG_MMU=n. Then, we no longer need to worry about memory reserves depletion and we can treat equally. > + > + return true; > +} > + > bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > { > if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) > @@ -3770,6 +3795,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > unsigned long alloc_start = jiffies; > unsigned int stall_timeout = 10 * HZ; > unsigned int cpuset_mems_cookie; > + bool reserves; > > /* > * In the slowpath, we sanity check order to avoid ever trying to > @@ -3875,15 +3901,24 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > wake_all_kswapds(order, ac); > > - if (gfp_pfmemalloc_allowed(gfp_mask)) > - alloc_flags = ALLOC_NO_WATERMARKS; > + /* > + * Distinguish requests which really need access to whole memory > + * reserves from oom victims which can live with their own reserve > + */ > + reserves = gfp_pfmemalloc_allowed(gfp_mask); > + if (reserves) { > + if (tsk_is_oom_victim(current)) > + alloc_flags = ALLOC_OOM; If reserves == true due to reasons other than tsk_is_oom_victim(current) == true (e.g. __GFP_MEMALLOC), why dare to reduce it? > + else > + alloc_flags = ALLOC_NO_WATERMARKS; > + } If CONFIG_MMU=n, doing this test is silly. if (tsk_is_oom_victim(current)) alloc_flags = ALLOC_NO_WATERMARKS; else alloc_flags = ALLOC_NO_WATERMARKS; > > /* > * Reset the zonelist iterators if memory policies can be ignored. > * These allocations are high priority and system rather than user > * orientated. > */ > - if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { > + if (!(alloc_flags & ALLOC_CPUSET) || reserves) { > ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); > ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, > ac->high_zoneidx, ac->nodemask); > @@ -3960,7 +3995,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > goto got_pg; > > /* Avoid allocations with no watermarks from looping endlessly */ > - if (test_thread_flag(TIF_MEMDIE) && > + if (tsk_is_oom_victim(current) && > (alloc_flags == ALLOC_NO_WATERMARKS || > (gfp_mask & __GFP_NOMEMALLOC))) > goto nopage; And you are silently changing to "!costly __GFP_DIRECT_RECLAIM allocations never fail (even selected for OOM victims)" (i.e. updating the too small to fail memory allocation rule) by doing alloc_flags == ALLOC_NO_WATERMARKS if CONFIG_MMU=y. Applying this change might disturb memory allocation behavior. I don't like this patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id E37FE6B0567 for ; Tue, 1 Aug 2017 12:52:48 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id l3so2966205wrc.12 for ; Tue, 01 Aug 2017 09:52:48 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id k130si1593558wmf.17.2017.08.01.09.52.47 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 01 Aug 2017 09:52:47 -0700 (PDT) Date: Tue, 1 Aug 2017 18:52:42 +0200 From: Michal Hocko Subject: Re: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170801165242.GA15518@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed 02-08-17 00:30:33, Tetsuo Handa wrote: > Michal Hocko wrote: > > CONFIG_MMU=n doesn't have oom reaper so let's stick to the original > > ALLOC_NO_WATERMARKS approach but be careful because they still might > > deplete all the memory reserves so keep the semantic as close to the > > original implementation as possible and give them access to memory > > reserves only up to exit_mm (when tsk->mm is cleared) rather than while > > tsk_is_oom_victim which is until signal struct is gone. > > Currently memory allocations from __mmput() can use memory reserves but > this patch changes __mmput() not to use memory reserves. You say "keep > the semantic as close to the original implementation as possible" but > this change is not guaranteed to be safe. Yeah it cannot. That's why I've said as close as possible rather than equivalent. On the other hand I am wondering whether you have anything specific in mind or this is just a formalistic nitpicking^Wremark. > > @@ -2943,10 +2943,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, > > * the high-atomic reserves. This will over-estimate the size of the > > * atomic reserve but it avoids a search. > > */ > > - if (likely(!alloc_harder)) > > + if (likely(!alloc_harder)) { > > free_pages -= z->nr_reserved_highatomic; > > - else > > - min -= min / 4; > > + } else { > > + /* > > + * OOM victims can try even harder than normal ALLOC_HARDER > > + * users > > + */ > > + if (alloc_flags & ALLOC_OOM) > > ALLOC_OOM is ALLOC_NO_WATERMARKS if CONFIG_MMU=n. > I wonder this test makes sense for ALLOC_NO_WATERMARKS. Yeah, it would be pointless because get_page_from_freelist will then ignore the result of the watermark check for ALLOC_NO_WATERMARKS. It is not harmfull though. I didn't find much better way without making the code harder to read. Do you have any suggestion? > > + min -= min / 2; > > + else > > + min -= min / 4; > > + } > > + > > > > #ifdef CONFIG_CMA > > /* If allocation can't use CMA areas don't use free CMA pages */ > > @@ -3603,6 +3612,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > return alloc_flags; > > } > > > > +static bool oom_reserves_allowed(struct task_struct *tsk) > > +{ > > + if (!tsk_is_oom_victim(tsk)) > > + return false; > > + > > + /* > > + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves > > + * depletion and shouldn't give access to memory reserves passed the > > + * exit_mm > > + */ > > + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm) > > + return false; > > Branching based on CONFIG_MMU is ugly. I suggest timeout based next OOM > victim selection if CONFIG_MMU=n. I suggest we do not argue about nommu without actually optimizing for or fixing nommu which we are not here. I am even not sure memory reserves can ever be depleted for that config. Anyway I will go with the following instead diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5e5911f40014..3510e06b3bf3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3618,11 +3618,10 @@ static bool oom_reserves_allowed(struct task_struct *tsk) return false; /* - * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves - * depletion and shouldn't give access to memory reserves passed the - * exit_mm + * !MMU doesn't have oom reaper so give access to memory reserves + * only to the thread with TIF_MEMDIE set */ - if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm) + if (!IS_ENABLED(CONFIG_MMU) && !test_thread_flag(TIF_MEMDIE)) return false; return true; This should preserve the original semantic. Is that acceptable for you? > > @@ -3875,15 +3901,24 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > > wake_all_kswapds(order, ac); > > > > - if (gfp_pfmemalloc_allowed(gfp_mask)) > > - alloc_flags = ALLOC_NO_WATERMARKS; > > + /* > > + * Distinguish requests which really need access to whole memory > > + * reserves from oom victims which can live with their own reserve > > + */ > > + reserves = gfp_pfmemalloc_allowed(gfp_mask); > > + if (reserves) { > > + if (tsk_is_oom_victim(current)) > > + alloc_flags = ALLOC_OOM; > > If reserves == true due to reasons other than tsk_is_oom_victim(current) == true > (e.g. __GFP_MEMALLOC), why dare to reduce it? Well the comment above tries to explain. I assume that the oom victim is special here. a) it is on the way to die and b) we know that something will be freeing memory on the background so I assume this is acceptable. > > + else > > + alloc_flags = ALLOC_NO_WATERMARKS; > > + } > > If CONFIG_MMU=n, doing this test is silly. > > if (tsk_is_oom_victim(current)) > alloc_flags = ALLOC_NO_WATERMARKS; > else > alloc_flags = ALLOC_NO_WATERMARKS; I am pretty sure any compiler can see the outcome is the same so the check would be dropped in that case. I primarily wanted to prevent from an additional ifdefery. I am open to suggestions for a better layout though. > > @@ -3960,7 +3995,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > goto got_pg; > > > > /* Avoid allocations with no watermarks from looping endlessly */ > > - if (test_thread_flag(TIF_MEMDIE) && > > + if (tsk_is_oom_victim(current) && > > (alloc_flags == ALLOC_NO_WATERMARKS || > > (gfp_mask & __GFP_NOMEMALLOC))) > > goto nopage; > > And you are silently changing to "!costly __GFP_DIRECT_RECLAIM allocations never fail > (even selected for OOM victims)" (i.e. updating the too small to fail memory allocation > rule) by doing alloc_flags == ALLOC_NO_WATERMARKS if CONFIG_MMU=y. Ups that is an oversight during the rebase. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5e5911f40014..6593ff9de1d9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3996,7 +3996,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, /* Avoid allocations with no watermarks from looping endlessly */ if (tsk_is_oom_victim(current) && - (alloc_flags == ALLOC_NO_WATERMARKS || + (alloc_flags == ALLOC_OOM || (gfp_mask & __GFP_NOMEMALLOC))) goto nopage; Does this look better? > Applying this change might disturb memory allocation behavior. I don't > like this patch. Do you see anything appart from nommu that would be an unfixable road block? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id EC488280310 for ; Wed, 2 Aug 2017 02:10:26 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id z53so4757139wrz.10 for ; Tue, 01 Aug 2017 23:10:26 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id n186si2595685wmn.214.2017.08.01.23.10.25 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 01 Aug 2017 23:10:25 -0700 (PDT) Date: Wed, 2 Aug 2017 08:10:22 +0200 From: Michal Hocko Subject: Re: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170802061022.GA25318@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> <20170801165242.GA15518@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170801165242.GA15518@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue 01-08-17 18:52:42, Michal Hocko wrote: > On Wed 02-08-17 00:30:33, Tetsuo Handa wrote: [...] > > > - if (gfp_pfmemalloc_allowed(gfp_mask)) > > > - alloc_flags = ALLOC_NO_WATERMARKS; > > > + /* > > > + * Distinguish requests which really need access to whole memory > > > + * reserves from oom victims which can live with their own reserve > > > + */ > > > + reserves = gfp_pfmemalloc_allowed(gfp_mask); > > > + if (reserves) { > > > + if (tsk_is_oom_victim(current)) > > > + alloc_flags = ALLOC_OOM; > > > > If reserves == true due to reasons other than tsk_is_oom_victim(current) == true > > (e.g. __GFP_MEMALLOC), why dare to reduce it? > > Well the comment above tries to explain. I assume that the oom victim is > special here. a) it is on the way to die and b) we know that something > will be freeing memory on the background so I assume this is acceptable. I was thinking about this some more. It is not that hard to achive the original semantic. The code is slightly uglier but acceptable I guess What do you think about the following? --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3510e06b3bf3..7ae0f6d45614 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3627,21 +3627,31 @@ static bool oom_reserves_allowed(struct task_struct *tsk) return true; } -bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) +/* + * Distinguish requests which really need access to full memory + * reserves from oom victims which can live with a portion of it + */ +static inline int __gfp_pfmemalloc_flags(gfp_t gfp_mask) { if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) - return false; - + return 0; if (gfp_mask & __GFP_MEMALLOC) - return true; + return ALLOC_NO_WATERMARKS; if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) - return true; - if (!in_interrupt() && - ((current->flags & PF_MEMALLOC) || - oom_reserves_allowed(current))) - return true; + return ALLOC_NO_WATERMARKS; + if (!in_interrupt()) { + if (current->flags & PF_MEMALLOC) + return ALLOC_NO_WATERMARKS; + else if (oom_reserves_allowed(current)) + return ALLOC_OOM; + } - return false; + return 0; +} + +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) +{ + return __gfp_pfmemalloc_flags(gfp_mask) > 0; } /* @@ -3794,7 +3804,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, unsigned long alloc_start = jiffies; unsigned int stall_timeout = 10 * HZ; unsigned int cpuset_mems_cookie; - bool reserves; + int reserves; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3900,17 +3910,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); - /* - * Distinguish requests which really need access to whole memory - * reserves from oom victims which can live with their own reserve - */ - reserves = gfp_pfmemalloc_allowed(gfp_mask); - if (reserves) { - if (tsk_is_oom_victim(current)) - alloc_flags = ALLOC_OOM; - else - alloc_flags = ALLOC_NO_WATERMARKS; - } + reserves = __gfp_pfmemalloc_flags(gfp_mask); + if (reserves) + alloc_flags = reserves; /* * Reset the zonelist iterators if memory policies can be ignored. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 20A546B05AA for ; Wed, 2 Aug 2017 04:29:19 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id x28so4960022wma.7 for ; Wed, 02 Aug 2017 01:29:19 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 60si26686076wrq.87.2017.08.02.01.29.17 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 02 Aug 2017 01:29:17 -0700 (PDT) Date: Wed, 2 Aug 2017 10:29:14 +0200 From: Michal Hocko Subject: [PATCH v2 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170802082914.GF2524@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170727090357.3205-2-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML With the follow up changes based on Tetsuo review. I have dropped Roman's Tested-by because at least the allocation exit for oom victim behavior has changed. --- commit b1cfab26a98d48d69dc106d0e0ab616728b04992 Author: Michal Hocko Date: Wed Jul 26 16:38:40 2017 +0200 mm, oom: do not rely on TIF_MEMDIE for memory reserves access For ages we have been relying on TIF_MEMDIE thread flag to mark OOM victims and then, among other things, to give these threads full access to memory reserves. There are few shortcomings of this implementation, though. First of all and the most serious one is that the full access to memory reserves is quite dangerous because we leave no safety room for the system to operate and potentially do last emergency steps to move on. Secondly this flag is per task_struct while the OOM killer operates on mm_struct granularity so all processes sharing the given mm are killed. Giving the full access to all these task_structs could lead to a quick memory reserves depletion. We have tried to reduce this risk by giving TIF_MEMDIE only to the main thread and the currently allocating task but that doesn't really solve this problem while it surely opens up a room for corner cases - e.g. GFP_NO{FS,IO} requests might loop inside the allocator without access to memory reserves because a particular thread was not the group leader. Now that we have the oom reaper and that all oom victims are reapable after 1b51e65eab64 ("oom, oom_reaper: allow to reap mm shared by the kthreads") we can be more conservative and grant only partial access to memory reserves because there are reasonable chances of the parallel memory freeing. We still want some access to reserves because we do not want other consumers to eat up the victim's freed memory. oom victims will still contend with __GFP_HIGH users but those shouldn't be so aggressive to starve oom victims completely. Introduce ALLOC_OOM flag and give all tsk_is_oom_victim tasks access to the half of the reserves. This makes the access to reserves independent on which task has passed through mark_oom_victim. Also drop any usage of TIF_MEMDIE from the page allocator proper and replace it by tsk_is_oom_victim as well which will make page_alloc.c completely TIF_MEMDIE free finally. CONFIG_MMU=n doesn't have oom reaper so let's stick to the original ALLOC_NO_WATERMARKS approach. There is a demand to make the oom killer memcg aware which will imply many tasks killed at once. This change will allow such a usecase without worrying about complete memory reserves depletion. Changes since v1 - do not play tricks with nommu and grant access to memory reserves as long as TIF_MEMDIE is set - break out from allocation properly for oom victims as per Tetsuo - distinguish oom victims from other consumers of memory reserves in __gfp_pfmemalloc_flags - per Tetsuo Signed-off-by: Michal Hocko diff --git a/mm/internal.h b/mm/internal.h index 24d88f084705..1ebcb1ed01b5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -480,6 +480,17 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, /* Mask to get the watermark bits */ #define ALLOC_WMARK_MASK (ALLOC_NO_WATERMARKS-1) +/* + * Only MMU archs have async oom victim reclaim - aka oom_reaper so we + * cannot assume a reduced access to memory reserves is sufficient for + * !MMU + */ +#ifdef CONFIG_MMU +#define ALLOC_OOM 0x08 +#else +#define ALLOC_OOM ALLOC_NO_WATERMARKS +#endif + #define ALLOC_HARDER 0x10 /* try to alloc harder */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e8b4f030c1c..c9f3569a76c7 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -824,7 +824,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message) /* * If the task is already exiting, don't alarm the sysadmin or kill - * its children or threads, just set TIF_MEMDIE so it can die quickly + * its children or threads, just give it access to memory reserves + * so it can die quickly */ task_lock(p); if (task_will_free_mem(p)) { @@ -889,9 +890,9 @@ static void oom_kill_process(struct oom_control *oc, const char *message) count_memcg_event_mm(mm, OOM_KILL); /* - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent - * the OOM victim from depleting the memory reserves from the user - * space under its control. + * We should send SIGKILL before granting access to memory reserves + * in order to prevent the OOM victim from depleting the memory + * reserves from the user space under its control. */ do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); mark_oom_victim(victim); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 80e4adb4c360..7ae0f6d45614 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2930,7 +2930,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, { long min = mark; int o; - const bool alloc_harder = (alloc_flags & ALLOC_HARDER); + const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); /* free_pages may go negative - that's OK */ free_pages -= (1 << order) - 1; @@ -2943,10 +2943,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, * the high-atomic reserves. This will over-estimate the size of the * atomic reserve but it avoids a search. */ - if (likely(!alloc_harder)) + if (likely(!alloc_harder)) { free_pages -= z->nr_reserved_highatomic; - else - min -= min / 4; + } else { + /* + * OOM victims can try even harder than normal ALLOC_HARDER + * users + */ + if (alloc_flags & ALLOC_OOM) + min -= min / 2; + else + min -= min / 4; + } + #ifdef CONFIG_CMA /* If allocation can't use CMA areas don't use free CMA pages */ @@ -3184,7 +3193,7 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask) * of allowed nodes. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) - if (test_thread_flag(TIF_MEMDIE) || + if (tsk_is_oom_victim(current) || (current->flags & (PF_MEMALLOC | PF_EXITING))) filter &= ~SHOW_MEM_FILTER_NODES; if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) @@ -3603,21 +3612,46 @@ gfp_to_alloc_flags(gfp_t gfp_mask) return alloc_flags; } -bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) +static bool oom_reserves_allowed(struct task_struct *tsk) { - if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) + if (!tsk_is_oom_victim(tsk)) + return false; + + /* + * !MMU doesn't have oom reaper so give access to memory reserves + * only to the thread with TIF_MEMDIE set + */ + if (!IS_ENABLED(CONFIG_MMU) && !test_thread_flag(TIF_MEMDIE)) return false; + return true; +} + +/* + * Distinguish requests which really need access to full memory + * reserves from oom victims which can live with a portion of it + */ +static inline int __gfp_pfmemalloc_flags(gfp_t gfp_mask) +{ + if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) + return 0; if (gfp_mask & __GFP_MEMALLOC) - return true; + return ALLOC_NO_WATERMARKS; if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) - return true; - if (!in_interrupt() && - ((current->flags & PF_MEMALLOC) || - unlikely(test_thread_flag(TIF_MEMDIE)))) - return true; + return ALLOC_NO_WATERMARKS; + if (!in_interrupt()) { + if (current->flags & PF_MEMALLOC) + return ALLOC_NO_WATERMARKS; + else if (oom_reserves_allowed(current)) + return ALLOC_OOM; + } - return false; + return 0; +} + +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) +{ + return __gfp_pfmemalloc_flags(gfp_mask) > 0; } /* @@ -3770,6 +3804,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, unsigned long alloc_start = jiffies; unsigned int stall_timeout = 10 * HZ; unsigned int cpuset_mems_cookie; + int reserves; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3875,15 +3910,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); - if (gfp_pfmemalloc_allowed(gfp_mask)) - alloc_flags = ALLOC_NO_WATERMARKS; + reserves = __gfp_pfmemalloc_flags(gfp_mask); + if (reserves) + alloc_flags = reserves; /* * Reset the zonelist iterators if memory policies can be ignored. * These allocations are high priority and system rather than user * orientated. */ - if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { + if (!(alloc_flags & ALLOC_CPUSET) || reserves) { ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->high_zoneidx, ac->nodemask); @@ -3960,8 +3996,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto got_pg; /* Avoid allocations with no watermarks from looping endlessly */ - if (test_thread_flag(TIF_MEMDIE) && - (alloc_flags == ALLOC_NO_WATERMARKS || + if (tsk_is_oom_victim(current) && + (alloc_flags == ALLOC_OOM || (gfp_mask & __GFP_NOMEMALLOC))) goto nopage; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id 1E35D6B064B for ; Wed, 2 Aug 2017 21:40:03 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id z19so12537oia.13 for ; Wed, 02 Aug 2017 18:40:03 -0700 (PDT) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id w132si22073794oib.514.2017.08.02.18.40.00 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 02 Aug 2017 18:40:01 -0700 (PDT) Subject: Re: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access From: Tetsuo Handa References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> <20170801165242.GA15518@dhcp22.suse.cz> In-Reply-To: <20170801165242.GA15518@dhcp22.suse.cz> Message-Id: <201708031039.GDG05288.OQJOHtLVFMSFFO@I-love.SAKURA.ne.jp> Date: Thu, 3 Aug 2017 10:39:42 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@kernel.org Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Michal Hocko wrote: > On Wed 02-08-17 00:30:33, Tetsuo Handa wrote: > > > @@ -3603,6 +3612,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > > return alloc_flags; > > > } > > > > > > +static bool oom_reserves_allowed(struct task_struct *tsk) > > > +{ > > > + if (!tsk_is_oom_victim(tsk)) > > > + return false; > > > + > > > + /* > > > + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves > > > + * depletion and shouldn't give access to memory reserves passed the > > > + * exit_mm > > > + */ > > > + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm) > > > + return false; > > > > Branching based on CONFIG_MMU is ugly. I suggest timeout based next OOM > > victim selection if CONFIG_MMU=n. > > I suggest we do not argue about nommu without actually optimizing for or > fixing nommu which we are not here. I am even not sure memory reserves > can ever be depleted for that config. I don't think memory reserves can deplete for CONFIG_MMU=n environment. But the reason the OOM reaper was introduced is not limited to handling depletion of memory reserves. The OOM reaper was introduced because OOM victims might get stuck indirectly waiting for other threads doing memory allocation. You said > Yes, exit_aio is the only blocking call I know of currently. But I would > like this to be as robust as possible and so I do not want to rely on > the current implementation. This can change in future and I can > guarantee that nobody will think about the oom path when adding > something to the final __mmput path. at http://lkml.kernel.org/r/20170726054533.GA960@dhcp22.suse.cz , but how can you guarantee that nobody will think about the oom path when adding something to the final __mmput() path without thinking about possibility of getting stuck waiting for memory allocation in CONFIG_MMU=n environment? As long as possibility of getting stuck remains, you should not assume that something you don't want will not happen. It's time to make CONFIG_MMU=n kernels treatable like CONFIG_MMU=y kernels. If it is technically impossible (or is not worthwhile) to implement the OOM reaper for CONFIG_MMU=n kernels, I'm fine with timeout based approach like shown below. Then, we no longer need to use branching based on CONFIG_MMU. include/linux/mm_types.h | 3 +++ mm/oom_kill.c | 20 +++++++++++++++----- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7f384bb..374a2ae 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -504,6 +504,9 @@ struct mm_struct { atomic_long_t hugetlb_usage; #endif struct work_struct async_put_work; +#ifndef CONFIG_MMU + unsigned long oom_victim_wait_timer; +#endif } __randomize_layout; extern struct mm_struct init_mm; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e8b4f0..dd6239d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -53,6 +53,17 @@ DEFINE_MUTEX(oom_lock); +static bool should_ignore_this_mm(struct mm_struct *mm) +{ +#ifndef CONFIG_MMU + if (!mm->oom_victim_wait_timer) + mm->oom_victim_wait_timer = jiffies; + else if (time_after(jiffies, mm->oom_victim_wait_timer + HZ)) + return true; +#endif + return test_bit(MMF_OOM_SKIP, &mm->flags); +}; + #ifdef CONFIG_NUMA /** * has_intersects_mems_allowed() - check task eligiblity for kill @@ -188,9 +199,8 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, * the middle of vfork */ adj = (long)p->signal->oom_score_adj; - if (adj == OOM_SCORE_ADJ_MIN || - test_bit(MMF_OOM_SKIP, &p->mm->flags) || - in_vfork(p)) { + if (adj == OOM_SCORE_ADJ_MIN || should_ignore_this_mm(p->mm) || + in_vfork(p)) { task_unlock(p); return 0; } @@ -303,7 +313,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) * any memory is quite low. */ if (!is_sysrq_oom(oc) && tsk_is_oom_victim(task)) { - if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags)) + if (should_ignore_this_mm(task->signal->oom_mm)) goto next; goto abort; } @@ -783,7 +793,7 @@ static bool task_will_free_mem(struct task_struct *task) * This task has already been drained by the oom reaper so there are * only small chances it will free some more */ - if (test_bit(MMF_OOM_SKIP, &mm->flags)) + if (should_ignore_this_mm(mm)) return false; if (atomic_read(&mm->mm_users) <= 1) -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 91A656B0667 for ; Thu, 3 Aug 2017 03:06:11 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id g71so1051903wmg.13 for ; Thu, 03 Aug 2017 00:06:11 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id g53si986529wrg.284.2017.08.03.00.06.09 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 03 Aug 2017 00:06:09 -0700 (PDT) Date: Thu, 3 Aug 2017 09:06:06 +0200 From: Michal Hocko Subject: Re: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170803070606.GA12521@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> <20170801165242.GA15518@dhcp22.suse.cz> <201708031039.GDG05288.OQJOHtLVFMSFFO@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201708031039.GDG05288.OQJOHtLVFMSFFO@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu 03-08-17 10:39:42, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 02-08-17 00:30:33, Tetsuo Handa wrote: > > > > @@ -3603,6 +3612,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > > > return alloc_flags; > > > > } > > > > > > > > +static bool oom_reserves_allowed(struct task_struct *tsk) > > > > +{ > > > > + if (!tsk_is_oom_victim(tsk)) > > > > + return false; > > > > + > > > > + /* > > > > + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves > > > > + * depletion and shouldn't give access to memory reserves passed the > > > > + * exit_mm > > > > + */ > > > > + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm) > > > > + return false; > > > > > > Branching based on CONFIG_MMU is ugly. I suggest timeout based next OOM > > > victim selection if CONFIG_MMU=n. > > > > I suggest we do not argue about nommu without actually optimizing for or > > fixing nommu which we are not here. I am even not sure memory reserves > > can ever be depleted for that config. > > I don't think memory reserves can deplete for CONFIG_MMU=n environment. > But the reason the OOM reaper was introduced is not limited to handling > depletion of memory reserves. The OOM reaper was introduced because > OOM victims might get stuck indirectly waiting for other threads doing > memory allocation. You said > > > Yes, exit_aio is the only blocking call I know of currently. But I would > > like this to be as robust as possible and so I do not want to rely on > > the current implementation. This can change in future and I can > > guarantee that nobody will think about the oom path when adding > > something to the final __mmput path. > > at http://lkml.kernel.org/r/20170726054533.GA960@dhcp22.suse.cz , but > how can you guarantee that nobody will think about the oom path > when adding something to the final __mmput() path without thinking > about possibility of getting stuck waiting for memory allocation in > CONFIG_MMU=n environment? Look, I really appreciate your sentiment for for nommu platform but with an absolute lack of _any_ oom reports on that platform that I am aware of nor any reports about lockups during oom I am less than thrilled to add a code to fix a problem which even might not exist. Nommu is usually very special with a very specific workload running (e.g. no overcommit) so I strongly suspect that any OOM theories are highly academic. All I do care about is to not regress nommu as much as possible. So can we get back to the proposed patch and updates I have done to address your review feedback please? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id AA8CF6B066F for ; Thu, 3 Aug 2017 04:03:38 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id s21so556815oie.5 for ; Thu, 03 Aug 2017 01:03:38 -0700 (PDT) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id z204si21389198oiz.14.2017.08.03.01.03.37 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 03 Aug 2017 01:03:37 -0700 (PDT) Subject: Re: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access From: Tetsuo Handa References: <20170727090357.3205-2-mhocko@kernel.org> <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> <20170801165242.GA15518@dhcp22.suse.cz> <201708031039.GDG05288.OQJOHtLVFMSFFO@I-love.SAKURA.ne.jp> <20170803070606.GA12521@dhcp22.suse.cz> In-Reply-To: <20170803070606.GA12521@dhcp22.suse.cz> Message-Id: <201708031703.HGC35950.LSJFOHQFtFMOVO@I-love.SAKURA.ne.jp> Date: Thu, 3 Aug 2017 17:03:20 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@kernel.org Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Michal Hocko wrote: > Look, I really appreciate your sentiment for for nommu platform but with > an absolute lack of _any_ oom reports on that platform that I am aware > of nor any reports about lockups during oom I am less than thrilled to > add a code to fix a problem which even might not exist. Nommu is usually > very special with a very specific workload running (e.g. no overcommit) > so I strongly suspect that any OOM theories are highly academic. If you believe that there is really no oom report, get rid of the OOM killer completely. > > All I do care about is to not regress nommu as much as possible. So can > we get back to the proposed patch and updates I have done to address > your review feedback please? No unless we get rid of the OOM killer if CONFIG_MMU=n. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 170db4d..e931969 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3312,7 +3312,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) goto out; /* Exhausted what can be done so it's blamo time */ - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { + if ((IS_ENABLED(CONFIG_MMU) && out_of_memory(&oc)) || + WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { *did_some_progress = 1; /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id A18196B0679 for ; Thu, 3 Aug 2017 04:21:06 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id k68so1268785wmd.14 for ; Thu, 03 Aug 2017 01:21:06 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id f66si934943wmd.164.2017.08.03.01.21.05 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 03 Aug 2017 01:21:05 -0700 (PDT) Date: Thu, 3 Aug 2017 10:21:04 +0200 From: Michal Hocko Subject: Re: [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170803082104.GE12521@dhcp22.suse.cz> References: <20170727090357.3205-2-mhocko@kernel.org> <201708020030.ACB04683.JLHMFVOSFFOtOQ@I-love.SAKURA.ne.jp> <20170801165242.GA15518@dhcp22.suse.cz> <201708031039.GDG05288.OQJOHtLVFMSFFO@I-love.SAKURA.ne.jp> <20170803070606.GA12521@dhcp22.suse.cz> <201708031703.HGC35950.LSJFOHQFtFMOVO@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201708031703.HGC35950.LSJFOHQFtFMOVO@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: akpm@linux-foundation.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu 03-08-17 17:03:20, Tetsuo Handa wrote: > Michal Hocko wrote: > > Look, I really appreciate your sentiment for for nommu platform but with > > an absolute lack of _any_ oom reports on that platform that I am aware > > of nor any reports about lockups during oom I am less than thrilled to > > add a code to fix a problem which even might not exist. Nommu is usually > > very special with a very specific workload running (e.g. no overcommit) > > so I strongly suspect that any OOM theories are highly academic. > > If you believe that there is really no oom report, get rid of the OOM > killer completely. I am not an user or even an owner of such a platform. As I've said all I care about is to not regress for those guys and I believe that the patch doesn't change nommu behavior in any risky way. If yes, point them out and I will try to address them. > > All I do care about is to not regress nommu as much as possible. So can > > we get back to the proposed patch and updates I have done to address > > your review feedback please? > > No unless we get rid of the OOM killer if CONFIG_MMU=n. Are you saying that you are going to nack the patch based on this reasoning? This is just ridiculous. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 6C8486B0685 for ; Thu, 3 Aug 2017 05:37:04 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id g32so1243612wrd.8 for ; Thu, 03 Aug 2017 02:37:04 -0700 (PDT) Received: from outbound-smtp02.blacknight.com (outbound-smtp02.blacknight.com. [81.17.249.8]) by mx.google.com with ESMTPS id 140si1028242wmm.132.2017.08.03.02.37.02 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 03 Aug 2017 02:37:02 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp02.blacknight.com (Postfix) with ESMTPS id EF5B598CA1 for ; Thu, 3 Aug 2017 09:37:01 +0000 (UTC) Date: Thu, 3 Aug 2017 10:37:01 +0100 From: Mel Gorman Subject: Re: [PATCH v2 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170803093701.icju4mxmto3ls3ch@techsingularity.net> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> <20170802082914.GF2524@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20170802082914.GF2524@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Andrew Morton , David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML On Wed, Aug 02, 2017 at 10:29:14AM +0200, Michal Hocko wrote: > For ages we have been relying on TIF_MEMDIE thread flag to mark OOM > victims and then, among other things, to give these threads full > access to memory reserves. There are few shortcomings of this > implementation, though. > I believe the original intent was that allocations from the exit path would be small and relatively short-lived. > First of all and the most serious one is that the full access to memory > reserves is quite dangerous because we leave no safety room for the > system to operate and potentially do last emergency steps to move on. > > Secondly this flag is per task_struct while the OOM killer operates > on mm_struct granularity so all processes sharing the given mm are > killed. Giving the full access to all these task_structs could lead to > a quick memory reserves depletion. This is a valid concern. > We have tried to reduce this risk by > giving TIF_MEMDIE only to the main thread and the currently allocating > task but that doesn't really solve this problem while it surely opens up > a room for corner cases - e.g. GFP_NO{FS,IO} requests might loop inside > the allocator without access to memory reserves because a particular > thread was not the group leader. > > Now that we have the oom reaper and that all oom victims are reapable > after 1b51e65eab64 ("oom, oom_reaper: allow to reap mm shared by the > kthreads") we can be more conservative and grant only partial access to > memory reserves because there are reasonable chances of the parallel > memory freeing. We still want some access to reserves because we do not > want other consumers to eat up the victim's freed memory. oom victims > will still contend with __GFP_HIGH users but those shouldn't be so > aggressive to starve oom victims completely. > > Introduce ALLOC_OOM flag and give all tsk_is_oom_victim tasks access to > the half of the reserves. This makes the access to reserves independent > on which task has passed through mark_oom_victim. Also drop any > usage of TIF_MEMDIE from the page allocator proper and replace it by > tsk_is_oom_victim as well which will make page_alloc.c completely > TIF_MEMDIE free finally. > > CONFIG_MMU=n doesn't have oom reaper so let's stick to the original > ALLOC_NO_WATERMARKS approach. > > There is a demand to make the oom killer memcg aware which will imply > many tasks killed at once. This change will allow such a usecase without > worrying about complete memory reserves depletion. > > Changes since v1 > - do not play tricks with nommu and grant access to memory reserves as > long as TIF_MEMDIE is set > - break out from allocation properly for oom victims as per Tetsuo > - distinguish oom victims from other consumers of memory reserves in > __gfp_pfmemalloc_flags - per Tetsuo > > Signed-off-by: Michal Hocko > > diff --git a/mm/internal.h b/mm/internal.h > index 24d88f084705..1ebcb1ed01b5 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -480,6 +480,17 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, > /* Mask to get the watermark bits */ > #define ALLOC_WMARK_MASK (ALLOC_NO_WATERMARKS-1) > > +/* > + * Only MMU archs have async oom victim reclaim - aka oom_reaper so we > + * cannot assume a reduced access to memory reserves is sufficient for > + * !MMU > + */ > +#ifdef CONFIG_MMU > +#define ALLOC_OOM 0x08 > +#else > +#define ALLOC_OOM ALLOC_NO_WATERMARKS > +#endif > + > #define ALLOC_HARDER 0x10 /* try to alloc harder */ > #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ > #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ Ok, no collision with the wmark indexes so that should be fine. While I didn't check, I suspect that !MMU users also have relatively few CPUs to allow major contention. > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 9e8b4f030c1c..c9f3569a76c7 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -824,7 +824,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message) > > /* > * If the task is already exiting, don't alarm the sysadmin or kill > - * its children or threads, just set TIF_MEMDIE so it can die quickly > + * its children or threads, just give it access to memory reserves > + * so it can die quickly > */ > task_lock(p); > if (task_will_free_mem(p)) { > @@ -889,9 +890,9 @@ static void oom_kill_process(struct oom_control *oc, const char *message) > count_memcg_event_mm(mm, OOM_KILL); > > /* > - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent > - * the OOM victim from depleting the memory reserves from the user > - * space under its control. > + * We should send SIGKILL before granting access to memory reserves > + * in order to prevent the OOM victim from depleting the memory > + * reserves from the user space under its control. > */ > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); > mark_oom_victim(victim); There is the secondary effect that some operations gets aborted entirely if a SIGKILL is pending but that's neither here nor there. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 80e4adb4c360..7ae0f6d45614 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2930,7 +2930,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, > { > long min = mark; > int o; > - const bool alloc_harder = (alloc_flags & ALLOC_HARDER); > + const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); > > /* free_pages may go negative - that's OK */ > free_pages -= (1 << order) - 1; > @@ -2943,10 +2943,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, > * the high-atomic reserves. This will over-estimate the size of the > * atomic reserve but it avoids a search. > */ > - if (likely(!alloc_harder)) > + if (likely(!alloc_harder)) { > free_pages -= z->nr_reserved_highatomic; > - else > - min -= min / 4; > + } else { > + /* > + * OOM victims can try even harder than normal ALLOC_HARDER > + * users > + */ > + if (alloc_flags & ALLOC_OOM) > + min -= min / 2; > + else > + min -= min / 4; > + } > + > > #ifdef CONFIG_CMA > /* If allocation can't use CMA areas don't use free CMA pages */ I see no problem here although the comment could be slightly better. It suggests at OOM victims can try harder but not why /* * OOM victims can try even harder than normal ALLOC_HARDER users on the * grounds that it's definitely going to be in the exit path shortly and * free memory. Any allocation it makes during the free path will be * small and short-lived. */ > @@ -3184,7 +3193,7 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask) > * of allowed nodes. > */ > if (!(gfp_mask & __GFP_NOMEMALLOC)) > - if (test_thread_flag(TIF_MEMDIE) || > + if (tsk_is_oom_victim(current) || > (current->flags & (PF_MEMALLOC | PF_EXITING))) > filter &= ~SHOW_MEM_FILTER_NODES; > if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) > @@ -3603,21 +3612,46 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > return alloc_flags; > } > > -bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > +static bool oom_reserves_allowed(struct task_struct *tsk) > { > - if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) > + if (!tsk_is_oom_victim(tsk)) > + return false; > + > + /* > + * !MMU doesn't have oom reaper so give access to memory reserves > + * only to the thread with TIF_MEMDIE set > + */ > + if (!IS_ENABLED(CONFIG_MMU) && !test_thread_flag(TIF_MEMDIE)) > return false; > > + return true; > +} > + Ok, there is a chance that a task selected as an OOM kill victim may be in the middle of a __GFP_NOMEMALLOC allocation but I can't actually see a problem wiith that. __GFP_NOMEMALLOC users are not going to be in the exit path (which we care about for an OOM killed task) and the caller should always be able to handle a failure. > +/* > + * Distinguish requests which really need access to full memory > + * reserves from oom victims which can live with a portion of it > + */ > +static inline int __gfp_pfmemalloc_flags(gfp_t gfp_mask) > +{ > + if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) > + return 0; > if (gfp_mask & __GFP_MEMALLOC) > - return true; > + return ALLOC_NO_WATERMARKS; > if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) > - return true; > - if (!in_interrupt() && > - ((current->flags & PF_MEMALLOC) || > - unlikely(test_thread_flag(TIF_MEMDIE)))) > - return true; > + return ALLOC_NO_WATERMARKS; > + if (!in_interrupt()) { > + if (current->flags & PF_MEMALLOC) > + return ALLOC_NO_WATERMARKS; > + else if (oom_reserves_allowed(current)) > + return ALLOC_OOM; > + } > > - return false; > + return 0; > +} > + > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > +{ > + return __gfp_pfmemalloc_flags(gfp_mask) > 0; > } Very subtle sign casing error here. If the flags ever use the high bit, this wraps and fails. It "shouldn't be possible" but you could just remove the "> 0" there to be on the safe side or have __gfp_pfmemalloc_flags return unsigned. > > /* > @@ -3770,6 +3804,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > unsigned long alloc_start = jiffies; > unsigned int stall_timeout = 10 * HZ; > unsigned int cpuset_mems_cookie; > + int reserves; > This should be explicitly named to indicate it's about flags and not the number of reserve pages or something else wacky. > /* > * In the slowpath, we sanity check order to avoid ever trying to > @@ -3875,15 +3910,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > wake_all_kswapds(order, ac); > > - if (gfp_pfmemalloc_allowed(gfp_mask)) > - alloc_flags = ALLOC_NO_WATERMARKS; > + reserves = __gfp_pfmemalloc_flags(gfp_mask); > + if (reserves) > + alloc_flags = reserves; > And if it's reserve_flags you can save a branch with reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); alloc_pags |= reserve_flags; It won't make much difference considering how branch-intensive the allocator is anyway. > /* > * Reset the zonelist iterators if memory policies can be ignored. > * These allocations are high priority and system rather than user > * orientated. > */ > - if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { > + if (!(alloc_flags & ALLOC_CPUSET) || reserves) { > ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); > ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, > ac->high_zoneidx, ac->nodemask); > @@ -3960,8 +3996,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > goto got_pg; > > /* Avoid allocations with no watermarks from looping endlessly */ > - if (test_thread_flag(TIF_MEMDIE) && > - (alloc_flags == ALLOC_NO_WATERMARKS || > + if (tsk_is_oom_victim(current) && > + (alloc_flags == ALLOC_OOM || > (gfp_mask & __GFP_NOMEMALLOC))) > goto nopage; > Mostly I only found nit-picks so whether you address them or not Acked-by: Mel Gorman -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id EA14D6B0693 for ; Thu, 3 Aug 2017 07:00:33 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id g28so1513892wrg.3 for ; Thu, 03 Aug 2017 04:00:33 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 1si1330644wrf.473.2017.08.03.04.00.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 03 Aug 2017 04:00:32 -0700 (PDT) Date: Thu, 3 Aug 2017 13:00:30 +0200 From: Michal Hocko Subject: Re: [PATCH v2 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170803110030.GJ12521@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> <20170802082914.GF2524@dhcp22.suse.cz> <20170803093701.icju4mxmto3ls3ch@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170803093701.icju4mxmto3ls3ch@techsingularity.net> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML On Thu 03-08-17 10:37:01, Mel Gorman wrote: > On Wed, Aug 02, 2017 at 10:29:14AM +0200, Michal Hocko wrote: > > For ages we have been relying on TIF_MEMDIE thread flag to mark OOM > > victims and then, among other things, to give these threads full > > access to memory reserves. There are few shortcomings of this > > implementation, though. > > > > I believe the original intent was that allocations from the exit path > would be small and relatively short-lived. Yes. > > > First of all and the most serious one is that the full access to memory > > reserves is quite dangerous because we leave no safety room for the > > system to operate and potentially do last emergency steps to move on. > > > > Secondly this flag is per task_struct while the OOM killer operates > > on mm_struct granularity so all processes sharing the given mm are > > killed. Giving the full access to all these task_structs could lead to > > a quick memory reserves depletion. > > This is a valid concern. > > > We have tried to reduce this risk by > > giving TIF_MEMDIE only to the main thread and the currently allocating > > task but that doesn't really solve this problem while it surely opens up > > a room for corner cases - e.g. GFP_NO{FS,IO} requests might loop inside > > the allocator without access to memory reserves because a particular > > thread was not the group leader. > > > > Now that we have the oom reaper and that all oom victims are reapable > > after 1b51e65eab64 ("oom, oom_reaper: allow to reap mm shared by the > > kthreads") we can be more conservative and grant only partial access to > > memory reserves because there are reasonable chances of the parallel > > memory freeing. We still want some access to reserves because we do not > > want other consumers to eat up the victim's freed memory. oom victims > > will still contend with __GFP_HIGH users but those shouldn't be so > > aggressive to starve oom victims completely. > > > > Introduce ALLOC_OOM flag and give all tsk_is_oom_victim tasks access to > > the half of the reserves. This makes the access to reserves independent > > on which task has passed through mark_oom_victim. Also drop any > > usage of TIF_MEMDIE from the page allocator proper and replace it by > > tsk_is_oom_victim as well which will make page_alloc.c completely > > TIF_MEMDIE free finally. > > > > CONFIG_MMU=n doesn't have oom reaper so let's stick to the original > > ALLOC_NO_WATERMARKS approach. > > > > There is a demand to make the oom killer memcg aware which will imply > > many tasks killed at once. This change will allow such a usecase without > > worrying about complete memory reserves depletion. > > > > Changes since v1 > > - do not play tricks with nommu and grant access to memory reserves as > > long as TIF_MEMDIE is set > > - break out from allocation properly for oom victims as per Tetsuo > > - distinguish oom victims from other consumers of memory reserves in > > __gfp_pfmemalloc_flags - per Tetsuo > > > > Signed-off-by: Michal Hocko > > > > diff --git a/mm/internal.h b/mm/internal.h > > index 24d88f084705..1ebcb1ed01b5 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -480,6 +480,17 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, > > /* Mask to get the watermark bits */ > > #define ALLOC_WMARK_MASK (ALLOC_NO_WATERMARKS-1) > > > > +/* > > + * Only MMU archs have async oom victim reclaim - aka oom_reaper so we > > + * cannot assume a reduced access to memory reserves is sufficient for > > + * !MMU > > + */ > > +#ifdef CONFIG_MMU > > +#define ALLOC_OOM 0x08 > > +#else > > +#define ALLOC_OOM ALLOC_NO_WATERMARKS > > +#endif > > + > > #define ALLOC_HARDER 0x10 /* try to alloc harder */ > > #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ > > #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ > > Ok, no collision with the wmark indexes so that should be fine. While I > didn't check, I suspect that !MMU users also have relatively few CPUs to > allow major contention. Well, I didn't try to improve the !MMU case because a) I do not know whether there is a real problem with oom depletion there and b) I have no way to test this. So I only focused on keeping the status quo for nommu. [...] > > @@ -2943,10 +2943,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, > > * the high-atomic reserves. This will over-estimate the size of the > > * atomic reserve but it avoids a search. > > */ > > - if (likely(!alloc_harder)) > > + if (likely(!alloc_harder)) { > > free_pages -= z->nr_reserved_highatomic; > > - else > > - min -= min / 4; > > + } else { > > + /* > > + * OOM victims can try even harder than normal ALLOC_HARDER > > + * users > > + */ > > + if (alloc_flags & ALLOC_OOM) > > + min -= min / 2; > > + else > > + min -= min / 4; > > + } > > + > > > > #ifdef CONFIG_CMA > > /* If allocation can't use CMA areas don't use free CMA pages */ > > I see no problem here although the comment could be slightly better. It > suggests at OOM victims can try harder but not why > > /* > * OOM victims can try even harder than normal ALLOC_HARDER users on the > * grounds that it's definitely going to be in the exit path shortly and > * free memory. Any allocation it makes during the free path will be > * small and short-lived. > */ OK, I have replaced the coment. > > @@ -3603,21 +3612,46 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > return alloc_flags; > > } > > > > -bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > > +static bool oom_reserves_allowed(struct task_struct *tsk) > > { > > - if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) > > + if (!tsk_is_oom_victim(tsk)) > > + return false; > > + > > + /* > > + * !MMU doesn't have oom reaper so give access to memory reserves > > + * only to the thread with TIF_MEMDIE set > > + */ > > + if (!IS_ENABLED(CONFIG_MMU) && !test_thread_flag(TIF_MEMDIE)) > > return false; > > > > + return true; > > +} > > + > > Ok, there is a chance that a task selected as an OOM kill victim may be > in the middle of a __GFP_NOMEMALLOC allocation but I can't actually see a > problem wiith that. __GFP_NOMEMALLOC users are not going to be in the exit > path (which we care about for an OOM killed task) and the caller should > always be able to handle a failure. Not sure I understand. If the oom victim is doing __GFP_NOMEMALLOC then we haven't been doing ALLOC_NO_WATERMARKS even before. So I preserve the behavior here. Even though I am not sure this is a deliberate behavior or something more of result of an evolution of the code. > > +/* > > + * Distinguish requests which really need access to full memory > > + * reserves from oom victims which can live with a portion of it > > + */ > > +static inline int __gfp_pfmemalloc_flags(gfp_t gfp_mask) > > +{ > > + if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) > > + return 0; > > if (gfp_mask & __GFP_MEMALLOC) > > - return true; > > + return ALLOC_NO_WATERMARKS; > > if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) > > - return true; > > - if (!in_interrupt() && > > - ((current->flags & PF_MEMALLOC) || > > - unlikely(test_thread_flag(TIF_MEMDIE)))) > > - return true; > > + return ALLOC_NO_WATERMARKS; > > + if (!in_interrupt()) { > > + if (current->flags & PF_MEMALLOC) > > + return ALLOC_NO_WATERMARKS; > > + else if (oom_reserves_allowed(current)) > > + return ALLOC_OOM; > > + } > > > > - return false; > > + return 0; > > +} > > + > > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > > +{ > > + return __gfp_pfmemalloc_flags(gfp_mask) > 0; > > } > > Very subtle sign casing error here. If the flags ever use the high bit, > this wraps and fails. It "shouldn't be possible" but you could just remove > the "> 0" there to be on the safe side or have __gfp_pfmemalloc_flags > return unsigned. what about return !!__gfp_pfmemalloc_flags(gfp_mask); > > /* > > @@ -3770,6 +3804,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > unsigned long alloc_start = jiffies; > > unsigned int stall_timeout = 10 * HZ; > > unsigned int cpuset_mems_cookie; > > + int reserves; > > > > This should be explicitly named to indicate it's about flags and not the > number of reserve pages or something else wacky. s@reserves@reserve_flags@? > > /* > > * In the slowpath, we sanity check order to avoid ever trying to > > @@ -3875,15 +3910,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > > wake_all_kswapds(order, ac); > > > > - if (gfp_pfmemalloc_allowed(gfp_mask)) > > - alloc_flags = ALLOC_NO_WATERMARKS; > > + reserves = __gfp_pfmemalloc_flags(gfp_mask); > > + if (reserves) > > + alloc_flags = reserves; > > > > And if it's reserve_flags you can save a branch with > > reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); > alloc_pags |= reserve_flags; > > It won't make much difference considering how branch-intensive the allocator > is anyway. I was actually considering that but rather didn't want to do it because I wanted to reset alloc_flags rather than create strange ALLOC_$FOO combinations which would be harder to evaluate. > > /* > > * Reset the zonelist iterators if memory policies can be ignored. > > * These allocations are high priority and system rather than user > > * orientated. > > */ > > - if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { > > + if (!(alloc_flags & ALLOC_CPUSET) || reserves) { > > ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); > > ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, > > ac->high_zoneidx, ac->nodemask); > > @@ -3960,8 +3996,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > goto got_pg; > > > > /* Avoid allocations with no watermarks from looping endlessly */ > > - if (test_thread_flag(TIF_MEMDIE) && > > - (alloc_flags == ALLOC_NO_WATERMARKS || > > + if (tsk_is_oom_victim(current) && > > + (alloc_flags == ALLOC_OOM || > > (gfp_mask & __GFP_NOMEMALLOC))) > > goto nopage; > > > > Mostly I only found nit-picks so whether you address them or not > > Acked-by: Mel Gorman Thanks a lot for your review. Here is an incremental diff on top --- commit 8483306e108a25441d796a8da3df5b85d633d859 Author: Michal Hocko Date: Thu Aug 3 12:58:49 2017 +0200 fold me "mm, oom: do not rely on TIF_MEMDIE for memory reserves access" - clarify access to memory reserves in __zone_watermark_ok - per Mel - make int->bool conversion in gfp_pfmemalloc_allowed more robust - per Mel - s@reserves@reserve_flags@ in __alloc_pages_slowpath - per Mel diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7ae0f6d45614..90e331e4c077 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2948,7 +2948,9 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, } else { /* * OOM victims can try even harder than normal ALLOC_HARDER - * users + * users on the grounds that it's definitely going to be in + * the exit path shortly and free memory. Any allocation it + * makes during the free path will be small and short-lived. */ if (alloc_flags & ALLOC_OOM) min -= min / 2; @@ -3651,7 +3653,7 @@ static inline int __gfp_pfmemalloc_flags(gfp_t gfp_mask) bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) { - return __gfp_pfmemalloc_flags(gfp_mask) > 0; + return !!__gfp_pfmemalloc_flags(gfp_mask); } /* @@ -3804,7 +3806,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, unsigned long alloc_start = jiffies; unsigned int stall_timeout = 10 * HZ; unsigned int cpuset_mems_cookie; - int reserves; + int reserve_flags; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3910,16 +3912,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); - reserves = __gfp_pfmemalloc_flags(gfp_mask); - if (reserves) - alloc_flags = reserves; + reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); + if (reserve_flags) + alloc_flags = reserve_flags; /* * Reset the zonelist iterators if memory policies can be ignored. * These allocations are high priority and system rather than user * orientated. */ - if (!(alloc_flags & ALLOC_CPUSET) || reserves) { + if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->high_zoneidx, ac->nodemask); -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id D10D86B06A3 for ; Thu, 3 Aug 2017 08:22:27 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id z36so1740527wrb.13 for ; Thu, 03 Aug 2017 05:22:27 -0700 (PDT) Received: from outbound-smtp08.blacknight.com (outbound-smtp08.blacknight.com. [46.22.139.13]) by mx.google.com with ESMTPS id 33si1685317edp.326.2017.08.03.05.22.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 03 Aug 2017 05:22:26 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id D6F6A1C18F8 for ; Thu, 3 Aug 2017 13:22:25 +0100 (IST) Date: Thu, 3 Aug 2017 13:22:25 +0100 From: Mel Gorman Subject: Re: [PATCH v2 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20170803122225.3ycluy5ixl5edlfn@techsingularity.net> References: <20170727090357.3205-1-mhocko@kernel.org> <20170727090357.3205-2-mhocko@kernel.org> <20170802082914.GF2524@dhcp22.suse.cz> <20170803093701.icju4mxmto3ls3ch@techsingularity.net> <20170803110030.GJ12521@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20170803110030.GJ12521@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Andrew Morton , David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML On Thu, Aug 03, 2017 at 01:00:30PM +0200, Michal Hocko wrote: > > Ok, no collision with the wmark indexes so that should be fine. While I > > didn't check, I suspect that !MMU users also have relatively few CPUs to > > allow major contention. > > Well, I didn't try to improve the !MMU case because a) I do not know > whether there is a real problem with oom depletion there and b) I have > no way to test this. So I only focused on keeping the status quo for > nommu. > I've no problem with that. > > > @@ -3603,21 +3612,46 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > > return alloc_flags; > > > } > > > > > > -bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > > > +static bool oom_reserves_allowed(struct task_struct *tsk) > > > { > > > - if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) > > > + if (!tsk_is_oom_victim(tsk)) > > > + return false; > > > + > > > + /* > > > + * !MMU doesn't have oom reaper so give access to memory reserves > > > + * only to the thread with TIF_MEMDIE set > > > + */ > > > + if (!IS_ENABLED(CONFIG_MMU) && !test_thread_flag(TIF_MEMDIE)) > > > return false; > > > > > > + return true; > > > +} > > > + > > > > Ok, there is a chance that a task selected as an OOM kill victim may be > > in the middle of a __GFP_NOMEMALLOC allocation but I can't actually see a > > problem wiith that. __GFP_NOMEMALLOC users are not going to be in the exit > > path (which we care about for an OOM killed task) and the caller should > > always be able to handle a failure. > > Not sure I understand. If the oom victim is doing __GFP_NOMEMALLOC then > we haven't been doing ALLOC_NO_WATERMARKS even before. So I preserve the > behavior here. Even though I am not sure this is a deliberate behavior > or something more of result of an evolution of the code. > The behaviour is fine as far as I can tell. > > > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > > > +{ > > > + return __gfp_pfmemalloc_flags(gfp_mask) > 0; > > > } > > > > Very subtle sign casing error here. If the flags ever use the high bit, > > this wraps and fails. It "shouldn't be possible" but you could just remove > > the "> 0" there to be on the safe side or have __gfp_pfmemalloc_flags > > return unsigned. > > what about > return !!__gfp_pfmemalloc_flags(gfp_mask); > You could but it's overkill. Any value cast to bool should be safe as it's meant to be immune from truncation concerns. > > > /* > > > @@ -3770,6 +3804,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > unsigned long alloc_start = jiffies; > > > unsigned int stall_timeout = 10 * HZ; > > > unsigned int cpuset_mems_cookie; > > > + int reserves; > > > > > > > This should be explicitly named to indicate it's about flags and not the > > number of reserve pages or something else wacky. > > s@reserves@reserve_flags@? > That's do. > > > /* > > > * In the slowpath, we sanity check order to avoid ever trying to > > > @@ -3875,15 +3910,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > > > wake_all_kswapds(order, ac); > > > > > > - if (gfp_pfmemalloc_allowed(gfp_mask)) > > > - alloc_flags = ALLOC_NO_WATERMARKS; > > > + reserves = __gfp_pfmemalloc_flags(gfp_mask); > > > + if (reserves) > > > + alloc_flags = reserves; > > > > > > > And if it's reserve_flags you can save a branch with > > > > reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); > > alloc_pags |= reserve_flags; > > > > It won't make much difference considering how branch-intensive the allocator > > is anyway. > > I was actually considering that but rather didn't want to do it because > I wanted to reset alloc_flags rather than create strange ALLOC_$FOO > combinations which would be harder to evaluate. > Ok, it does implicitely clear flags like ALLOC_CPUSET which is fine in this context but it must be remembered in the future if an alloc flag is ever introduced that has meaning even for oom kill. > > Mostly I only found nit-picks so whether you address them or not > > > > Acked-by: Mel Gorman > > Thanks a lot for your review. Here is an incremental diff on top Looks fine. I am not a major fan of the !! because I think it's unnecessary but it's not worth making a big deal out of. It's a well-recognised idiom. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f69.google.com (mail-lf0-f69.google.com [209.85.215.69]) by kanga.kvack.org (Postfix) with ESMTP id 1DBC56B02F4 for ; Mon, 7 Aug 2017 10:22:26 -0400 (EDT) Received: by mail-lf0-f69.google.com with SMTP id g21so1046450lfg.3 for ; Mon, 07 Aug 2017 07:22:26 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id m29si3177444lfj.628.2017.08.07.07.22.24 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 07 Aug 2017 07:22:24 -0700 (PDT) Date: Mon, 7 Aug 2017 16:21:21 +0200 From: Michal Hocko Subject: Re: [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Message-ID: <20170807142121.GK32434@dhcp22.suse.cz> References: <20170727090357.3205-1-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170727090357.3205-1-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: David Rientjes , Johannes Weiner , Roman Gushchin , Tetsuo Handa , linux-mm@kvack.org, LKML On Thu 27-07-17 11:03:55, Michal Hocko wrote: > Hi, > this is a part of a larger series I posted back in Oct last year [1]. I > have dropped patch 3 because it was incorrect and patch 4 is not > applicable without it. > > The primary reason to apply patch 1 is to remove a risk of the complete > memory depletion by oom victims. While this is a theoretical risk right > now there is a demand for memcg aware oom killer which might kill all > processes inside a memcg which can be a lot of tasks. That would make > the risk quite real. > > This issue is addressed by limiting access to memory reserves. We no > longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim > instead. See Patch 1 for more details. Patch 2 is a trivial follow up > cleanup. > > I would still like to get rid of TIF_MEMDIE completely but I do not have > time to do it now and it is not a pressing issue. > > [1] http://lkml.kernel.org/r/20161004090009.7974-1-mhocko@kernel.org Are there any more comments/questions? I will resubmit if not. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org