* [patch] mm, oom: remove gfp helper function @ 2014-11-26 22:17 David Rientjes 2014-11-27 10:25 ` Michal Hocko 2014-12-01 23:23 ` Johannes Weiner 0 siblings, 2 replies; 10+ messages in thread From: David Rientjes @ 2014-11-26 22:17 UTC (permalink / raw) To: Andrew Morton Cc: Qiang Huang, Johannes Weiner, Michal Hocko, linux-kernel, linux-mm Commit b9921ecdee66 ("mm: add a helper function to check may oom condition") was added because the gfp criteria for oom killing was checked in both the page allocator and memcg. That was true for about nine months, but then commit 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path") removed the memcg usecase. Fold the implementation into its only caller. Signed-off-by: David Rientjes <rientjes@google.com> --- include/linux/oom.h | 5 ----- mm/page_alloc.c | 2 +- 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -85,11 +85,6 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; } -static inline bool oom_gfp_allowed(gfp_t gfp_mask) -{ - return (gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY); -} - extern struct task_struct *find_lock_task_mm(struct task_struct *p); /* sysctls */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2706,7 +2706,7 @@ rebalance: * running out of options and have to consider going OOM */ if (!did_some_progress) { - if (oom_gfp_allowed(gfp_mask)) { + if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { if (oom_killer_disabled) goto nopage; /* Coredumps can quickly deplete all memory reserves */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-11-26 22:17 [patch] mm, oom: remove gfp helper function David Rientjes @ 2014-11-27 10:25 ` Michal Hocko 2014-12-01 23:30 ` Johannes Weiner 2014-12-01 23:23 ` Johannes Weiner 1 sibling, 1 reply; 10+ messages in thread From: Michal Hocko @ 2014-11-27 10:25 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Qiang Huang, Johannes Weiner, linux-kernel, linux-mm On Wed 26-11-14 14:17:32, David Rientjes wrote: > Commit b9921ecdee66 ("mm: add a helper function to check may oom > condition") was added because the gfp criteria for oom killing was > checked in both the page allocator and memcg. > > That was true for about nine months, but then commit 0029e19ebf84 ("mm: > memcontrol: remove explicit OOM parameter in charge path") removed the > memcg usecase. > > Fold the implementation into its only caller. I don't care much whether the check is open coded or hidden behind the helper but I would really appreciate a comment explaining why we care about these two particular gfp flags. The code is like that since ages - excavation work would lead us back to 2002 resp. 2003. Let's save other others people time and do not repeat the same exercise again. What about a comment like the following? > Signed-off-by: David Rientjes <rientjes@google.com> > --- > include/linux/oom.h | 5 ----- > mm/page_alloc.c | 2 +- > 2 files changed, 1 insertion(+), 6 deletions(-) > [...] > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2706,7 +2706,7 @@ rebalance: > * running out of options and have to consider going OOM > */ > if (!did_some_progress) { > - if (oom_gfp_allowed(gfp_mask)) { /* * Do not attempt to trigger OOM killer for !__GFP_FS * allocations because it would be premature to kill * anything just because the reclaim is stuck on * dirty/writeback pages. * __GFP_NORETRY allocations might fail and so the OOM * would be more harmful than useful. */ > + if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { > if (oom_killer_disabled) > goto nopage; > /* Coredumps can quickly deplete all memory reserves */ -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-11-27 10:25 ` Michal Hocko @ 2014-12-01 23:30 ` Johannes Weiner 2014-12-03 15:52 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Johannes Weiner @ 2014-12-01 23:30 UTC (permalink / raw) To: Michal Hocko Cc: David Rientjes, Andrew Morton, Qiang Huang, linux-kernel, linux-mm On Thu, Nov 27, 2014 at 11:25:47AM +0100, Michal Hocko wrote: > On Wed 26-11-14 14:17:32, David Rientjes wrote: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2706,7 +2706,7 @@ rebalance: > > * running out of options and have to consider going OOM > > */ > > if (!did_some_progress) { > > - if (oom_gfp_allowed(gfp_mask)) { > /* > * Do not attempt to trigger OOM killer for !__GFP_FS > * allocations because it would be premature to kill > * anything just because the reclaim is stuck on > * dirty/writeback pages. > * __GFP_NORETRY allocations might fail and so the OOM > * would be more harmful than useful. > */ I don't think we need to explain the individual flags, but it would indeed be useful to remark here that we shouldn't OOM kill from allocations contexts with (severely) limited reclaim abilities. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-12-01 23:30 ` Johannes Weiner @ 2014-12-03 15:52 ` Michal Hocko 2014-12-03 18:15 ` Johannes Weiner 2014-12-03 23:10 ` Andrew Morton 0 siblings, 2 replies; 10+ messages in thread From: Michal Hocko @ 2014-12-03 15:52 UTC (permalink / raw) To: Johannes Weiner Cc: David Rientjes, Andrew Morton, Qiang Huang, linux-kernel, linux-mm On Mon 01-12-14 18:30:40, Johannes Weiner wrote: > On Thu, Nov 27, 2014 at 11:25:47AM +0100, Michal Hocko wrote: > > On Wed 26-11-14 14:17:32, David Rientjes wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -2706,7 +2706,7 @@ rebalance: > > > * running out of options and have to consider going OOM > > > */ > > > if (!did_some_progress) { > > > - if (oom_gfp_allowed(gfp_mask)) { > > /* > > * Do not attempt to trigger OOM killer for !__GFP_FS > > * allocations because it would be premature to kill > > * anything just because the reclaim is stuck on > > * dirty/writeback pages. > > * __GFP_NORETRY allocations might fail and so the OOM > > * would be more harmful than useful. > > */ > > I don't think we need to explain the individual flags, but it would > indeed be useful to remark here that we shouldn't OOM kill from > allocations contexts with (severely) limited reclaim abilities. Is __GFP_NORETRY really related to limited reclaim abilities? I thought it was merely a way to tell the allocator to fail rather than spend too much time reclaiming. If you are referring to __GFP_FS part then I have no objections to be less specific, of course, but __GFP_IO would fall into the same category but we are not checking for it. I have no idea why we consider the first and not the later one, to be honest... -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-12-03 15:52 ` Michal Hocko @ 2014-12-03 18:15 ` Johannes Weiner 2014-12-04 15:17 ` Michal Hocko 2014-12-03 23:10 ` Andrew Morton 1 sibling, 1 reply; 10+ messages in thread From: Johannes Weiner @ 2014-12-03 18:15 UTC (permalink / raw) To: Michal Hocko Cc: David Rientjes, Andrew Morton, Qiang Huang, linux-kernel, linux-mm On Wed, Dec 03, 2014 at 04:52:22PM +0100, Michal Hocko wrote: > On Mon 01-12-14 18:30:40, Johannes Weiner wrote: > > On Thu, Nov 27, 2014 at 11:25:47AM +0100, Michal Hocko wrote: > > > On Wed 26-11-14 14:17:32, David Rientjes wrote: > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -2706,7 +2706,7 @@ rebalance: > > > > * running out of options and have to consider going OOM > > > > */ > > > > if (!did_some_progress) { > > > > - if (oom_gfp_allowed(gfp_mask)) { > > > /* > > > * Do not attempt to trigger OOM killer for !__GFP_FS > > > * allocations because it would be premature to kill > > > * anything just because the reclaim is stuck on > > > * dirty/writeback pages. > > > * __GFP_NORETRY allocations might fail and so the OOM > > > * would be more harmful than useful. > > > */ > > > > I don't think we need to explain the individual flags, but it would > > indeed be useful to remark here that we shouldn't OOM kill from > > allocations contexts with (severely) limited reclaim abilities. > > Is __GFP_NORETRY really related to limited reclaim abilities? I thought > it was merely a way to tell the allocator to fail rather than spend too > much time reclaiming. And you wouldn't call that "limited reclaim ability"? I guess it's a matter of phrasing, but the point is that we don't want anybody to OOM kill that didn't exhaust all other options that are usually available to allocators. This includes the ability to enter the FS, the ability to do IO in general, and the ability to retry reclaim. Possibly more. > If you are referring to __GFP_FS part then I have > no objections to be less specific, of course, but __GFP_IO would fall > into the same category but we are not checking for it. I have no idea > why we consider the first and not the later one, to be honest... Which proves my point that we should document high-level intent rather than implementation. Suddenly, that missing __GFP_IO is sticking out like a sore thumb... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-12-03 18:15 ` Johannes Weiner @ 2014-12-04 15:17 ` Michal Hocko 2014-12-04 20:19 ` Johannes Weiner 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2014-12-04 15:17 UTC (permalink / raw) To: Johannes Weiner Cc: David Rientjes, Andrew Morton, Qiang Huang, linux-kernel, linux-mm On Wed 03-12-14 13:15:09, Johannes Weiner wrote: > On Wed, Dec 03, 2014 at 04:52:22PM +0100, Michal Hocko wrote: > > On Mon 01-12-14 18:30:40, Johannes Weiner wrote: > > > On Thu, Nov 27, 2014 at 11:25:47AM +0100, Michal Hocko wrote: > > > > On Wed 26-11-14 14:17:32, David Rientjes wrote: > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > --- a/mm/page_alloc.c > > > > > +++ b/mm/page_alloc.c > > > > > @@ -2706,7 +2706,7 @@ rebalance: > > > > > * running out of options and have to consider going OOM > > > > > */ > > > > > if (!did_some_progress) { > > > > > - if (oom_gfp_allowed(gfp_mask)) { > > > > /* > > > > * Do not attempt to trigger OOM killer for !__GFP_FS > > > > * allocations because it would be premature to kill > > > > * anything just because the reclaim is stuck on > > > > * dirty/writeback pages. > > > > * __GFP_NORETRY allocations might fail and so the OOM > > > > * would be more harmful than useful. > > > > */ > > > > > > I don't think we need to explain the individual flags, but it would > > > indeed be useful to remark here that we shouldn't OOM kill from > > > allocations contexts with (severely) limited reclaim abilities. > > > > Is __GFP_NORETRY really related to limited reclaim abilities? I thought > > it was merely a way to tell the allocator to fail rather than spend too > > much time reclaiming. > > And you wouldn't call that "limited reclaim ability"? I really do not want to go into language lawyering here. But to me the reclaim ability is what the reclaim is capable to do with the given gfp. And __GFP_NORETRY is completely irrelevant for the reclaim. It tells the allocator how hard it should try (similar like __GFP_REPEAT or __GFP_NOFAIL) unlike __GFP_FS which restricts the reclaim in its operation. > I guess it's a > matter of phrasing, but the point is that we don't want anybody to OOM > kill that didn't exhaust all other options that are usually available > to allocators. This includes the ability to enter the FS, the ability > to do IO in general, and the ability to retry reclaim. Possibly more. Right. > > If you are referring to __GFP_FS part then I have > > no objections to be less specific, of course, but __GFP_IO would fall > > into the same category but we are not checking for it. I have no idea > > why we consider the first and not the later one, to be honest... > > Which proves my point that we should document high-level intent rather > than implementation. Suddenly, that missing __GFP_IO is sticking out > like a sore thumb... I am obviously not insisting on the above wording. I am for everything that would clarify the test and do not force me to go through several hops of the git blame to find the original intention again after year when I forget this again. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-12-04 15:17 ` Michal Hocko @ 2014-12-04 20:19 ` Johannes Weiner 2014-12-05 14:05 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Johannes Weiner @ 2014-12-04 20:19 UTC (permalink / raw) To: Michal Hocko Cc: David Rientjes, Andrew Morton, Qiang Huang, linux-kernel, linux-mm On Thu, Dec 04, 2014 at 04:17:58PM +0100, Michal Hocko wrote: > On Wed 03-12-14 13:15:09, Johannes Weiner wrote: > > On Wed, Dec 03, 2014 at 04:52:22PM +0100, Michal Hocko wrote: > > > On Mon 01-12-14 18:30:40, Johannes Weiner wrote: > > > > On Thu, Nov 27, 2014 at 11:25:47AM +0100, Michal Hocko wrote: > > > > > On Wed 26-11-14 14:17:32, David Rientjes wrote: > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > > --- a/mm/page_alloc.c > > > > > > +++ b/mm/page_alloc.c > > > > > > @@ -2706,7 +2706,7 @@ rebalance: > > > > > > * running out of options and have to consider going OOM > > > > > > */ > > > > > > if (!did_some_progress) { > > > > > > - if (oom_gfp_allowed(gfp_mask)) { > > > > > /* > > > > > * Do not attempt to trigger OOM killer for !__GFP_FS > > > > > * allocations because it would be premature to kill > > > > > * anything just because the reclaim is stuck on > > > > > * dirty/writeback pages. > > > > > * __GFP_NORETRY allocations might fail and so the OOM > > > > > * would be more harmful than useful. > > > > > */ > > > > > > > > I don't think we need to explain the individual flags, but it would > > > > indeed be useful to remark here that we shouldn't OOM kill from > > > > allocations contexts with (severely) limited reclaim abilities. > > > > > > Is __GFP_NORETRY really related to limited reclaim abilities? I thought > > > it was merely a way to tell the allocator to fail rather than spend too > > > much time reclaiming. > > > > And you wouldn't call that "limited reclaim ability"? > > I really do not want to go into language lawyering here. But to me the > reclaim ability is what the reclaim is capable to do with the given gfp. I was explicitely talking about the reclaim abilities of the allocation context, in the context of the allocator. Surely that includes how often the task can call try_to_free_pages()? > And __GFP_NORETRY is completely irrelevant for the reclaim. It tells the > allocator how hard it should try (similar like __GFP_REPEAT or > __GFP_NOFAIL) unlike __GFP_FS which restricts the reclaim in its > operation. Tells the allocator how hard to try *what*? It's not just looking at the freelists in a tight loop. It's reclaiming memory. You're still thinking about the implementation and get distracted by the detail that the allocator's reclaim logic is living in a separate code file. Look at the architecture. The GFP flags apply to the freelist manager and the reclaim code as one big allocator machine that goes off to get pages, from wherever. And anything that is taking pages that are not already on the freelist must be reclaiming them from somewhere else, by definition. Thus, the allocation context's reclaim ability includes whether it can free or migrate used pages on its own (__GFP_WAIT), whether it can scan and migrate pages indefinitely or not (__GFP_NOFAIL and __GFP_NORETRY), and whether it can write and wait for the pages it looks at (__GFP_FS and __GFP_IO). OOM killing is just another way to reclaim, but the point here is that it's so disruptive that we want to avoid it unless all other means to reclaim memory are exhausted. And yes, continuing to scan pages *is* a means for the allocator to reclaim memory, and we don't OOM kill when that is still making progress! That being said, you ARE right that __GFP_FS and __GFP_NORETRY apply to different components of the allocator. While they should have the same effect on OOM-killing, the fact that they are checked together in the same branch is a strong signal of how weird this code actually is. How about the following? It changes the code flow to clarify what's actually going on there and gets rid of oom_gfp_allowed() altogether, instead of awkwardly trying to explain something that has no meaning. Btw, it looks like there is a bug with oom_killer_disabled, because it will return NULL for __GFP_NOFAIL. --- From: Johannes Weiner <hannes@cmpxchg.org> Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation slowpath The OOM killing invocation does a lot of duplicative checks against the task's allocation context. Rework it to take advantage of the existing checks in the allocator slowpath. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/oom.h | 5 ---- mm/page_alloc.c | 80 +++++++++++++++++++++++------------------------------ 2 files changed, 35 insertions(+), 50 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index e8d6e1058723..4971874f54db 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -85,11 +85,6 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; } -static inline bool oom_gfp_allowed(gfp_t gfp_mask) -{ - return (gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY); -} - extern struct task_struct *find_lock_task_mm(struct task_struct *p); /* sysctls */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 616a2c956b4b..2df99ca56e28 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2232,12 +2232,21 @@ static inline struct page * __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, nodemask_t *nodemask, struct zone *preferred_zone, - int classzone_idx, int migratetype) + int classzone_idx, int migratetype, unsigned long *did_some_progress) { struct page *page; - /* Acquire the per-zone oom lock for each zone */ + *did_some_progress = 0; + + if (oom_killer_disabled) + return NULL; + + /* + * Acquire the per-zone oom lock for each zone. If that + * fails, somebody else is making progress for us. + */ if (!oom_zonelist_trylock(zonelist, gfp_mask)) { + *did_some_progress = 1; schedule_timeout_uninterruptible(1); return NULL; } @@ -2263,12 +2272,18 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, goto out; if (!(gfp_mask & __GFP_NOFAIL)) { + /* Coredumps can quickly deplete all memory reserves */ + if (current->flags & PF_DUMPCORE) + goto out; /* The OOM killer will not help higher order allocs */ if (order > PAGE_ALLOC_COSTLY_ORDER) goto out; /* The OOM killer does not needlessly kill tasks for lowmem */ if (high_zoneidx < ZONE_NORMAL) goto out; + /* The OOM killer does not compensate for light reclaim */ + if (!(gfp_mask & __GFP_FS)) + goto out; /* * GFP_THISNODE contains __GFP_NORETRY and we never hit this. * Sanity check for bare calls of __GFP_THISNODE, not real OOM. @@ -2281,7 +2296,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, } /* Exhausted what can be done so it's blamo time */ out_of_memory(zonelist, gfp_mask, order, nodemask, false); - + *did_some_progress = 1; out: oom_zonelist_unlock(zonelist, gfp_mask); return page; @@ -2571,7 +2586,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; -restart: if (!(gfp_mask & __GFP_NO_KSWAPD)) wake_all_kswapds(order, zonelist, high_zoneidx, preferred_zone, nodemask); @@ -2701,51 +2715,27 @@ rebalance: if (page) goto got_pg; - /* - * If we failed to make any progress reclaiming, then we are - * running out of options and have to consider going OOM - */ - if (!did_some_progress) { - if (oom_gfp_allowed(gfp_mask)) { - if (oom_killer_disabled) - goto nopage; - /* Coredumps can quickly deplete all memory reserves */ - if ((current->flags & PF_DUMPCORE) && - !(gfp_mask & __GFP_NOFAIL)) - goto nopage; - page = __alloc_pages_may_oom(gfp_mask, order, - zonelist, high_zoneidx, - nodemask, preferred_zone, - classzone_idx, migratetype); - if (page) - goto got_pg; - - if (!(gfp_mask & __GFP_NOFAIL)) { - /* - * The oom killer is not called for high-order - * allocations that may fail, so if no progress - * is being made, there are no other options and - * retrying is unlikely to help. - */ - if (order > PAGE_ALLOC_COSTLY_ORDER) - goto nopage; - /* - * The oom killer is not called for lowmem - * allocations to prevent needlessly killing - * innocent tasks. - */ - if (high_zoneidx < ZONE_NORMAL) - goto nopage; - } - - goto restart; - } - } - /* Check if we should retry the allocation */ pages_reclaimed += did_some_progress; if (should_alloc_retry(gfp_mask, order, did_some_progress, pages_reclaimed)) { + /* + * If we fail to make progress by freeing individual + * pages, but the allocation wants us to keep going, + * start OOM killing tasks. + */ + if (!did_some_progress) { + page = __alloc_pages_may_oom(gfp_mask, order, zonelist, + high_zoneidx, nodemask, + preferred_zone, classzone_idx, + migratetype,&did_some_progress); + if (page) + goto got_pg; + if (!did_some_progress) { + BUG_ON(gfp_mask & __GFP_NOFAIL); + goto nopage; + } + } /* Wait for some write requests to complete then retry */ wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); goto rebalance; -- 2.1.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-12-04 20:19 ` Johannes Weiner @ 2014-12-05 14:05 ` Michal Hocko 0 siblings, 0 replies; 10+ messages in thread From: Michal Hocko @ 2014-12-05 14:05 UTC (permalink / raw) To: Johannes Weiner Cc: David Rientjes, Andrew Morton, Qiang Huang, linux-kernel, linux-mm On Thu 04-12-14 15:19:05, Johannes Weiner wrote: [...] > How about the following? It changes the code flow to clarify what's > actually going on there and gets rid of oom_gfp_allowed() altogether, > instead of awkwardly trying to explain something that has no meaning. Yes this makes a lot of sense. > Btw, it looks like there is a bug with oom_killer_disabled, because it > will return NULL for __GFP_NOFAIL. Right! __GFP_NOFAIL allocation after oom is disabled cannot be guaranteed. > --- > From: Johannes Weiner <hannes@cmpxchg.org> > Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation > slowpath > > The OOM killing invocation does a lot of duplicative checks against > the task's allocation context. Rework it to take advantage of the > existing checks in the allocator slowpath. Nice! Just document the __GFP_NOFAIL fix here as well, please. > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> I will rebase my oom vs pm-freezer patch (http://marc.info/?l=linux-mm&m=141634503316543&w=2) which touches this area on top of your patch. Thanks! > --- > include/linux/oom.h | 5 ---- > mm/page_alloc.c | 80 +++++++++++++++++++++++------------------------------ > 2 files changed, 35 insertions(+), 50 deletions(-) > > diff --git a/include/linux/oom.h b/include/linux/oom.h > index e8d6e1058723..4971874f54db 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -85,11 +85,6 @@ static inline void oom_killer_enable(void) > oom_killer_disabled = false; > } > > -static inline bool oom_gfp_allowed(gfp_t gfp_mask) > -{ > - return (gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY); > -} > - > extern struct task_struct *find_lock_task_mm(struct task_struct *p); > > /* sysctls */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 616a2c956b4b..2df99ca56e28 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2232,12 +2232,21 @@ static inline struct page * > __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > struct zonelist *zonelist, enum zone_type high_zoneidx, > nodemask_t *nodemask, struct zone *preferred_zone, > - int classzone_idx, int migratetype) > + int classzone_idx, int migratetype, unsigned long *did_some_progress) > { > struct page *page; > > - /* Acquire the per-zone oom lock for each zone */ > + *did_some_progress = 0; > + > + if (oom_killer_disabled) > + return NULL; > + > + /* > + * Acquire the per-zone oom lock for each zone. If that > + * fails, somebody else is making progress for us. > + */ > if (!oom_zonelist_trylock(zonelist, gfp_mask)) { > + *did_some_progress = 1; > schedule_timeout_uninterruptible(1); > return NULL; > } > @@ -2263,12 +2272,18 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > goto out; > > if (!(gfp_mask & __GFP_NOFAIL)) { > + /* Coredumps can quickly deplete all memory reserves */ > + if (current->flags & PF_DUMPCORE) > + goto out; > /* The OOM killer will not help higher order allocs */ > if (order > PAGE_ALLOC_COSTLY_ORDER) > goto out; > /* The OOM killer does not needlessly kill tasks for lowmem */ > if (high_zoneidx < ZONE_NORMAL) > goto out; > + /* The OOM killer does not compensate for light reclaim */ > + if (!(gfp_mask & __GFP_FS)) > + goto out; > /* > * GFP_THISNODE contains __GFP_NORETRY and we never hit this. > * Sanity check for bare calls of __GFP_THISNODE, not real OOM. > @@ -2281,7 +2296,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > } > /* Exhausted what can be done so it's blamo time */ > out_of_memory(zonelist, gfp_mask, order, nodemask, false); > - > + *did_some_progress = 1; > out: > oom_zonelist_unlock(zonelist, gfp_mask); > return page; > @@ -2571,7 +2586,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (gfp_mask & GFP_THISNODE) == GFP_THISNODE) > goto nopage; > > -restart: > if (!(gfp_mask & __GFP_NO_KSWAPD)) > wake_all_kswapds(order, zonelist, high_zoneidx, > preferred_zone, nodemask); > @@ -2701,51 +2715,27 @@ rebalance: > if (page) > goto got_pg; > > - /* > - * If we failed to make any progress reclaiming, then we are > - * running out of options and have to consider going OOM > - */ > - if (!did_some_progress) { > - if (oom_gfp_allowed(gfp_mask)) { > - if (oom_killer_disabled) > - goto nopage; > - /* Coredumps can quickly deplete all memory reserves */ > - if ((current->flags & PF_DUMPCORE) && > - !(gfp_mask & __GFP_NOFAIL)) > - goto nopage; > - page = __alloc_pages_may_oom(gfp_mask, order, > - zonelist, high_zoneidx, > - nodemask, preferred_zone, > - classzone_idx, migratetype); > - if (page) > - goto got_pg; > - > - if (!(gfp_mask & __GFP_NOFAIL)) { > - /* > - * The oom killer is not called for high-order > - * allocations that may fail, so if no progress > - * is being made, there are no other options and > - * retrying is unlikely to help. > - */ > - if (order > PAGE_ALLOC_COSTLY_ORDER) > - goto nopage; > - /* > - * The oom killer is not called for lowmem > - * allocations to prevent needlessly killing > - * innocent tasks. > - */ > - if (high_zoneidx < ZONE_NORMAL) > - goto nopage; > - } > - > - goto restart; > - } > - } > - > /* Check if we should retry the allocation */ > pages_reclaimed += did_some_progress; > if (should_alloc_retry(gfp_mask, order, did_some_progress, > pages_reclaimed)) { > + /* > + * If we fail to make progress by freeing individual > + * pages, but the allocation wants us to keep going, > + * start OOM killing tasks. > + */ > + if (!did_some_progress) { > + page = __alloc_pages_may_oom(gfp_mask, order, zonelist, > + high_zoneidx, nodemask, > + preferred_zone, classzone_idx, > + migratetype,&did_some_progress); > + if (page) > + goto got_pg; > + if (!did_some_progress) { > + BUG_ON(gfp_mask & __GFP_NOFAIL); > + goto nopage; > + } > + } > /* Wait for some write requests to complete then retry */ > wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > goto rebalance; > -- > 2.1.3 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-12-03 15:52 ` Michal Hocko 2014-12-03 18:15 ` Johannes Weiner @ 2014-12-03 23:10 ` Andrew Morton 1 sibling, 0 replies; 10+ messages in thread From: Andrew Morton @ 2014-12-03 23:10 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, David Rientjes, Qiang Huang, linux-kernel, linux-mm On Wed, 3 Dec 2014 16:52:22 +0100 Michal Hocko <mhocko@suse.cz> wrote: > On Mon 01-12-14 18:30:40, Johannes Weiner wrote: > > On Thu, Nov 27, 2014 at 11:25:47AM +0100, Michal Hocko wrote: > > > On Wed 26-11-14 14:17:32, David Rientjes wrote: > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -2706,7 +2706,7 @@ rebalance: > > > > * running out of options and have to consider going OOM > > > > */ > > > > if (!did_some_progress) { > > > > - if (oom_gfp_allowed(gfp_mask)) { > > > /* > > > * Do not attempt to trigger OOM killer for !__GFP_FS > > > * allocations because it would be premature to kill > > > * anything just because the reclaim is stuck on > > > * dirty/writeback pages. > > > * __GFP_NORETRY allocations might fail and so the OOM > > > * would be more harmful than useful. > > > */ > > > > I don't think we need to explain the individual flags, but it would > > indeed be useful to remark here that we shouldn't OOM kill from > > allocations contexts with (severely) limited reclaim abilities. > > Is __GFP_NORETRY really related to limited reclaim abilities? I thought > it was merely a way to tell the allocator to fail rather than spend too > much time reclaiming. That's my understanding of __GFP_NORETRY. However it seems that I really didn't have a plan: : commit 75908778d91e92ca3c9ed587c4550866f4c903fc : Author: Andrew Morton <akpm@digeo.com> : Date: Sun Apr 20 00:28:12 2003 -0700 : : [PATCH] implement __GFP_REPEAT, __GFP_NOFAIL, __GFP_NORETRY : : This is a cleanup patch. : : There are quite a lot of places in the kernel which will infinitely retry a : memory allocation. : : Generally, they get it wrong. Some do yield(), the semantics of which have : changed over time. Some do schedule(), which can lock up if the caller is : SCHED_FIFO/RR. Some do schedule_timeout(), etc. : : And often it is unnecessary, because the page allocator will do the retry : internally anyway. But we cannot rely on that - this behaviour may change : (-aa and -rmap kernels do not do this, for instance). : : So it is good to formalise and to centralise this operation. If an : allocation specifies __GFP_REPEAT then the page allocator must infinitely : retry the allocation. : : The semantics of __GFP_REPEAT are "try harder". The allocation _may_ fail : (the 2.4 -aa and -rmap VM's do not retry infinitely by default). : : The semantics of __GFP_NOFAIL are "cannot fail". It is a no-op in this VM, : but needs to be honoured (or fix up the callers) if the VM ischanged to not : retry infinitely by default. : : The semantics of __GFP_NOREPEAT are "try once, don't loop". This isn't used : at present (although perhaps it should be, in swapoff). It is mainly for : completeness. (that's a braino in the changelog: it should be s/__GFP_NOREPEAT/__GFP_NORETRY/) > If you are referring to __GFP_FS part then I have > no objections to be less specific, of course, but __GFP_IO would fall > into the same category but we are not checking for it. I have no idea > why we consider the first and not the later one, to be honest... (__GFP_FS && !__GFP_IO) doesn't make much sense and probably doesn't happen. "__GFP_FS implies __GFP_IO" is OK. Anyway, yes, This particular piece of __alloc_pages_slowpath() sorely needs documenting please. Once we manage to work out why we're doing what we're doing! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: remove gfp helper function 2014-11-26 22:17 [patch] mm, oom: remove gfp helper function David Rientjes 2014-11-27 10:25 ` Michal Hocko @ 2014-12-01 23:23 ` Johannes Weiner 1 sibling, 0 replies; 10+ messages in thread From: Johannes Weiner @ 2014-12-01 23:23 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Qiang Huang, Michal Hocko, linux-kernel, linux-mm On Wed, Nov 26, 2014 at 02:17:32PM -0800, David Rientjes wrote: > Commit b9921ecdee66 ("mm: add a helper function to check may oom > condition") was added because the gfp criteria for oom killing was > checked in both the page allocator and memcg. > > That was true for about nine months, but then commit 0029e19ebf84 ("mm: > memcontrol: remove explicit OOM parameter in charge path") removed the > memcg usecase. > > Fold the implementation into its only caller. > > Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-12-05 14:05 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-11-26 22:17 [patch] mm, oom: remove gfp helper function David Rientjes 2014-11-27 10:25 ` Michal Hocko 2014-12-01 23:30 ` Johannes Weiner 2014-12-03 15:52 ` Michal Hocko 2014-12-03 18:15 ` Johannes Weiner 2014-12-04 15:17 ` Michal Hocko 2014-12-04 20:19 ` Johannes Weiner 2014-12-05 14:05 ` Michal Hocko 2014-12-03 23:10 ` Andrew Morton 2014-12-01 23:23 ` Johannes Weiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).