* + mm-page_allocc-prevent-unending-loop-in-__alloc_pages_slowpath.patch added to -mm tree
@ 2011-05-20 22:23 akpm
0 siblings, 0 replies; only message in thread
From: akpm @ 2011-05-20 22:23 UTC (permalink / raw)
To: mm-commits; +Cc: abarry, mgorman, minchan.kim, riel, stable
The patch titled
mm/page_alloc.c: prevent unending loop in __alloc_pages_slowpath()
has been added to the -mm tree. Its filename is
mm-page_allocc-prevent-unending-loop-in-__alloc_pages_slowpath.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
out what to do about this
The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/
------------------------------------------------------
Subject: mm/page_alloc.c: prevent unending loop in __alloc_pages_slowpath()
From: Andrew Barry <abarry@cray.com>
I believe I found a problem in __alloc_pages_slowpath, which allows a
process to get stuck endlessly looping, even when lots of memory is
available.
Running an I/O and memory intensive stress-test I see a 0-order page
allocation with __GFP_IO and __GFP_WAIT, running on a system with very
little free memory. Right about the same time that the stress-test gets
killed by the OOM-killer, the utility trying to allocate memory gets stuck
in __alloc_pages_slowpath even though most of the systems memory was freed
by the oom-kill of the stress-test.
The utility ends up looping from the rebalance label down through the
wait_iff_congested continiously. Because order=0,
__alloc_pages_direct_compact skips the call to get_page_from_freelist.
Because all of the reclaimable memory on the system has already been
reclaimed, __alloc_pages_direct_reclaim skips the call to
get_page_from_freelist. Since there is no __GFP_FS flag, the block with
__alloc_pages_may_oom is skipped. The loop hits the wait_iff_congested,
then jumps back to rebalance without ever trying to
get_page_from_freelist. This loop repeats infinitely.
The test case is pretty pathological. Running a mix of I/O stress-tests
that do a lot of fork() and consume all of the system memory, I can pretty
reliably hit this on 600 nodes, in about 12 hours. 32GB/node.
Signed-off-by: Andrew Barry <abarry@cray.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel<riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/page_alloc.c~mm-page_allocc-prevent-unending-loop-in-__alloc_pages_slowpath mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_allocc-prevent-unending-loop-in-__alloc_pages_slowpath
+++ a/mm/page_alloc.c
@@ -2106,6 +2106,7 @@ restart:
first_zones_zonelist(zonelist, high_zoneidx, NULL,
&preferred_zone);
+rebalance:
/* This is the last chance, in general, before the goto nopage. */
page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
@@ -2113,7 +2114,6 @@ restart:
if (page)
goto got_pg;
-rebalance:
/* Allocate without watermarks if the context allows */
if (alloc_flags & ALLOC_NO_WATERMARKS) {
page = __alloc_pages_high_priority(gfp_mask, order,
_
Patches currently in -mm which might be from abarry@cray.com are
mm-page_allocc-prevent-unending-loop-in-__alloc_pages_slowpath.patch
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2011-05-20 22:23 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-20 22:23 + mm-page_allocc-prevent-unending-loop-in-__alloc_pages_slowpath.patch added to -mm tree akpm
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.