[patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator

* [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator
@ 2011-01-23 22:58 David Rientjes
  2011-01-24 17:09 ` Rik van Riel
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: David Rientjes @ 2011-01-23 22:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Johannes Weiner, Minchan Kim, Wu Fengguang,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Jens Axboe,
	linux-mm

0e093d99763e (writeback: do not sleep on the congestion queue if there
are no congested BDIs or if significant congestion is not being
encountered in the current zone) uncovered a livelock in the page
allocator that resulted in tasks infinitely looping trying to find memory
and kswapd running at 100% cpu.

The issue occurs because drain_all_pages() is called immediately
following direct reclaim when no memory is freed and try_to_free_pages()
returns non-zero because all zones in the zonelist do not have their
all_unreclaimable flag set.

When draining the per-cpu pagesets back to the buddy allocator for each
zone, the zone->pages_scanned counter is cleared to avoid erroneously
setting zone->all_unreclaimable later.  The problem is that no pages may
actually be drained and, thus, the unreclaimable logic never fails direct
reclaim so the oom killer may be invoked.

This apparently only manifested after wait_iff_congested() was introduced
and the zone was full of anonymous memory that would not congest the
backing store.  The page allocator would infinitely loop if there were no
other tasks waiting to be scheduled and clear zone->pages_scanned because
of drain_all_pages() as the result of this change before kswapd could
scan enough pages to trigger the reclaim logic.  Additionally, with every
loop of the page allocator and in the reclaim path, kswapd would be
kicked and would end up running at 100% cpu.  In this scenario, current
and kswapd are all running continuously with kswapd incrementing
zone->pages_scanned and current clearing it.

The problem is even more pronounced when current swaps some of its memory
to swap cache and the reclaimable logic then considers all active
anonymous memory in the all_unreclaimable logic, which requires a much
higher zone->pages_scanned value for try_to_free_pages() to return zero
that is never attainable in this scenario.

Before wait_iff_congested(), the page allocator would incur an
unconditional timeout and allow kswapd to elevate zone->pages_scanned to
a level that the oom killer would be called the next time it loops.

The fix is to only attempt to drain pcp pages if there is actually a
quantity to be drained.  The unconditional clearing of
zone->pages_scanned in free_pcppages_bulk() need not be changed since
other callers already ensure that draining will occur.  This patch
ensures that free_pcppages_bulk() will actually free memory before
calling into it from drain_all_pages() so zone->pages_scanned is only
cleared if appropriate.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1088,8 +1088,10 @@ static void drain_pages(unsigned int cpu)
 		pset = per_cpu_ptr(zone->pageset, cpu);
 
 		pcp = &pset->pcp;
-		free_pcppages_bulk(zone, pcp->count, pcp);
-		pcp->count = 0;
+		if (pcp->count) {
+			free_pcppages_bulk(zone, pcp->count, pcp);
+			pcp->count = 0;
+		}
 		local_irq_restore(flags);
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread