From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756726Ab2K0Utu (ORCPT ); Tue, 27 Nov 2012 15:49:50 -0500 Received: from zene.cmpxchg.org ([85.214.230.12]:35197 "EHLO zene.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756513Ab2K0Utp (ORCPT ); Tue, 27 Nov 2012 15:49:45 -0500 From: Johannes Weiner To: Andrew Morton Cc: Mel Gorman , Rik van Riel , George Spelvin , Johannes Hirte , Tomas Racek , Jan Kara , Dave Hansen , Josh Boyer , Valdis.Kletnieks@vt.edu, Jiri Slaby , Thorsten Leemhuis , Zdenek Kabelac , Bruno Wolff III , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [patch] mm: vmscan: fix kswapd endless loop on higher order allocation Date: Tue, 27 Nov 2012 15:48:35 -0500 Message-Id: <1354049315-12874-2-git-send-email-hannes@cmpxchg.org> X-Mailer: git-send-email 1.7.11.7 In-Reply-To: <1354049315-12874-1-git-send-email-hannes@cmpxchg.org> References: <1354049315-12874-1-git-send-email-hannes@cmpxchg.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Kswapd does not in all places have the same criteria for a balanced zone. Zones are only being reclaimed when their high watermark is breached, but compaction checks loop over the zonelist again when the zone does not meet the low watermark plus two times the size of the allocation. This gets kswapd stuck in an endless loop over a small zone, like the DMA zone, where the high watermark is smaller than the compaction requirement. Add a function, zone_balanced(), that checks the watermark, and, for higher order allocations, if compaction has enough free memory. Then use it uniformly to check for balanced zones. This makes sure that when the compaction watermark is not met, at least reclaim happens and progress is made - or the zone is declared unreclaimable at some point and skipped entirely. Reported-by: George Spelvin Reported-by: Johannes Hirte Reported-by: Tomas Racek Signed-off-by: Johannes Weiner Tested-by: Johannes Hirte Reviewed-by: Rik van Riel --- mm/vmscan.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 48550c6..3b0aef4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2397,6 +2397,19 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc) } while (memcg); } +static bool zone_balanced(struct zone *zone, int order, + unsigned long balance_gap, int classzone_idx) +{ + if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone) + + balance_gap, classzone_idx, 0)) + return false; + + if (COMPACTION_BUILD && order && !compaction_suitable(zone, order)) + return false; + + return true; +} + /* * pgdat_balanced is used when checking if a node is balanced for high-order * allocations. Only zones that meet watermarks and are in a zone allowed @@ -2475,8 +2488,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, continue; } - if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone), - i, 0)) + if (!zone_balanced(zone, order, 0, i)) all_zones_ok = false; else balanced += zone->present_pages; @@ -2585,8 +2597,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, break; } - if (!zone_watermark_ok_safe(zone, order, - high_wmark_pages(zone), 0, 0)) { + if (!zone_balanced(zone, order, 0, 0)) { end_zone = i; break; } else { @@ -2662,9 +2673,8 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, testorder = 0; if ((buffer_heads_over_limit && is_highmem_idx(i)) || - !zone_watermark_ok_safe(zone, testorder, - high_wmark_pages(zone) + balance_gap, - end_zone, 0)) { + !zone_balanced(zone, testorder, + balance_gap, end_zone)) { shrink_zone(zone, &sc); reclaim_state->reclaimed_slab = 0; @@ -2691,8 +2701,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, continue; } - if (!zone_watermark_ok_safe(zone, testorder, - high_wmark_pages(zone), end_zone, 0)) { + if (!zone_balanced(zone, testorder, 0, end_zone)) { all_zones_ok = 0; /* * We are still under min water mark. This -- 1.7.11.7