mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* + mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully.patch added to -mm tree
@ 2011-06-28 21:53 akpm
  0 siblings, 0 replies; only message in thread
From: akpm @ 2011-06-28 21:53 UTC (permalink / raw)
  To: mm-commits
  Cc: mgorman, P, hannes, kosaki.motohiro, luto, minchan.kim, riel, stable


The patch titled
     mm: vmscan: only read new_classzone_idx from pgdat when reclaiming successfully
has been added to the -mm tree.  Its filename is
     mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
out what to do about this

The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/

------------------------------------------------------
Subject: mm: vmscan: only read new_classzone_idx from pgdat when reclaiming successfully
From: Mel Gorman <mgorman@suse.de>

During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.  This
is expected behaviour.  Unfortunately, if the highest zone is small, a
problem occurs.

When balance_pgdat() returns, it may be at a lower classzone_idx than it
started because the highest zone was unreclaimable.  Before checking if it
should go to sleep though, it checks pgdat->classzone_idx which when there
is no other activity will be MAX_NR_ZONES-1.  It interprets this as it has
been woken up while reclaiming, skips scheduling and reclaims again.  As
there is no useful reclaim work to do, it enters into a loop of shrinking
slab consuming loads of CPU until the highest zone becomes reclaimable for
a long period of time.

There are two problems here.  1) If the returned classzone or order is
lower, it'll continue reclaiming without scheduling.  2) if the highest
zone was marked unreclaimable but balance_pgdat() returns immediately at
DEF_PRIORITY, the new lower classzone is not communicated back to kswapd()
for sleeping.

This patch does two things that are related.  If the end_zone is
unreclaimable, this information is communicated back.  Second, if the
classzone or order was reduced due to failing to reclaim, new information
is not read from pgdat and instead an attempt is made to go to sleep.  Due
to this, it is also necessary that pgdat->classzone_idx be initialised
each time to pgdat->nr_zones - 1 to avoid re-reads being interpreted as
wakeups.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Pádraig Brady <P@draigBrady.com>
Tested-by: Pádraig Brady <P@draigBrady.com>
Tested-by: Andrew Lutomirski <luto@mit.edu>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |   34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff -puN mm/vmscan.c~mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully
+++ a/mm/vmscan.c
@@ -2451,7 +2451,6 @@ loop_again:
 			if (!zone_watermark_ok_safe(zone, order,
 					high_wmark_pages(zone), 0, 0)) {
 				end_zone = i;
-				*classzone_idx = i;
 				break;
 			}
 		}
@@ -2531,8 +2530,11 @@ loop_again:
 			    total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
 				sc.may_writepage = 1;
 
-			if (zone->all_unreclaimable)
+			if (zone->all_unreclaimable) {
+				if (end_zone && end_zone == i)
+					end_zone--;
 				continue;
+			}
 
 			if (!zone_watermark_ok_safe(zone, order,
 					high_wmark_pages(zone), end_zone, 0)) {
@@ -2712,8 +2714,8 @@ static void kswapd_try_to_sleep(pg_data_
  */
 static int kswapd(void *p)
 {
-	unsigned long order;
-	int classzone_idx;
+	unsigned long order, new_order;
+	int classzone_idx, new_classzone_idx;
 	pg_data_t *pgdat = (pg_data_t*)p;
 	struct task_struct *tsk = current;
 
@@ -2743,17 +2745,23 @@ static int kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
-	order = 0;
-	classzone_idx = MAX_NR_ZONES - 1;
+	order = new_order = 0;
+	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
 	for ( ; ; ) {
-		unsigned long new_order;
-		int new_classzone_idx;
 		int ret;
 
-		new_order = pgdat->kswapd_max_order;
-		new_classzone_idx = pgdat->classzone_idx;
-		pgdat->kswapd_max_order = 0;
-		pgdat->classzone_idx = MAX_NR_ZONES - 1;
+		/*
+		 * If the last balance_pgdat was unsuccessful it's unlikely a
+		 * new request of a similar or harder type will succeed soon
+		 * so consider going to sleep on the basis we reclaimed at
+		 */
+		if (classzone_idx >= new_classzone_idx && order == new_order) {
+			new_order = pgdat->kswapd_max_order;
+			new_classzone_idx = pgdat->classzone_idx;
+			pgdat->kswapd_max_order =  0;
+			pgdat->classzone_idx = pgdat->nr_zones - 1;
+		}
+
 		if (order < new_order || classzone_idx > new_classzone_idx) {
 			/*
 			 * Don't sleep if someone wants a larger 'order'
@@ -2766,7 +2774,7 @@ static int kswapd(void *p)
 			order = pgdat->kswapd_max_order;
 			classzone_idx = pgdat->classzone_idx;
 			pgdat->kswapd_max_order = 0;
-			pgdat->classzone_idx = MAX_NR_ZONES - 1;
+			pgdat->classzone_idx = pgdat->nr_zones - 1;
 		}
 
 		ret = try_to_freeze();
_

Patches currently in -mm which might be from mgorman@suse.de are

mm-vmscan-correct-check-for-kswapd-sleeping-in-sleeping_prematurely.patch
mm-vmscan-do-not-apply-pressure-to-slab-if-we-are-not-applying-pressure-to-zone.patch
mm-vmscan-evaluate-the-watermarks-against-the-correct-classzone.patch
mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-06-28 21:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-28 21:53 + mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully.patch added to -mm tree akpm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).