linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim
@ 2009-11-01 15:08 KOSAKI Motohiro
  2009-11-01 15:09 ` [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it KOSAKI Motohiro
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 15:08 UTC (permalink / raw)
  To: Rafael J. Wysocki, Rik van Riel, LKML, linux-mm, Andrew Morton
  Cc: kosaki.motohiro

Currently, sc.scap_cluster_max has double meanings.

 1) reclaim batch size as isolate_lru_pages()'s argument
 2) reclaim baling out thresolds

The two meanings pretty unrelated. Thus, Let's separate it.
this patch doesn't change any behavior.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |   21 +++++++++++++++------
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f805958..6a3eb9f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -55,6 +55,9 @@ struct scan_control {
 	/* Number of pages freed so far during a call to shrink_zones() */
 	unsigned long nr_reclaimed;
 
+	/* How many pages shrink_list() should reclaim */
+	unsigned long nr_to_reclaim;
+
 	/* This context's GFP mask */
 	gfp_t gfp_mask;
 
@@ -1585,6 +1588,7 @@ static void shrink_zone(int priority, struct zone *zone,
 	enum lru_list l;
 	unsigned long nr_reclaimed = sc->nr_reclaimed;
 	unsigned long swap_cluster_max = sc->swap_cluster_max;
+	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 	int noswap = 0;
 
@@ -1634,8 +1638,7 @@ static void shrink_zone(int priority, struct zone *zone,
 		 * with multiple processes reclaiming pages, the total
 		 * freeing target can get unreasonably large.
 		 */
-		if (nr_reclaimed > swap_cluster_max &&
-			priority < DEF_PRIORITY && !current_is_kswapd())
+		if (nr_reclaimed > nr_to_reclaim && priority < DEF_PRIORITY)
 			break;
 	}
 
@@ -1733,6 +1736,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 	struct zoneref *z;
 	struct zone *zone;
 	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
+	unsigned long writeback_threshold;
 
 	delayacct_freepages_start();
 
@@ -1768,7 +1772,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 			}
 		}
 		total_scanned += sc->nr_scanned;
-		if (sc->nr_reclaimed >= sc->swap_cluster_max) {
+		if (sc->nr_reclaimed >= sc->nr_to_reclaim) {
 			ret = sc->nr_reclaimed;
 			goto out;
 		}
@@ -1780,8 +1784,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 		 * that's undesirable in laptop mode, where we *want* lumpy
 		 * writeout.  So in laptop mode, write out the whole world.
 		 */
-		if (total_scanned > sc->swap_cluster_max +
-					sc->swap_cluster_max / 2) {
+		writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2;
+		if (total_scanned > writeback_threshold) {
 			wakeup_flusher_threads(laptop_mode ? 0 : total_scanned);
 			sc->may_writepage = 1;
 		}
@@ -1827,6 +1831,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 		.gfp_mask = gfp_mask,
 		.may_writepage = !laptop_mode,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
+		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.may_unmap = 1,
 		.may_swap = 1,
 		.swappiness = vm_swappiness,
@@ -1885,6 +1890,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.may_unmap = 1,
 		.may_swap = !noswap,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
+		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.swappiness = swappiness,
 		.order = 0,
 		.mem_cgroup = mem_cont,
@@ -1932,6 +1938,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
 		.may_unmap = 1,
 		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
+		.nr_to_reclaim = ULONG_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
 		.mem_cgroup = NULL,
@@ -2549,7 +2556,9 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
 		.may_swap = 1,
 		.swap_cluster_max = max_t(unsigned long, nr_pages,
-					SWAP_CLUSTER_MAX),
+				       SWAP_CLUSTER_MAX),
+		.nr_to_reclaim = max_t(unsigned long, nr_pages,
+				       SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,
 		.swappiness = vm_swappiness,
 		.order = order,
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-01 15:08 [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim KOSAKI Motohiro
@ 2009-11-01 15:09 ` KOSAKI Motohiro
  2009-11-01 15:12   ` Rik van Riel
                     ` (2 more replies)
  2009-11-01 15:11 ` [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage KOSAKI Motohiro
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 15:09 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, Rafael J. Wysocki, Rik van Riel, linux-mm,
	Andrew Morton

shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework
memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max
was introduced by commit a06fe4d307 (Speed freeing memory for suspend).

commit a06fe4d307 said

   Without the patch:
   Freed  14600 pages in  1749 jiffies = 32.61 MB/s (Anomolous!)
   Freed  88563 pages in 14719 jiffies = 23.50 MB/s
   Freed 205734 pages in 32389 jiffies = 24.81 MB/s

   With the patch:
   Freed  68252 pages in   496 jiffies = 537.52 MB/s
   Freed 116464 pages in   569 jiffies = 798.54 MB/s
   Freed 209699 pages in   705 jiffies = 1161.89 MB/s

At that time, their patch was pretty worth. However, Modern Hardware
trend and recent VM improvement broke its worth. From several reason,
I think we should remove shrink_all_zones() at all.

detail:

1) Old days, shrink_zone()'s slowness was mainly caused by stupid io-throttle
  at no i/o congestion.
  but current shrink_zone() is sane, not slow.

2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
  fine on numa system.
  example)
    System has 4GB memory and each node have 2GB. and hibernate need 1GB.

    optimal)
       steal 500MB from each node.
    shrink_all_zones)
       steal 1GB from node-0.

  Oh, Cache balancing logic was broken. ;)
  Unfortunately, Desktop system moved ahead NUMA at nowadays.
  (Side note, if hibernate require 2GB, shrink_all_zones() never success
   on above machine)

3) if the node has several I/O flighting pages, shrink_all_zones() makes
  pretty bad result.

  schenario) hibernate need 1GB

  1) shrink_all_zones() try to reclaim 1GB from Node-0
  2) but it only reclaimed 990MB
  3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1
  4) it reclaimed 990MB

  Oh, well. it reclaimed twice much than required.
  In the other hand, current shrink_zone() has sane baling out logic.
  then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk.

4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only
  shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal.

Then, This patch changed shrink_all_memory() to only the wrapper function of 
do_try_to_free_pages(). it bring good reviewability and debuggability, and solve 
above problems.

side note: Reclaim logic unificication makes two good side effect.
 - Fix recursive reclaim bug on shrink_all_memory().
   it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
 - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |  156 +++++++++++------------------------------------------------
 1 files changed, 28 insertions(+), 128 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6a3eb9f..9862c04 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,6 +58,8 @@ struct scan_control {
 	/* How many pages shrink_list() should reclaim */
 	unsigned long nr_to_reclaim;
 
+	unsigned long hibernation_mode;
+
 	/* This context's GFP mask */
 	gfp_t gfp_mask;
 
@@ -1748,7 +1750,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 	if (scanning_global_lru(sc)) {
 		for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
 
-			if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
+			if (!sc->hibernation_mode &&
+			    !cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
 				continue;
 
 			lru_pages += zone_reclaimable_pages(zone);
@@ -1791,7 +1794,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 		}
 
 		/* Take a nap, wait for some writeback to complete */
-		if (sc->nr_scanned && priority < DEF_PRIORITY - 2)
+		if (!sc->hibernation_mode && sc->nr_scanned &&
+		    priority < DEF_PRIORITY - 2)
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
 	/* top priority shrink_zones still had more to do? don't OOM, then */
@@ -2262,148 +2266,44 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 
 #ifdef CONFIG_HIBERNATION
 /*
- * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
- * from LRU lists system-wide, for given pass and priority.
- *
- * For pass > 3 we also try to shrink the LRU lists that contain a few pages
- */
-static void shrink_all_zones(unsigned long nr_pages, int prio,
-				      int pass, struct scan_control *sc)
-{
-	struct zone *zone;
-	unsigned long nr_reclaimed = 0;
-	struct zone_reclaim_stat *reclaim_stat;
-
-	for_each_populated_zone(zone) {
-		enum lru_list l;
-
-		if (zone_is_all_unreclaimable(zone) && prio != DEF_PRIORITY)
-			continue;
-
-		for_each_evictable_lru(l) {
-			enum zone_stat_item ls = NR_LRU_BASE + l;
-			unsigned long lru_pages = zone_page_state(zone, ls);
-
-			/* For pass = 0, we don't shrink the active list */
-			if (pass == 0 && (l == LRU_ACTIVE_ANON ||
-						l == LRU_ACTIVE_FILE))
-				continue;
-
-			reclaim_stat = get_reclaim_stat(zone, sc);
-			reclaim_stat->nr_saved_scan[l] +=
-						(lru_pages >> prio) + 1;
-			if (reclaim_stat->nr_saved_scan[l]
-						>= nr_pages || pass > 3) {
-				unsigned long nr_to_scan;
-
-				reclaim_stat->nr_saved_scan[l] = 0;
-				nr_to_scan = min(nr_pages, lru_pages);
-				nr_reclaimed += shrink_list(l, nr_to_scan, zone,
-								sc, prio);
-				if (nr_reclaimed >= nr_pages) {
-					sc->nr_reclaimed += nr_reclaimed;
-					return;
-				}
-			}
-		}
-	}
-	sc->nr_reclaimed += nr_reclaimed;
-}
-
-/*
- * Try to free `nr_pages' of memory, system-wide, and return the number of
+ * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
  * freed pages.
  *
  * Rather than trying to age LRUs the aim is to preserve the overall
  * LRU order by reclaiming preferentially
  * inactive > active > active referenced > active mapped
  */
-unsigned long shrink_all_memory(unsigned long nr_pages)
+unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 {
-	unsigned long lru_pages, nr_slab;
-	int pass;
 	struct reclaim_state reclaim_state;
 	struct scan_control sc = {
-		.gfp_mask = GFP_KERNEL,
-		.may_unmap = 0,
+		.gfp_mask = GFP_HIGHUSER_MOVABLE,
+		.may_swap = 1,
+		.may_unmap = 1,
 		.may_writepage = 1,
+		.swap_cluster_max = SWAP_CLUSTER_MAX,
+		.nr_to_reclaim = nr_to_reclaim,
+		.hibernation_mode = 1,
+		.swappiness = vm_swappiness,
+		.order = 0,
 		.isolate_pages = isolate_pages_global,
-		.nr_reclaimed = 0,
 	};
+	struct zonelist * zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
+	struct task_struct *p = current;
+	unsigned long nr_reclaimed;
 
-	current->reclaim_state = &reclaim_state;
-
-	lru_pages = global_reclaimable_pages();
-	nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
-	/* If slab caches are huge, it's better to hit them first */
-	while (nr_slab >= lru_pages) {
-		reclaim_state.reclaimed_slab = 0;
-		shrink_slab(nr_pages, sc.gfp_mask, lru_pages);
-		if (!reclaim_state.reclaimed_slab)
-			break;
-
-		sc.nr_reclaimed += reclaim_state.reclaimed_slab;
-		if (sc.nr_reclaimed >= nr_pages)
-			goto out;
-
-		nr_slab -= reclaim_state.reclaimed_slab;
-	}
-
-	/*
-	 * We try to shrink LRUs in 5 passes:
-	 * 0 = Reclaim from inactive_list only
-	 * 1 = Reclaim from active list but don't reclaim mapped
-	 * 2 = 2nd pass of type 1
-	 * 3 = Reclaim mapped (normal reclaim)
-	 * 4 = 2nd pass of type 3
-	 */
-	for (pass = 0; pass < 5; pass++) {
-		int prio;
-
-		/* Force reclaiming mapped pages in the passes #3 and #4 */
-		if (pass > 2)
-			sc.may_unmap = 1;
-
-		for (prio = DEF_PRIORITY; prio >= 0; prio--) {
-			unsigned long nr_to_scan = nr_pages - sc.nr_reclaimed;
-
-			sc.nr_scanned = 0;
-			sc.swap_cluster_max = nr_to_scan;
-			shrink_all_zones(nr_to_scan, prio, pass, &sc);
-			if (sc.nr_reclaimed >= nr_pages)
-				goto out;
-
-			reclaim_state.reclaimed_slab = 0;
-			shrink_slab(sc.nr_scanned, sc.gfp_mask,
-				    global_reclaimable_pages());
-			sc.nr_reclaimed += reclaim_state.reclaimed_slab;
-			if (sc.nr_reclaimed >= nr_pages)
-				goto out;
-
-			if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
-				congestion_wait(BLK_RW_ASYNC, HZ / 10);
-		}
-	}
-
-	/*
-	 * If sc.nr_reclaimed = 0, we could not shrink LRUs, but there may be
-	 * something in slab caches
-	 */
-	if (!sc.nr_reclaimed) {
-		do {
-			reclaim_state.reclaimed_slab = 0;
-			shrink_slab(nr_pages, sc.gfp_mask,
-				    global_reclaimable_pages());
-			sc.nr_reclaimed += reclaim_state.reclaimed_slab;
-		} while (sc.nr_reclaimed < nr_pages &&
-				reclaim_state.reclaimed_slab > 0);
-	}
+	p->flags |= PF_MEMALLOC;
+	lockdep_set_current_reclaim_state(sc.gfp_mask);
+	reclaim_state.reclaimed_slab = 0;
+	p->reclaim_state = &reclaim_state;
 
+	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
 
-out:
-	current->reclaim_state = NULL;
+	p->reclaim_state = NULL;
+	lockdep_clear_current_reclaim_state();
+	p->flags &= ~PF_MEMALLOC;
 
-	return sc.nr_reclaimed;
+	return nr_reclaimed;
 }
 #endif /* CONFIG_HIBERNATION */
 
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage
  2009-11-01 15:08 [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim KOSAKI Motohiro
  2009-11-01 15:09 ` [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it KOSAKI Motohiro
@ 2009-11-01 15:11 ` KOSAKI Motohiro
  2009-11-01 17:51   ` Rik van Riel
  2009-11-02  0:40   ` Minchan Kim
  2009-11-01 15:12 ` [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max KOSAKI Motohiro
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 15:11 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, Rafael J. Wysocki, Rik van Riel, linux-mm,
	Andrew Morton, Christoph Lameter, Mel Gorman

In old days, we didn't have sc.nr_to_reclaim and it brought
sc.swap_cluster_max misuse.

huge sc.swap_cluster_max might makes unnecessary OOM and
no performance benefit.

Now, we can remove above dangerous one.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9862c04..e463d48 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2455,8 +2455,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
 		.may_swap = 1,
-		.swap_cluster_max = max_t(unsigned long, nr_pages,
-				       SWAP_CLUSTER_MAX),
+		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.nr_to_reclaim = max_t(unsigned long, nr_pages,
 				       SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max
  2009-11-01 15:08 [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim KOSAKI Motohiro
  2009-11-01 15:09 ` [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it KOSAKI Motohiro
  2009-11-01 15:11 ` [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage KOSAKI Motohiro
@ 2009-11-01 15:12 ` KOSAKI Motohiro
  2009-11-01 17:56   ` Rik van Riel
  2009-11-02  0:46   ` Minchan Kim
  2009-11-01 15:13 ` [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone KOSAKI Motohiro
  2009-11-02  0:35 ` [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim Minchan Kim
  4 siblings, 2 replies; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 15:12 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, Rafael J. Wysocki, Rik van Riel, linux-mm,
	Andrew Morton

Now, all caller is settng to swap_cluster_max = SWAP_CLUSTER_MAX.
Then, we can remove it perfectly.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   26 ++++++--------------------
 1 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e463d48..7bdf4f0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -71,12 +71,6 @@ struct scan_control {
 	/* Can pages be swapped as part of reclaim? */
 	int may_swap;
 
-	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
-	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
-	 * In this context, it doesn't matter that we scan the
-	 * whole list at once. */
-	int swap_cluster_max;
-
 	int swappiness;
 
 	int all_unreclaimable;
@@ -1127,7 +1121,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 		unsigned long nr_anon;
 		unsigned long nr_file;
 
-		nr_taken = sc->isolate_pages(sc->swap_cluster_max,
+		nr_taken = sc->isolate_pages(SWAP_CLUSTER_MAX,
 			     &page_list, &nr_scan, sc->order, mode,
 				zone, sc->mem_cgroup, 0, file);
 
@@ -1562,15 +1556,14 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
  * until we collected @swap_cluster_max pages to scan.
  */
 static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
-				       unsigned long *nr_saved_scan,
-				       unsigned long swap_cluster_max)
+				       unsigned long *nr_saved_scan)
 {
 	unsigned long nr;
 
 	*nr_saved_scan += nr_to_scan;
 	nr = *nr_saved_scan;
 
-	if (nr >= swap_cluster_max)
+	if (nr >= SWAP_CLUSTER_MAX)
 		*nr_saved_scan = 0;
 	else
 		nr = 0;
@@ -1589,7 +1582,6 @@ static void shrink_zone(int priority, struct zone *zone,
 	unsigned long percent[2];	/* anon @ 0; file @ 1 */
 	enum lru_list l;
 	unsigned long nr_reclaimed = sc->nr_reclaimed;
-	unsigned long swap_cluster_max = sc->swap_cluster_max;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 	int noswap = 0;
@@ -1612,8 +1604,7 @@ static void shrink_zone(int priority, struct zone *zone,
 			scan = (scan * percent[file]) / 100;
 		}
 		nr[l] = nr_scan_try_batch(scan,
-					  &reclaim_stat->nr_saved_scan[l],
-					  swap_cluster_max);
+					  &reclaim_stat->nr_saved_scan[l]);
 	}
 
 	trace_printk("sz p=%d %d:%s %lu:%lu [%lu %lu %lu %lu] \n",
@@ -1625,7 +1616,8 @@ static void shrink_zone(int priority, struct zone *zone,
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {
 			if (nr[l]) {
-				nr_to_scan = min(nr[l], swap_cluster_max);
+				nr_to_scan = min_t(unsigned long,
+						   nr[l], SWAP_CLUSTER_MAX);
 				nr[l] -= nr_to_scan;
 
 				nr_reclaimed += shrink_list(l, nr_to_scan,
@@ -1834,7 +1826,6 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 	struct scan_control sc = {
 		.gfp_mask = gfp_mask,
 		.may_writepage = !laptop_mode,
-		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.may_unmap = 1,
 		.may_swap = 1,
@@ -1859,7 +1850,6 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 		.may_writepage = !laptop_mode,
 		.may_unmap = 1,
 		.may_swap = !noswap,
-		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = swappiness,
 		.order = 0,
 		.mem_cgroup = mem,
@@ -1893,7 +1883,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.may_writepage = !laptop_mode,
 		.may_unmap = 1,
 		.may_swap = !noswap,
-		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.swappiness = swappiness,
 		.order = 0,
@@ -1941,7 +1930,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
 		.gfp_mask = GFP_KERNEL,
 		.may_unmap = 1,
 		.may_swap = 1,
-		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.nr_to_reclaim = ULONG_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
@@ -2281,7 +2269,6 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 		.may_swap = 1,
 		.may_unmap = 1,
 		.may_writepage = 1,
-		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.nr_to_reclaim = nr_to_reclaim,
 		.hibernation_mode = 1,
 		.swappiness = vm_swappiness,
@@ -2455,7 +2442,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
 		.may_swap = 1,
-		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.nr_to_reclaim = max_t(unsigned long, nr_pages,
 				       SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-01 15:09 ` [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it KOSAKI Motohiro
@ 2009-11-01 15:12   ` Rik van Riel
  2009-11-01 21:38   ` Rafael J. Wysocki
  2009-11-01 22:01   ` Nigel Cunningham
  2 siblings, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2009-11-01 15:12 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Rafael J. Wysocki, linux-mm, Andrew Morton

On 11/01/2009 10:09 AM, KOSAKI Motohiro wrote:

> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Rafael J. Wysocki<rjw@sisk.pl>
> Cc: Rik van Riel<riel@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone
  2009-11-01 15:08 [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim KOSAKI Motohiro
                   ` (2 preceding siblings ...)
  2009-11-01 15:12 ` [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max KOSAKI Motohiro
@ 2009-11-01 15:13 ` KOSAKI Motohiro
  2009-11-01 17:58   ` Rik van Riel
  2009-11-02  0:48   ` Minchan Kim
  2009-11-02  0:35 ` [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim Minchan Kim
  4 siblings, 2 replies; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 15:13 UTC (permalink / raw)
  To: LKML; +Cc: kosaki.motohiro, Rik van Riel, linux-mm, Andrew Morton

Fix small inconsistent of ">" and ">=".

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7bdf4f0..e6ea011 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1632,7 +1632,7 @@ static void shrink_zone(int priority, struct zone *zone,
 		 * with multiple processes reclaiming pages, the total
 		 * freeing target can get unreasonably large.
 		 */
-		if (nr_reclaimed > nr_to_reclaim && priority < DEF_PRIORITY)
+		if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
 			break;
 	}
 
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage
  2009-11-01 15:11 ` [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage KOSAKI Motohiro
@ 2009-11-01 17:51   ` Rik van Riel
  2009-11-02  0:40   ` Minchan Kim
  1 sibling, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2009-11-01 17:51 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Rafael J. Wysocki, linux-mm, Andrew Morton,
	Christoph Lameter, Mel Gorman

On 11/01/2009 10:11 AM, KOSAKI Motohiro wrote:
> In old days, we didn't have sc.nr_to_reclaim and it brought
> sc.swap_cluster_max misuse.
>
> huge sc.swap_cluster_max might makes unnecessary OOM and
> no performance benefit.
>
> Now, we can remove above dangerous one.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max
  2009-11-01 15:12 ` [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max KOSAKI Motohiro
@ 2009-11-01 17:56   ` Rik van Riel
  2009-11-02  0:46   ` Minchan Kim
  1 sibling, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2009-11-01 17:56 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Rafael J. Wysocki, linux-mm, Andrew Morton

On 11/01/2009 10:12 AM, KOSAKI Motohiro wrote:
> Now, all caller is settng to swap_cluster_max = SWAP_CLUSTER_MAX.
> Then, we can remove it perfectly.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone
  2009-11-01 15:13 ` [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone KOSAKI Motohiro
@ 2009-11-01 17:58   ` Rik van Riel
  2009-11-02  0:48   ` Minchan Kim
  1 sibling, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2009-11-01 17:58 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

On 11/01/2009 10:13 AM, KOSAKI Motohiro wrote:
> Fix small inconsistent of ">" and">=".

Nice catch.

> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-01 15:09 ` [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it KOSAKI Motohiro
  2009-11-01 15:12   ` Rik van Riel
@ 2009-11-01 21:38   ` Rafael J. Wysocki
  2009-11-02 15:35     ` KOSAKI Motohiro
  2009-11-01 22:01   ` Nigel Cunningham
  2 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-01 21:38 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Rik van Riel, linux-mm, Andrew Morton

On Sunday 01 November 2009, KOSAKI Motohiro wrote:
> shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework
> memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max
> was introduced by commit a06fe4d307 (Speed freeing memory for suspend).
> 
> commit a06fe4d307 said
> 
>    Without the patch:
>    Freed  14600 pages in  1749 jiffies = 32.61 MB/s (Anomolous!)
>    Freed  88563 pages in 14719 jiffies = 23.50 MB/s
>    Freed 205734 pages in 32389 jiffies = 24.81 MB/s
> 
>    With the patch:
>    Freed  68252 pages in   496 jiffies = 537.52 MB/s
>    Freed 116464 pages in   569 jiffies = 798.54 MB/s
>    Freed 209699 pages in   705 jiffies = 1161.89 MB/s
> 
> At that time, their patch was pretty worth. However, Modern Hardware
> trend and recent VM improvement broke its worth. From several reason,
> I think we should remove shrink_all_zones() at all.
> 
> detail:
> 
> 1) Old days, shrink_zone()'s slowness was mainly caused by stupid io-throttle
>   at no i/o congestion.
>   but current shrink_zone() is sane, not slow.
> 
> 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
>   fine on numa system.
>   example)
>     System has 4GB memory and each node have 2GB. and hibernate need 1GB.
> 
>     optimal)
>        steal 500MB from each node.
>     shrink_all_zones)
>        steal 1GB from node-0.
> 
>   Oh, Cache balancing logic was broken. ;)
>   Unfortunately, Desktop system moved ahead NUMA at nowadays.
>   (Side note, if hibernate require 2GB, shrink_all_zones() never success
>    on above machine)
> 
> 3) if the node has several I/O flighting pages, shrink_all_zones() makes
>   pretty bad result.
> 
>   schenario) hibernate need 1GB
> 
>   1) shrink_all_zones() try to reclaim 1GB from Node-0
>   2) but it only reclaimed 990MB
>   3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1
>   4) it reclaimed 990MB
> 
>   Oh, well. it reclaimed twice much than required.
>   In the other hand, current shrink_zone() has sane baling out logic.
>   then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk.
> 
> 4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only
>   shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal.
> 
> Then, This patch changed shrink_all_memory() to only the wrapper function of 
> do_try_to_free_pages(). it bring good reviewability and debuggability, and solve 
> above problems.
> 
> side note: Reclaim logic unificication makes two good side effect.
>  - Fix recursive reclaim bug on shrink_all_memory().
>    it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
>  - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.

As I said previously, I don't really see a reason to keep shrink_all_memory().

Do you think that removing it will result in performance degradation?

Rafael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-01 15:09 ` [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it KOSAKI Motohiro
  2009-11-01 15:12   ` Rik van Riel
  2009-11-01 21:38   ` Rafael J. Wysocki
@ 2009-11-01 22:01   ` Nigel Cunningham
  2009-11-02 15:35     ` KOSAKI Motohiro
  2 siblings, 1 reply; 32+ messages in thread
From: Nigel Cunningham @ 2009-11-01 22:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Rafael J. Wysocki, Rik van Riel, linux-mm, Andrew Morton

Hi.

KOSAKI Motohiro wrote:
> shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework
> memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max
> was introduced by commit a06fe4d307 (Speed freeing memory for suspend).
> 
> commit a06fe4d307 said
> 
>    Without the patch:
>    Freed  14600 pages in  1749 jiffies = 32.61 MB/s (Anomolous!)
>    Freed  88563 pages in 14719 jiffies = 23.50 MB/s
>    Freed 205734 pages in 32389 jiffies = 24.81 MB/s
> 
>    With the patch:
>    Freed  68252 pages in   496 jiffies = 537.52 MB/s
>    Freed 116464 pages in   569 jiffies = 798.54 MB/s
>    Freed 209699 pages in   705 jiffies = 1161.89 MB/s
> 
> At that time, their patch was pretty worth. However, Modern Hardware
> trend and recent VM improvement broke its worth. From several reason,
> I think we should remove shrink_all_zones() at all.
> 
> detail:
> 
> 1) Old days, shrink_zone()'s slowness was mainly caused by stupid io-throttle
>   at no i/o congestion.
>   but current shrink_zone() is sane, not slow.
> 
> 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
>   fine on numa system.
>   example)
>     System has 4GB memory and each node have 2GB. and hibernate need 1GB.
> 
>     optimal)
>        steal 500MB from each node.
>     shrink_all_zones)
>        steal 1GB from node-0.

I haven't given much thought to numa awareness in hibernate code, but I
can say that the shrink_all_memory interface is woefully inadequate as
far as zone awareness goes. Since lowmem needs to be atomically restored
before we can restore highmem, we really need to be able to ask for a
particular number of pages of a particular zone type to be freed.

>   Oh, Cache balancing logic was broken. ;)
>   Unfortunately, Desktop system moved ahead NUMA at nowadays.
>   (Side note, if hibernate require 2GB, shrink_all_zones() never success
>    on above machine)
> 
> 3) if the node has several I/O flighting pages, shrink_all_zones() makes
>   pretty bad result.
> 
>   schenario) hibernate need 1GB
> 
>   1) shrink_all_zones() try to reclaim 1GB from Node-0
>   2) but it only reclaimed 990MB
>   3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1
>   4) it reclaimed 990MB
> 
>   Oh, well. it reclaimed twice much than required.
>   In the other hand, current shrink_zone() has sane baling out logic.
>   then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk.

Yes, this is bad.

> 4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only
>   shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal.

I don't follow your logic here. Without being a mm expert, I'd imagine
that it shouldn't matter that much if most of the inactive list gets freed.

Regards,

Nigel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim
  2009-11-01 15:08 [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim KOSAKI Motohiro
                   ` (3 preceding siblings ...)
  2009-11-01 15:13 ` [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone KOSAKI Motohiro
@ 2009-11-02  0:35 ` Minchan Kim
  2009-11-02  0:48   ` Minchan Kim
  2009-11-02 15:35   ` KOSAKI Motohiro
  4 siblings, 2 replies; 32+ messages in thread
From: Minchan Kim @ 2009-11-02  0:35 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Rafael J. Wysocki, Rik van Riel, LKML, linux-mm, Andrew Morton

Hi, Kosaki.

On Mon, 2 Nov 2009 00:08:44 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Currently, sc.scap_cluster_max has double meanings.
> 
>  1) reclaim batch size as isolate_lru_pages()'s argument
>  2) reclaim baling out thresolds
> 
> The two meanings pretty unrelated. Thus, Let's separate it.
> this patch doesn't change any behavior.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> ---
>  mm/vmscan.c |   21 +++++++++++++++------
>  1 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f805958..6a3eb9f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -55,6 +55,9 @@ struct scan_control {
>  	/* Number of pages freed so far during a call to shrink_zones() */
>  	unsigned long nr_reclaimed;
>  
> +	/* How many pages shrink_list() should reclaim */
> +	unsigned long nr_to_reclaim;

If you try to divide meaning of swap_cluster_max, 
How about changing 'swap_cluster_max', too?

It has a meaning which represents 'batch size'. ;)
I hope we change it in this chance.

> +
>  	/* This context's GFP mask */
>  	gfp_t gfp_mask;
>  
> @@ -1585,6 +1588,7 @@ static void shrink_zone(int priority, struct zone *zone,
>  	enum lru_list l;
>  	unsigned long nr_reclaimed = sc->nr_reclaimed;
>  	unsigned long swap_cluster_max = sc->swap_cluster_max;
> +	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>  	int noswap = 0;
>  
> @@ -1634,8 +1638,7 @@ static void shrink_zone(int priority, struct zone *zone,
>  		 * with multiple processes reclaiming pages, the total
>  		 * freeing target can get unreasonably large.
>  		 */

How about adding following comment or more good one.
'kswapd doesn't bail out because it has 'MAX' nr_to_reclaim'.

> -		if (nr_reclaimed > swap_cluster_max &&
> -			priority < DEF_PRIORITY && !current_is_kswapd())
> +		if (nr_reclaimed > nr_to_reclaim && priority < DEF_PRIORITY)
>  			break;
>  	}
>  
> @@ -1733,6 +1736,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>  	struct zoneref *z;
>  	struct zone *zone;
>  	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
> +	unsigned long writeback_threshold;
>  
>  	delayacct_freepages_start();
>  
> @@ -1768,7 +1772,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>  			}
>  		}
>  		total_scanned += sc->nr_scanned;
> -		if (sc->nr_reclaimed >= sc->swap_cluster_max) {
> +		if (sc->nr_reclaimed >= sc->nr_to_reclaim) {
>  			ret = sc->nr_reclaimed;
>  			goto out;
>  		}
> @@ -1780,8 +1784,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>  		 * that's undesirable in laptop mode, where we *want* lumpy
>  		 * writeout.  So in laptop mode, write out the whole world.
>  		 */
> -		if (total_scanned > sc->swap_cluster_max +
> -					sc->swap_cluster_max / 2) {
> +		writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2;
> +		if (total_scanned > writeback_threshold) {
>  			wakeup_flusher_threads(laptop_mode ? 0 : total_scanned);
>  			sc->may_writepage = 1;
>  		}
> @@ -1827,6 +1831,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  		.gfp_mask = gfp_mask,
>  		.may_writepage = !laptop_mode,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.nr_to_reclaim = SWAP_CLUSTER_MAX,
>  		.may_unmap = 1,
>  		.may_swap = 1,
>  		.swappiness = vm_swappiness,
> @@ -1885,6 +1890,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  		.may_unmap = 1,
>  		.may_swap = !noswap,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.nr_to_reclaim = SWAP_CLUSTER_MAX,
>  		.swappiness = swappiness,
>  		.order = 0,
>  		.mem_cgroup = mem_cont,
> @@ -1932,6 +1938,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
>  		.may_unmap = 1,
>  		.may_swap = 1,
>  
		.swap_cluster_max = SWAP_CLUSTER_MAX,
Or add comment in here. 

'kswapd doesn't want to be bailed out while reclaim.'

> +		.nr_to_reclaim = ULONG_MAX,
>  		.swappiness = vm_swappiness,
>  		.order = order,
>  		.mem_cgroup = NULL,
> @@ -2549,7 +2556,9 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
>  		.may_swap = 1,
>  		.swap_cluster_max = max_t(unsigned long, nr_pages,
> -					SWAP_CLUSTER_MAX),
> +				       SWAP_CLUSTER_MAX),
> +		.nr_to_reclaim = max_t(unsigned long, nr_pages,
> +				       SWAP_CLUSTER_MAX),
>  		.gfp_mask = gfp_mask,
>  		.swappiness = vm_swappiness,
>  		.order = order,
> -- 
> 1.6.2.5
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage
  2009-11-01 15:11 ` [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage KOSAKI Motohiro
  2009-11-01 17:51   ` Rik van Riel
@ 2009-11-02  0:40   ` Minchan Kim
  1 sibling, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-11-02  0:40 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Rafael J. Wysocki, Rik van Riel, linux-mm, Andrew Morton,
	Christoph Lameter, Mel Gorman

On Mon, 2 Nov 2009 00:11:02 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> In old days, we didn't have sc.nr_to_reclaim and it brought
> sc.swap_cluster_max misuse.
> 
> huge sc.swap_cluster_max might makes unnecessary OOM and
> no performance benefit.
> 
> Now, we can remove above dangerous one.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max
  2009-11-01 15:12 ` [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max KOSAKI Motohiro
  2009-11-01 17:56   ` Rik van Riel
@ 2009-11-02  0:46   ` Minchan Kim
  1 sibling, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-11-02  0:46 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Rafael J. Wysocki, Rik van Riel, linux-mm, Andrew Morton

On Mon, 2 Nov 2009 00:12:02 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Now, all caller is settng to swap_cluster_max = SWAP_CLUSTER_MAX.
> Then, we can remove it perfectly.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and  sc.nr_max_reclaim
  2009-11-02  0:35 ` [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim Minchan Kim
@ 2009-11-02  0:48   ` Minchan Kim
  2009-11-02 15:35   ` KOSAKI Motohiro
  1 sibling, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-11-02  0:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Rafael J. Wysocki, Rik van Riel, LKML, linux-mm, Andrew Morton

On Mon, Nov 2, 2009 at 9:35 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi, Kosaki.
>
> On Mon, 2 Nov 2009 00:08:44 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
>> Currently, sc.scap_cluster_max has double meanings.
>>
>>  1) reclaim batch size as isolate_lru_pages()'s argument
>>  2) reclaim baling out thresolds
>>
>> The two meanings pretty unrelated. Thus, Let's separate it.
>> this patch doesn't change any behavior.
>>
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Rafael J. Wysocki <rjw@sisk.pl>
>> Reviewed-by: Rik van Riel <riel@redhat.com>
>> ---
>>  mm/vmscan.c |   21 +++++++++++++++------
>>  1 files changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index f805958..6a3eb9f 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -55,6 +55,9 @@ struct scan_control {
>>       /* Number of pages freed so far during a call to shrink_zones() */
>>       unsigned long nr_reclaimed;
>>
>> +     /* How many pages shrink_list() should reclaim */
>> +     unsigned long nr_to_reclaim;
>
> If you try to divide meaning of swap_cluster_max,
> How about changing 'swap_cluster_max', too?
>
> It has a meaning which represents 'batch size'. ;)
> I hope we change it in this chance.

I see the your 4th patch 'Kill sc.swap_cluster_max' now.
It's good to me. Forget this comment. :)


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone
  2009-11-01 15:13 ` [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone KOSAKI Motohiro
  2009-11-01 17:58   ` Rik van Riel
@ 2009-11-02  0:48   ` Minchan Kim
  1 sibling, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-11-02  0:48 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Rik van Riel, linux-mm, Andrew Morton

On Mon, 2 Nov 2009 00:13:04 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Fix small inconsistent of ">" and ">=".
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-01 21:38   ` Rafael J. Wysocki
@ 2009-11-02 15:35     ` KOSAKI Motohiro
  2009-11-02 19:03       ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-02 15:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: kosaki.motohiro, LKML, Rik van Riel, linux-mm, Andrew Morton

> > Then, This patch changed shrink_all_memory() to only the wrapper function of 
> > do_try_to_free_pages(). it bring good reviewability and debuggability, and solve 
> > above problems.
> > 
> > side note: Reclaim logic unificication makes two good side effect.
> >  - Fix recursive reclaim bug on shrink_all_memory().
> >    it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
> >  - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.
> 
> As I said previously, I don't really see a reason to keep shrink_all_memory().
> 
> Do you think that removing it will result in performance degradation?

Hmm...
Probably, I misunderstood your mention. I thought you suggested to kill
all hibernation specific reclaim code. I did. It's no performance degression.
(At least, I didn't observe)

But, if you hope to kill shrink_all_memory() function itsef, the short answer is,
it's impossible.

Current VM reclaim code need some preparetion to caller, and there are existing in
both alloc_pages_slowpath() and try_to_free_pages(). We can't omit its preparation.

Please see following shrink_all_memory() code. it's pretty small. it only have
few vmscan preparation. I don't think it is hard to maintainance.


=====================================================
unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
{
        struct reclaim_state reclaim_state;
        struct scan_control sc = {
                .gfp_mask = GFP_HIGHUSER_MOVABLE,
                .may_swap = 1,
                .may_unmap = 1,
                .may_writepage = 1,
                .nr_to_reclaim = nr_to_reclaim,
                .hibernation_mode = 1,
                .swappiness = vm_swappiness,
                .order = 0,
                .isolate_pages = isolate_pages_global,
        };
        struct zonelist * zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
        struct task_struct *p = current;
        unsigned long nr_reclaimed;

        p->flags |= PF_MEMALLOC;
        lockdep_set_current_reclaim_state(sc.gfp_mask);
        reclaim_state.reclaimed_slab = 0;
        p->reclaim_state = &reclaim_state;

        nr_reclaimed = do_try_to_free_pages(zonelist, &sc);

        p->reclaim_state = NULL;
        lockdep_clear_current_reclaim_state();
        p->flags &= ~PF_MEMALLOC;

        return nr_reclaimed;
}



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim
  2009-11-02  0:35 ` [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim Minchan Kim
  2009-11-02  0:48   ` Minchan Kim
@ 2009-11-02 15:35   ` KOSAKI Motohiro
  2009-11-02 23:34     ` Minchan Kim
  1 sibling, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-02 15:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Rafael J. Wysocki, Rik van Riel, LKML, linux-mm,
	Andrew Morton

Hi

> > @@ -1932,6 +1938,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
> >  		.may_unmap = 1,
> >  		.may_swap = 1,
> >  
> 		.swap_cluster_max = SWAP_CLUSTER_MAX,
> Or add comment in here. 
> 
> 'kswapd doesn't want to be bailed out while reclaim.'

OK, reasonable.
How about this?



---
 mm/vmscan.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7fb3435..84e4da0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1930,6 +1930,10 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
 		.gfp_mask = GFP_KERNEL,
 		.may_unmap = 1,
 		.may_swap = 1,
+		/*
+		 * kswapd doesn't want to be bailed out while reclaim. because
+		 * we want to put equal scanning pressure on each zone.
+		 */
 		.nr_to_reclaim = ULONG_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
-- 
1.6.2.5





^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-01 22:01   ` Nigel Cunningham
@ 2009-11-02 15:35     ` KOSAKI Motohiro
  2009-11-02 19:05       ` Rafael J. Wysocki
  2009-11-02 21:19       ` Nigel Cunningham
  0 siblings, 2 replies; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-02 15:35 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: kosaki.motohiro, LKML, Rafael J. Wysocki, Rik van Riel, linux-mm,
	Andrew Morton

Hi

Thank you for the reviewing :)

> > 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
> >   fine on numa system.
> >   example)
> >     System has 4GB memory and each node have 2GB. and hibernate need 1GB.
> > 
> >     optimal)
> >        steal 500MB from each node.
> >     shrink_all_zones)
> >        steal 1GB from node-0.
> 
> I haven't given much thought to numa awareness in hibernate code, but I
> can say that the shrink_all_memory interface is woefully inadequate as
> far as zone awareness goes. Since lowmem needs to be atomically restored
> before we can restore highmem, we really need to be able to ask for a
> particular number of pages of a particular zone type to be freed.

Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
per-zone number of freed pages information? if hibernation don't need highmem.
following incremental patch prevent highmem reclaim perfectly. Is it enough?


---
 mm/vmscan.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e6ea011..7fb3435 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2265,7 +2265,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 {
 	struct reclaim_state reclaim_state;
 	struct scan_control sc = {
-		.gfp_mask = GFP_HIGHUSER_MOVABLE,
+		.gfp_mask = GFP_KERNEL,
 		.may_swap = 1,
 		.may_unmap = 1,
 		.may_writepage = 1,
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-02 15:35     ` KOSAKI Motohiro
@ 2009-11-02 19:03       ` Rafael J. Wysocki
  2009-11-03 14:00         ` KOSAKI Motohiro
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-02 19:03 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Rik van Riel, linux-mm, Andrew Morton

On Monday 02 November 2009, KOSAKI Motohiro wrote:
> > > Then, This patch changed shrink_all_memory() to only the wrapper function of 
> > > do_try_to_free_pages(). it bring good reviewability and debuggability, and solve 
> > > above problems.
> > > 
> > > side note: Reclaim logic unificication makes two good side effect.
> > >  - Fix recursive reclaim bug on shrink_all_memory().
> > >    it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
> > >  - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.
> > 
> > As I said previously, I don't really see a reason to keep shrink_all_memory().
> > 
> > Do you think that removing it will result in performance degradation?
> 
> Hmm...
> Probably, I misunderstood your mention. I thought you suggested to kill
> all hibernation specific reclaim code. I did. It's no performance degression.
> (At least, I didn't observe)
> 
> But, if you hope to kill shrink_all_memory() function itsef, the short answer is,
> it's impossible.
> 
> Current VM reclaim code need some preparetion to caller, and there are existing in
> both alloc_pages_slowpath() and try_to_free_pages(). We can't omit its preparation.

Well, my grepping for 'shrink_all_memory' throughout the entire kernel source
code seems to indicate that hibernate_preallocate_memory() is the only current
user of it.  I may be wrong, but I doubt it, unless some new users have been
added since 2.6.31.

In case I'm not wrong, it should be safe to drop it from
hibernate_preallocate_memory(), because it's there for performance reasons
only.  Now, since hibernate_preallocate_memory() appears to be the only user of
it, it should be safe to drop it entirely.

> Please see following shrink_all_memory() code. it's pretty small. it only have
> few vmscan preparation. I don't think it is hard to maintainance.

No, it's not, but I'm really not sure it's worth keeping.

Thanks,
Rafael


> =====================================================
> unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> {
>         struct reclaim_state reclaim_state;
>         struct scan_control sc = {
>                 .gfp_mask = GFP_HIGHUSER_MOVABLE,
>                 .may_swap = 1,
>                 .may_unmap = 1,
>                 .may_writepage = 1,
>                 .nr_to_reclaim = nr_to_reclaim,
>                 .hibernation_mode = 1,
>                 .swappiness = vm_swappiness,
>                 .order = 0,
>                 .isolate_pages = isolate_pages_global,
>         };
>         struct zonelist * zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
>         struct task_struct *p = current;
>         unsigned long nr_reclaimed;
> 
>         p->flags |= PF_MEMALLOC;
>         lockdep_set_current_reclaim_state(sc.gfp_mask);
>         reclaim_state.reclaimed_slab = 0;
>         p->reclaim_state = &reclaim_state;
> 
>         nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
> 
>         p->reclaim_state = NULL;
>         lockdep_clear_current_reclaim_state();
>         p->flags &= ~PF_MEMALLOC;
> 
>         return nr_reclaimed;
> }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-02 15:35     ` KOSAKI Motohiro
@ 2009-11-02 19:05       ` Rafael J. Wysocki
  2009-11-02 21:19       ` Nigel Cunningham
  1 sibling, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-02 19:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Nigel Cunningham, LKML, Rik van Riel, linux-mm, Andrew Morton

On Monday 02 November 2009, KOSAKI Motohiro wrote:
> Hi
> 
> Thank you for the reviewing :)
> 
> > > 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
> > >   fine on numa system.
> > >   example)
> > >     System has 4GB memory and each node have 2GB. and hibernate need 1GB.
> > > 
> > >     optimal)
> > >        steal 500MB from each node.
> > >     shrink_all_zones)
> > >        steal 1GB from node-0.
> > 
> > I haven't given much thought to numa awareness in hibernate code, but I
> > can say that the shrink_all_memory interface is woefully inadequate as
> > far as zone awareness goes. Since lowmem needs to be atomically restored
> > before we can restore highmem, we really need to be able to ask for a
> > particular number of pages of a particular zone type to be freed.
> 
> Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
> per-zone number of freed pages information? if hibernation don't need highmem.

It does need highmem.  At least the mainline version does.

> following incremental patch prevent highmem reclaim perfectly. Is it enough?

Thanks,
Rafael

 
> ---
>  mm/vmscan.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index e6ea011..7fb3435 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2265,7 +2265,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
>  {
>  	struct reclaim_state reclaim_state;
>  	struct scan_control sc = {
> -		.gfp_mask = GFP_HIGHUSER_MOVABLE,
> +		.gfp_mask = GFP_KERNEL,
>  		.may_swap = 1,
>  		.may_unmap = 1,
>  		.may_writepage = 1,

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-02 15:35     ` KOSAKI Motohiro
  2009-11-02 19:05       ` Rafael J. Wysocki
@ 2009-11-02 21:19       ` Nigel Cunningham
  2009-11-03 11:30         ` Rafael J. Wysocki
  2009-11-03 14:00         ` KOSAKI Motohiro
  1 sibling, 2 replies; 32+ messages in thread
From: Nigel Cunningham @ 2009-11-02 21:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Rafael J. Wysocki, Rik van Riel, linux-mm, Andrew Morton

Hi.

KOSAKI Motohiro wrote:
>> I haven't given much thought to numa awareness in hibernate code, but I
>> can say that the shrink_all_memory interface is woefully inadequate as
>> far as zone awareness goes. Since lowmem needs to be atomically restored
>> before we can restore highmem, we really need to be able to ask for a
>> particular number of pages of a particular zone type to be freed.
> 
> Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
> per-zone number of freed pages information? if hibernation don't need highmem.
> following incremental patch prevent highmem reclaim perfectly. Is it enough?

(Disclaimer: I don't think about highmem a lot any more, and might have
forgotten some of the details, or swsusp's algorithms might have
changed. Rafael might need to correct some of this...)

Imagine that you have a system with 1000 pages of lowmem and 5000 pages
of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
are in use.

In order to to be able to save an image, we need to be able to do an
atomic copy of those lowmem pages.

You might think that we could just copy everything into the spare
highmem pages, but we can't because mapping and unmapping the highmem
pages as we copy the data will leave us with an inconsistent copy.
Depending on the configuration, it might (for example) have one page -
say on in the pagetables - reflecting one page being kmapped and another
page - containing the variables that record what kmap slots are used,
for example - recording a different page being kmapped.

What we do, then, is seek to atomically copy the lowmem pages to lowmem.
That requires, however, that we have at least half of the lowmem pages
free. So, then, we need a function that lets us free lowmem pages only.

I hope that makes it clearer.

> ---
>  mm/vmscan.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index e6ea011..7fb3435 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2265,7 +2265,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
>  {
>  	struct reclaim_state reclaim_state;
>  	struct scan_control sc = {
> -		.gfp_mask = GFP_HIGHUSER_MOVABLE,
> +		.gfp_mask = GFP_KERNEL,
>  		.may_swap = 1,
>  		.may_unmap = 1,
>  		.may_writepage = 1,

I don't think so. I think what we really need is:

shrink_memory_type(gfp_mask, pages_needed)

That is, a function that would let us say "Free 489 pages of lowmem" or
"Free 983 pages of highmem" or "Free 340 pages of any kind of memory".
(The later might be used if we just want to free some pages because the
image as it stands is too big for the storage available).

Regards,

Nigel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim
  2009-11-02 15:35   ` KOSAKI Motohiro
@ 2009-11-02 23:34     ` Minchan Kim
  0 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-11-02 23:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Minchan Kim, Rafael J. Wysocki, Rik van Riel, LKML, linux-mm,
	Andrew Morton

On Tue, 3 Nov 2009 00:35:30 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Hi
> 
> > > @@ -1932,6 +1938,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
> > >  		.may_unmap = 1,
> > >  		.may_swap = 1,
> > >  
> > 		.swap_cluster_max = SWAP_CLUSTER_MAX,
> > Or add comment in here. 
> > 
> > 'kswapd doesn't want to be bailed out while reclaim.'
> 
> OK, reasonable.
> How about this?
> 
>
> 
> ---
>  mm/vmscan.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7fb3435..84e4da0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1930,6 +1930,10 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
>  		.gfp_mask = GFP_KERNEL,
>  		.may_unmap = 1,
>  		.may_swap = 1,
> +		/*
> +		 * kswapd doesn't want to be bailed out while reclaim. because
> +		 * we want to put equal scanning pressure on each zone.
> +		 */
>  		.nr_to_reclaim = ULONG_MAX,
>  		.swappiness = vm_swappiness,
>  		.order = order,
> -- 
> 1.6.2.5
> 


Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
It looks good than mine.
Thanks, Kosaki. :)



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-02 21:19       ` Nigel Cunningham
@ 2009-11-03 11:30         ` Rafael J. Wysocki
  2009-11-03 21:12           ` Nigel Cunningham
  2009-11-03 14:00         ` KOSAKI Motohiro
  1 sibling, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-03 11:30 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: KOSAKI Motohiro, LKML, Rik van Riel, linux-mm, Andrew Morton

On Monday 02 November 2009, Nigel Cunningham wrote:
> Hi.

Hi,

> KOSAKI Motohiro wrote:
> >> I haven't given much thought to numa awareness in hibernate code, but I
> >> can say that the shrink_all_memory interface is woefully inadequate as
> >> far as zone awareness goes. Since lowmem needs to be atomically restored
> >> before we can restore highmem, we really need to be able to ask for a
> >> particular number of pages of a particular zone type to be freed.
> > 
> > Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
> > per-zone number of freed pages information? if hibernation don't need highmem.
> > following incremental patch prevent highmem reclaim perfectly. Is it enough?
> 
> (Disclaimer: I don't think about highmem a lot any more, and might have
> forgotten some of the details, or swsusp's algorithms might have
> changed. Rafael might need to correct some of this...)
> 
> Imagine that you have a system with 1000 pages of lowmem and 5000 pages
> of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
> are in use.
> 
> In order to to be able to save an image, we need to be able to do an
> atomic copy of those lowmem pages.
> 
> You might think that we could just copy everything into the spare
> highmem pages, but we can't because mapping and unmapping the highmem
> pages as we copy the data will leave us with an inconsistent copy.

This isn't the case any more for the mainline hibernate code.  We use highmem
for storing image data as well as lowmem.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-02 19:03       ` Rafael J. Wysocki
@ 2009-11-03 14:00         ` KOSAKI Motohiro
  2009-11-03 21:51           ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-03 14:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: kosaki.motohiro, LKML, Rik van Riel, linux-mm, Andrew Morton

> On Monday 02 November 2009, KOSAKI Motohiro wrote:
> > > > Then, This patch changed shrink_all_memory() to only the wrapper function of 
> > > > do_try_to_free_pages(). it bring good reviewability and debuggability, and solve 
> > > > above problems.
> > > > 
> > > > side note: Reclaim logic unificication makes two good side effect.
> > > >  - Fix recursive reclaim bug on shrink_all_memory().
> > > >    it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
> > > >  - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.
> > > 
> > > As I said previously, I don't really see a reason to keep shrink_all_memory().
> > > 
> > > Do you think that removing it will result in performance degradation?
> > 
> > Hmm...
> > Probably, I misunderstood your mention. I thought you suggested to kill
> > all hibernation specific reclaim code. I did. It's no performance degression.
> > (At least, I didn't observe)
> > 
> > But, if you hope to kill shrink_all_memory() function itsef, the short answer is,
> > it's impossible.
> > 
> > Current VM reclaim code need some preparetion to caller, and there are existing in
> > both alloc_pages_slowpath() and try_to_free_pages(). We can't omit its preparation.
> 
> Well, my grepping for 'shrink_all_memory' throughout the entire kernel source
> code seems to indicate that hibernate_preallocate_memory() is the only current
> user of it.  I may be wrong, but I doubt it, unless some new users have been
> added since 2.6.31.
> 
> In case I'm not wrong, it should be safe to drop it from
> hibernate_preallocate_memory(), because it's there for performance reasons
> only.  Now, since hibernate_preallocate_memory() appears to be the only user of
> it, it should be safe to drop it entirely.

Hmmm...
I've try the dropping shrink_all_memory() today. but I've got bad result.

In 3 times test, result were

 2 times: kernel hang-up ;)
 1 time:   success, but make slower than with shrink_all_memory() about 100x times.


Did you try to drop it yourself on your machine? Is this success?



> > Please see following shrink_all_memory() code. it's pretty small. it only have
> > few vmscan preparation. I don't think it is hard to maintainance.
> 
> No, it's not, but I'm really not sure it's worth keeping.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-02 21:19       ` Nigel Cunningham
  2009-11-03 11:30         ` Rafael J. Wysocki
@ 2009-11-03 14:00         ` KOSAKI Motohiro
  2009-11-03 21:52           ` Rafael J. Wysocki
  1 sibling, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2009-11-03 14:00 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: kosaki.motohiro, LKML, Rafael J. Wysocki, Rik van Riel, linux-mm,
	Andrew Morton

> Hi.
> 
> KOSAKI Motohiro wrote:
> >> I haven't given much thought to numa awareness in hibernate code, but I
> >> can say that the shrink_all_memory interface is woefully inadequate as
> >> far as zone awareness goes. Since lowmem needs to be atomically restored
> >> before we can restore highmem, we really need to be able to ask for a
> >> particular number of pages of a particular zone type to be freed.
> > 
> > Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
> > per-zone number of freed pages information? if hibernation don't need highmem.
> > following incremental patch prevent highmem reclaim perfectly. Is it enough?
> 
> (Disclaimer: I don't think about highmem a lot any more, and might have
> forgotten some of the details, or swsusp's algorithms might have
> changed. Rafael might need to correct some of this...)
> 
> Imagine that you have a system with 1000 pages of lowmem and 5000 pages
> of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
> are in use.
> 
> In order to to be able to save an image, we need to be able to do an
> atomic copy of those lowmem pages.
> 
> You might think that we could just copy everything into the spare
> highmem pages, but we can't because mapping and unmapping the highmem
> pages as we copy the data will leave us with an inconsistent copy.
> Depending on the configuration, it might (for example) have one page -
> say on in the pagetables - reflecting one page being kmapped and another
> page - containing the variables that record what kmap slots are used,
> for example - recording a different page being kmapped.
> 
> What we do, then, is seek to atomically copy the lowmem pages to lowmem.
> That requires, however, that we have at least half of the lowmem pages
> free. So, then, we need a function that lets us free lowmem pages only.
> 
> I hope that makes it clearer.
> 
> > ---
> >  mm/vmscan.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index e6ea011..7fb3435 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2265,7 +2265,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> >  {
> >  	struct reclaim_state reclaim_state;
> >  	struct scan_control sc = {
> > -		.gfp_mask = GFP_HIGHUSER_MOVABLE,
> > +		.gfp_mask = GFP_KERNEL,
> >  		.may_swap = 1,
> >  		.may_unmap = 1,
> >  		.may_writepage = 1,
> 
> I don't think so. I think what we really need is:
> 
> shrink_memory_type(gfp_mask, pages_needed)
> 
> That is, a function that would let us say "Free 489 pages of lowmem" or
> "Free 983 pages of highmem" or "Free 340 pages of any kind of memory".
> (The later might be used if we just want to free some pages because the
> image as it stands is too big for the storage available).

I can add gfp_mask argument to shrink_all_memory(). it's easy.
but I obviously need to help from PM folks for meaningful change. I'm not sure
How should we calculate required free memory for hibernation.

Sidenote, current reclaim logic can't do  "Free 983 pages of highmem".
but I doubt it's really necessary. I guess we only need "reclaim lowmem" and/or
"reclaim any kind of memory".




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-03 11:30         ` Rafael J. Wysocki
@ 2009-11-03 21:12           ` Nigel Cunningham
  2009-11-03 22:00             ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: Nigel Cunningham @ 2009-11-03 21:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: KOSAKI Motohiro, LKML, Rik van Riel, linux-mm, Andrew Morton

Hi Rafael.

Rafael J. Wysocki wrote:
> On Monday 02 November 2009, Nigel Cunningham wrote:
>> Hi.
> 
> Hi,
> 
>> KOSAKI Motohiro wrote:
>>>> I haven't given much thought to numa awareness in hibernate code, but I
>>>> can say that the shrink_all_memory interface is woefully inadequate as
>>>> far as zone awareness goes. Since lowmem needs to be atomically restored
>>>> before we can restore highmem, we really need to be able to ask for a
>>>> particular number of pages of a particular zone type to be freed.
>>> Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
>>> per-zone number of freed pages information? if hibernation don't need highmem.
>>> following incremental patch prevent highmem reclaim perfectly. Is it enough?
>> (Disclaimer: I don't think about highmem a lot any more, and might have
>> forgotten some of the details, or swsusp's algorithms might have
>> changed. Rafael might need to correct some of this...)
>>
>> Imagine that you have a system with 1000 pages of lowmem and 5000 pages
>> of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
>> are in use.
>>
>> In order to to be able to save an image, we need to be able to do an
>> atomic copy of those lowmem pages.
>>
>> You might think that we could just copy everything into the spare
>> highmem pages, but we can't because mapping and unmapping the highmem
>> pages as we copy the data will leave us with an inconsistent copy.
> 
> This isn't the case any more for the mainline hibernate code.  We use highmem
> for storing image data as well as lowmem.

Highmem for storing copies of lowmem pages?

Regards,

Nigel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-03 14:00         ` KOSAKI Motohiro
@ 2009-11-03 21:51           ` Rafael J. Wysocki
  0 siblings, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-03 21:51 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Rik van Riel, linux-mm, Andrew Morton

On Tuesday 03 November 2009, KOSAKI Motohiro wrote:
> > On Monday 02 November 2009, KOSAKI Motohiro wrote:
> > > > > Then, This patch changed shrink_all_memory() to only the wrapper function of 
> > > > > do_try_to_free_pages(). it bring good reviewability and debuggability, and solve 
> > > > > above problems.
> > > > > 
> > > > > side note: Reclaim logic unificication makes two good side effect.
> > > > >  - Fix recursive reclaim bug on shrink_all_memory().
> > > > >    it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
> > > > >  - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.
> > > > 
> > > > As I said previously, I don't really see a reason to keep shrink_all_memory().
> > > > 
> > > > Do you think that removing it will result in performance degradation?
> > > 
> > > Hmm...
> > > Probably, I misunderstood your mention. I thought you suggested to kill
> > > all hibernation specific reclaim code. I did. It's no performance degression.
> > > (At least, I didn't observe)
> > > 
> > > But, if you hope to kill shrink_all_memory() function itsef, the short answer is,
> > > it's impossible.
> > > 
> > > Current VM reclaim code need some preparetion to caller, and there are existing in
> > > both alloc_pages_slowpath() and try_to_free_pages(). We can't omit its preparation.
> > 
> > Well, my grepping for 'shrink_all_memory' throughout the entire kernel source
> > code seems to indicate that hibernate_preallocate_memory() is the only current
> > user of it.  I may be wrong, but I doubt it, unless some new users have been
> > added since 2.6.31.
> > 
> > In case I'm not wrong, it should be safe to drop it from
> > hibernate_preallocate_memory(), because it's there for performance reasons
> > only.  Now, since hibernate_preallocate_memory() appears to be the only user of
> > it, it should be safe to drop it entirely.
> 
> Hmmm...
> I've try the dropping shrink_all_memory() today. but I've got bad result.
> 
> In 3 times test, result were
> 
>  2 times: kernel hang-up ;)
>  1 time:   success, but make slower than with shrink_all_memory() about 100x times.
> 
> 
> Did you try to drop it yourself on your machine? Is this success?

Generally, yes, but the performance was hit really badly.

So, the conclusion is that we need shrink_all_memory() for things to work,
which is kind of interesting.

In that case, please feel free to add Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
to the patch.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-03 14:00         ` KOSAKI Motohiro
@ 2009-11-03 21:52           ` Rafael J. Wysocki
  0 siblings, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-03 21:52 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Nigel Cunningham, LKML, Rik van Riel, linux-mm, Andrew Morton

On Tuesday 03 November 2009, KOSAKI Motohiro wrote:
> > Hi.
> > 
> > KOSAKI Motohiro wrote:
> > >> I haven't given much thought to numa awareness in hibernate code, but I
> > >> can say that the shrink_all_memory interface is woefully inadequate as
> > >> far as zone awareness goes. Since lowmem needs to be atomically restored
> > >> before we can restore highmem, we really need to be able to ask for a
> > >> particular number of pages of a particular zone type to be freed.
> > > 
> > > Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
> > > per-zone number of freed pages information? if hibernation don't need highmem.
> > > following incremental patch prevent highmem reclaim perfectly. Is it enough?
> > 
> > (Disclaimer: I don't think about highmem a lot any more, and might have
> > forgotten some of the details, or swsusp's algorithms might have
> > changed. Rafael might need to correct some of this...)
> > 
> > Imagine that you have a system with 1000 pages of lowmem and 5000 pages
> > of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
> > are in use.
> > 
> > In order to to be able to save an image, we need to be able to do an
> > atomic copy of those lowmem pages.
> > 
> > You might think that we could just copy everything into the spare
> > highmem pages, but we can't because mapping and unmapping the highmem
> > pages as we copy the data will leave us with an inconsistent copy.
> > Depending on the configuration, it might (for example) have one page -
> > say on in the pagetables - reflecting one page being kmapped and another
> > page - containing the variables that record what kmap slots are used,
> > for example - recording a different page being kmapped.
> > 
> > What we do, then, is seek to atomically copy the lowmem pages to lowmem.
> > That requires, however, that we have at least half of the lowmem pages
> > free. So, then, we need a function that lets us free lowmem pages only.
> > 
> > I hope that makes it clearer.
> > 
> > > ---
> > >  mm/vmscan.c |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index e6ea011..7fb3435 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2265,7 +2265,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> > >  {
> > >  	struct reclaim_state reclaim_state;
> > >  	struct scan_control sc = {
> > > -		.gfp_mask = GFP_HIGHUSER_MOVABLE,
> > > +		.gfp_mask = GFP_KERNEL,
> > >  		.may_swap = 1,
> > >  		.may_unmap = 1,
> > >  		.may_writepage = 1,
> > 
> > I don't think so. I think what we really need is:
> > 
> > shrink_memory_type(gfp_mask, pages_needed)
> > 
> > That is, a function that would let us say "Free 489 pages of lowmem" or
> > "Free 983 pages of highmem" or "Free 340 pages of any kind of memory".
> > (The later might be used if we just want to free some pages because the
> > image as it stands is too big for the storage available).
> 
> I can add gfp_mask argument to shrink_all_memory(). it's easy.
> but I obviously need to help from PM folks for meaningful change. I'm not sure
> How should we calculate required free memory for hibernation.
> 
> Sidenote, current reclaim logic can't do  "Free 983 pages of highmem".
> but I doubt it's really necessary. I guess we only need "reclaim lowmem" and/or
> "reclaim any kind of memory".

The latter.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-03 21:12           ` Nigel Cunningham
@ 2009-11-03 22:00             ` Rafael J. Wysocki
  2009-11-12 12:33               ` using highmem for atomic copy of lowmem was " Pavel Machek
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-03 22:00 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: KOSAKI Motohiro, LKML, Rik van Riel, linux-mm, Andrew Morton

On Tuesday 03 November 2009, Nigel Cunningham wrote:
> Hi Rafael.
> 
> Rafael J. Wysocki wrote:
> > On Monday 02 November 2009, Nigel Cunningham wrote:
> >> Hi.
> > 
> > Hi,
> > 
> >> KOSAKI Motohiro wrote:
> >>>> I haven't given much thought to numa awareness in hibernate code, but I
> >>>> can say that the shrink_all_memory interface is woefully inadequate as
> >>>> far as zone awareness goes. Since lowmem needs to be atomically restored
> >>>> before we can restore highmem, we really need to be able to ask for a
> >>>> particular number of pages of a particular zone type to be freed.
> >>> Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
> >>> per-zone number of freed pages information? if hibernation don't need highmem.
> >>> following incremental patch prevent highmem reclaim perfectly. Is it enough?
> >> (Disclaimer: I don't think about highmem a lot any more, and might have
> >> forgotten some of the details, or swsusp's algorithms might have
> >> changed. Rafael might need to correct some of this...)
> >>
> >> Imagine that you have a system with 1000 pages of lowmem and 5000 pages
> >> of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
> >> are in use.
> >>
> >> In order to to be able to save an image, we need to be able to do an
> >> atomic copy of those lowmem pages.
> >>
> >> You might think that we could just copy everything into the spare
> >> highmem pages, but we can't because mapping and unmapping the highmem
> >> pages as we copy the data will leave us with an inconsistent copy.
> > 
> > This isn't the case any more for the mainline hibernate code.  We use highmem
> > for storing image data as well as lowmem.
> 
> Highmem for storing copies of lowmem pages?

It is possible in theory, but I don't think it happens in practice given the
way in which the memory is freed.  Still copy_data_page() takes this
possibility into account.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* using highmem for atomic copy of lowmem was Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-03 22:00             ` Rafael J. Wysocki
@ 2009-11-12 12:33               ` Pavel Machek
  2009-11-12 23:33                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: Pavel Machek @ 2009-11-12 12:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nigel Cunningham, KOSAKI Motohiro, LKML, Rik van Riel, linux-mm,
	Andrew Morton

Hi!

> > >> (Disclaimer: I don't think about highmem a lot any more, and might have
> > >> forgotten some of the details, or swsusp's algorithms might have
> > >> changed. Rafael might need to correct some of this...)
> > >>
> > >> Imagine that you have a system with 1000 pages of lowmem and 5000 pages
> > >> of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
> > >> are in use.
> > >>
> > >> In order to to be able to save an image, we need to be able to do an
> > >> atomic copy of those lowmem pages.
> > >>
> > >> You might think that we could just copy everything into the spare
> > >> highmem pages, but we can't because mapping and unmapping the highmem
> > >> pages as we copy the data will leave us with an inconsistent copy.
> > > 
> > > This isn't the case any more for the mainline hibernate code.  We use highmem
> > > for storing image data as well as lowmem.
> > 
> > Highmem for storing copies of lowmem pages?
> 
> It is possible in theory, but I don't think it happens in practice given the
> way in which the memory is freed.  Still copy_data_page() takes this
> possibility into account.

Yes, it does, but I wonder if it can ever work...?

copy_data_page() takes great care not to modify any memory -- like
using handmade loop instead of memcpy() -- yet it uses kmap_atomic()
and friends.

If kmap_atomic()+kunmap_atomic() pair is guaranteed not to change any
memory, thats probably safe, but...

(Actually, plain memcpy() should be usable for copying highmem pages;
those should not contain task_structs or anything else touched by
memcpy()).
								Pavel 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: using highmem for atomic copy of lowmem was Re: [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it
  2009-11-12 12:33               ` using highmem for atomic copy of lowmem was " Pavel Machek
@ 2009-11-12 23:33                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2009-11-12 23:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nigel Cunningham, KOSAKI Motohiro, LKML, Rik van Riel, linux-mm,
	Andrew Morton

On Thursday 12 November 2009, Pavel Machek wrote:
> Hi!
> 
> > > >> (Disclaimer: I don't think about highmem a lot any more, and might have
> > > >> forgotten some of the details, or swsusp's algorithms might have
> > > >> changed. Rafael might need to correct some of this...)
> > > >>
> > > >> Imagine that you have a system with 1000 pages of lowmem and 5000 pages
> > > >> of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
> > > >> are in use.
> > > >>
> > > >> In order to to be able to save an image, we need to be able to do an
> > > >> atomic copy of those lowmem pages.
> > > >>
> > > >> You might think that we could just copy everything into the spare
> > > >> highmem pages, but we can't because mapping and unmapping the highmem
> > > >> pages as we copy the data will leave us with an inconsistent copy.
> > > > 
> > > > This isn't the case any more for the mainline hibernate code.  We use highmem
> > > > for storing image data as well as lowmem.
> > > 
> > > Highmem for storing copies of lowmem pages?
> > 
> > It is possible in theory, but I don't think it happens in practice given the
> > way in which the memory is freed.  Still copy_data_page() takes this
> > possibility into account.
> 
> Yes, it does, but I wonder if it can ever work...?
> 
> copy_data_page() takes great care not to modify any memory -- like
> using handmade loop instead of memcpy() -- yet it uses kmap_atomic()
> and friends.
> 
> If kmap_atomic()+kunmap_atomic() pair is guaranteed not to change any
> memory, thats probably safe, but...

It only would be unsafe if the page being copied was changed at the same time,
but the code is designed to avoid that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2009-11-12 23:32 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-01 15:08 [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim KOSAKI Motohiro
2009-11-01 15:09 ` [PATCHv2 2/5] vmscan: Kill hibernation specific reclaim logic and unify it KOSAKI Motohiro
2009-11-01 15:12   ` Rik van Riel
2009-11-01 21:38   ` Rafael J. Wysocki
2009-11-02 15:35     ` KOSAKI Motohiro
2009-11-02 19:03       ` Rafael J. Wysocki
2009-11-03 14:00         ` KOSAKI Motohiro
2009-11-03 21:51           ` Rafael J. Wysocki
2009-11-01 22:01   ` Nigel Cunningham
2009-11-02 15:35     ` KOSAKI Motohiro
2009-11-02 19:05       ` Rafael J. Wysocki
2009-11-02 21:19       ` Nigel Cunningham
2009-11-03 11:30         ` Rafael J. Wysocki
2009-11-03 21:12           ` Nigel Cunningham
2009-11-03 22:00             ` Rafael J. Wysocki
2009-11-12 12:33               ` using highmem for atomic copy of lowmem was " Pavel Machek
2009-11-12 23:33                 ` Rafael J. Wysocki
2009-11-03 14:00         ` KOSAKI Motohiro
2009-11-03 21:52           ` Rafael J. Wysocki
2009-11-01 15:11 ` [PATCHv2 3/5] vmscan: Stop zone_reclaim()'s wrong swap_cluster_max usage KOSAKI Motohiro
2009-11-01 17:51   ` Rik van Riel
2009-11-02  0:40   ` Minchan Kim
2009-11-01 15:12 ` [PATCHv2 4/5] vmscan: Kill sc.swap_cluster_max KOSAKI Motohiro
2009-11-01 17:56   ` Rik van Riel
2009-11-02  0:46   ` Minchan Kim
2009-11-01 15:13 ` [PATCHv2 5/5][nit fix] vmscan Make consistent of reclaim bale out between do_try_to_free_page and shrink_zone KOSAKI Motohiro
2009-11-01 17:58   ` Rik van Riel
2009-11-02  0:48   ` Minchan Kim
2009-11-02  0:35 ` [PATCHv2 1/5] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim Minchan Kim
2009-11-02  0:48   ` Minchan Kim
2009-11-02 15:35   ` KOSAKI Motohiro
2009-11-02 23:34     ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).