All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking
@ 2009-05-06 22:40 Rafael J. Wysocki
  2009-05-06 22:41 ` [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen Rafael J. Wysocki
                   ` (11 more replies)
  0 siblings, 12 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:40 UTC (permalink / raw)
  To: pm list; +Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham

Hi,

The following patchset is an attempt to rework the memory shrinking mechanism
used during hibernation to make room for the image.  It is a work in progress
and most likely it's going to be modified, but it has been discussed recently
and I'd like to get comments on the current version.

[1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
        it's verified that we can avoid the OOM killing by using
        __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
        in the next patches).

[2/5] - drop memory shrinking from the suspend (to RAM) code path

[3/5] - move swsusp_shrink_memory() to snapshot.c

[4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
        memory pressure)

[5/5] - allocate image pages along with the shrinking.

Details are in the changelogs, please have a look and tell me what you think.

Best,
Rafael



-- 
Everyone knows that debugging is twice as hard as writing a program
in the first place.  So if you're as clever as you can be when you write it,
how will you ever debug it? --- Brian Kernighan


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
  2009-05-06 22:41 ` [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen Rafael J. Wysocki
@ 2009-05-06 22:41 ` Rafael J. Wysocki
  2009-05-06 23:00   ` Nigel Cunningham
                     ` (3 more replies)
  2009-05-06 22:42 ` [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
                   ` (9 subsequent siblings)
  11 siblings, 4 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:41 UTC (permalink / raw)
  To: pm list; +Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham

From: Rafael J. Wysocki <rjw@sisk.pl>

The OOM killer is not really going to work while tasks are frozen, so
we can just give up calling it in that case.

This will allow us to safely use memory allocations for decreasing
the number of saveable pages in the hibernation core code instead of
using any artificial memory shriking mechanisms for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h |    2 ++
 kernel/power/process.c  |   12 ++++++++++++
 mm/page_alloc.c         |    5 +++++
 3 files changed, 19 insertions(+)

Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -19,6 +19,8 @@
  */
 #define TIMEOUT	(20 * HZ)
 
+static bool tasks_frozen;
+
 static inline int freezeable(struct task_struct * p)
 {
 	if ((p == current) ||
@@ -120,6 +122,10 @@ int freeze_processes(void)
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
+	if (!error)
+		tasks_frozen = true;
+
 	return error;
 }
 
@@ -145,6 +151,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	tasks_frozen = false;
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);
@@ -152,3 +160,7 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
+bool processes_are_frozen(void)
+{
+	return tasks_frozen;
+}
Index: linux-2.6/include/linux/freezer.h
===================================================================
--- linux-2.6.orig/include/linux/freezer.h
+++ linux-2.6/include/linux/freezer.h
@@ -50,6 +50,7 @@ extern int thaw_process(struct task_stru
 extern void refrigerator(void);
 extern int freeze_processes(void);
 extern void thaw_processes(void);
+extern bool processes_are_frozen(void);
 
 static inline int try_to_freeze(void)
 {
@@ -170,6 +171,7 @@ static inline int thaw_process(struct ta
 static inline void refrigerator(void) {}
 static inline int freeze_processes(void) { BUG(); return 0; }
 static inline void thaw_processes(void) {}
+static inline bool processes_are_frozen(void) { return false; }
 
 static inline int try_to_freeze(void) { return 0; }
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
+#include <linux/freezer.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1600,6 +1601,10 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		/* The OOM killer won't work if processes are frozen. */
+		if (processes_are_frozen())
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
@ 2009-05-06 22:41 ` Rafael J. Wysocki
  2009-05-06 22:41 ` Rafael J. Wysocki
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:41 UTC (permalink / raw)
  To: pm list; +Cc: Andrew Morton, Wu Fengguang, LKML

From: Rafael J. Wysocki <rjw@sisk.pl>

The OOM killer is not really going to work while tasks are frozen, so
we can just give up calling it in that case.

This will allow us to safely use memory allocations for decreasing
the number of saveable pages in the hibernation core code instead of
using any artificial memory shriking mechanisms for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h |    2 ++
 kernel/power/process.c  |   12 ++++++++++++
 mm/page_alloc.c         |    5 +++++
 3 files changed, 19 insertions(+)

Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -19,6 +19,8 @@
  */
 #define TIMEOUT	(20 * HZ)
 
+static bool tasks_frozen;
+
 static inline int freezeable(struct task_struct * p)
 {
 	if ((p == current) ||
@@ -120,6 +122,10 @@ int freeze_processes(void)
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
+	if (!error)
+		tasks_frozen = true;
+
 	return error;
 }
 
@@ -145,6 +151,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	tasks_frozen = false;
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);
@@ -152,3 +160,7 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
+bool processes_are_frozen(void)
+{
+	return tasks_frozen;
+}
Index: linux-2.6/include/linux/freezer.h
===================================================================
--- linux-2.6.orig/include/linux/freezer.h
+++ linux-2.6/include/linux/freezer.h
@@ -50,6 +50,7 @@ extern int thaw_process(struct task_stru
 extern void refrigerator(void);
 extern int freeze_processes(void);
 extern void thaw_processes(void);
+extern bool processes_are_frozen(void);
 
 static inline int try_to_freeze(void)
 {
@@ -170,6 +171,7 @@ static inline int thaw_process(struct ta
 static inline void refrigerator(void) {}
 static inline int freeze_processes(void) { BUG(); return 0; }
 static inline void thaw_processes(void) {}
+static inline bool processes_are_frozen(void) { return false; }
 
 static inline int try_to_freeze(void) { return 0; }
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
+#include <linux/freezer.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1600,6 +1601,10 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		/* The OOM killer won't work if processes are frozen. */
+		if (processes_are_frozen())
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
  2009-05-06 22:41 ` [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen Rafael J. Wysocki
  2009-05-06 22:41 ` Rafael J. Wysocki
@ 2009-05-06 22:42 ` Rafael J. Wysocki
  2009-05-06 23:01   ` Nigel Cunningham
  2009-05-06 23:01   ` Nigel Cunningham
  2009-05-06 22:42 ` Rafael J. Wysocki
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:42 UTC (permalink / raw)
  To: pm list; +Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2009-05-06 22:42 ` [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
@ 2009-05-06 22:42 ` Rafael J. Wysocki
  2009-05-06 22:42 ` [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2) Rafael J. Wysocki
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:42 UTC (permalink / raw)
  To: pm list; +Cc: Andrew Morton, Wu Fengguang, LKML

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2009-05-06 22:42 ` Rafael J. Wysocki
@ 2009-05-06 22:42 ` Rafael J. Wysocki
  2009-05-06 22:42 ` Rafael J. Wysocki
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:42 UTC (permalink / raw)
  To: pm list; +Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2009-05-06 22:42 ` [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2) Rafael J. Wysocki
@ 2009-05-06 22:42 ` Rafael J. Wysocki
  2009-05-06 22:44 ` [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:42 UTC (permalink / raw)
  To: pm list; +Cc: Andrew Morton, Wu Fengguang, LKML

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2009-05-06 22:42 ` Rafael J. Wysocki
@ 2009-05-06 22:44 ` Rafael J. Wysocki
  2009-05-06 23:27   ` Nigel Cunningham
  2009-05-06 23:27   ` Nigel Cunningham
  2009-05-06 22:44 ` Rafael J. Wysocki
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:44 UTC (permalink / raw)
  To: pm list; +Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 98 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,120 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
+ * preallocate_image_memory - Allocate given number of page frames
+ * @nr_pages: Number of page frames to allocate
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages-- > 0) {
+		struct page *page;
+
+		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
+		if (!page)
+			break;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
 }
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory ... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	saveable = count_data_pages() + count_highmem_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		count += zone_page_state(zone, NR_FREE_PAGES);
+		if (!is_highmem(zone))
+			count -= zone->lowmem_reserve[ZONE_NORMAL];
+	}
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not lesser than the current number of saveable
+	 * pages in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size.
+	 */
+	count -= max_size;
+	pages = preallocate_image_memory(count);
+	if (pages < count)
+		error = -ENOMEM;
+	else
+		pages += preallocate_image_memory(max_size - size);
+
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2009-05-06 22:44 ` [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
@ 2009-05-06 22:44 ` Rafael J. Wysocki
  2009-05-06 22:48 ` [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily Rafael J. Wysocki
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:44 UTC (permalink / raw)
  To: pm list; +Cc: Andrew Morton, Wu Fengguang, LKML

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 98 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,120 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
+ * preallocate_image_memory - Allocate given number of page frames
+ * @nr_pages: Number of page frames to allocate
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages-- > 0) {
+		struct page *page;
+
+		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
+		if (!page)
+			break;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
 }
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory ... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	saveable = count_data_pages() + count_highmem_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		count += zone_page_state(zone, NR_FREE_PAGES);
+		if (!is_highmem(zone))
+			count -= zone->lowmem_reserve[ZONE_NORMAL];
+	}
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not lesser than the current number of saveable
+	 * pages in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size.
+	 */
+	count -= max_size;
+	pages = preallocate_image_memory(count);
+	if (pages < count)
+		error = -ENOMEM;
+	else
+		pages += preallocate_image_memory(max_size - size);
+
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (8 preceding siblings ...)
  2009-05-06 22:48 ` [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily Rafael J. Wysocki
@ 2009-05-06 22:48 ` Rafael J. Wysocki
  2009-05-07 21:48 ` [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking (rev. 2) Rafael J. Wysocki
  2009-05-07 21:48   ` Rafael J. Wysocki
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:48 UTC (permalink / raw)
  To: pm list; +Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++--
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  178 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 130 insertions(+), 65 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,10 +1083,14 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
 
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
  * preallocate_image_memory - Allocate given number of page frames
  * @nr_pages: Number of page frames to allocate
@@ -1081,9 +1104,14 @@ static unsigned long preallocate_image_m
 	while (nr_pages-- > 0) {
 		struct page *page;
 
-		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
+		page = alloc_image_page(GFP_IMAGE);
 		if (!page)
 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_alloc++;
 	}
 
@@ -1091,7 +1119,30 @@ static unsigned long preallocate_image_m
 }
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1110,16 +1161,27 @@ static unsigned long preallocate_image_m
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory ... ");
+	printk(KERN_INFO "PM: Preallocating image memory ... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
 	saveable = count_data_pages() + count_highmem_pages();
 
@@ -1143,10 +1205,12 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not lesser than the current number of saveable
-	 * pages in memory, we don't need to do anything more.
+	 * pages in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_memory(saveable);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1165,24 +1237,27 @@ int swsusp_shrink_memory(void)
 	count -= max_size;
 	pages = preallocate_image_memory(count);
 	if (pages < count)
-		error = -ENOMEM;
+		goto err_out;
 	else
 		pages += preallocate_image_memory(max_size - size);
 
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1193,7 +1268,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1215,19 +1290,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1249,7 +1322,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1269,7 +1342,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1288,51 +1361,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (7 preceding siblings ...)
  2009-05-06 22:44 ` Rafael J. Wysocki
@ 2009-05-06 22:48 ` Rafael J. Wysocki
  2009-05-06 22:48 ` Rafael J. Wysocki
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-06 22:48 UTC (permalink / raw)
  To: pm list; +Cc: Andrew Morton, Wu Fengguang, LKML

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++--
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  178 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 130 insertions(+), 65 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,10 +1083,14 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
 
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
  * preallocate_image_memory - Allocate given number of page frames
  * @nr_pages: Number of page frames to allocate
@@ -1081,9 +1104,14 @@ static unsigned long preallocate_image_m
 	while (nr_pages-- > 0) {
 		struct page *page;
 
-		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
+		page = alloc_image_page(GFP_IMAGE);
 		if (!page)
 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_alloc++;
 	}
 
@@ -1091,7 +1119,30 @@ static unsigned long preallocate_image_m
 }
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1110,16 +1161,27 @@ static unsigned long preallocate_image_m
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory ... ");
+	printk(KERN_INFO "PM: Preallocating image memory ... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
 	saveable = count_data_pages() + count_highmem_pages();
 
@@ -1143,10 +1205,12 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not lesser than the current number of saveable
-	 * pages in memory, we don't need to do anything more.
+	 * pages in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_memory(saveable);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1165,24 +1237,27 @@ int swsusp_shrink_memory(void)
 	count -= max_size;
 	pages = preallocate_image_memory(count);
 	if (pages < count)
-		error = -ENOMEM;
+		goto err_out;
 	else
 		pages += preallocate_image_memory(max_size - size);
 
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1193,7 +1268,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1215,19 +1290,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1249,7 +1322,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1269,7 +1342,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1288,51 +1361,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-06 22:41 ` Rafael J. Wysocki
  2009-05-06 23:00   ` Nigel Cunningham
@ 2009-05-06 23:00   ` Nigel Cunningham
  2009-05-07 12:10     ` Rafael J. Wysocki
  2009-05-07 12:10     ` Rafael J. Wysocki
  2009-05-07  0:36   ` [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer whentasks " Matt Helsley
  2009-05-07  0:36   ` [linux-pm] " Matt Helsley
  3 siblings, 2 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-06 23:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

Hi.

On Thu, 2009-05-07 at 00:41 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The OOM killer is not really going to work while tasks are frozen, so
> we can just give up calling it in that case.
> 
> This will allow us to safely use memory allocations for decreasing
> the number of saveable pages in the hibernation core code instead of
> using any artificial memory shriking mechanisms for this purpose.

Should we disable the warning that the nopage path gives if tasks are
frozen? I'm in two minds - if you get problems as a result, it might
help to diagnose them. On the other hand, you don't want tons of
warnings due to the hibernation code trying to allocate memory it can't
get. In TuxOnIce, I currently do all allocations with __GFP_NOWARN.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-06 22:41 ` Rafael J. Wysocki
@ 2009-05-06 23:00   ` Nigel Cunningham
  2009-05-06 23:00   ` Nigel Cunningham
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-06 23:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

Hi.

On Thu, 2009-05-07 at 00:41 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The OOM killer is not really going to work while tasks are frozen, so
> we can just give up calling it in that case.
> 
> This will allow us to safely use memory allocations for decreasing
> the number of saveable pages in the hibernation core code instead of
> using any artificial memory shriking mechanisms for this purpose.

Should we disable the warning that the nopage path gives if tasks are
frozen? I'm in two minds - if you get problems as a result, it might
help to diagnose them. On the other hand, you don't want tons of
warnings due to the hibernation code trying to allocate memory it can't
get. In TuxOnIce, I currently do all allocations with __GFP_NOWARN.

Regards,

Nigel

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-06 22:42 ` [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
@ 2009-05-06 23:01   ` Nigel Cunningham
  2009-05-06 23:01   ` Nigel Cunningham
  1 sibling, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-06 23:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

Hi.

On Thu, 2009-05-07 at 00:42 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Remove the shrinking of memory from the suspend-to-RAM code, where
> it is not really necessary.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>



^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-06 22:42 ` [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
  2009-05-06 23:01   ` Nigel Cunningham
@ 2009-05-06 23:01   ` Nigel Cunningham
  1 sibling, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-06 23:01 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

Hi.

On Thu, 2009-05-07 at 00:42 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Remove the shrinking of memory from the suspend-to-RAM code, where
> it is not really necessary.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-06 22:44 ` [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
@ 2009-05-06 23:27   ` Nigel Cunningham
  2009-05-07 12:18     ` Rafael J. Wysocki
  2009-05-07 12:18     ` Rafael J. Wysocki
  2009-05-06 23:27   ` Nigel Cunningham
  1 sibling, 2 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-06 23:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

Hi.

On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> just once to make some room for the image and then allocates memory
> to apply more pressure to the memory management subsystem, if
> necessary.
> 
> Unfortunately, we don't seem to be able to drop shrink_all_memory()
> entirely just yet, because that would lead to huge performance
> regressions in some test cases.

I know it doesn't fit with your current way of doing things, but have
you considered trying larger order allocations as a means of getting
memory freed? I have code in tuxonice_prepare_image.c (look for
extra_pages_allocated) that might be helpful for this purpose.

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
>  1 file changed, 98 insertions(+), 47 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1066,69 +1066,120 @@ void swsusp_free(void)
>  	buffer = NULL;
>  }
>  
> +/* Helper functions used for the shrinking of memory. */
> +
>  /**
> - *	swsusp_shrink_memory -  Try to free as much memory as needed
> - *
> - *	... but do not OOM-kill anyone
> + * preallocate_image_memory - Allocate given number of page frames
> + * @nr_pages: Number of page frames to allocate
>   *
> - *	Notice: all userland should be stopped before it is called, or
> - *	livelock is possible.
> + * Return value: Number of page frames actually allocated
>   */
> -
> -#define SHRINK_BITE	10000
> -static inline unsigned long __shrink_memory(long tmp)
> +static unsigned long preallocate_image_memory(unsigned long nr_pages)
>  {
> -	if (tmp > SHRINK_BITE)
> -		tmp = SHRINK_BITE;
> -	return shrink_all_memory(tmp);
> +	unsigned long nr_alloc = 0;
> +
> +	while (nr_pages-- > 0) {
> +		struct page *page;
> +
> +		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);

Ah... now I see you're using __GFP_NOWARN already :)

> +		if (!page)
> +			break;
> +		nr_alloc++;
> +	}
> +
> +	return nr_alloc;
>  }
>  
> +/**
> + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + *
> + * To create a hibernation image it is necessary to make a copy of every page
> + * frame in use.  We also need a number of page frames to be free during
> + * hibernation for allocations made while saving the image and for device
> + * drivers, in case they need to allocate memory from their hibernation
> + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> + * respectively, both of which are rough estimates).  To make this happen, we
> + * compute the total number of available page frames and allocate at least
> + *
> + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> + *
> + * of them, which corresponds to the maximum size of a hibernation image.
> + *
> + * If image_size is set below the number following from the above formula,
> + * the preallocation of memory is continued until the total number of page
> + * frames in use is below the requested image size or it is impossible to
> + * allocate more memory, whichever happens first.
> + */

You should also be taking into account how much storage is available
here - that would make things more reliable. If compression is begin
used, you could also apply an 'expected compression ratio' so that you
don't unnecessarily free memory that will fit once compressed.

>  int swsusp_shrink_memory(void)
>  {
> -	long tmp;
>  	struct zone *zone;
> -	unsigned long pages = 0;
> -	unsigned int i = 0;
> -	char *p = "-\\|/";
> +	unsigned long saveable, size, max_size, count, pages = 0;
>  	struct timeval start, stop;
> +	int error = 0;
>  
> -	printk(KERN_INFO "PM: Shrinking memory...  ");
> +	printk(KERN_INFO "PM: Shrinking memory ... ");

Without the space is normal, at least to my mind.

>  	do_gettimeofday(&start);
> -	do {
> -		long size, highmem_size;
>  
> -		highmem_size = count_highmem_pages();
> -		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
> -		tmp = size;
> -		size += highmem_size;
> -		for_each_populated_zone(zone) {
> -			tmp += snapshot_additional_pages(zone);
> -			if (is_highmem(zone)) {
> -				highmem_size -=
> -					zone_page_state(zone, NR_FREE_PAGES);
> -			} else {
> -				tmp -= zone_page_state(zone, NR_FREE_PAGES);
> -				tmp += zone->lowmem_reserve[ZONE_NORMAL];
> -			}
> -		}
> +	/* Count the number of saveable data pages. */
> +	saveable = count_data_pages() + count_highmem_pages();
> +
> +	/*
> +	 * Compute the total number of page frames we can use (count) and the
> +	 * number of pages needed for image metadata (size).
> +	 */
> +	count = saveable;
> +	size = 0;
> +	for_each_populated_zone(zone) {
> +		size += snapshot_additional_pages(zone);
> +		count += zone_page_state(zone, NR_FREE_PAGES);
> +		if (!is_highmem(zone))
> +			count -= zone->lowmem_reserve[ZONE_NORMAL];
> +	}
>  
> -		if (highmem_size < 0)
> -			highmem_size = 0;

You're not taking watermarks into account here - that isn't a problem
with shrink_all_memory because it usually frees more than you ask for
(or has done in the past), but if you're getting exactly what you ask
for, you might run into trouble if more than half of memory is in use to
start with.

> +	/* Compute the maximum number of saveable pages to leave in memory. */
> +	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
> +	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
> +	if (size > max_size)
> +		size = max_size;
> +	/*
> +	 * If the maximum is not lesser than the current number of saveable

s/lesser/less/

> +	 * pages in memory, we don't need to do anything more.
> +	 */
> +	if (size >= saveable)
> +		goto out;
>  
> -		tmp += highmem_size;
> -		if (tmp > 0) {
> -			tmp = __shrink_memory(tmp);
> -			if (!tmp)
> -				return -ENOMEM;
> -			pages += tmp;
> -		} else if (size > image_size / PAGE_SIZE) {
> -			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
> -			pages += tmp;
> -		}
> -		printk("\b%c", p[i++%4]);
> -	} while (tmp > 0);
> +	/*
> +	 * Let the memory management subsystem know that we're going to need a
> +	 * large number of page frames to allocate and make it free some memory.
> +	 * NOTE: If this is not done, performance is heavily affected in some
> +	 * test cases.
> +	 */
> +	shrink_all_memory(saveable - size);
> +
> +	/*
> +	 * The number of saveable pages in memory was too high, so apply some
> +	 * pressure to decrease it.  First, make room for the largest possible
> +	 * image and fail if that doesn't work.  Next, try to decrease the size
> +	 * of the image as much as indicated by image_size.
> +	 */
> +	count -= max_size;
> +	pages = preallocate_image_memory(count);
> +	if (pages < count)
> +		error = -ENOMEM;
> +	else
> +		pages += preallocate_image_memory(max_size - size);
> +
> +	/* Release all of the preallocated page frames. */
> +	swsusp_free();
> +
> +	if (error) {
> +		printk(KERN_CONT "\n");
> +		return error;
> +	}
> +
> + out:
>  	do_gettimeofday(&stop);
> -	printk("\bdone (%lu pages freed)\n", pages);
> +	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
>  	swsusp_show_speed(&start, &stop, pages, "Freed");
>  
>  	return 0;

Regards,

Nigel


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-06 22:44 ` [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
  2009-05-06 23:27   ` Nigel Cunningham
@ 2009-05-06 23:27   ` Nigel Cunningham
  1 sibling, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-06 23:27 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

Hi.

On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> just once to make some room for the image and then allocates memory
> to apply more pressure to the memory management subsystem, if
> necessary.
> 
> Unfortunately, we don't seem to be able to drop shrink_all_memory()
> entirely just yet, because that would lead to huge performance
> regressions in some test cases.

I know it doesn't fit with your current way of doing things, but have
you considered trying larger order allocations as a means of getting
memory freed? I have code in tuxonice_prepare_image.c (look for
extra_pages_allocated) that might be helpful for this purpose.

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
>  1 file changed, 98 insertions(+), 47 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1066,69 +1066,120 @@ void swsusp_free(void)
>  	buffer = NULL;
>  }
>  
> +/* Helper functions used for the shrinking of memory. */
> +
>  /**
> - *	swsusp_shrink_memory -  Try to free as much memory as needed
> - *
> - *	... but do not OOM-kill anyone
> + * preallocate_image_memory - Allocate given number of page frames
> + * @nr_pages: Number of page frames to allocate
>   *
> - *	Notice: all userland should be stopped before it is called, or
> - *	livelock is possible.
> + * Return value: Number of page frames actually allocated
>   */
> -
> -#define SHRINK_BITE	10000
> -static inline unsigned long __shrink_memory(long tmp)
> +static unsigned long preallocate_image_memory(unsigned long nr_pages)
>  {
> -	if (tmp > SHRINK_BITE)
> -		tmp = SHRINK_BITE;
> -	return shrink_all_memory(tmp);
> +	unsigned long nr_alloc = 0;
> +
> +	while (nr_pages-- > 0) {
> +		struct page *page;
> +
> +		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);

Ah... now I see you're using __GFP_NOWARN already :)

> +		if (!page)
> +			break;
> +		nr_alloc++;
> +	}
> +
> +	return nr_alloc;
>  }
>  
> +/**
> + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + *
> + * To create a hibernation image it is necessary to make a copy of every page
> + * frame in use.  We also need a number of page frames to be free during
> + * hibernation for allocations made while saving the image and for device
> + * drivers, in case they need to allocate memory from their hibernation
> + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> + * respectively, both of which are rough estimates).  To make this happen, we
> + * compute the total number of available page frames and allocate at least
> + *
> + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> + *
> + * of them, which corresponds to the maximum size of a hibernation image.
> + *
> + * If image_size is set below the number following from the above formula,
> + * the preallocation of memory is continued until the total number of page
> + * frames in use is below the requested image size or it is impossible to
> + * allocate more memory, whichever happens first.
> + */

You should also be taking into account how much storage is available
here - that would make things more reliable. If compression is begin
used, you could also apply an 'expected compression ratio' so that you
don't unnecessarily free memory that will fit once compressed.

>  int swsusp_shrink_memory(void)
>  {
> -	long tmp;
>  	struct zone *zone;
> -	unsigned long pages = 0;
> -	unsigned int i = 0;
> -	char *p = "-\\|/";
> +	unsigned long saveable, size, max_size, count, pages = 0;
>  	struct timeval start, stop;
> +	int error = 0;
>  
> -	printk(KERN_INFO "PM: Shrinking memory...  ");
> +	printk(KERN_INFO "PM: Shrinking memory ... ");

Without the space is normal, at least to my mind.

>  	do_gettimeofday(&start);
> -	do {
> -		long size, highmem_size;
>  
> -		highmem_size = count_highmem_pages();
> -		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
> -		tmp = size;
> -		size += highmem_size;
> -		for_each_populated_zone(zone) {
> -			tmp += snapshot_additional_pages(zone);
> -			if (is_highmem(zone)) {
> -				highmem_size -=
> -					zone_page_state(zone, NR_FREE_PAGES);
> -			} else {
> -				tmp -= zone_page_state(zone, NR_FREE_PAGES);
> -				tmp += zone->lowmem_reserve[ZONE_NORMAL];
> -			}
> -		}
> +	/* Count the number of saveable data pages. */
> +	saveable = count_data_pages() + count_highmem_pages();
> +
> +	/*
> +	 * Compute the total number of page frames we can use (count) and the
> +	 * number of pages needed for image metadata (size).
> +	 */
> +	count = saveable;
> +	size = 0;
> +	for_each_populated_zone(zone) {
> +		size += snapshot_additional_pages(zone);
> +		count += zone_page_state(zone, NR_FREE_PAGES);
> +		if (!is_highmem(zone))
> +			count -= zone->lowmem_reserve[ZONE_NORMAL];
> +	}
>  
> -		if (highmem_size < 0)
> -			highmem_size = 0;

You're not taking watermarks into account here - that isn't a problem
with shrink_all_memory because it usually frees more than you ask for
(or has done in the past), but if you're getting exactly what you ask
for, you might run into trouble if more than half of memory is in use to
start with.

> +	/* Compute the maximum number of saveable pages to leave in memory. */
> +	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
> +	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
> +	if (size > max_size)
> +		size = max_size;
> +	/*
> +	 * If the maximum is not lesser than the current number of saveable

s/lesser/less/

> +	 * pages in memory, we don't need to do anything more.
> +	 */
> +	if (size >= saveable)
> +		goto out;
>  
> -		tmp += highmem_size;
> -		if (tmp > 0) {
> -			tmp = __shrink_memory(tmp);
> -			if (!tmp)
> -				return -ENOMEM;
> -			pages += tmp;
> -		} else if (size > image_size / PAGE_SIZE) {
> -			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
> -			pages += tmp;
> -		}
> -		printk("\b%c", p[i++%4]);
> -	} while (tmp > 0);
> +	/*
> +	 * Let the memory management subsystem know that we're going to need a
> +	 * large number of page frames to allocate and make it free some memory.
> +	 * NOTE: If this is not done, performance is heavily affected in some
> +	 * test cases.
> +	 */
> +	shrink_all_memory(saveable - size);
> +
> +	/*
> +	 * The number of saveable pages in memory was too high, so apply some
> +	 * pressure to decrease it.  First, make room for the largest possible
> +	 * image and fail if that doesn't work.  Next, try to decrease the size
> +	 * of the image as much as indicated by image_size.
> +	 */
> +	count -= max_size;
> +	pages = preallocate_image_memory(count);
> +	if (pages < count)
> +		error = -ENOMEM;
> +	else
> +		pages += preallocate_image_memory(max_size - size);
> +
> +	/* Release all of the preallocated page frames. */
> +	swsusp_free();
> +
> +	if (error) {
> +		printk(KERN_CONT "\n");
> +		return error;
> +	}
> +
> + out:
>  	do_gettimeofday(&stop);
> -	printk("\bdone (%lu pages freed)\n", pages);
> +	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
>  	swsusp_show_speed(&start, &stop, pages, "Freed");
>  
>  	return 0;

Regards,

Nigel

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [linux-pm] [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer whentasks are frozen
  2009-05-06 22:41 ` Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-05-07  0:36   ` [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer whentasks " Matt Helsley
@ 2009-05-07  0:36   ` Matt Helsley
  2009-05-07 12:09     ` Rafael J. Wysocki
  2009-05-07 12:09     ` Rafael J. Wysocki
  3 siblings, 2 replies; 205+ messages in thread
From: Matt Helsley @ 2009-05-07  0:36 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, Andrew Morton, Wu Fengguang, LKML

On Thu, May 07, 2009 at 12:41:04AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The OOM killer is not really going to work while tasks are frozen, so
> we can just give up calling it in that case.
> 
> This will allow us to safely use memory allocations for decreasing
> the number of saveable pages in the hibernation core code instead of
> using any artificial memory shriking mechanisms for this purpose.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>


> ---
>  include/linux/freezer.h |    2 ++
>  kernel/power/process.c  |   12 ++++++++++++
>  mm/page_alloc.c         |    5 +++++
>  3 files changed, 19 insertions(+)
> 
> Index: linux-2.6/kernel/power/process.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/process.c
> +++ linux-2.6/kernel/power/process.c
> @@ -19,6 +19,8 @@
>   */
>  #define TIMEOUT	(20 * HZ)
> 
> +static bool tasks_frozen;
> +
>  static inline int freezeable(struct task_struct * p)
>  {
>  	if ((p == current) ||
> @@ -120,6 +122,10 @@ int freeze_processes(void)
>   Exit:
>  	BUG_ON(in_atomic());
>  	printk("\n");
> +
> +	if (!error)
> +		tasks_frozen = true;
> +

It's not really about whether some tasks are frozen -- that can 
happen using the cgroup freezer too. The flag really indicates if
all killable tasks are frozen. That can't happen using the cgroup
freezer since the root cgroup can't be frozen. So I think some name changes 
are in order but otherwise the patch looks fine.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer whentasks are frozen
  2009-05-06 22:41 ` Rafael J. Wysocki
  2009-05-06 23:00   ` Nigel Cunningham
  2009-05-06 23:00   ` Nigel Cunningham
@ 2009-05-07  0:36   ` Matt Helsley
  2009-05-07  0:36   ` [linux-pm] " Matt Helsley
  3 siblings, 0 replies; 205+ messages in thread
From: Matt Helsley @ 2009-05-07  0:36 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

On Thu, May 07, 2009 at 12:41:04AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The OOM killer is not really going to work while tasks are frozen, so
> we can just give up calling it in that case.
> 
> This will allow us to safely use memory allocations for decreasing
> the number of saveable pages in the hibernation core code instead of
> using any artificial memory shriking mechanisms for this purpose.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>


> ---
>  include/linux/freezer.h |    2 ++
>  kernel/power/process.c  |   12 ++++++++++++
>  mm/page_alloc.c         |    5 +++++
>  3 files changed, 19 insertions(+)
> 
> Index: linux-2.6/kernel/power/process.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/process.c
> +++ linux-2.6/kernel/power/process.c
> @@ -19,6 +19,8 @@
>   */
>  #define TIMEOUT	(20 * HZ)
> 
> +static bool tasks_frozen;
> +
>  static inline int freezeable(struct task_struct * p)
>  {
>  	if ((p == current) ||
> @@ -120,6 +122,10 @@ int freeze_processes(void)
>   Exit:
>  	BUG_ON(in_atomic());
>  	printk("\n");
> +
> +	if (!error)
> +		tasks_frozen = true;
> +

It's not really about whether some tasks are frozen -- that can 
happen using the cgroup freezer too. The flag really indicates if
all killable tasks are frozen. That can't happen using the cgroup
freezer since the root cgroup can't be frozen. So I think some name changes 
are in order but otherwise the patch looks fine.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [linux-pm] [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer whentasks are frozen
  2009-05-07  0:36   ` [linux-pm] " Matt Helsley
@ 2009-05-07 12:09     ` Rafael J. Wysocki
  2009-05-07 12:09     ` Rafael J. Wysocki
  1 sibling, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 12:09 UTC (permalink / raw)
  To: Matt Helsley; +Cc: pm list, Andrew Morton, Wu Fengguang, LKML

On Thursday 07 May 2009, Matt Helsley wrote:
> On Thu, May 07, 2009 at 12:41:04AM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > The OOM killer is not really going to work while tasks are frozen, so
> > we can just give up calling it in that case.
> > 
> > This will allow us to safely use memory allocations for decreasing
> > the number of saveable pages in the hibernation core code instead of
> > using any artificial memory shriking mechanisms for this purpose.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> 
> > ---
> >  include/linux/freezer.h |    2 ++
> >  kernel/power/process.c  |   12 ++++++++++++
> >  mm/page_alloc.c         |    5 +++++
> >  3 files changed, 19 insertions(+)
> > 
> > Index: linux-2.6/kernel/power/process.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/process.c
> > +++ linux-2.6/kernel/power/process.c
> > @@ -19,6 +19,8 @@
> >   */
> >  #define TIMEOUT	(20 * HZ)
> > 
> > +static bool tasks_frozen;
> > +
> >  static inline int freezeable(struct task_struct * p)
> >  {
> >  	if ((p == current) ||
> > @@ -120,6 +122,10 @@ int freeze_processes(void)
> >   Exit:
> >  	BUG_ON(in_atomic());
> >  	printk("\n");
> > +
> > +	if (!error)
> > +		tasks_frozen = true;
> > +
> 
> It's not really about whether some tasks are frozen -- that can 
> happen using the cgroup freezer too. The flag really indicates if
> all killable tasks are frozen. That can't happen using the cgroup
> freezer since the root cgroup can't be frozen. So I think some name changes 
> are in order but otherwise the patch looks fine.

Well, as I said in the [0/5] message, I'm not sure if the patch is really
necessary.  I'll change the names if it turns out to be.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer whentasks are frozen
  2009-05-07  0:36   ` [linux-pm] " Matt Helsley
  2009-05-07 12:09     ` Rafael J. Wysocki
@ 2009-05-07 12:09     ` Rafael J. Wysocki
  1 sibling, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 12:09 UTC (permalink / raw)
  To: Matt Helsley; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

On Thursday 07 May 2009, Matt Helsley wrote:
> On Thu, May 07, 2009 at 12:41:04AM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > The OOM killer is not really going to work while tasks are frozen, so
> > we can just give up calling it in that case.
> > 
> > This will allow us to safely use memory allocations for decreasing
> > the number of saveable pages in the hibernation core code instead of
> > using any artificial memory shriking mechanisms for this purpose.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> 
> > ---
> >  include/linux/freezer.h |    2 ++
> >  kernel/power/process.c  |   12 ++++++++++++
> >  mm/page_alloc.c         |    5 +++++
> >  3 files changed, 19 insertions(+)
> > 
> > Index: linux-2.6/kernel/power/process.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/process.c
> > +++ linux-2.6/kernel/power/process.c
> > @@ -19,6 +19,8 @@
> >   */
> >  #define TIMEOUT	(20 * HZ)
> > 
> > +static bool tasks_frozen;
> > +
> >  static inline int freezeable(struct task_struct * p)
> >  {
> >  	if ((p == current) ||
> > @@ -120,6 +122,10 @@ int freeze_processes(void)
> >   Exit:
> >  	BUG_ON(in_atomic());
> >  	printk("\n");
> > +
> > +	if (!error)
> > +		tasks_frozen = true;
> > +
> 
> It's not really about whether some tasks are frozen -- that can 
> happen using the cgroup freezer too. The flag really indicates if
> all killable tasks are frozen. That can't happen using the cgroup
> freezer since the root cgroup can't be frozen. So I think some name changes 
> are in order but otherwise the patch looks fine.

Well, as I said in the [0/5] message, I'm not sure if the patch is really
necessary.  I'll change the names if it turns out to be.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-06 23:00   ` Nigel Cunningham
@ 2009-05-07 12:10     ` Rafael J. Wysocki
  2009-05-07 12:10     ` Rafael J. Wysocki
  1 sibling, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 12:10 UTC (permalink / raw)
  To: nigel; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

On Thursday 07 May 2009, Nigel Cunningham wrote:
> Hi.

Hi,

> On Thu, 2009-05-07 at 00:41 +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > The OOM killer is not really going to work while tasks are frozen, so
> > we can just give up calling it in that case.
> > 
> > This will allow us to safely use memory allocations for decreasing
> > the number of saveable pages in the hibernation core code instead of
> > using any artificial memory shriking mechanisms for this purpose.
> 
> Should we disable the warning that the nopage path gives if tasks are
> frozen? I'm in two minds - if you get problems as a result, it might
> help to diagnose them. On the other hand, you don't want tons of
> warnings due to the hibernation code trying to allocate memory it can't
> get. In TuxOnIce, I currently do all allocations with __GFP_NOWARN.

Yes, I use __GFP_NOWARN in the next patches too. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-06 23:00   ` Nigel Cunningham
  2009-05-07 12:10     ` Rafael J. Wysocki
@ 2009-05-07 12:10     ` Rafael J. Wysocki
  1 sibling, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 12:10 UTC (permalink / raw)
  To: nigel; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

On Thursday 07 May 2009, Nigel Cunningham wrote:
> Hi.

Hi,

> On Thu, 2009-05-07 at 00:41 +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > The OOM killer is not really going to work while tasks are frozen, so
> > we can just give up calling it in that case.
> > 
> > This will allow us to safely use memory allocations for decreasing
> > the number of saveable pages in the hibernation core code instead of
> > using any artificial memory shriking mechanisms for this purpose.
> 
> Should we disable the warning that the nopage path gives if tasks are
> frozen? I'm in two minds - if you get problems as a result, it might
> help to diagnose them. On the other hand, you don't want tons of
> warnings due to the hibernation code trying to allocate memory it can't
> get. In TuxOnIce, I currently do all allocations with __GFP_NOWARN.

Yes, I use __GFP_NOWARN in the next patches too. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-06 23:27   ` Nigel Cunningham
  2009-05-07 12:18     ` Rafael J. Wysocki
@ 2009-05-07 12:18     ` Rafael J. Wysocki
  2009-05-07 20:00       ` Rafael J. Wysocki
                         ` (3 more replies)
  1 sibling, 4 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 12:18 UTC (permalink / raw)
  To: nigel; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

On Thursday 07 May 2009, Nigel Cunningham wrote:
> Hi.

Hi,

> On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > just once to make some room for the image and then allocates memory
> > to apply more pressure to the memory management subsystem, if
> > necessary.
> > 
> > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > entirely just yet, because that would lead to huge performance
> > regressions in some test cases.
> 
> I know it doesn't fit with your current way of doing things, but have
> you considered trying larger order allocations as a means of getting
> memory freed?

Actually, I was thinking about that.  What's your experience with this
approach?

> I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> might be helpful for this purpose.

OK, thanks.  I'll have a look at it.

> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
> >  1 file changed, 98 insertions(+), 47 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1066,69 +1066,120 @@ void swsusp_free(void)
> >  	buffer = NULL;
> >  }
> >  
> > +/* Helper functions used for the shrinking of memory. */
> > +
> >  /**
> > - *	swsusp_shrink_memory -  Try to free as much memory as needed
> > - *
> > - *	... but do not OOM-kill anyone
> > + * preallocate_image_memory - Allocate given number of page frames
> > + * @nr_pages: Number of page frames to allocate
> >   *
> > - *	Notice: all userland should be stopped before it is called, or
> > - *	livelock is possible.
> > + * Return value: Number of page frames actually allocated
> >   */
> > -
> > -#define SHRINK_BITE	10000
> > -static inline unsigned long __shrink_memory(long tmp)
> > +static unsigned long preallocate_image_memory(unsigned long nr_pages)
> >  {
> > -	if (tmp > SHRINK_BITE)
> > -		tmp = SHRINK_BITE;
> > -	return shrink_all_memory(tmp);
> > +	unsigned long nr_alloc = 0;
> > +
> > +	while (nr_pages-- > 0) {
> > +		struct page *page;
> > +
> > +		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
> 
> Ah... now I see you're using __GFP_NOWARN already :)
> 
> > +		if (!page)
> > +			break;
> > +		nr_alloc++;
> > +	}
> > +
> > +	return nr_alloc;
> >  }
> >  
> > +/**
> > + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> > + *
> > + * To create a hibernation image it is necessary to make a copy of every page
> > + * frame in use.  We also need a number of page frames to be free during
> > + * hibernation for allocations made while saving the image and for device
> > + * drivers, in case they need to allocate memory from their hibernation
> > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> > + * respectively, both of which are rough estimates).  To make this happen, we
> > + * compute the total number of available page frames and allocate at least
> > + *
> > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> > + *
> > + * of them, which corresponds to the maximum size of a hibernation image.
> > + *
> > + * If image_size is set below the number following from the above formula,
> > + * the preallocation of memory is continued until the total number of page
> > + * frames in use is below the requested image size or it is impossible to
> > + * allocate more memory, whichever happens first.
> > + */
> 
> You should also be taking into account how much storage is available
> here - that would make things more reliable. If compression is begin
> used, you could also apply an 'expected compression ratio' so that you
> don't unnecessarily free memory that will fit once compressed.

Currently compression is only done in user space so I don't know in advance
whether or not it's going to be used.

> >  int swsusp_shrink_memory(void)
> >  {
> > -	long tmp;
> >  	struct zone *zone;
> > -	unsigned long pages = 0;
> > -	unsigned int i = 0;
> > -	char *p = "-\\|/";
> > +	unsigned long saveable, size, max_size, count, pages = 0;
> >  	struct timeval start, stop;
> > +	int error = 0;
> >  
> > -	printk(KERN_INFO "PM: Shrinking memory...  ");
> > +	printk(KERN_INFO "PM: Shrinking memory ... ");
> 
> Without the space is normal, at least to my mind.

OK

> >  	do_gettimeofday(&start);
> > -	do {
> > -		long size, highmem_size;
> >  
> > -		highmem_size = count_highmem_pages();
> > -		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
> > -		tmp = size;
> > -		size += highmem_size;
> > -		for_each_populated_zone(zone) {
> > -			tmp += snapshot_additional_pages(zone);
> > -			if (is_highmem(zone)) {
> > -				highmem_size -=
> > -					zone_page_state(zone, NR_FREE_PAGES);
> > -			} else {
> > -				tmp -= zone_page_state(zone, NR_FREE_PAGES);
> > -				tmp += zone->lowmem_reserve[ZONE_NORMAL];
> > -			}
> > -		}
> > +	/* Count the number of saveable data pages. */
> > +	saveable = count_data_pages() + count_highmem_pages();
> > +
> > +	/*
> > +	 * Compute the total number of page frames we can use (count) and the
> > +	 * number of pages needed for image metadata (size).
> > +	 */
> > +	count = saveable;
> > +	size = 0;
> > +	for_each_populated_zone(zone) {
> > +		size += snapshot_additional_pages(zone);
> > +		count += zone_page_state(zone, NR_FREE_PAGES);
> > +		if (!is_highmem(zone))
> > +			count -= zone->lowmem_reserve[ZONE_NORMAL];
> > +	}
> >  
> > -		if (highmem_size < 0)
> > -			highmem_size = 0;
> 
> You're not taking watermarks into account here - that isn't a problem
> with shrink_all_memory because it usually frees more than you ask for
> (or has done in the past), but if you're getting exactly what you ask
> for, you might run into trouble if more than half of memory is in use to
> start with.

Hmm, why exactly?

> > +	/* Compute the maximum number of saveable pages to leave in memory. */
> > +	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
> > +	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
> > +	if (size > max_size)
> > +		size = max_size;
> > +	/*
> > +	 * If the maximum is not lesser than the current number of saveable
> 
> s/lesser/less/

Right, thanks.

> > +	 * pages in memory, we don't need to do anything more.
> > +	 */
> > +	if (size >= saveable)
> > +		goto out;
> >  
> > -		tmp += highmem_size;
> > -		if (tmp > 0) {
> > -			tmp = __shrink_memory(tmp);
> > -			if (!tmp)
> > -				return -ENOMEM;
> > -			pages += tmp;
> > -		} else if (size > image_size / PAGE_SIZE) {
> > -			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
> > -			pages += tmp;
> > -		}
> > -		printk("\b%c", p[i++%4]);
> > -	} while (tmp > 0);
> > +	/*
> > +	 * Let the memory management subsystem know that we're going to need a
> > +	 * large number of page frames to allocate and make it free some memory.
> > +	 * NOTE: If this is not done, performance is heavily affected in some
> > +	 * test cases.
> > +	 */
> > +	shrink_all_memory(saveable - size);
> > +
> > +	/*
> > +	 * The number of saveable pages in memory was too high, so apply some
> > +	 * pressure to decrease it.  First, make room for the largest possible
> > +	 * image and fail if that doesn't work.  Next, try to decrease the size
> > +	 * of the image as much as indicated by image_size.
> > +	 */
> > +	count -= max_size;
> > +	pages = preallocate_image_memory(count);
> > +	if (pages < count)
> > +		error = -ENOMEM;
> > +	else
> > +		pages += preallocate_image_memory(max_size - size);
> > +
> > +	/* Release all of the preallocated page frames. */
> > +	swsusp_free();
> > +
> > +	if (error) {
> > +		printk(KERN_CONT "\n");
> > +		return error;
> > +	}
> > +
> > + out:
> >  	do_gettimeofday(&stop);
> > -	printk("\bdone (%lu pages freed)\n", pages);
> > +	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
> >  	swsusp_show_speed(&start, &stop, pages, "Freed");
> >  
> >  	return 0;

Best,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-06 23:27   ` Nigel Cunningham
@ 2009-05-07 12:18     ` Rafael J. Wysocki
  2009-05-07 12:18     ` Rafael J. Wysocki
  1 sibling, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 12:18 UTC (permalink / raw)
  To: nigel; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

On Thursday 07 May 2009, Nigel Cunningham wrote:
> Hi.

Hi,

> On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > just once to make some room for the image and then allocates memory
> > to apply more pressure to the memory management subsystem, if
> > necessary.
> > 
> > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > entirely just yet, because that would lead to huge performance
> > regressions in some test cases.
> 
> I know it doesn't fit with your current way of doing things, but have
> you considered trying larger order allocations as a means of getting
> memory freed?

Actually, I was thinking about that.  What's your experience with this
approach?

> I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> might be helpful for this purpose.

OK, thanks.  I'll have a look at it.

> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
> >  1 file changed, 98 insertions(+), 47 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1066,69 +1066,120 @@ void swsusp_free(void)
> >  	buffer = NULL;
> >  }
> >  
> > +/* Helper functions used for the shrinking of memory. */
> > +
> >  /**
> > - *	swsusp_shrink_memory -  Try to free as much memory as needed
> > - *
> > - *	... but do not OOM-kill anyone
> > + * preallocate_image_memory - Allocate given number of page frames
> > + * @nr_pages: Number of page frames to allocate
> >   *
> > - *	Notice: all userland should be stopped before it is called, or
> > - *	livelock is possible.
> > + * Return value: Number of page frames actually allocated
> >   */
> > -
> > -#define SHRINK_BITE	10000
> > -static inline unsigned long __shrink_memory(long tmp)
> > +static unsigned long preallocate_image_memory(unsigned long nr_pages)
> >  {
> > -	if (tmp > SHRINK_BITE)
> > -		tmp = SHRINK_BITE;
> > -	return shrink_all_memory(tmp);
> > +	unsigned long nr_alloc = 0;
> > +
> > +	while (nr_pages-- > 0) {
> > +		struct page *page;
> > +
> > +		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
> 
> Ah... now I see you're using __GFP_NOWARN already :)
> 
> > +		if (!page)
> > +			break;
> > +		nr_alloc++;
> > +	}
> > +
> > +	return nr_alloc;
> >  }
> >  
> > +/**
> > + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> > + *
> > + * To create a hibernation image it is necessary to make a copy of every page
> > + * frame in use.  We also need a number of page frames to be free during
> > + * hibernation for allocations made while saving the image and for device
> > + * drivers, in case they need to allocate memory from their hibernation
> > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> > + * respectively, both of which are rough estimates).  To make this happen, we
> > + * compute the total number of available page frames and allocate at least
> > + *
> > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> > + *
> > + * of them, which corresponds to the maximum size of a hibernation image.
> > + *
> > + * If image_size is set below the number following from the above formula,
> > + * the preallocation of memory is continued until the total number of page
> > + * frames in use is below the requested image size or it is impossible to
> > + * allocate more memory, whichever happens first.
> > + */
> 
> You should also be taking into account how much storage is available
> here - that would make things more reliable. If compression is begin
> used, you could also apply an 'expected compression ratio' so that you
> don't unnecessarily free memory that will fit once compressed.

Currently compression is only done in user space so I don't know in advance
whether or not it's going to be used.

> >  int swsusp_shrink_memory(void)
> >  {
> > -	long tmp;
> >  	struct zone *zone;
> > -	unsigned long pages = 0;
> > -	unsigned int i = 0;
> > -	char *p = "-\\|/";
> > +	unsigned long saveable, size, max_size, count, pages = 0;
> >  	struct timeval start, stop;
> > +	int error = 0;
> >  
> > -	printk(KERN_INFO "PM: Shrinking memory...  ");
> > +	printk(KERN_INFO "PM: Shrinking memory ... ");
> 
> Without the space is normal, at least to my mind.

OK

> >  	do_gettimeofday(&start);
> > -	do {
> > -		long size, highmem_size;
> >  
> > -		highmem_size = count_highmem_pages();
> > -		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
> > -		tmp = size;
> > -		size += highmem_size;
> > -		for_each_populated_zone(zone) {
> > -			tmp += snapshot_additional_pages(zone);
> > -			if (is_highmem(zone)) {
> > -				highmem_size -=
> > -					zone_page_state(zone, NR_FREE_PAGES);
> > -			} else {
> > -				tmp -= zone_page_state(zone, NR_FREE_PAGES);
> > -				tmp += zone->lowmem_reserve[ZONE_NORMAL];
> > -			}
> > -		}
> > +	/* Count the number of saveable data pages. */
> > +	saveable = count_data_pages() + count_highmem_pages();
> > +
> > +	/*
> > +	 * Compute the total number of page frames we can use (count) and the
> > +	 * number of pages needed for image metadata (size).
> > +	 */
> > +	count = saveable;
> > +	size = 0;
> > +	for_each_populated_zone(zone) {
> > +		size += snapshot_additional_pages(zone);
> > +		count += zone_page_state(zone, NR_FREE_PAGES);
> > +		if (!is_highmem(zone))
> > +			count -= zone->lowmem_reserve[ZONE_NORMAL];
> > +	}
> >  
> > -		if (highmem_size < 0)
> > -			highmem_size = 0;
> 
> You're not taking watermarks into account here - that isn't a problem
> with shrink_all_memory because it usually frees more than you ask for
> (or has done in the past), but if you're getting exactly what you ask
> for, you might run into trouble if more than half of memory is in use to
> start with.

Hmm, why exactly?

> > +	/* Compute the maximum number of saveable pages to leave in memory. */
> > +	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
> > +	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
> > +	if (size > max_size)
> > +		size = max_size;
> > +	/*
> > +	 * If the maximum is not lesser than the current number of saveable
> 
> s/lesser/less/

Right, thanks.

> > +	 * pages in memory, we don't need to do anything more.
> > +	 */
> > +	if (size >= saveable)
> > +		goto out;
> >  
> > -		tmp += highmem_size;
> > -		if (tmp > 0) {
> > -			tmp = __shrink_memory(tmp);
> > -			if (!tmp)
> > -				return -ENOMEM;
> > -			pages += tmp;
> > -		} else if (size > image_size / PAGE_SIZE) {
> > -			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
> > -			pages += tmp;
> > -		}
> > -		printk("\b%c", p[i++%4]);
> > -	} while (tmp > 0);
> > +	/*
> > +	 * Let the memory management subsystem know that we're going to need a
> > +	 * large number of page frames to allocate and make it free some memory.
> > +	 * NOTE: If this is not done, performance is heavily affected in some
> > +	 * test cases.
> > +	 */
> > +	shrink_all_memory(saveable - size);
> > +
> > +	/*
> > +	 * The number of saveable pages in memory was too high, so apply some
> > +	 * pressure to decrease it.  First, make room for the largest possible
> > +	 * image and fail if that doesn't work.  Next, try to decrease the size
> > +	 * of the image as much as indicated by image_size.
> > +	 */
> > +	count -= max_size;
> > +	pages = preallocate_image_memory(count);
> > +	if (pages < count)
> > +		error = -ENOMEM;
> > +	else
> > +		pages += preallocate_image_memory(max_size - size);
> > +
> > +	/* Release all of the preallocated page frames. */
> > +	swsusp_free();
> > +
> > +	if (error) {
> > +		printk(KERN_CONT "\n");
> > +		return error;
> > +	}
> > +
> > + out:
> >  	do_gettimeofday(&stop);
> > -	printk("\bdone (%lu pages freed)\n", pages);
> > +	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
> >  	swsusp_show_speed(&start, &stop, pages, "Freed");
> >  
> >  	return 0;

Best,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 12:18     ` Rafael J. Wysocki
  2009-05-07 20:00       ` Rafael J. Wysocki
@ 2009-05-07 20:00       ` Rafael J. Wysocki
  2009-05-07 20:53         ` Nigel Cunningham
  2009-05-07 20:53         ` Nigel Cunningham
  2009-05-07 20:51       ` Nigel Cunningham
  2009-05-07 20:51       ` Nigel Cunningham
  3 siblings, 2 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 20:00 UTC (permalink / raw)
  To: nigel; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Nigel Cunningham wrote:
> > Hi.
> 
> Hi,
> 
> > On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > 
> > I know it doesn't fit with your current way of doing things, but have
> > you considered trying larger order allocations as a means of getting
> > memory freed?
> 
> Actually, I was thinking about that.  What's your experience with this
> approach?
> 
> > I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> > might be helpful for this purpose.
> 
> OK, thanks.  I'll have a look at it.

I have tried it, but the results are even worse than with 0-order allocations
only.

So far, I have got the best results with shrink_all_memory() called once and
followed by allocating as much memory as we want to be free using 0-order
allocations.  Like in this patch. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 12:18     ` Rafael J. Wysocki
@ 2009-05-07 20:00       ` Rafael J. Wysocki
  2009-05-07 20:00       ` Rafael J. Wysocki
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 20:00 UTC (permalink / raw)
  To: nigel; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Nigel Cunningham wrote:
> > Hi.
> 
> Hi,
> 
> > On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > 
> > I know it doesn't fit with your current way of doing things, but have
> > you considered trying larger order allocations as a means of getting
> > memory freed?
> 
> Actually, I was thinking about that.  What's your experience with this
> approach?
> 
> > I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> > might be helpful for this purpose.
> 
> OK, thanks.  I'll have a look at it.

I have tried it, but the results are even worse than with 0-order allocations
only.

So far, I have got the best results with shrink_all_memory() called once and
followed by allocating as much memory as we want to be free using 0-order
allocations.  Like in this patch. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 12:18     ` Rafael J. Wysocki
  2009-05-07 20:00       ` Rafael J. Wysocki
  2009-05-07 20:00       ` Rafael J. Wysocki
@ 2009-05-07 20:51       ` Nigel Cunningham
  2009-05-07 20:51       ` Nigel Cunningham
  3 siblings, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-07 20:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

Hi.

On Thu, 2009-05-07 at 14:18 +0200, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Nigel Cunningham wrote:
> > Hi.
> 
> Hi,
> 
> > On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > 
> > I know it doesn't fit with your current way of doing things, but have
> > you considered trying larger order allocations as a means of getting
> > memory freed?
> 
> Actually, I was thinking about that.  What's your experience with this
> approach?

I can't give you statistics, but it seems faster and the VM seems to
work better with the implicit hint that we want larger amounts of memory
than just single pages. The main difficulty is making sure that drivers
can still do allocations with order > 0 later. Some of them seem to want
to do that instead of vmallocing.

> > I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> > might be helpful for this purpose.
> 
> OK, thanks.  I'll have a look at it.
> 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > >  kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
> > >  1 file changed, 98 insertions(+), 47 deletions(-)
> > > 
> > > Index: linux-2.6/kernel/power/snapshot.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/power/snapshot.c
> > > +++ linux-2.6/kernel/power/snapshot.c
> > > @@ -1066,69 +1066,120 @@ void swsusp_free(void)
> > >  	buffer = NULL;
> > >  }
> > >  
> > > +/* Helper functions used for the shrinking of memory. */
> > > +
> > >  /**
> > > - *	swsusp_shrink_memory -  Try to free as much memory as needed
> > > - *
> > > - *	... but do not OOM-kill anyone
> > > + * preallocate_image_memory - Allocate given number of page frames
> > > + * @nr_pages: Number of page frames to allocate
> > >   *
> > > - *	Notice: all userland should be stopped before it is called, or
> > > - *	livelock is possible.
> > > + * Return value: Number of page frames actually allocated
> > >   */
> > > -
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +static unsigned long preallocate_image_memory(unsigned long nr_pages)
> > >  {
> > > -	if (tmp > SHRINK_BITE)
> > > -		tmp = SHRINK_BITE;
> > > -	return shrink_all_memory(tmp);
> > > +	unsigned long nr_alloc = 0;
> > > +
> > > +	while (nr_pages-- > 0) {
> > > +		struct page *page;
> > > +
> > > +		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
> > 
> > Ah... now I see you're using __GFP_NOWARN already :)
> > 
> > > +		if (!page)
> > > +			break;
> > > +		nr_alloc++;
> > > +	}
> > > +
> > > +	return nr_alloc;
> > >  }
> > >  
> > > +/**
> > > + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> > > + *
> > > + * To create a hibernation image it is necessary to make a copy of every page
> > > + * frame in use.  We also need a number of page frames to be free during
> > > + * hibernation for allocations made while saving the image and for device
> > > + * drivers, in case they need to allocate memory from their hibernation
> > > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> > > + * respectively, both of which are rough estimates).  To make this happen, we
> > > + * compute the total number of available page frames and allocate at least
> > > + *
> > > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> > > + *
> > > + * of them, which corresponds to the maximum size of a hibernation image.
> > > + *
> > > + * If image_size is set below the number following from the above formula,
> > > + * the preallocation of memory is continued until the total number of page
> > > + * frames in use is below the requested image size or it is impossible to
> > > + * allocate more memory, whichever happens first.
> > > + */
> > 
> > You should also be taking into account how much storage is available
> > here - that would make things more reliable. If compression is begin
> > used, you could also apply an 'expected compression ratio' so that you
> > don't unnecessarily free memory that will fit once compressed.
> 
> Currently compression is only done in user space so I don't know in advance
> whether or not it's going to be used.

k. I guess it's more worth the effort in our case, but would it be that
hard to do? An extra ioctl, I guess?

> > >  int swsusp_shrink_memory(void)
> > >  {
> > > -	long tmp;
> > >  	struct zone *zone;
> > > -	unsigned long pages = 0;
> > > -	unsigned int i = 0;
> > > -	char *p = "-\\|/";
> > > +	unsigned long saveable, size, max_size, count, pages = 0;
> > >  	struct timeval start, stop;
> > > +	int error = 0;
> > >  
> > > -	printk(KERN_INFO "PM: Shrinking memory...  ");
> > > +	printk(KERN_INFO "PM: Shrinking memory ... ");
> > 
> > Without the space is normal, at least to my mind.
> 
> OK
> 
> > >  	do_gettimeofday(&start);
> > > -	do {
> > > -		long size, highmem_size;
> > >  
> > > -		highmem_size = count_highmem_pages();
> > > -		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
> > > -		tmp = size;
> > > -		size += highmem_size;
> > > -		for_each_populated_zone(zone) {
> > > -			tmp += snapshot_additional_pages(zone);
> > > -			if (is_highmem(zone)) {
> > > -				highmem_size -=
> > > -					zone_page_state(zone, NR_FREE_PAGES);
> > > -			} else {
> > > -				tmp -= zone_page_state(zone, NR_FREE_PAGES);
> > > -				tmp += zone->lowmem_reserve[ZONE_NORMAL];
> > > -			}
> > > -		}
> > > +	/* Count the number of saveable data pages. */
> > > +	saveable = count_data_pages() + count_highmem_pages();
> > > +
> > > +	/*
> > > +	 * Compute the total number of page frames we can use (count) and the
> > > +	 * number of pages needed for image metadata (size).
> > > +	 */
> > > +	count = saveable;
> > > +	size = 0;
> > > +	for_each_populated_zone(zone) {
> > > +		size += snapshot_additional_pages(zone);
> > > +		count += zone_page_state(zone, NR_FREE_PAGES);
> > > +		if (!is_highmem(zone))
> > > +			count -= zone->lowmem_reserve[ZONE_NORMAL];
> > > +	}
> > >  
> > > -		if (highmem_size < 0)
> > > -			highmem_size = 0;
> > 
> > You're not taking watermarks into account here - that isn't a problem
> > with shrink_all_memory because it usually frees more than you ask for
> > (or has done in the past), but if you're getting exactly what you ask
> > for, you might run into trouble if more than half of memory is in use to
> > start with.
> 
> Hmm, why exactly?

You can't allocate the memory in the watermarks. Or at least, you're not
supposed to - it's supposed to be for 'emergency' allocations - so
things can still make progress in low memory situations. Personally, I
think watermarks should be irrelevant in the hibernation case. We know
almost exactly what's going on in the system. If our code is well
written, we should be able to account for every page being allocated and
freed. (Okay, maybe some leeway for the lowlevel bio code).

> > > +	/* Compute the maximum number of saveable pages to leave in memory. */
> > > +	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
> > > +	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
> > > +	if (size > max_size)
> > > +		size = max_size;
> > > +	/*
> > > +	 * If the maximum is not lesser than the current number of saveable
> > 
> > s/lesser/less/
> 
> Right, thanks.
> 
> > > +	 * pages in memory, we don't need to do anything more.
> > > +	 */
> > > +	if (size >= saveable)
> > > +		goto out;
> > >  
> > > -		tmp += highmem_size;
> > > -		if (tmp > 0) {
> > > -			tmp = __shrink_memory(tmp);
> > > -			if (!tmp)
> > > -				return -ENOMEM;
> > > -			pages += tmp;
> > > -		} else if (size > image_size / PAGE_SIZE) {
> > > -			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
> > > -			pages += tmp;
> > > -		}
> > > -		printk("\b%c", p[i++%4]);
> > > -	} while (tmp > 0);
> > > +	/*
> > > +	 * Let the memory management subsystem know that we're going to need a
> > > +	 * large number of page frames to allocate and make it free some memory.
> > > +	 * NOTE: If this is not done, performance is heavily affected in some
> > > +	 * test cases.
> > > +	 */
> > > +	shrink_all_memory(saveable - size);
> > > +
> > > +	/*
> > > +	 * The number of saveable pages in memory was too high, so apply some
> > > +	 * pressure to decrease it.  First, make room for the largest possible
> > > +	 * image and fail if that doesn't work.  Next, try to decrease the size
> > > +	 * of the image as much as indicated by image_size.
> > > +	 */
> > > +	count -= max_size;
> > > +	pages = preallocate_image_memory(count);
> > > +	if (pages < count)
> > > +		error = -ENOMEM;
> > > +	else
> > > +		pages += preallocate_image_memory(max_size - size);
> > > +
> > > +	/* Release all of the preallocated page frames. */
> > > +	swsusp_free();
> > > +
> > > +	if (error) {
> > > +		printk(KERN_CONT "\n");
> > > +		return error;
> > > +	}
> > > +
> > > + out:
> > >  	do_gettimeofday(&stop);
> > > -	printk("\bdone (%lu pages freed)\n", pages);
> > > +	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
> > >  	swsusp_show_speed(&start, &stop, pages, "Freed");
> > >  
> > >  	return 0;

Regards,

Nigel


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 12:18     ` Rafael J. Wysocki
                         ` (2 preceding siblings ...)
  2009-05-07 20:51       ` Nigel Cunningham
@ 2009-05-07 20:51       ` Nigel Cunningham
  3 siblings, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-07 20:51 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

Hi.

On Thu, 2009-05-07 at 14:18 +0200, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Nigel Cunningham wrote:
> > Hi.
> 
> Hi,
> 
> > On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > 
> > I know it doesn't fit with your current way of doing things, but have
> > you considered trying larger order allocations as a means of getting
> > memory freed?
> 
> Actually, I was thinking about that.  What's your experience with this
> approach?

I can't give you statistics, but it seems faster and the VM seems to
work better with the implicit hint that we want larger amounts of memory
than just single pages. The main difficulty is making sure that drivers
can still do allocations with order > 0 later. Some of them seem to want
to do that instead of vmallocing.

> > I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> > might be helpful for this purpose.
> 
> OK, thanks.  I'll have a look at it.
> 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > >  kernel/power/snapshot.c |  145 ++++++++++++++++++++++++++++++++----------------
> > >  1 file changed, 98 insertions(+), 47 deletions(-)
> > > 
> > > Index: linux-2.6/kernel/power/snapshot.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/power/snapshot.c
> > > +++ linux-2.6/kernel/power/snapshot.c
> > > @@ -1066,69 +1066,120 @@ void swsusp_free(void)
> > >  	buffer = NULL;
> > >  }
> > >  
> > > +/* Helper functions used for the shrinking of memory. */
> > > +
> > >  /**
> > > - *	swsusp_shrink_memory -  Try to free as much memory as needed
> > > - *
> > > - *	... but do not OOM-kill anyone
> > > + * preallocate_image_memory - Allocate given number of page frames
> > > + * @nr_pages: Number of page frames to allocate
> > >   *
> > > - *	Notice: all userland should be stopped before it is called, or
> > > - *	livelock is possible.
> > > + * Return value: Number of page frames actually allocated
> > >   */
> > > -
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +static unsigned long preallocate_image_memory(unsigned long nr_pages)
> > >  {
> > > -	if (tmp > SHRINK_BITE)
> > > -		tmp = SHRINK_BITE;
> > > -	return shrink_all_memory(tmp);
> > > +	unsigned long nr_alloc = 0;
> > > +
> > > +	while (nr_pages-- > 0) {
> > > +		struct page *page;
> > > +
> > > +		page = alloc_image_page(GFP_KERNEL | __GFP_NOWARN);
> > 
> > Ah... now I see you're using __GFP_NOWARN already :)
> > 
> > > +		if (!page)
> > > +			break;
> > > +		nr_alloc++;
> > > +	}
> > > +
> > > +	return nr_alloc;
> > >  }
> > >  
> > > +/**
> > > + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> > > + *
> > > + * To create a hibernation image it is necessary to make a copy of every page
> > > + * frame in use.  We also need a number of page frames to be free during
> > > + * hibernation for allocations made while saving the image and for device
> > > + * drivers, in case they need to allocate memory from their hibernation
> > > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> > > + * respectively, both of which are rough estimates).  To make this happen, we
> > > + * compute the total number of available page frames and allocate at least
> > > + *
> > > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> > > + *
> > > + * of them, which corresponds to the maximum size of a hibernation image.
> > > + *
> > > + * If image_size is set below the number following from the above formula,
> > > + * the preallocation of memory is continued until the total number of page
> > > + * frames in use is below the requested image size or it is impossible to
> > > + * allocate more memory, whichever happens first.
> > > + */
> > 
> > You should also be taking into account how much storage is available
> > here - that would make things more reliable. If compression is begin
> > used, you could also apply an 'expected compression ratio' so that you
> > don't unnecessarily free memory that will fit once compressed.
> 
> Currently compression is only done in user space so I don't know in advance
> whether or not it's going to be used.

k. I guess it's more worth the effort in our case, but would it be that
hard to do? An extra ioctl, I guess?

> > >  int swsusp_shrink_memory(void)
> > >  {
> > > -	long tmp;
> > >  	struct zone *zone;
> > > -	unsigned long pages = 0;
> > > -	unsigned int i = 0;
> > > -	char *p = "-\\|/";
> > > +	unsigned long saveable, size, max_size, count, pages = 0;
> > >  	struct timeval start, stop;
> > > +	int error = 0;
> > >  
> > > -	printk(KERN_INFO "PM: Shrinking memory...  ");
> > > +	printk(KERN_INFO "PM: Shrinking memory ... ");
> > 
> > Without the space is normal, at least to my mind.
> 
> OK
> 
> > >  	do_gettimeofday(&start);
> > > -	do {
> > > -		long size, highmem_size;
> > >  
> > > -		highmem_size = count_highmem_pages();
> > > -		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
> > > -		tmp = size;
> > > -		size += highmem_size;
> > > -		for_each_populated_zone(zone) {
> > > -			tmp += snapshot_additional_pages(zone);
> > > -			if (is_highmem(zone)) {
> > > -				highmem_size -=
> > > -					zone_page_state(zone, NR_FREE_PAGES);
> > > -			} else {
> > > -				tmp -= zone_page_state(zone, NR_FREE_PAGES);
> > > -				tmp += zone->lowmem_reserve[ZONE_NORMAL];
> > > -			}
> > > -		}
> > > +	/* Count the number of saveable data pages. */
> > > +	saveable = count_data_pages() + count_highmem_pages();
> > > +
> > > +	/*
> > > +	 * Compute the total number of page frames we can use (count) and the
> > > +	 * number of pages needed for image metadata (size).
> > > +	 */
> > > +	count = saveable;
> > > +	size = 0;
> > > +	for_each_populated_zone(zone) {
> > > +		size += snapshot_additional_pages(zone);
> > > +		count += zone_page_state(zone, NR_FREE_PAGES);
> > > +		if (!is_highmem(zone))
> > > +			count -= zone->lowmem_reserve[ZONE_NORMAL];
> > > +	}
> > >  
> > > -		if (highmem_size < 0)
> > > -			highmem_size = 0;
> > 
> > You're not taking watermarks into account here - that isn't a problem
> > with shrink_all_memory because it usually frees more than you ask for
> > (or has done in the past), but if you're getting exactly what you ask
> > for, you might run into trouble if more than half of memory is in use to
> > start with.
> 
> Hmm, why exactly?

You can't allocate the memory in the watermarks. Or at least, you're not
supposed to - it's supposed to be for 'emergency' allocations - so
things can still make progress in low memory situations. Personally, I
think watermarks should be irrelevant in the hibernation case. We know
almost exactly what's going on in the system. If our code is well
written, we should be able to account for every page being allocated and
freed. (Okay, maybe some leeway for the lowlevel bio code).

> > > +	/* Compute the maximum number of saveable pages to leave in memory. */
> > > +	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
> > > +	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
> > > +	if (size > max_size)
> > > +		size = max_size;
> > > +	/*
> > > +	 * If the maximum is not lesser than the current number of saveable
> > 
> > s/lesser/less/
> 
> Right, thanks.
> 
> > > +	 * pages in memory, we don't need to do anything more.
> > > +	 */
> > > +	if (size >= saveable)
> > > +		goto out;
> > >  
> > > -		tmp += highmem_size;
> > > -		if (tmp > 0) {
> > > -			tmp = __shrink_memory(tmp);
> > > -			if (!tmp)
> > > -				return -ENOMEM;
> > > -			pages += tmp;
> > > -		} else if (size > image_size / PAGE_SIZE) {
> > > -			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
> > > -			pages += tmp;
> > > -		}
> > > -		printk("\b%c", p[i++%4]);
> > > -	} while (tmp > 0);
> > > +	/*
> > > +	 * Let the memory management subsystem know that we're going to need a
> > > +	 * large number of page frames to allocate and make it free some memory.
> > > +	 * NOTE: If this is not done, performance is heavily affected in some
> > > +	 * test cases.
> > > +	 */
> > > +	shrink_all_memory(saveable - size);
> > > +
> > > +	/*
> > > +	 * The number of saveable pages in memory was too high, so apply some
> > > +	 * pressure to decrease it.  First, make room for the largest possible
> > > +	 * image and fail if that doesn't work.  Next, try to decrease the size
> > > +	 * of the image as much as indicated by image_size.
> > > +	 */
> > > +	count -= max_size;
> > > +	pages = preallocate_image_memory(count);
> > > +	if (pages < count)
> > > +		error = -ENOMEM;
> > > +	else
> > > +		pages += preallocate_image_memory(max_size - size);
> > > +
> > > +	/* Release all of the preallocated page frames. */
> > > +	swsusp_free();
> > > +
> > > +	if (error) {
> > > +		printk(KERN_CONT "\n");
> > > +		return error;
> > > +	}
> > > +
> > > + out:
> > >  	do_gettimeofday(&stop);
> > > -	printk("\bdone (%lu pages freed)\n", pages);
> > > +	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
> > >  	swsusp_show_speed(&start, &stop, pages, "Freed");
> > >  
> > >  	return 0;

Regards,

Nigel

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 20:00       ` Rafael J. Wysocki
@ 2009-05-07 20:53         ` Nigel Cunningham
  2009-05-07 20:53         ` Nigel Cunningham
  1 sibling, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-07 20:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek

Hi.

On Thu, 2009-05-07 at 22:00 +0200, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > On Thursday 07 May 2009, Nigel Cunningham wrote:
> > > Hi.
> > 
> > Hi,
> > 
> > > On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > > 
> > > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > > just once to make some room for the image and then allocates memory
> > > > to apply more pressure to the memory management subsystem, if
> > > > necessary.
> > > > 
> > > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > > entirely just yet, because that would lead to huge performance
> > > > regressions in some test cases.
> > > 
> > > I know it doesn't fit with your current way of doing things, but have
> > > you considered trying larger order allocations as a means of getting
> > > memory freed?
> > 
> > Actually, I was thinking about that.  What's your experience with this
> > approach?
> > 
> > > I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> > > might be helpful for this purpose.
> > 
> > OK, thanks.  I'll have a look at it.
> 
> I have tried it, but the results are even worse than with 0-order allocations
> only.
> 
> So far, I have got the best results with shrink_all_memory() called once and
> followed by allocating as much memory as we want to be free using 0-order
> allocations.  Like in this patch. :-)

Hmm. That's surprising. It would be interesting to look at what's going
on. Unfortunately, I just don't have the time at the moment to help.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 20:00       ` Rafael J. Wysocki
  2009-05-07 20:53         ` Nigel Cunningham
@ 2009-05-07 20:53         ` Nigel Cunningham
  1 sibling, 0 replies; 205+ messages in thread
From: Nigel Cunningham @ 2009-05-07 20:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, Wu Fengguang, Andrew Morton, LKML

Hi.

On Thu, 2009-05-07 at 22:00 +0200, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > On Thursday 07 May 2009, Nigel Cunningham wrote:
> > > Hi.
> > 
> > Hi,
> > 
> > > On Thu, 2009-05-07 at 00:44 +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > > 
> > > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > > just once to make some room for the image and then allocates memory
> > > > to apply more pressure to the memory management subsystem, if
> > > > necessary.
> > > > 
> > > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > > entirely just yet, because that would lead to huge performance
> > > > regressions in some test cases.
> > > 
> > > I know it doesn't fit with your current way of doing things, but have
> > > you considered trying larger order allocations as a means of getting
> > > memory freed?
> > 
> > Actually, I was thinking about that.  What's your experience with this
> > approach?
> > 
> > > I have code in tuxonice_prepare_image.c (look for extra_pages_allocated) that
> > > might be helpful for this purpose.
> > 
> > OK, thanks.  I'll have a look at it.
> 
> I have tried it, but the results are even worse than with 0-order allocations
> only.
> 
> So far, I have got the best results with shrink_all_memory() called once and
> followed by allocating as much memory as we want to be free using 0-order
> allocations.  Like in this patch. :-)

Hmm. That's surprising. It would be interesting to look at what's going
on. Unfortunately, I just don't have the time at the moment to help.

Regards,

Nigel

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking (rev. 2)
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
@ 2009-05-07 21:48   ` Rafael J. Wysocki
  2009-05-06 22:41 ` Rafael J. Wysocki
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:48 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> Hi,
> 
> The following patchset is an attempt to rework the memory shrinking mechanism
> used during hibernation to make room for the image.  It is a work in progress
> and most likely it's going to be modified, but it has been discussed recently
> and I'd like to get comments on the current version.
> 
> [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
>         it's verified that we can avoid the OOM killing by using
>         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
>         in the next patches).
> 
> [2/5] - drop memory shrinking from the suspend (to RAM) code path
> 
> [3/5] - move swsusp_shrink_memory() to snapshot.c
> 
> [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
>         memory pressure)
> 
> [5/5] - allocate image pages along with the shrinking.

Updated patchset follows.

Most importantly, the first patch has been replaced by the one adding
__GFP_NO_OOM_KILL, following the Andrew's advice.  The other patches are
slightly changed to address some comments I've received since yesterday.

Please tell me what you think.

Best,
Rafael


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking (rev. 2)
  2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
                   ` (9 preceding siblings ...)
  2009-05-06 22:48 ` Rafael J. Wysocki
@ 2009-05-07 21:48 ` Rafael J. Wysocki
  2009-05-07 21:48   ` Rafael J. Wysocki
  11 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:48 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> Hi,
> 
> The following patchset is an attempt to rework the memory shrinking mechanism
> used during hibernation to make room for the image.  It is a work in progress
> and most likely it's going to be modified, but it has been discussed recently
> and I'd like to get comments on the current version.
> 
> [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
>         it's verified that we can avoid the OOM killing by using
>         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
>         in the next patches).
> 
> [2/5] - drop memory shrinking from the suspend (to RAM) code path
> 
> [3/5] - move swsusp_shrink_memory() to snapshot.c
> 
> [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
>         memory pressure)
> 
> [5/5] - allocate image pages along with the shrinking.

Updated patchset follows.

Most importantly, the first patch has been replaced by the one adding
__GFP_NO_OOM_KILL, following the Andrew's advice.  The other patches are
slightly changed to address some comments I've received since yesterday.

Please tell me what you think.

Best,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking (rev. 2)
@ 2009-05-07 21:48   ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:48 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> Hi,
> 
> The following patchset is an attempt to rework the memory shrinking mechanism
> used during hibernation to make room for the image.  It is a work in progress
> and most likely it's going to be modified, but it has been discussed recently
> and I'd like to get comments on the current version.
> 
> [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
>         it's verified that we can avoid the OOM killing by using
>         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
>         in the next patches).
> 
> [2/5] - drop memory shrinking from the suspend (to RAM) code path
> 
> [3/5] - move swsusp_shrink_memory() to snapshot.c
> 
> [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
>         memory pressure)
> 
> [5/5] - allocate image pages along with the shrinking.

Updated patchset follows.

Most importantly, the first patch has been replaced by the one adding
__GFP_NO_OOM_KILL, following the Andrew's advice.  The other patches are
slightly changed to address some comments I've received since yesterday.

Please tell me what you think.

Best,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-07 21:48   ` Rafael J. Wysocki
@ 2009-05-07 21:50     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:50 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Andrew Morton <akpm@linux-foundation.org>

> > Remind me: why can't we just allocate N pages at suspend-time?
> 
> We need half of memory free. The reason we can't "just allocate" is
> probably OOM killer; but my memories are quite weak :-(.

hm.  You'd think that with our splendid range of __GFP_foo falgs, there
would be some combo which would suit this requirement but I can't
immediately spot one.

We can always add another I guess.  Something like...

[rjw: fixed white space, added comment in page_alloc.c]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h |    3 ++-
 mm/page_alloc.c     |    8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -1619,8 +1619,12 @@ nofail_alloc:
 			goto got_pg;
 		}
 
-		/* The OOM killer will not help higher order allocs so fail */
-		if (order > PAGE_ALLOC_COSTLY_ORDER) {
+		/*
+		 * The OOM killer will not help higher order allocs so fail.
+		 * Also fail if the caller doesn't want the OOM killer to run.
+		 */
+		if (order > PAGE_ALLOC_COSTLY_ORDER
+				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
 			clear_zonelist_oom(zonelist, gfp_mask);
 			goto nopage;
 		}
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -51,8 +51,9 @@ struct vm_area_struct;
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
 #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
 
-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-07 21:48   ` Rafael J. Wysocki
  (?)
@ 2009-05-07 21:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:50 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Andrew Morton <akpm@linux-foundation.org>

> > Remind me: why can't we just allocate N pages at suspend-time?
> 
> We need half of memory free. The reason we can't "just allocate" is
> probably OOM killer; but my memories are quite weak :-(.

hm.  You'd think that with our splendid range of __GFP_foo falgs, there
would be some combo which would suit this requirement but I can't
immediately spot one.

We can always add another I guess.  Something like...

[rjw: fixed white space, added comment in page_alloc.c]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h |    3 ++-
 mm/page_alloc.c     |    8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -1619,8 +1619,12 @@ nofail_alloc:
 			goto got_pg;
 		}
 
-		/* The OOM killer will not help higher order allocs so fail */
-		if (order > PAGE_ALLOC_COSTLY_ORDER) {
+		/*
+		 * The OOM killer will not help higher order allocs so fail.
+		 * Also fail if the caller doesn't want the OOM killer to run.
+		 */
+		if (order > PAGE_ALLOC_COSTLY_ORDER
+				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
 			clear_zonelist_oom(zonelist, gfp_mask);
 			goto nopage;
 		}
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -51,8 +51,9 @@ struct vm_area_struct;
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
 #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
 
-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL
@ 2009-05-07 21:50     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:50 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Andrew Morton <akpm@linux-foundation.org>

> > Remind me: why can't we just allocate N pages at suspend-time?
> 
> We need half of memory free. The reason we can't "just allocate" is
> probably OOM killer; but my memories are quite weak :-(.

hm.  You'd think that with our splendid range of __GFP_foo falgs, there
would be some combo which would suit this requirement but I can't
immediately spot one.

We can always add another I guess.  Something like...

[rjw: fixed white space, added comment in page_alloc.c]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h |    3 ++-
 mm/page_alloc.c     |    8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -1619,8 +1619,12 @@ nofail_alloc:
 			goto got_pg;
 		}
 
-		/* The OOM killer will not help higher order allocs so fail */
-		if (order > PAGE_ALLOC_COSTLY_ORDER) {
+		/*
+		 * The OOM killer will not help higher order allocs so fail.
+		 * Also fail if the caller doesn't want the OOM killer to run.
+		 */
+		if (order > PAGE_ALLOC_COSTLY_ORDER
+				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
 			clear_zonelist_oom(zonelist, gfp_mask);
 			goto nopage;
 		}
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -51,8 +51,9 @@ struct vm_area_struct;
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
 #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
 
-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-07 21:48   ` Rafael J. Wysocki
@ 2009-05-07 21:51     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:51 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-07 21:48   ` Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  (?)
@ 2009-05-07 21:51   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:51 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
@ 2009-05-07 21:51     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:51 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-07 21:48   ` Rafael J. Wysocki
@ 2009-05-07 21:51     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:51 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-07 21:48   ` Rafael J. Wysocki
                     ` (5 preceding siblings ...)
  (?)
@ 2009-05-07 21:51   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:51 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
@ 2009-05-07 21:51     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:51 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 21:48   ` Rafael J. Wysocki
@ 2009-05-07 21:53     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:53 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  144 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 97 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,119 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
+ * preallocate_image_memory - Allocate given number of page frames
+ * @nr_pages: Number of page frames to allocate
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(GFP_IMAGE))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
 }
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory ... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	saveable = count_data_pages() + count_highmem_pages();
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		count += zone_page_state(zone, NR_FREE_PAGES);
+		count -= zone->pages_min;
+	}
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not lesser than the current number of saveable
+	 * pages in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
+
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size.
+	 */
+	count -= max_size;
+	pages = preallocate_image_memory(count);
+	if (pages < count)
+		error = -ENOMEM;
+	else
+		pages += preallocate_image_memory(max_size - size);
+
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
  2009-05-07 21:48   ` Rafael J. Wysocki
                     ` (7 preceding siblings ...)
  (?)
@ 2009-05-07 21:53   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:53 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  144 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 97 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,119 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
+ * preallocate_image_memory - Allocate given number of page frames
+ * @nr_pages: Number of page frames to allocate
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(GFP_IMAGE))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
 }
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory ... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	saveable = count_data_pages() + count_highmem_pages();
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		count += zone_page_state(zone, NR_FREE_PAGES);
+		count -= zone->pages_min;
+	}
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not lesser than the current number of saveable
+	 * pages in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
+
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size.
+	 */
+	count -= max_size;
+	pages = preallocate_image_memory(count);
+	if (pages < count)
+		error = -ENOMEM;
+	else
+		pages += preallocate_image_memory(max_size - size);
+
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory
@ 2009-05-07 21:53     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:53 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  144 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 97 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,119 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
+ * preallocate_image_memory - Allocate given number of page frames
+ * @nr_pages: Number of page frames to allocate
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(GFP_IMAGE))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
 }
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory ... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	saveable = count_data_pages() + count_highmem_pages();
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		count += zone_page_state(zone, NR_FREE_PAGES);
+		count -= zone->pages_min;
+	}
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not lesser than the current number of saveable
+	 * pages in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
+
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size.
+	 */
+	count -= max_size;
+	pages = preallocate_image_memory(count);
+	if (pages < count)
+		error = -ENOMEM;
+	else
+		pages += preallocate_image_memory(max_size - size);
+
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-07 21:48   ` Rafael J. Wysocki
@ 2009-05-07 21:55     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:55 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 +++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  173 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 124 insertions(+), 66 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1081,8 +1102,16 @@ static unsigned long preallocate_image_m
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(GFP_IMAGE))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(GFP_IMAGE);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1091,7 +1120,30 @@ static unsigned long preallocate_image_m
 }
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1110,16 +1162,27 @@ static unsigned long preallocate_image_m
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory ... ");
+	printk(KERN_INFO "PM: Preallocating image memory ... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
 	saveable = count_data_pages() + count_highmem_pages();
 
@@ -1142,10 +1205,12 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not lesser than the current number of saveable
-	 * pages in memory, we don't need to do anything more.
+	 * pages in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_memory(saveable);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1164,24 +1229,27 @@ int swsusp_shrink_memory(void)
 	count -= max_size;
 	pages = preallocate_image_memory(count);
 	if (pages < count)
-		error = -ENOMEM;
+		goto err_out;
 	else
 		pages += preallocate_image_memory(max_size - size);
 
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1192,7 +1260,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1214,19 +1282,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1248,7 +1314,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1268,7 +1334,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1287,51 +1353,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-07 21:48   ` Rafael J. Wysocki
                     ` (9 preceding siblings ...)
  (?)
@ 2009-05-07 21:55   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:55 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 +++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  173 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 124 insertions(+), 66 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1081,8 +1102,16 @@ static unsigned long preallocate_image_m
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(GFP_IMAGE))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(GFP_IMAGE);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1091,7 +1120,30 @@ static unsigned long preallocate_image_m
 }
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1110,16 +1162,27 @@ static unsigned long preallocate_image_m
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory ... ");
+	printk(KERN_INFO "PM: Preallocating image memory ... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
 	saveable = count_data_pages() + count_highmem_pages();
 
@@ -1142,10 +1205,12 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not lesser than the current number of saveable
-	 * pages in memory, we don't need to do anything more.
+	 * pages in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_memory(saveable);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1164,24 +1229,27 @@ int swsusp_shrink_memory(void)
 	count -= max_size;
 	pages = preallocate_image_memory(count);
 	if (pages < count)
-		error = -ENOMEM;
+		goto err_out;
 	else
 		pages += preallocate_image_memory(max_size - size);
 
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1192,7 +1260,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1214,19 +1282,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1248,7 +1314,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1268,7 +1334,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1287,51 +1353,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily
@ 2009-05-07 21:55     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 21:55 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 +++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  173 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 124 insertions(+), 66 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1081,8 +1102,16 @@ static unsigned long preallocate_image_m
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(GFP_IMAGE))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(GFP_IMAGE);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1091,7 +1120,30 @@ static unsigned long preallocate_image_m
 }
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1110,16 +1162,27 @@ static unsigned long preallocate_image_m
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, pages = 0;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory ... ");
+	printk(KERN_INFO "PM: Preallocating image memory ... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
 	saveable = count_data_pages() + count_highmem_pages();
 
@@ -1142,10 +1205,12 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not lesser than the current number of saveable
-	 * pages in memory, we don't need to do anything more.
+	 * pages in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_memory(saveable);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1164,24 +1229,27 @@ int swsusp_shrink_memory(void)
 	count -= max_size;
 	pages = preallocate_image_memory(count);
 	if (pages < count)
-		error = -ENOMEM;
+		goto err_out;
 	else
 		pages += preallocate_image_memory(max_size - size);
 
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1192,7 +1260,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1214,19 +1282,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1248,7 +1314,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1268,7 +1334,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1287,51 +1353,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH] PM/Freezer: Disable OOM killer when tasks are frozen (was: Re: [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL)
  2009-05-07 21:50     ` Rafael J. Wysocki
@ 2009-05-07 22:24       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 22:24 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> 
> > > Remind me: why can't we just allocate N pages at suspend-time?
> > 
> > We need half of memory free. The reason we can't "just allocate" is
> > probably OOM killer; but my memories are quite weak :-(.
> 
> hm.  You'd think that with our splendid range of __GFP_foo falgs, there
> would be some combo which would suit this requirement but I can't
> immediately spot one.
> 
> We can always add another I guess.  Something like...
> 
> [rjw: fixed white space, added comment in page_alloc.c]
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

An alternative to this one is the appended patch.

The idea here is that after freezing the user space totally, there's no point
in letting the OOM killer run, because that won't result in any memory being
freed anyway until the tasks are thawed.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Freezer: Disable OOM killer when tasks are frozen

The OOM killer is not really going to work while tasks are frozen, so
we can just give up calling it in that case.

This will allow us to safely use memory allocations for decreasing
the number of saveable pages in the hibernation core code instead of
using any artificial memory shriking mechanisms for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h |    2 ++
 kernel/power/process.c  |   16 ++++++++++++++++
 mm/page_alloc.c         |    5 +++++
 3 files changed, 23 insertions(+)

Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -19,6 +19,12 @@
  */
 #define TIMEOUT	(20 * HZ)
 
+/*
+ * Set after freeze_processes() has successfully run and reset at the beginning
+ * of thaw_processes().
+ */
+static bool all_tasks_frozen;
+
 static inline int freezeable(struct task_struct * p)
 {
 	if ((p == current) ||
@@ -120,6 +126,10 @@ int freeze_processes(void)
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
+	if (!error)
+		all_tasks_frozen = true;
+
 	return error;
 }
 
@@ -145,6 +155,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	all_tasks_frozen = false;
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);
@@ -152,3 +164,7 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
+bool killable_tasks_are_frozen(void)
+{
+	return all_tasks_frozen;
+}
Index: linux-2.6/include/linux/freezer.h
===================================================================
--- linux-2.6.orig/include/linux/freezer.h
+++ linux-2.6/include/linux/freezer.h
@@ -50,6 +50,7 @@ extern int thaw_process(struct task_stru
 extern void refrigerator(void);
 extern int freeze_processes(void);
 extern void thaw_processes(void);
+extern bool killable_tasks_are_frozen(void);
 
 static inline int try_to_freeze(void)
 {
@@ -170,6 +171,7 @@ static inline int thaw_process(struct ta
 static inline void refrigerator(void) {}
 static inline int freeze_processes(void) { BUG(); return 0; }
 static inline void thaw_processes(void) {}
+static inline bool killable_tasks_are_frozen(void) { return false; }
 
 static inline int try_to_freeze(void) { return 0; }
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
+#include <linux/freezer.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1600,6 +1601,10 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		/* The OOM killer won't work if processes are frozen. */
+		if (killable_tasks_are_frozen())
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH] PM/Freezer: Disable OOM killer when tasks are frozen (was: Re: [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL)
  2009-05-07 21:50     ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-07 22:24     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 22:24 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> 
> > > Remind me: why can't we just allocate N pages at suspend-time?
> > 
> > We need half of memory free. The reason we can't "just allocate" is
> > probably OOM killer; but my memories are quite weak :-(.
> 
> hm.  You'd think that with our splendid range of __GFP_foo falgs, there
> would be some combo which would suit this requirement but I can't
> immediately spot one.
> 
> We can always add another I guess.  Something like...
> 
> [rjw: fixed white space, added comment in page_alloc.c]
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

An alternative to this one is the appended patch.

The idea here is that after freezing the user space totally, there's no point
in letting the OOM killer run, because that won't result in any memory being
freed anyway until the tasks are thawed.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Freezer: Disable OOM killer when tasks are frozen

The OOM killer is not really going to work while tasks are frozen, so
we can just give up calling it in that case.

This will allow us to safely use memory allocations for decreasing
the number of saveable pages in the hibernation core code instead of
using any artificial memory shriking mechanisms for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h |    2 ++
 kernel/power/process.c  |   16 ++++++++++++++++
 mm/page_alloc.c         |    5 +++++
 3 files changed, 23 insertions(+)

Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -19,6 +19,12 @@
  */
 #define TIMEOUT	(20 * HZ)
 
+/*
+ * Set after freeze_processes() has successfully run and reset at the beginning
+ * of thaw_processes().
+ */
+static bool all_tasks_frozen;
+
 static inline int freezeable(struct task_struct * p)
 {
 	if ((p == current) ||
@@ -120,6 +126,10 @@ int freeze_processes(void)
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
+	if (!error)
+		all_tasks_frozen = true;
+
 	return error;
 }
 
@@ -145,6 +155,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	all_tasks_frozen = false;
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);
@@ -152,3 +164,7 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
+bool killable_tasks_are_frozen(void)
+{
+	return all_tasks_frozen;
+}
Index: linux-2.6/include/linux/freezer.h
===================================================================
--- linux-2.6.orig/include/linux/freezer.h
+++ linux-2.6/include/linux/freezer.h
@@ -50,6 +50,7 @@ extern int thaw_process(struct task_stru
 extern void refrigerator(void);
 extern int freeze_processes(void);
 extern void thaw_processes(void);
+extern bool killable_tasks_are_frozen(void);
 
 static inline int try_to_freeze(void)
 {
@@ -170,6 +171,7 @@ static inline int thaw_process(struct ta
 static inline void refrigerator(void) {}
 static inline int freeze_processes(void) { BUG(); return 0; }
 static inline void thaw_processes(void) {}
+static inline bool killable_tasks_are_frozen(void) { return false; }
 
 static inline int try_to_freeze(void) { return 0; }
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
+#include <linux/freezer.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1600,6 +1601,10 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		/* The OOM killer won't work if processes are frozen. */
+		if (killable_tasks_are_frozen())
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH] PM/Freezer: Disable OOM killer when tasks are frozen (was: Re: [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL)
@ 2009-05-07 22:24       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-07 22:24 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> 
> > > Remind me: why can't we just allocate N pages at suspend-time?
> > 
> > We need half of memory free. The reason we can't "just allocate" is
> > probably OOM killer; but my memories are quite weak :-(.
> 
> hm.  You'd think that with our splendid range of __GFP_foo falgs, there
> would be some combo which would suit this requirement but I can't
> immediately spot one.
> 
> We can always add another I guess.  Something like...
> 
> [rjw: fixed white space, added comment in page_alloc.c]
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

An alternative to this one is the appended patch.

The idea here is that after freezing the user space totally, there's no point
in letting the OOM killer run, because that won't result in any memory being
freed anyway until the tasks are thawed.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Freezer: Disable OOM killer when tasks are frozen

The OOM killer is not really going to work while tasks are frozen, so
we can just give up calling it in that case.

This will allow us to safely use memory allocations for decreasing
the number of saveable pages in the hibernation core code instead of
using any artificial memory shriking mechanisms for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h |    2 ++
 kernel/power/process.c  |   16 ++++++++++++++++
 mm/page_alloc.c         |    5 +++++
 3 files changed, 23 insertions(+)

Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -19,6 +19,12 @@
  */
 #define TIMEOUT	(20 * HZ)
 
+/*
+ * Set after freeze_processes() has successfully run and reset at the beginning
+ * of thaw_processes().
+ */
+static bool all_tasks_frozen;
+
 static inline int freezeable(struct task_struct * p)
 {
 	if ((p == current) ||
@@ -120,6 +126,10 @@ int freeze_processes(void)
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
+	if (!error)
+		all_tasks_frozen = true;
+
 	return error;
 }
 
@@ -145,6 +155,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	all_tasks_frozen = false;
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);
@@ -152,3 +164,7 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
+bool killable_tasks_are_frozen(void)
+{
+	return all_tasks_frozen;
+}
Index: linux-2.6/include/linux/freezer.h
===================================================================
--- linux-2.6.orig/include/linux/freezer.h
+++ linux-2.6/include/linux/freezer.h
@@ -50,6 +50,7 @@ extern int thaw_process(struct task_stru
 extern void refrigerator(void);
 extern int freeze_processes(void);
 extern void thaw_processes(void);
+extern bool killable_tasks_are_frozen(void);
 
 static inline int try_to_freeze(void)
 {
@@ -170,6 +171,7 @@ static inline int thaw_process(struct ta
 static inline void refrigerator(void) {}
 static inline int freeze_processes(void) { BUG(); return 0; }
 static inline void thaw_processes(void) {}
+static inline bool killable_tasks_are_frozen(void) { return false; }
 
 static inline int try_to_freeze(void) { return 0; }
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
+#include <linux/freezer.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1600,6 +1601,10 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		/* The OOM killer won't work if processes are frozen. */
+		if (killable_tasks_are_frozen())
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-07 21:51     ` Rafael J. Wysocki
@ 2009-05-08  8:52       ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Fri, May 08, 2009 at 05:51:10AM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Remove the shrinking of memory from the suspend-to-RAM code, where
> it is not really necessary.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Nigel Cunningham <nigel@tuxonice.net>

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
  2009-05-07 21:51     ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-08  8:52     ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:52 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Fri, May 08, 2009 at 05:51:10AM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Remove the shrinking of memory from the suspend-to-RAM code, where
> it is not really necessary.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Nigel Cunningham <nigel@tuxonice.net>

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend
@ 2009-05-08  8:52       ` Wu Fengguang
  0 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Fri, May 08, 2009 at 05:51:10AM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Remove the shrinking of memory from the suspend-to-RAM code, where
> it is not really necessary.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Nigel Cunningham <nigel@tuxonice.net>

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-07 21:51     ` Rafael J. Wysocki
@ 2009-05-08  8:53       ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Fri, May 08, 2009 at 05:51:56AM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The next patch is going to modify the memory shrinking code so that
> it will make memory allocations to free memory instead of using an
> artificial memory shrinking mechanism for that.  For this purpose it
> is convenient to move swsusp_shrink_memory() from
> kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
> memory-shrinking code is going to use things that are local to
> kernel/power/snapshot.c .
> 
> [rev. 2: Make some functions static and remove their headers from
>  kernel/power/power.h]
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Pavel Machek <pavel@ucw.cz>

Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-07 21:51     ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-08  8:53     ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Fri, May 08, 2009 at 05:51:56AM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The next patch is going to modify the memory shrinking code so that
> it will make memory allocations to free memory instead of using an
> artificial memory shrinking mechanism for that.  For this purpose it
> is convenient to move swsusp_shrink_memory() from
> kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
> memory-shrinking code is going to use things that are local to
> kernel/power/snapshot.c .
> 
> [rev. 2: Make some functions static and remove their headers from
>  kernel/power/power.h]
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Pavel Machek <pavel@ucw.cz>

Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
@ 2009-05-08  8:53       ` Wu Fengguang
  0 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Fri, May 08, 2009 at 05:51:56AM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The next patch is going to modify the memory shrinking code so that
> it will make memory allocations to free memory instead of using an
> artificial memory shrinking mechanism for that.  For this purpose it
> is convenient to move swsusp_shrink_memory() from
> kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
> memory-shrinking code is going to use things that are local to
> kernel/power/snapshot.c .
> 
> [rev. 2: Make some functions static and remove their headers from
>  kernel/power/power.h]
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Pavel Machek <pavel@ucw.cz>

Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 3)
  2009-05-07 21:48   ` Rafael J. Wysocki
@ 2009-05-10 13:48     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:48 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > The following patchset is an attempt to rework the memory shrinking mechanism
> > used during hibernation to make room for the image.  It is a work in progress
> > and most likely it's going to be modified, but it has been discussed recently
> > and I'd like to get comments on the current version.
> > 
> > [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
> >         it's verified that we can avoid the OOM killing by using
> >         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
> >         in the next patches).
> > 
> > [2/5] - drop memory shrinking from the suspend (to RAM) code path
> > 
> > [3/5] - move swsusp_shrink_memory() to snapshot.c
> > 
> > [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
> >         memory pressure)
> > 
> > [5/5] - allocate image pages along with the shrinking.
> 
> Updated patchset follows.

I the meantime I added a patch that attempts to computer the size of the hard
core working set.  I also had to rework the patch reworking
swsusp_shrink_memory() so that it takes highmem into account.

Currently, the patchset consists of the following patches:

[1/6] - disable the OOM kernel after freezing tasks (this will be dropped if
        it's verified that we can avoid the OOM killing by using
        __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
        in the next patches).

[2/6] - drop memory shrinking from the suspend (to RAM) code path

[3/6] - move swsusp_shrink_memory() to snapshot.c

[4/6] - rework swsusp_shrink_memory() (to use memory allocations for applying
        memory pressure)

[5/6] - allocate image pages along with the shrinking

[6/6] - estimate the size of the hard core working set and use it as the lower
        limit of the image size.

Comments welcome.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 3)
  2009-05-07 21:48   ` Rafael J. Wysocki
                     ` (11 preceding siblings ...)
  (?)
@ 2009-05-10 13:48   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:48 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > The following patchset is an attempt to rework the memory shrinking mechanism
> > used during hibernation to make room for the image.  It is a work in progress
> > and most likely it's going to be modified, but it has been discussed recently
> > and I'd like to get comments on the current version.
> > 
> > [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
> >         it's verified that we can avoid the OOM killing by using
> >         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
> >         in the next patches).
> > 
> > [2/5] - drop memory shrinking from the suspend (to RAM) code path
> > 
> > [3/5] - move swsusp_shrink_memory() to snapshot.c
> > 
> > [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
> >         memory pressure)
> > 
> > [5/5] - allocate image pages along with the shrinking.
> 
> Updated patchset follows.

I the meantime I added a patch that attempts to computer the size of the hard
core working set.  I also had to rework the patch reworking
swsusp_shrink_memory() so that it takes highmem into account.

Currently, the patchset consists of the following patches:

[1/6] - disable the OOM kernel after freezing tasks (this will be dropped if
        it's verified that we can avoid the OOM killing by using
        __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
        in the next patches).

[2/6] - drop memory shrinking from the suspend (to RAM) code path

[3/6] - move swsusp_shrink_memory() to snapshot.c

[4/6] - rework swsusp_shrink_memory() (to use memory allocations for applying
        memory pressure)

[5/6] - allocate image pages along with the shrinking

[6/6] - estimate the size of the hard core working set and use it as the lower
        limit of the image size.

Comments welcome.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 3)
@ 2009-05-10 13:48     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:48 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > The following patchset is an attempt to rework the memory shrinking mechanism
> > used during hibernation to make room for the image.  It is a work in progress
> > and most likely it's going to be modified, but it has been discussed recently
> > and I'd like to get comments on the current version.
> > 
> > [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
> >         it's verified that we can avoid the OOM killing by using
> >         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
> >         in the next patches).
> > 
> > [2/5] - drop memory shrinking from the suspend (to RAM) code path
> > 
> > [3/5] - move swsusp_shrink_memory() to snapshot.c
> > 
> > [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
> >         memory pressure)
> > 
> > [5/5] - allocate image pages along with the shrinking.
> 
> Updated patchset follows.

I the meantime I added a patch that attempts to computer the size of the hard
core working set.  I also had to rework the patch reworking
swsusp_shrink_memory() so that it takes highmem into account.

Currently, the patchset consists of the following patches:

[1/6] - disable the OOM kernel after freezing tasks (this will be dropped if
        it's verified that we can avoid the OOM killing by using
        __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
        in the next patches).

[2/6] - drop memory shrinking from the suspend (to RAM) code path

[3/6] - move swsusp_shrink_memory() to snapshot.c

[4/6] - rework swsusp_shrink_memory() (to use memory allocations for applying
        memory pressure)

[5/6] - allocate image pages along with the shrinking

[6/6] - estimate the size of the hard core working set and use it as the lower
        limit of the image size.

Comments welcome.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-10 13:48     ` Rafael J. Wysocki
@ 2009-05-10 13:50       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Andrew Morton <akpm@linux-foundation.org>

> > Remind me: why can't we just allocate N pages at suspend-time?
> 
> We need half of memory free. The reason we can't "just allocate" is
> probably OOM killer; but my memories are quite weak :-(.

hm.  You'd think that with our splendid range of __GFP_foo falgs, there
would be some combo which would suit this requirement but I can't
immediately spot one.

We can always add another I guess.  Something like...

[rjw: fixed white space, added comment in page_alloc.c]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h |    3 ++-
 mm/page_alloc.c     |    8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -1619,8 +1619,12 @@ nofail_alloc:
 			goto got_pg;
 		}
 
-		/* The OOM killer will not help higher order allocs so fail */
-		if (order > PAGE_ALLOC_COSTLY_ORDER) {
+		/*
+		 * The OOM killer will not help higher order allocs so fail.
+		 * Also fail if the caller doesn't want the OOM killer to run.
+		 */
+		if (order > PAGE_ALLOC_COSTLY_ORDER
+				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
 			clear_zonelist_oom(zonelist, gfp_mask);
 			goto nopage;
 		}
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -51,8 +51,9 @@ struct vm_area_struct;
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
 #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
 
-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-10 13:48     ` Rafael J. Wysocki
  (?)
@ 2009-05-10 13:50     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:50 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

From: Andrew Morton <akpm@linux-foundation.org>

> > Remind me: why can't we just allocate N pages at suspend-time?
> 
> We need half of memory free. The reason we can't "just allocate" is
> probably OOM killer; but my memories are quite weak :-(.

hm.  You'd think that with our splendid range of __GFP_foo falgs, there
would be some combo which would suit this requirement but I can't
immediately spot one.

We can always add another I guess.  Something like...

[rjw: fixed white space, added comment in page_alloc.c]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h |    3 ++-
 mm/page_alloc.c     |    8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -1619,8 +1619,12 @@ nofail_alloc:
 			goto got_pg;
 		}
 
-		/* The OOM killer will not help higher order allocs so fail */
-		if (order > PAGE_ALLOC_COSTLY_ORDER) {
+		/*
+		 * The OOM killer will not help higher order allocs so fail.
+		 * Also fail if the caller doesn't want the OOM killer to run.
+		 */
+		if (order > PAGE_ALLOC_COSTLY_ORDER
+				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
 			clear_zonelist_oom(zonelist, gfp_mask);
 			goto nopage;
 		}
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -51,8 +51,9 @@ struct vm_area_struct;
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
 #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
 
-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
@ 2009-05-10 13:50       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Andrew Morton <akpm@linux-foundation.org>

> > Remind me: why can't we just allocate N pages at suspend-time?
> 
> We need half of memory free. The reason we can't "just allocate" is
> probably OOM killer; but my memories are quite weak :-(.

hm.  You'd think that with our splendid range of __GFP_foo falgs, there
would be some combo which would suit this requirement but I can't
immediately spot one.

We can always add another I guess.  Something like...

[rjw: fixed white space, added comment in page_alloc.c]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h |    3 ++-
 mm/page_alloc.c     |    8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -1619,8 +1619,12 @@ nofail_alloc:
 			goto got_pg;
 		}
 
-		/* The OOM killer will not help higher order allocs so fail */
-		if (order > PAGE_ALLOC_COSTLY_ORDER) {
+		/*
+		 * The OOM killer will not help higher order allocs so fail.
+		 * Also fail if the caller doesn't want the OOM killer to run.
+		 */
+		if (order > PAGE_ALLOC_COSTLY_ORDER
+				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
 			clear_zonelist_oom(zonelist, gfp_mask);
 			goto nopage;
 		}
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -51,8 +51,9 @@ struct vm_area_struct;
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
 #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
 
-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/6] PM/Suspend: Do not shrink memory before suspend
  2009-05-10 13:48     ` Rafael J. Wysocki
  (?)
@ 2009-05-10 13:50       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/6] PM/Suspend: Do not shrink memory before suspend
@ 2009-05-10 13:50       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:50 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 2/6] PM/Suspend: Do not shrink memory before suspend
@ 2009-05-10 13:50       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-10 13:48     ` Rafael J. Wysocki
@ 2009-05-10 13:51       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:51 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-10 13:48     ` Rafael J. Wysocki
                       ` (3 preceding siblings ...)
  (?)
@ 2009-05-10 13:51     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:51 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 3/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
@ 2009-05-10 13:51       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:51 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-10 13:48     ` Rafael J. Wysocki
@ 2009-05-10 13:53       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:53 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  209 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 165 insertions(+), 44 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,190 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
+ * Return value: Number of page frames actually allocated
+ */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
+
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
+}
+
+/**
+ * compute_fraction - Compute approximate fraction x * (a/b)
+ * @x: Number to multiply.
+ * @numerator: Numerator of the fraction (a).
+ * @denominator: Denominator of the fraction (b).
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Compute an approximate value of the expression x * (a/b), where a is less
+ * than b, all x, a, b are unsigned longs and x * a may be greater than the
+ * maximum unsigned long.
  */
+static unsigned long compute_fraction(
+	unsigned long x, unsigned long numerator, unsigned long denominator)
+{
+	unsigned long ratio = (numerator << 8) / denominator;
+
+	x *= ratio;
+	return x >> 8;
+}
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long highmem_fraction(
+	unsigned long size, unsigned long highmem, unsigned long count)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return highmem > count / 2 ?
+			compute_fraction(size, highmem, count) :
+			size - compute_fraction(size, count - highmem, count);
+}
+#else
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
 }
 
+static inline unsigned long highmem_fraction(
+	unsigned long size, unsigned long highmem, unsigned long count)
+{
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone)) {
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		} else {
+			count += zone_page_state(zone, NR_FREE_PAGES);
 		}
+	}
+	count += highmem;
+	count -= totalreserve_pages;
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 *
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	alloc = count - max_size - pages_highmem;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_image_highmem(
+				highmem_fraction(size, highmem, count));
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-10 13:48     ` Rafael J. Wysocki
                       ` (6 preceding siblings ...)
  (?)
@ 2009-05-10 13:53     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:53 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  209 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 165 insertions(+), 44 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,190 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
+ * Return value: Number of page frames actually allocated
+ */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
+
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
+}
+
+/**
+ * compute_fraction - Compute approximate fraction x * (a/b)
+ * @x: Number to multiply.
+ * @numerator: Numerator of the fraction (a).
+ * @denominator: Denominator of the fraction (b).
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Compute an approximate value of the expression x * (a/b), where a is less
+ * than b, all x, a, b are unsigned longs and x * a may be greater than the
+ * maximum unsigned long.
  */
+static unsigned long compute_fraction(
+	unsigned long x, unsigned long numerator, unsigned long denominator)
+{
+	unsigned long ratio = (numerator << 8) / denominator;
+
+	x *= ratio;
+	return x >> 8;
+}
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long highmem_fraction(
+	unsigned long size, unsigned long highmem, unsigned long count)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return highmem > count / 2 ?
+			compute_fraction(size, highmem, count) :
+			size - compute_fraction(size, count - highmem, count);
+}
+#else
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
 }
 
+static inline unsigned long highmem_fraction(
+	unsigned long size, unsigned long highmem, unsigned long count)
+{
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone)) {
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		} else {
+			count += zone_page_state(zone, NR_FREE_PAGES);
 		}
+	}
+	count += highmem;
+	count -= totalreserve_pages;
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 *
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	alloc = count - max_size - pages_highmem;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_image_highmem(
+				highmem_fraction(size, highmem, count));
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-10 13:53       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:53 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  209 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 165 insertions(+), 44 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,190 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
+ * Return value: Number of page frames actually allocated
+ */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
+
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
+}
+
+/**
+ * compute_fraction - Compute approximate fraction x * (a/b)
+ * @x: Number to multiply.
+ * @numerator: Numerator of the fraction (a).
+ * @denominator: Denominator of the fraction (b).
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Compute an approximate value of the expression x * (a/b), where a is less
+ * than b, all x, a, b are unsigned longs and x * a may be greater than the
+ * maximum unsigned long.
  */
+static unsigned long compute_fraction(
+	unsigned long x, unsigned long numerator, unsigned long denominator)
+{
+	unsigned long ratio = (numerator << 8) / denominator;
+
+	x *= ratio;
+	return x >> 8;
+}
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+static unsigned long highmem_fraction(
+	unsigned long size, unsigned long highmem, unsigned long count)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return highmem > count / 2 ?
+			compute_fraction(size, highmem, count) :
+			size - compute_fraction(size, count - highmem, count);
+}
+#else
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
 }
 
+static inline unsigned long highmem_fraction(
+	unsigned long size, unsigned long highmem, unsigned long count)
+{
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of page
+ * frames in use is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone)) {
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		} else {
+			count += zone_page_state(zone, NR_FREE_PAGES);
 		}
+	}
+	count += highmem;
+	count -= totalreserve_pages;
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance is heavily affected in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 *
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	alloc = count - max_size - pages_highmem;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_image_highmem(
+				highmem_fraction(size, highmem, count));
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-10 13:48     ` Rafael J. Wysocki
@ 2009-05-10 13:57       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:57 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  186 ++++++++++++++++++++++++++++++------------------
 3 files changed, 130 insertions(+), 73 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1142,7 +1171,30 @@ static inline unsigned long highmem_frac
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1161,19 +1213,30 @@ static inline unsigned long highmem_frac
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1181,7 +1244,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1201,10 +1265,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1225,10 +1292,8 @@ int swsusp_shrink_memory(void)
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	alloc = count - max_size - pages_highmem;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(
@@ -1238,21 +1303,23 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1263,7 +1330,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1285,19 +1352,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1319,7 +1384,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1339,7 +1404,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1358,51 +1423,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-10 13:48     ` Rafael J. Wysocki
                       ` (7 preceding siblings ...)
  (?)
@ 2009-05-10 13:57     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:57 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  186 ++++++++++++++++++++++++++++++------------------
 3 files changed, 130 insertions(+), 73 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1142,7 +1171,30 @@ static inline unsigned long highmem_frac
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1161,19 +1213,30 @@ static inline unsigned long highmem_frac
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1181,7 +1244,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1201,10 +1265,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1225,10 +1292,8 @@ int swsusp_shrink_memory(void)
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	alloc = count - max_size - pages_highmem;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(
@@ -1238,21 +1303,23 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1263,7 +1330,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1285,19 +1352,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1319,7 +1384,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1339,7 +1404,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1358,51 +1423,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily
@ 2009-05-10 13:57       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 13:57 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  186 ++++++++++++++++++++++++++++++------------------
 3 files changed, 130 insertions(+), 73 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1142,7 +1171,30 @@ static inline unsigned long highmem_frac
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ * @size: Anticipated hibernation image size
+ */
+static void free_unnecessary_pages(unsigned long size)
+{
+	memory_bm_position_reset(&copy_bm);
+
+	while (alloc_normal + alloc_highmem > size) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		memory_bm_clear_bit(&copy_bm, pfn);
+		if (PageHighMem(page))
+			alloc_highmem--;
+		else
+			alloc_normal--;
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1161,19 +1213,30 @@ static inline unsigned long highmem_frac
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1181,7 +1244,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1201,10 +1265,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1225,10 +1292,8 @@ int swsusp_shrink_memory(void)
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	alloc = count - max_size - pages_highmem;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(
@@ -1238,21 +1303,23 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need 'size' page frames for the image but we have allocated
+	 * more.  Release the excessive ones now.
+	 */
+	free_unnecessary_pages(size);
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1263,7 +1330,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1285,19 +1352,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1319,7 +1384,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1339,7 +1404,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1358,51 +1423,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 6/6] PM/Hibernate: Estimate hard core working set size
  2009-05-10 13:48     ` Rafael J. Wysocki
@ 2009-05-10 14:12       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 14:12 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

We want to avoid attempting to free too much memory too hard, so
estimate the size of the hard core working set and use it as the
lower limit for preallocating memory.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

The formula used in this patch appears to return numbers that are too lower.

Namely, after applying a debug patch printing the values of the variables used
for the preallocation of memory, I got the following results for two test
systems:

i386, MSI Wind U100, 1 GB of RAM total

PM: Preallocating image memory...
count = 253198, max_size = 125563, saveable = 192367
Requested image size: 113064 pages
Hard working set size: 59551 pages
pages_highmem = 16091, alloc = 111544
alloc_highmem = 1612, alloc = 12499
count - pages = 113064
done (allocated 140134 pages)
PM: Allocated 560536 kbytes in 2.84 seconds (197.37 MB/s)

PM: Preallocating image memory...
count = 253178, max_size = 125553, saveable = 123191
Requested image size: 1 pages
Hard working set size: 14684 pages
pages_highmem = 16090, alloc = 111535
alloc_highmem = 14292, alloc = 110869
count - pages = 50135
done (allocated 203043 pages)

In the first run the hard working set size was irrelevant, because the
requested image size was much greater.  In the second run the requested
image size was very small and the hard working set size was used as the image
size, but the number of pages that were still allocated after the preallocation
was much greater than the hard working set size (should be smaller).

x86_64, HP nx6325, 1,5 GB of RAM total

[  250.386721] PM: Preallocating image memory...
count = 486414, max_size = 242165, saveable = 186947
[  256.844235] Requested image size: 1 pages
[  256.844392] Hard working set size: 10211 pages
[  256.844537] pages_highmem = 0, alloc = 244249
[  257.328347] alloc_highmem = 0, alloc = 231954
[  258.084074] count - pages = 24330
[  259.050589] done (allocated 462084 pages)
[  259.050653] PM: Allocated 1848336 kbytes in 8.66 seconds (213.43 MB/s)

In this case the hard core working set size was also used as the requested
image size, but the number of pages that were not freed after the preallocation
was still more than twice greater than this number.

---
 kernel/power/snapshot.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1090,6 +1090,8 @@ void swsusp_free(void)
 /* Helper functions used for the shrinking of memory. */
 
 #define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+/* Typical desktop does not have more than 100MB of mapped pages. */
+#define MAX_MMAP_PAGES	(100 << (20 - PAGE_SHIFT))
 
 /**
  * preallocate_image_pages - Allocate a number of pages for hibernation image
@@ -1194,6 +1196,40 @@ static void free_unnecessary_pages(unsig
 }
 
 /**
+ * hard_core_working_set_size - Estimate the size of the hard core working set
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * size of the hard core working set and use it as the lower limit for
+ * preallocating memory.
+ */
+static unsigned long hard_core_working_set_size(void)
+{
+	unsigned long size;
+
+	/*
+	 * Mapped pages are normally few and precious, but their number should
+	 * be bounded for safety.
+	 */
+	size = global_page_state(NR_FILE_MAPPED);
+	size = min_t(unsigned long, size, MAX_MMAP_PAGES);
+
+	/*
+	 * Disk I/O can be much faster than swap I/O, so optimize for
+	 * performance.
+	 */
+	size += global_page_state(NR_ACTIVE_ANON);
+	size += global_page_state(NR_INACTIVE_ANON);
+
+	/* Hard (but normally small) memory requests. */
+	size += global_page_state(NR_SLAB_UNRECLAIMABLE);
+	size += global_page_state(NR_UNEVICTABLE);
+	size += global_page_state(NR_PAGETABLE);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1282,6 +1318,14 @@ int hibernate_preallocate_memory(void)
 	shrink_all_memory(saveable - size);
 
 	/*
+	 * Estimate the size of the hard core working set and use it as the
+	 * minimum image size.
+	 */
+	pages = hard_core_working_set_size();
+	if (size < pages)
+		size = pages;
+
+	/*
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 6/6] PM/Hibernate: Estimate hard core working set size
  2009-05-10 13:48     ` Rafael J. Wysocki
                       ` (10 preceding siblings ...)
  (?)
@ 2009-05-10 14:12     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 14:12 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

From: Rafael J. Wysocki <rjw@sisk.pl>

We want to avoid attempting to free too much memory too hard, so
estimate the size of the hard core working set and use it as the
lower limit for preallocating memory.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

The formula used in this patch appears to return numbers that are too lower.

Namely, after applying a debug patch printing the values of the variables used
for the preallocation of memory, I got the following results for two test
systems:

i386, MSI Wind U100, 1 GB of RAM total

PM: Preallocating image memory...
count = 253198, max_size = 125563, saveable = 192367
Requested image size: 113064 pages
Hard working set size: 59551 pages
pages_highmem = 16091, alloc = 111544
alloc_highmem = 1612, alloc = 12499
count - pages = 113064
done (allocated 140134 pages)
PM: Allocated 560536 kbytes in 2.84 seconds (197.37 MB/s)

PM: Preallocating image memory...
count = 253178, max_size = 125553, saveable = 123191
Requested image size: 1 pages
Hard working set size: 14684 pages
pages_highmem = 16090, alloc = 111535
alloc_highmem = 14292, alloc = 110869
count - pages = 50135
done (allocated 203043 pages)

In the first run the hard working set size was irrelevant, because the
requested image size was much greater.  In the second run the requested
image size was very small and the hard working set size was used as the image
size, but the number of pages that were still allocated after the preallocation
was much greater than the hard working set size (should be smaller).

x86_64, HP nx6325, 1,5 GB of RAM total

[  250.386721] PM: Preallocating image memory...
count = 486414, max_size = 242165, saveable = 186947
[  256.844235] Requested image size: 1 pages
[  256.844392] Hard working set size: 10211 pages
[  256.844537] pages_highmem = 0, alloc = 244249
[  257.328347] alloc_highmem = 0, alloc = 231954
[  258.084074] count - pages = 24330
[  259.050589] done (allocated 462084 pages)
[  259.050653] PM: Allocated 1848336 kbytes in 8.66 seconds (213.43 MB/s)

In this case the hard core working set size was also used as the requested
image size, but the number of pages that were not freed after the preallocation
was still more than twice greater than this number.

---
 kernel/power/snapshot.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1090,6 +1090,8 @@ void swsusp_free(void)
 /* Helper functions used for the shrinking of memory. */
 
 #define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+/* Typical desktop does not have more than 100MB of mapped pages. */
+#define MAX_MMAP_PAGES	(100 << (20 - PAGE_SHIFT))
 
 /**
  * preallocate_image_pages - Allocate a number of pages for hibernation image
@@ -1194,6 +1196,40 @@ static void free_unnecessary_pages(unsig
 }
 
 /**
+ * hard_core_working_set_size - Estimate the size of the hard core working set
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * size of the hard core working set and use it as the lower limit for
+ * preallocating memory.
+ */
+static unsigned long hard_core_working_set_size(void)
+{
+	unsigned long size;
+
+	/*
+	 * Mapped pages are normally few and precious, but their number should
+	 * be bounded for safety.
+	 */
+	size = global_page_state(NR_FILE_MAPPED);
+	size = min_t(unsigned long, size, MAX_MMAP_PAGES);
+
+	/*
+	 * Disk I/O can be much faster than swap I/O, so optimize for
+	 * performance.
+	 */
+	size += global_page_state(NR_ACTIVE_ANON);
+	size += global_page_state(NR_INACTIVE_ANON);
+
+	/* Hard (but normally small) memory requests. */
+	size += global_page_state(NR_SLAB_UNRECLAIMABLE);
+	size += global_page_state(NR_UNEVICTABLE);
+	size += global_page_state(NR_PAGETABLE);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1282,6 +1318,14 @@ int hibernate_preallocate_memory(void)
 	shrink_all_memory(saveable - size);
 
 	/*
+	 * Estimate the size of the hard core working set and use it as the
+	 * minimum image size.
+	 */
+	pages = hard_core_working_set_size();
+	if (size < pages)
+		size = pages;
+
+	/*
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 6/6] PM/Hibernate: Estimate hard core working set size
@ 2009-05-10 14:12       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 14:12 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

We want to avoid attempting to free too much memory too hard, so
estimate the size of the hard core working set and use it as the
lower limit for preallocating memory.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

The formula used in this patch appears to return numbers that are too lower.

Namely, after applying a debug patch printing the values of the variables used
for the preallocation of memory, I got the following results for two test
systems:

i386, MSI Wind U100, 1 GB of RAM total

PM: Preallocating image memory...
count = 253198, max_size = 125563, saveable = 192367
Requested image size: 113064 pages
Hard working set size: 59551 pages
pages_highmem = 16091, alloc = 111544
alloc_highmem = 1612, alloc = 12499
count - pages = 113064
done (allocated 140134 pages)
PM: Allocated 560536 kbytes in 2.84 seconds (197.37 MB/s)

PM: Preallocating image memory...
count = 253178, max_size = 125553, saveable = 123191
Requested image size: 1 pages
Hard working set size: 14684 pages
pages_highmem = 16090, alloc = 111535
alloc_highmem = 14292, alloc = 110869
count - pages = 50135
done (allocated 203043 pages)

In the first run the hard working set size was irrelevant, because the
requested image size was much greater.  In the second run the requested
image size was very small and the hard working set size was used as the image
size, but the number of pages that were still allocated after the preallocation
was much greater than the hard working set size (should be smaller).

x86_64, HP nx6325, 1,5 GB of RAM total

[  250.386721] PM: Preallocating image memory...
count = 486414, max_size = 242165, saveable = 186947
[  256.844235] Requested image size: 1 pages
[  256.844392] Hard working set size: 10211 pages
[  256.844537] pages_highmem = 0, alloc = 244249
[  257.328347] alloc_highmem = 0, alloc = 231954
[  258.084074] count - pages = 24330
[  259.050589] done (allocated 462084 pages)
[  259.050653] PM: Allocated 1848336 kbytes in 8.66 seconds (213.43 MB/s)

In this case the hard core working set size was also used as the requested
image size, but the number of pages that were not freed after the preallocation
was still more than twice greater than this number.

---
 kernel/power/snapshot.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1090,6 +1090,8 @@ void swsusp_free(void)
 /* Helper functions used for the shrinking of memory. */
 
 #define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+/* Typical desktop does not have more than 100MB of mapped pages. */
+#define MAX_MMAP_PAGES	(100 << (20 - PAGE_SHIFT))
 
 /**
  * preallocate_image_pages - Allocate a number of pages for hibernation image
@@ -1194,6 +1196,40 @@ static void free_unnecessary_pages(unsig
 }
 
 /**
+ * hard_core_working_set_size - Estimate the size of the hard core working set
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * size of the hard core working set and use it as the lower limit for
+ * preallocating memory.
+ */
+static unsigned long hard_core_working_set_size(void)
+{
+	unsigned long size;
+
+	/*
+	 * Mapped pages are normally few and precious, but their number should
+	 * be bounded for safety.
+	 */
+	size = global_page_state(NR_FILE_MAPPED);
+	size = min_t(unsigned long, size, MAX_MMAP_PAGES);
+
+	/*
+	 * Disk I/O can be much faster than swap I/O, so optimize for
+	 * performance.
+	 */
+	size += global_page_state(NR_ACTIVE_ANON);
+	size += global_page_state(NR_INACTIVE_ANON);
+
+	/* Hard (but normally small) memory requests. */
+	size += global_page_state(NR_SLAB_UNRECLAIMABLE);
+	size += global_page_state(NR_UNEVICTABLE);
+	size += global_page_state(NR_PAGETABLE);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1282,6 +1318,14 @@ int hibernate_preallocate_memory(void)
 	shrink_all_memory(saveable - size);
 
 	/*
+	 * Estimate the size of the hard core working set and use it as the
+	 * minimum image size.
+	 */
+	pages = hard_core_working_set_size();
+	if (size < pages)
+		size = pages;
+
+	/*
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-10 13:57       ` Rafael J. Wysocki
@ 2009-05-10 19:49         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 19:49 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Since the hibernation code is now going to use allocations of memory
> to make enough room for the image, it can also use the page frames
> allocated at this stage as image page frames.  The low-level
> hibernation code needs to be rearranged for this purpose, but it
> allows us to avoid freeing a great number of pages and allocating
> these same pages once again later, so it generally is worth doing.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Unfortunately, this patch is not entirely correct.  Namely, the freeing of
unnecessary pages has to take the balance between highmem and the lower zones
into account and free_unnecessary_pages() was called with a wrong argument.

Corrected patch is appended.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

[rev. 2: Take highmem into account correctly.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  204 ++++++++++++++++++++++++++++++++----------------
 3 files changed, 148 insertions(+), 73 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1142,7 +1171,47 @@ static inline unsigned long highmem_frac
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ */
+static void free_unnecessary_pages(void)
+{
+	unsigned long save_highmem, to_free_normal, to_free_highmem;
+
+	to_free_normal = alloc_normal - count_data_pages();
+	save_highmem = count_highmem_pages();
+	if (alloc_highmem > save_highmem) {
+		to_free_highmem = alloc_highmem - save_highmem;
+	} else {
+		to_free_highmem = 0;
+		to_free_normal -= save_highmem - alloc_highmem;
+	}
+
+	memory_bm_position_reset(&copy_bm);
+
+	while (to_free_normal > 0 && to_free_highmem > 0) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		if (PageHighMem(page)) {
+			if (!to_free_highmem)
+				continue;
+			to_free_highmem--;
+			alloc_highmem--;
+		} else {
+			if (!to_free_normal)
+				continue;
+			to_free_normal--;
+			alloc_normal--;
+		}
+		memory_bm_clear_bit(&copy_bm, pfn);
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1161,19 +1230,30 @@ static inline unsigned long highmem_frac
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1181,7 +1261,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1200,10 +1281,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1224,10 +1308,8 @@ int swsusp_shrink_memory(void)
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	alloc = count - max_size - pages_highmem;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(
@@ -1237,21 +1319,24 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need as many page frames for the image as there are saveable
+	 * pages in memory, but we have allocated more.  Release the excessive
+	 * ones now.
+	 */
+	free_unnecessary_pages();
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1262,7 +1347,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1284,19 +1369,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1318,7 +1401,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1338,7 +1421,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1357,51 +1440,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily
  2009-05-10 13:57       ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-10 19:49       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 19:49 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Since the hibernation code is now going to use allocations of memory
> to make enough room for the image, it can also use the page frames
> allocated at this stage as image page frames.  The low-level
> hibernation code needs to be rearranged for this purpose, but it
> allows us to avoid freeing a great number of pages and allocating
> these same pages once again later, so it generally is worth doing.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Unfortunately, this patch is not entirely correct.  Namely, the freeing of
unnecessary pages has to take the balance between highmem and the lower zones
into account and free_unnecessary_pages() was called with a wrong argument.

Corrected patch is appended.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

[rev. 2: Take highmem into account correctly.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  204 ++++++++++++++++++++++++++++++++----------------
 3 files changed, 148 insertions(+), 73 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1142,7 +1171,47 @@ static inline unsigned long highmem_frac
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ */
+static void free_unnecessary_pages(void)
+{
+	unsigned long save_highmem, to_free_normal, to_free_highmem;
+
+	to_free_normal = alloc_normal - count_data_pages();
+	save_highmem = count_highmem_pages();
+	if (alloc_highmem > save_highmem) {
+		to_free_highmem = alloc_highmem - save_highmem;
+	} else {
+		to_free_highmem = 0;
+		to_free_normal -= save_highmem - alloc_highmem;
+	}
+
+	memory_bm_position_reset(&copy_bm);
+
+	while (to_free_normal > 0 && to_free_highmem > 0) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		if (PageHighMem(page)) {
+			if (!to_free_highmem)
+				continue;
+			to_free_highmem--;
+			alloc_highmem--;
+		} else {
+			if (!to_free_normal)
+				continue;
+			to_free_normal--;
+			alloc_normal--;
+		}
+		memory_bm_clear_bit(&copy_bm, pfn);
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1161,19 +1230,30 @@ static inline unsigned long highmem_frac
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1181,7 +1261,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1200,10 +1281,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1224,10 +1308,8 @@ int swsusp_shrink_memory(void)
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	alloc = count - max_size - pages_highmem;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(
@@ -1237,21 +1319,24 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need as many page frames for the image as there are saveable
+	 * pages in memory, but we have allocated more.  Release the excessive
+	 * ones now.
+	 */
+	free_unnecessary_pages();
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1262,7 +1347,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1284,19 +1369,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1318,7 +1401,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1338,7 +1421,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1357,51 +1440,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily
@ 2009-05-10 19:49         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 19:49 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Since the hibernation code is now going to use allocations of memory
> to make enough room for the image, it can also use the page frames
> allocated at this stage as image page frames.  The low-level
> hibernation code needs to be rearranged for this purpose, but it
> allows us to avoid freeing a great number of pages and allocating
> these same pages once again later, so it generally is worth doing.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Unfortunately, this patch is not entirely correct.  Namely, the freeing of
unnecessary pages has to take the balance between highmem and the lower zones
into account and free_unnecessary_pages() was called with a wrong argument.

Corrected patch is appended.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

[rev. 2: Take highmem into account correctly.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  204 ++++++++++++++++++++++++++++++++----------------
 3 files changed, 148 insertions(+), 73 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1142,7 +1171,47 @@ static inline unsigned long highmem_frac
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ */
+static void free_unnecessary_pages(void)
+{
+	unsigned long save_highmem, to_free_normal, to_free_highmem;
+
+	to_free_normal = alloc_normal - count_data_pages();
+	save_highmem = count_highmem_pages();
+	if (alloc_highmem > save_highmem) {
+		to_free_highmem = alloc_highmem - save_highmem;
+	} else {
+		to_free_highmem = 0;
+		to_free_normal -= save_highmem - alloc_highmem;
+	}
+
+	memory_bm_position_reset(&copy_bm);
+
+	while (to_free_normal > 0 && to_free_highmem > 0) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		if (PageHighMem(page)) {
+			if (!to_free_highmem)
+				continue;
+			to_free_highmem--;
+			alloc_highmem--;
+		} else {
+			if (!to_free_normal)
+				continue;
+			to_free_normal--;
+			alloc_normal--;
+		}
+		memory_bm_clear_bit(&copy_bm, pfn);
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1161,19 +1230,30 @@ static inline unsigned long highmem_frac
  * frames in use is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1181,7 +1261,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1200,10 +1281,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1224,10 +1308,8 @@ int swsusp_shrink_memory(void)
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	alloc = count - max_size - pages_highmem;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(
@@ -1237,21 +1319,24 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need as many page frames for the image as there are saveable
+	 * pages in memory, but we have allocated more.  Release the excessive
+	 * ones now.
+	 */
+	free_unnecessary_pages();
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1262,7 +1347,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1284,19 +1369,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1318,7 +1401,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1338,7 +1421,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1357,51 +1440,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
 
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Estimate hard core working set size
  2009-05-10 14:12       ` Rafael J. Wysocki
@ 2009-05-10 19:53         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 19:53 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> We want to avoid attempting to free too much memory too hard, so
> estimate the size of the hard core working set and use it as the
> lower limit for preallocating memory.
> 
> Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> 
> The formula used in this patch appears to return numbers that are too lower.

I was able to improve that a little by taking the reserved saveable pages into
accout and by the adding reclaimable slab, mlocked pages and inactive file
pages to the "hard core working set".  Still, the resulting number is only about
right for x86_64.  On i386 there still is something we're not taking into
account and that's something substantial (20000 pages seem to be "missing"
from the balance sheet).

Updated patch (on top of the corrected [6/6] I've just sent) is appended.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Estimate hard core working set size (rev. 2)

We want to avoid attempting to free too much memory too hard, so
estimate the size of the hard core working set and use it as the
lower limit for preallocating memory.

[rev. 2: Take saveable reserved pages into account and add some more
 types of pages to the "hard core working set".]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1090,6 +1090,8 @@ void swsusp_free(void)
 /* Helper functions used for the shrinking of memory. */
 
 #define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+/* Typical desktop does not have more than 100MB of mapped pages. */
+#define MAX_MMAP_PAGES	(100 << (20 - PAGE_SHIFT))
 
 /**
  * preallocate_image_pages - Allocate a number of pages for hibernation image
@@ -1211,6 +1213,60 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image and use it as the lower limit
+ * for preallocating memory.
+ */
+static unsigned long minimum_image_size(void)
+{
+	struct zone *zone;
+	unsigned long size;
+
+	/*
+	 * Mapped pages are normally few and precious, but their number should
+	 * be bounded for safety.
+	 */
+	size = global_page_state(NR_FILE_MAPPED);
+	size = min_t(unsigned long, size, MAX_MMAP_PAGES);
+
+	/* mlocked pages cannot be swapped out. */
+	size += global_page_state(NR_MLOCK);
+
+	/* Hard (but normally small) memory requests. */
+	size += global_page_state(NR_SLAB_UNRECLAIMABLE);
+	size += global_page_state(NR_SLAB_RECLAIMABLE);
+	size += global_page_state(NR_UNEVICTABLE);
+	size += global_page_state(NR_PAGETABLE);
+
+	/* Saveable pages that are reserved cannot be freed. */
+	for_each_zone(zone) {
+		unsigned long pfn, max_zone_pfn;
+
+		if (is_highmem(zone))
+			continue;
+		mark_free_pages(zone);
+		max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
+		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+			if (saveable_page(zone, pfn)
+			    && PageReserved(pfn_to_page(pfn)))
+				size++;
+	}
+
+	/*
+	 * Disk I/O can be much faster than swap I/O, so optimize for
+	 * performance.
+	 */
+	size += global_page_state(NR_ACTIVE_ANON);
+	size += global_page_state(NR_INACTIVE_ANON);
+	size += global_page_state(NR_ACTIVE_FILE);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1298,6 +1354,14 @@ int hibernate_preallocate_memory(void)
 	shrink_all_memory(saveable - size);
 
 	/*
+	 * Estimate the size of the hard core working set and use it as the
+	 * minimum image size.
+	 */
+	pages = minimum_image_size();
+	if (size < pages)
+		size = pages;
+
+	/*
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Estimate hard core working set size
  2009-05-10 14:12       ` Rafael J. Wysocki
  (?)
@ 2009-05-10 19:53       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 19:53 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> We want to avoid attempting to free too much memory too hard, so
> estimate the size of the hard core working set and use it as the
> lower limit for preallocating memory.
> 
> Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> 
> The formula used in this patch appears to return numbers that are too lower.

I was able to improve that a little by taking the reserved saveable pages into
accout and by the adding reclaimable slab, mlocked pages and inactive file
pages to the "hard core working set".  Still, the resulting number is only about
right for x86_64.  On i386 there still is something we're not taking into
account and that's something substantial (20000 pages seem to be "missing"
from the balance sheet).

Updated patch (on top of the corrected [6/6] I've just sent) is appended.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Estimate hard core working set size (rev. 2)

We want to avoid attempting to free too much memory too hard, so
estimate the size of the hard core working set and use it as the
lower limit for preallocating memory.

[rev. 2: Take saveable reserved pages into account and add some more
 types of pages to the "hard core working set".]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1090,6 +1090,8 @@ void swsusp_free(void)
 /* Helper functions used for the shrinking of memory. */
 
 #define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+/* Typical desktop does not have more than 100MB of mapped pages. */
+#define MAX_MMAP_PAGES	(100 << (20 - PAGE_SHIFT))
 
 /**
  * preallocate_image_pages - Allocate a number of pages for hibernation image
@@ -1211,6 +1213,60 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image and use it as the lower limit
+ * for preallocating memory.
+ */
+static unsigned long minimum_image_size(void)
+{
+	struct zone *zone;
+	unsigned long size;
+
+	/*
+	 * Mapped pages are normally few and precious, but their number should
+	 * be bounded for safety.
+	 */
+	size = global_page_state(NR_FILE_MAPPED);
+	size = min_t(unsigned long, size, MAX_MMAP_PAGES);
+
+	/* mlocked pages cannot be swapped out. */
+	size += global_page_state(NR_MLOCK);
+
+	/* Hard (but normally small) memory requests. */
+	size += global_page_state(NR_SLAB_UNRECLAIMABLE);
+	size += global_page_state(NR_SLAB_RECLAIMABLE);
+	size += global_page_state(NR_UNEVICTABLE);
+	size += global_page_state(NR_PAGETABLE);
+
+	/* Saveable pages that are reserved cannot be freed. */
+	for_each_zone(zone) {
+		unsigned long pfn, max_zone_pfn;
+
+		if (is_highmem(zone))
+			continue;
+		mark_free_pages(zone);
+		max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
+		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+			if (saveable_page(zone, pfn)
+			    && PageReserved(pfn_to_page(pfn)))
+				size++;
+	}
+
+	/*
+	 * Disk I/O can be much faster than swap I/O, so optimize for
+	 * performance.
+	 */
+	size += global_page_state(NR_ACTIVE_ANON);
+	size += global_page_state(NR_INACTIVE_ANON);
+	size += global_page_state(NR_ACTIVE_FILE);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1298,6 +1354,14 @@ int hibernate_preallocate_memory(void)
 	shrink_all_memory(saveable - size);
 
 	/*
+	 * Estimate the size of the hard core working set and use it as the
+	 * minimum image size.
+	 */
+	pages = minimum_image_size();
+	if (size < pages)
+		size = pages;
+
+	/*
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Estimate hard core working set size
@ 2009-05-10 19:53         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-10 19:53 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> We want to avoid attempting to free too much memory too hard, so
> estimate the size of the hard core working set and use it as the
> lower limit for preallocating memory.
> 
> Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> 
> The formula used in this patch appears to return numbers that are too lower.

I was able to improve that a little by taking the reserved saveable pages into
accout and by the adding reclaimable slab, mlocked pages and inactive file
pages to the "hard core working set".  Still, the resulting number is only about
right for x86_64.  On i386 there still is something we're not taking into
account and that's something substantial (20000 pages seem to be "missing"
from the balance sheet).

Updated patch (on top of the corrected [6/6] I've just sent) is appended.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Estimate hard core working set size (rev. 2)

We want to avoid attempting to free too much memory too hard, so
estimate the size of the hard core working set and use it as the
lower limit for preallocating memory.

[rev. 2: Take saveable reserved pages into account and add some more
 types of pages to the "hard core working set".]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1090,6 +1090,8 @@ void swsusp_free(void)
 /* Helper functions used for the shrinking of memory. */
 
 #define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN | __GFP_NO_OOM_KILL)
+/* Typical desktop does not have more than 100MB of mapped pages. */
+#define MAX_MMAP_PAGES	(100 << (20 - PAGE_SHIFT))
 
 /**
  * preallocate_image_pages - Allocate a number of pages for hibernation image
@@ -1211,6 +1213,60 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image and use it as the lower limit
+ * for preallocating memory.
+ */
+static unsigned long minimum_image_size(void)
+{
+	struct zone *zone;
+	unsigned long size;
+
+	/*
+	 * Mapped pages are normally few and precious, but their number should
+	 * be bounded for safety.
+	 */
+	size = global_page_state(NR_FILE_MAPPED);
+	size = min_t(unsigned long, size, MAX_MMAP_PAGES);
+
+	/* mlocked pages cannot be swapped out. */
+	size += global_page_state(NR_MLOCK);
+
+	/* Hard (but normally small) memory requests. */
+	size += global_page_state(NR_SLAB_UNRECLAIMABLE);
+	size += global_page_state(NR_SLAB_RECLAIMABLE);
+	size += global_page_state(NR_UNEVICTABLE);
+	size += global_page_state(NR_PAGETABLE);
+
+	/* Saveable pages that are reserved cannot be freed. */
+	for_each_zone(zone) {
+		unsigned long pfn, max_zone_pfn;
+
+		if (is_highmem(zone))
+			continue;
+		mark_free_pages(zone);
+		max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
+		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+			if (saveable_page(zone, pfn)
+			    && PageReserved(pfn_to_page(pfn)))
+				size++;
+	}
+
+	/*
+	 * Disk I/O can be much faster than swap I/O, so optimize for
+	 * performance.
+	 */
+	size += global_page_state(NR_ACTIVE_ANON);
+	size += global_page_state(NR_INACTIVE_ANON);
+	size += global_page_state(NR_ACTIVE_FILE);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1298,6 +1354,14 @@ int hibernate_preallocate_memory(void)
 	shrink_all_memory(saveable - size);
 
 	/*
+	 * Estimate the size of the hard core working set and use it as the
+	 * minimum image size.
+	 */
+	pages = minimum_image_size();
+	if (size < pages)
+		size = pages;
+
+	/*
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-10 13:50       ` Rafael J. Wysocki
@ 2009-05-11 20:12         ` David Rientjes
  -1 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-11 20:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wu Fengguang, pm list, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, linux-mm

On Sun, 10 May 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -1619,8 +1619,12 @@ nofail_alloc:
>  			goto got_pg;
>  		}
>  
> -		/* The OOM killer will not help higher order allocs so fail */
> -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> +		/*
> +		 * The OOM killer will not help higher order allocs so fail.
> +		 * Also fail if the caller doesn't want the OOM killer to run.
> +		 */
> +		if (order > PAGE_ALLOC_COSTLY_ORDER
> +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
>  			clear_zonelist_oom(zonelist, gfp_mask);
>  			goto nopage;
>  		}
> Index: linux-2.6/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.orig/include/linux/gfp.h
> +++ linux-2.6/include/linux/gfp.h
> @@ -51,8 +51,9 @@ struct vm_area_struct;
>  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
>  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
>  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
>  
> -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /* This equals 0, but use constants in case they ever change */
> 

Nack, unnecessary in mmotm and my patch series from 
http://lkml.org/lkml/2009/5/10/118.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-10 13:50       ` Rafael J. Wysocki
  (?)
@ 2009-05-11 20:12       ` David Rientjes
  -1 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-11 20:12 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, pm list, Wu Fengguang, Andrew Morton

On Sun, 10 May 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -1619,8 +1619,12 @@ nofail_alloc:
>  			goto got_pg;
>  		}
>  
> -		/* The OOM killer will not help higher order allocs so fail */
> -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> +		/*
> +		 * The OOM killer will not help higher order allocs so fail.
> +		 * Also fail if the caller doesn't want the OOM killer to run.
> +		 */
> +		if (order > PAGE_ALLOC_COSTLY_ORDER
> +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
>  			clear_zonelist_oom(zonelist, gfp_mask);
>  			goto nopage;
>  		}
> Index: linux-2.6/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.orig/include/linux/gfp.h
> +++ linux-2.6/include/linux/gfp.h
> @@ -51,8 +51,9 @@ struct vm_area_struct;
>  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
>  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
>  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
>  
> -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /* This equals 0, but use constants in case they ever change */
> 

Nack, unnecessary in mmotm and my patch series from 
http://lkml.org/lkml/2009/5/10/118.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
@ 2009-05-11 20:12         ` David Rientjes
  0 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-11 20:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wu Fengguang, pm list, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, linux-mm

On Sun, 10 May 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -1619,8 +1619,12 @@ nofail_alloc:
>  			goto got_pg;
>  		}
>  
> -		/* The OOM killer will not help higher order allocs so fail */
> -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> +		/*
> +		 * The OOM killer will not help higher order allocs so fail.
> +		 * Also fail if the caller doesn't want the OOM killer to run.
> +		 */
> +		if (order > PAGE_ALLOC_COSTLY_ORDER
> +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
>  			clear_zonelist_oom(zonelist, gfp_mask);
>  			goto nopage;
>  		}
> Index: linux-2.6/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.orig/include/linux/gfp.h
> +++ linux-2.6/include/linux/gfp.h
> @@ -51,8 +51,9 @@ struct vm_area_struct;
>  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
>  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
>  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
>  
> -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /* This equals 0, but use constants in case they ever change */
> 

Nack, unnecessary in mmotm and my patch series from 
http://lkml.org/lkml/2009/5/10/118.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-11 20:12         ` David Rientjes
@ 2009-05-11 22:14           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-11 22:14 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Wu Fengguang, pm list, LKML, Pavel Machek, Nigel Cunningham, linux-mm

On Monday 11 May 2009, David Rientjes wrote:
> On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -1619,8 +1619,12 @@ nofail_alloc:
> >  			goto got_pg;
> >  		}
> >  
> > -		/* The OOM killer will not help higher order allocs so fail */
> > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > +		/*
> > +		 * The OOM killer will not help higher order allocs so fail.
> > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > +		 */
> > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> >  			clear_zonelist_oom(zonelist, gfp_mask);
> >  			goto nopage;
> >  		}
> > Index: linux-2.6/include/linux/gfp.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/gfp.h
> > +++ linux-2.6/include/linux/gfp.h
> > @@ -51,8 +51,9 @@ struct vm_area_struct;
> >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> >  
> > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> >  
> >  /* This equals 0, but use constants in case they ever change */
> > 
> 
> Nack, unnecessary in mmotm and my patch series from 
> http://lkml.org/lkml/2009/5/10/118.

Andrew, what's your opinion, please?

I can wait with these patches until the dust settles in the mm land.

David, which patch in your series causes this to be unnecessary?

Best,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-11 20:12         ` David Rientjes
  (?)
@ 2009-05-11 22:14         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-11 22:14 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton; +Cc: LKML, linux-mm, pm list, Wu Fengguang

On Monday 11 May 2009, David Rientjes wrote:
> On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -1619,8 +1619,12 @@ nofail_alloc:
> >  			goto got_pg;
> >  		}
> >  
> > -		/* The OOM killer will not help higher order allocs so fail */
> > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > +		/*
> > +		 * The OOM killer will not help higher order allocs so fail.
> > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > +		 */
> > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> >  			clear_zonelist_oom(zonelist, gfp_mask);
> >  			goto nopage;
> >  		}
> > Index: linux-2.6/include/linux/gfp.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/gfp.h
> > +++ linux-2.6/include/linux/gfp.h
> > @@ -51,8 +51,9 @@ struct vm_area_struct;
> >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> >  
> > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> >  
> >  /* This equals 0, but use constants in case they ever change */
> > 
> 
> Nack, unnecessary in mmotm and my patch series from 
> http://lkml.org/lkml/2009/5/10/118.

Andrew, what's your opinion, please?

I can wait with these patches until the dust settles in the mm land.

David, which patch in your series causes this to be unnecessary?

Best,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
@ 2009-05-11 22:14           ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-11 22:14 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Wu Fengguang, pm list, LKML, Pavel Machek, Nigel Cunningham, linux-mm

On Monday 11 May 2009, David Rientjes wrote:
> On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -1619,8 +1619,12 @@ nofail_alloc:
> >  			goto got_pg;
> >  		}
> >  
> > -		/* The OOM killer will not help higher order allocs so fail */
> > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > +		/*
> > +		 * The OOM killer will not help higher order allocs so fail.
> > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > +		 */
> > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> >  			clear_zonelist_oom(zonelist, gfp_mask);
> >  			goto nopage;
> >  		}
> > Index: linux-2.6/include/linux/gfp.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/gfp.h
> > +++ linux-2.6/include/linux/gfp.h
> > @@ -51,8 +51,9 @@ struct vm_area_struct;
> >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> >  
> > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> >  
> >  /* This equals 0, but use constants in case they ever change */
> > 
> 
> Nack, unnecessary in mmotm and my patch series from 
> http://lkml.org/lkml/2009/5/10/118.

Andrew, what's your opinion, please?

I can wait with these patches until the dust settles in the mm land.

David, which patch in your series causes this to be unnecessary?

Best,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-11 22:14           ` Rafael J. Wysocki
@ 2009-05-11 22:33             ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-11 22:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: rientjes, fengguang.wu, linux-pm, linux-kernel, pavel, nigel, linux-mm

On Tue, 12 May 2009 00:14:23 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Monday 11 May 2009, David Rientjes wrote:
> > On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> > 
> > > Index: linux-2.6/mm/page_alloc.c
> > > ===================================================================
> > > --- linux-2.6.orig/mm/page_alloc.c
> > > +++ linux-2.6/mm/page_alloc.c
> > > @@ -1619,8 +1619,12 @@ nofail_alloc:
> > >  			goto got_pg;
> > >  		}
> > >  
> > > -		/* The OOM killer will not help higher order allocs so fail */
> > > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > > +		/*
> > > +		 * The OOM killer will not help higher order allocs so fail.
> > > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > > +		 */
> > > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> > >  			clear_zonelist_oom(zonelist, gfp_mask);
> > >  			goto nopage;
> > >  		}
> > > Index: linux-2.6/include/linux/gfp.h
> > > ===================================================================
> > > --- linux-2.6.orig/include/linux/gfp.h
> > > +++ linux-2.6/include/linux/gfp.h
> > > @@ -51,8 +51,9 @@ struct vm_area_struct;
> > >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> > >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> > >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> > >  
> > > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > >  
> > >  /* This equals 0, but use constants in case they ever change */
> > > 
> > 
> > Nack, unnecessary in mmotm and my patch series from 
> > http://lkml.org/lkml/2009/5/10/118.
> 
> Andrew, what's your opinion, please?

I don't understand which part of David's patch series is supposed to
address your requirement.  If it's "don't kill tasks which are in D
state" then that's a problem because right now I think that patch is
wrong.  It's still being discussed.

> I can wait with these patches until the dust settles in the mm land.

Yes, it is pretty dusty at present.  I'd suggest that finding something
else to do for a few days would be a wise step ;)


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-11 22:14           ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-11 22:33           ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-11 22:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, linux-mm, rientjes, linux-pm, fengguang.wu

On Tue, 12 May 2009 00:14:23 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Monday 11 May 2009, David Rientjes wrote:
> > On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> > 
> > > Index: linux-2.6/mm/page_alloc.c
> > > ===================================================================
> > > --- linux-2.6.orig/mm/page_alloc.c
> > > +++ linux-2.6/mm/page_alloc.c
> > > @@ -1619,8 +1619,12 @@ nofail_alloc:
> > >  			goto got_pg;
> > >  		}
> > >  
> > > -		/* The OOM killer will not help higher order allocs so fail */
> > > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > > +		/*
> > > +		 * The OOM killer will not help higher order allocs so fail.
> > > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > > +		 */
> > > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> > >  			clear_zonelist_oom(zonelist, gfp_mask);
> > >  			goto nopage;
> > >  		}
> > > Index: linux-2.6/include/linux/gfp.h
> > > ===================================================================
> > > --- linux-2.6.orig/include/linux/gfp.h
> > > +++ linux-2.6/include/linux/gfp.h
> > > @@ -51,8 +51,9 @@ struct vm_area_struct;
> > >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> > >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> > >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> > >  
> > > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > >  
> > >  /* This equals 0, but use constants in case they ever change */
> > > 
> > 
> > Nack, unnecessary in mmotm and my patch series from 
> > http://lkml.org/lkml/2009/5/10/118.
> 
> Andrew, what's your opinion, please?

I don't understand which part of David's patch series is supposed to
address your requirement.  If it's "don't kill tasks which are in D
state" then that's a problem because right now I think that patch is
wrong.  It's still being discussed.

> I can wait with these patches until the dust settles in the mm land.

Yes, it is pretty dusty at present.  I'd suggest that finding something
else to do for a few days would be a wise step ;)

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
@ 2009-05-11 22:33             ` Andrew Morton
  0 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-11 22:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: rientjes, fengguang.wu, linux-pm, linux-kernel, pavel, nigel, linux-mm

On Tue, 12 May 2009 00:14:23 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Monday 11 May 2009, David Rientjes wrote:
> > On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> > 
> > > Index: linux-2.6/mm/page_alloc.c
> > > ===================================================================
> > > --- linux-2.6.orig/mm/page_alloc.c
> > > +++ linux-2.6/mm/page_alloc.c
> > > @@ -1619,8 +1619,12 @@ nofail_alloc:
> > >  			goto got_pg;
> > >  		}
> > >  
> > > -		/* The OOM killer will not help higher order allocs so fail */
> > > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > > +		/*
> > > +		 * The OOM killer will not help higher order allocs so fail.
> > > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > > +		 */
> > > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> > >  			clear_zonelist_oom(zonelist, gfp_mask);
> > >  			goto nopage;
> > >  		}
> > > Index: linux-2.6/include/linux/gfp.h
> > > ===================================================================
> > > --- linux-2.6.orig/include/linux/gfp.h
> > > +++ linux-2.6/include/linux/gfp.h
> > > @@ -51,8 +51,9 @@ struct vm_area_struct;
> > >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> > >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> > >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> > >  
> > > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > >  
> > >  /* This equals 0, but use constants in case they ever change */
> > > 
> > 
> > Nack, unnecessary in mmotm and my patch series from 
> > http://lkml.org/lkml/2009/5/10/118.
> 
> Andrew, what's your opinion, please?

I don't understand which part of David's patch series is supposed to
address your requirement.  If it's "don't kill tasks which are in D
state" then that's a problem because right now I think that patch is
wrong.  It's still being discussed.

> I can wait with these patches until the dust settles in the mm land.

Yes, it is pretty dusty at present.  I'd suggest that finding something
else to do for a few days would be a wise step ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-11 22:33             ` Andrew Morton
@ 2009-05-11 23:04               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-11 23:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rientjes, fengguang.wu, linux-pm, linux-kernel, pavel, nigel, linux-mm

On Tuesday 12 May 2009, Andrew Morton wrote:
> On Tue, 12 May 2009 00:14:23 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Monday 11 May 2009, David Rientjes wrote:
> > > On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> > > 
> > > > Index: linux-2.6/mm/page_alloc.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/mm/page_alloc.c
> > > > +++ linux-2.6/mm/page_alloc.c
> > > > @@ -1619,8 +1619,12 @@ nofail_alloc:
> > > >  			goto got_pg;
> > > >  		}
> > > >  
> > > > -		/* The OOM killer will not help higher order allocs so fail */
> > > > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > > > +		/*
> > > > +		 * The OOM killer will not help higher order allocs so fail.
> > > > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > > > +		 */
> > > > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > > > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> > > >  			clear_zonelist_oom(zonelist, gfp_mask);
> > > >  			goto nopage;
> > > >  		}
> > > > Index: linux-2.6/include/linux/gfp.h
> > > > ===================================================================
> > > > --- linux-2.6.orig/include/linux/gfp.h
> > > > +++ linux-2.6/include/linux/gfp.h
> > > > @@ -51,8 +51,9 @@ struct vm_area_struct;
> > > >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> > > >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> > > >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > > > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> > > >  
> > > > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > > > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> > > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > > >  
> > > >  /* This equals 0, but use constants in case they ever change */
> > > > 
> > > 
> > > Nack, unnecessary in mmotm and my patch series from 
> > > http://lkml.org/lkml/2009/5/10/118.
> > 
> > Andrew, what's your opinion, please?
> 
> I don't understand which part of David's patch series is supposed to
> address your requirement.  If it's "don't kill tasks which are in D
> state" then that's a problem because right now I think that patch is
> wrong.  It's still being discussed.

Yeah.

> > I can wait with these patches until the dust settles in the mm land.
> 
> Yes, it is pretty dusty at present.  I'd suggest that finding something
> else to do for a few days would be a wise step ;)

Well, in fact I have finished the other parts of my patchset and the only
missing piece is how to prevent the OOM killer from triggering while
hibernation memory is being allocated, but in principle that can be done in a
couple of different ways.

So, I think I'll post an update shortly and I'll wait for mm to settle down.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
  2009-05-11 22:33             ` Andrew Morton
  (?)
  (?)
@ 2009-05-11 23:04             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-11 23:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, rientjes, linux-pm, fengguang.wu

On Tuesday 12 May 2009, Andrew Morton wrote:
> On Tue, 12 May 2009 00:14:23 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Monday 11 May 2009, David Rientjes wrote:
> > > On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> > > 
> > > > Index: linux-2.6/mm/page_alloc.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/mm/page_alloc.c
> > > > +++ linux-2.6/mm/page_alloc.c
> > > > @@ -1619,8 +1619,12 @@ nofail_alloc:
> > > >  			goto got_pg;
> > > >  		}
> > > >  
> > > > -		/* The OOM killer will not help higher order allocs so fail */
> > > > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > > > +		/*
> > > > +		 * The OOM killer will not help higher order allocs so fail.
> > > > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > > > +		 */
> > > > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > > > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> > > >  			clear_zonelist_oom(zonelist, gfp_mask);
> > > >  			goto nopage;
> > > >  		}
> > > > Index: linux-2.6/include/linux/gfp.h
> > > > ===================================================================
> > > > --- linux-2.6.orig/include/linux/gfp.h
> > > > +++ linux-2.6/include/linux/gfp.h
> > > > @@ -51,8 +51,9 @@ struct vm_area_struct;
> > > >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> > > >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> > > >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > > > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> > > >  
> > > > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > > > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> > > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > > >  
> > > >  /* This equals 0, but use constants in case they ever change */
> > > > 
> > > 
> > > Nack, unnecessary in mmotm and my patch series from 
> > > http://lkml.org/lkml/2009/5/10/118.
> > 
> > Andrew, what's your opinion, please?
> 
> I don't understand which part of David's patch series is supposed to
> address your requirement.  If it's "don't kill tasks which are in D
> state" then that's a problem because right now I think that patch is
> wrong.  It's still being discussed.

Yeah.

> > I can wait with these patches until the dust settles in the mm land.
> 
> Yes, it is pretty dusty at present.  I'd suggest that finding something
> else to do for a few days would be a wise step ;)

Well, in fact I have finished the other parts of my patchset and the only
missing piece is how to prevent the OOM killer from triggering while
hibernation memory is being allocated, but in principle that can be done in a
couple of different ways.

So, I think I'll post an update shortly and I'll wait for mm to settle down.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL
@ 2009-05-11 23:04               ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-11 23:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rientjes, fengguang.wu, linux-pm, linux-kernel, pavel, nigel, linux-mm

On Tuesday 12 May 2009, Andrew Morton wrote:
> On Tue, 12 May 2009 00:14:23 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Monday 11 May 2009, David Rientjes wrote:
> > > On Sun, 10 May 2009, Rafael J. Wysocki wrote:
> > > 
> > > > Index: linux-2.6/mm/page_alloc.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/mm/page_alloc.c
> > > > +++ linux-2.6/mm/page_alloc.c
> > > > @@ -1619,8 +1619,12 @@ nofail_alloc:
> > > >  			goto got_pg;
> > > >  		}
> > > >  
> > > > -		/* The OOM killer will not help higher order allocs so fail */
> > > > -		if (order > PAGE_ALLOC_COSTLY_ORDER) {
> > > > +		/*
> > > > +		 * The OOM killer will not help higher order allocs so fail.
> > > > +		 * Also fail if the caller doesn't want the OOM killer to run.
> > > > +		 */
> > > > +		if (order > PAGE_ALLOC_COSTLY_ORDER
> > > > +				|| (gfp_mask & __GFP_NO_OOM_KILL)) {
> > > >  			clear_zonelist_oom(zonelist, gfp_mask);
> > > >  			goto nopage;
> > > >  		}
> > > > Index: linux-2.6/include/linux/gfp.h
> > > > ===================================================================
> > > > --- linux-2.6.orig/include/linux/gfp.h
> > > > +++ linux-2.6/include/linux/gfp.h
> > > > @@ -51,8 +51,9 @@ struct vm_area_struct;
> > > >  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
> > > >  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
> > > >  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
> > > > +#define __GFP_NO_OOM_KILL ((__force gfp_t)0x200000u)  /* Don't invoke out_of_memory() */
> > > >  
> > > > -#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
> > > > +#define __GFP_BITS_SHIFT 22	/* Number of __GFP_FOO bits */
> > > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > > >  
> > > >  /* This equals 0, but use constants in case they ever change */
> > > > 
> > > 
> > > Nack, unnecessary in mmotm and my patch series from 
> > > http://lkml.org/lkml/2009/5/10/118.
> > 
> > Andrew, what's your opinion, please?
> 
> I don't understand which part of David's patch series is supposed to
> address your requirement.  If it's "don't kill tasks which are in D
> state" then that's a problem because right now I think that patch is
> wrong.  It's still being discussed.

Yeah.

> > I can wait with these patches until the dust settles in the mm land.
> 
> Yes, it is pretty dusty at present.  I'd suggest that finding something
> else to do for a few days would be a wise step ;)

Well, in fact I have finished the other parts of my patchset and the only
missing piece is how to prevent the OOM killer from triggering while
hibernation memory is being allocated, but in principle that can be done in a
couple of different ways.

So, I think I'll post an update shortly and I'll wait for mm to settle down.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 4)
  2009-05-10 13:48     ` Rafael J. Wysocki
@ 2009-05-13  8:32       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:32 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > > Hi,
> > > 
> > > The following patchset is an attempt to rework the memory shrinking mechanism
> > > used during hibernation to make room for the image.  It is a work in progress
> > > and most likely it's going to be modified, but it has been discussed recently
> > > and I'd like to get comments on the current version.
> > > 
> > > [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
> > >         it's verified that we can avoid the OOM killing by using
> > >         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
> > >         in the next patches).
> > > 
> > > [2/5] - drop memory shrinking from the suspend (to RAM) code path
> > > 
> > > [3/5] - move swsusp_shrink_memory() to snapshot.c
> > > 
> > > [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
> > >         memory pressure)
> > > 
> > > [5/5] - allocate image pages along with the shrinking.
> > 
> > Updated patchset follows.
> 
> I the meantime I added a patch that attempts to computer the size of the hard
> core working set.  I also had to rework the patch reworking
> swsusp_shrink_memory() so that it takes highmem into account.
> 
> Currently, the patchset consists of the following patches:
> 
> [1/6] - disable the OOM kernel after freezing tasks (this will be dropped if
>         it's verified that we can avoid the OOM killing by using
>         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
>         in the next patches).
> 
> [2/6] - drop memory shrinking from the suspend (to RAM) code path
> 
> [3/6] - move swsusp_shrink_memory() to snapshot.c
> 
> [4/6] - rework swsusp_shrink_memory() (to use memory allocations for applying
>         memory pressure)
> 
> [5/6] - allocate image pages along with the shrinking
> 
> [6/6] - estimate the size of the hard core working set and use it as the lower
>         limit of the image size.

This is the 4th (and hopefully final) version of the patchset reworking
hibernation memory shrinking.  The patches have been rearranged and some of
them were modified to fix bugs etc.

[1/6] - drop memory shrinking from the suspend (to RAM) code path

[2/6] - move swsusp_shrink_memory() to snapshot.c

[3/6] - disable the OOM killer after freezing tasks (now it is done it a
        simpler way)

[4/6] - rework swsusp_shrink_memory() to use memory allocations for applying
        memory pressure)

[5/6] - do not release the preallocated image pages

[6/6] - estimate the minimum image size and use it to avoid attempting to free
        too much memory too hard

IMO patches [1/6] and [2/6] are ready to go, so I'm going to add them to the
linux-next branch of the suspend tree.

Patch [3/6] in in mm, but it would be easier to handle [4/6] - [6/6] having
this patch in the suspend tree.  Dunno.  Andrew, what would you prefer to do
with it?

Patches [4/6] - [6/6] have been tested on a couple of boxes in different
configuration and no major problems have been found.

Comments welcome.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 4)
  2009-05-10 13:48     ` Rafael J. Wysocki
                       ` (12 preceding siblings ...)
  (?)
@ 2009-05-13  8:32     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:32 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > > Hi,
> > > 
> > > The following patchset is an attempt to rework the memory shrinking mechanism
> > > used during hibernation to make room for the image.  It is a work in progress
> > > and most likely it's going to be modified, but it has been discussed recently
> > > and I'd like to get comments on the current version.
> > > 
> > > [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
> > >         it's verified that we can avoid the OOM killing by using
> > >         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
> > >         in the next patches).
> > > 
> > > [2/5] - drop memory shrinking from the suspend (to RAM) code path
> > > 
> > > [3/5] - move swsusp_shrink_memory() to snapshot.c
> > > 
> > > [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
> > >         memory pressure)
> > > 
> > > [5/5] - allocate image pages along with the shrinking.
> > 
> > Updated patchset follows.
> 
> I the meantime I added a patch that attempts to computer the size of the hard
> core working set.  I also had to rework the patch reworking
> swsusp_shrink_memory() so that it takes highmem into account.
> 
> Currently, the patchset consists of the following patches:
> 
> [1/6] - disable the OOM kernel after freezing tasks (this will be dropped if
>         it's verified that we can avoid the OOM killing by using
>         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
>         in the next patches).
> 
> [2/6] - drop memory shrinking from the suspend (to RAM) code path
> 
> [3/6] - move swsusp_shrink_memory() to snapshot.c
> 
> [4/6] - rework swsusp_shrink_memory() (to use memory allocations for applying
>         memory pressure)
> 
> [5/6] - allocate image pages along with the shrinking
> 
> [6/6] - estimate the size of the hard core working set and use it as the lower
>         limit of the image size.

This is the 4th (and hopefully final) version of the patchset reworking
hibernation memory shrinking.  The patches have been rearranged and some of
them were modified to fix bugs etc.

[1/6] - drop memory shrinking from the suspend (to RAM) code path

[2/6] - move swsusp_shrink_memory() to snapshot.c

[3/6] - disable the OOM killer after freezing tasks (now it is done it a
        simpler way)

[4/6] - rework swsusp_shrink_memory() to use memory allocations for applying
        memory pressure)

[5/6] - do not release the preallocated image pages

[6/6] - estimate the minimum image size and use it to avoid attempting to free
        too much memory too hard

IMO patches [1/6] and [2/6] are ready to go, so I'm going to add them to the
linux-next branch of the suspend tree.

Patch [3/6] in in mm, but it would be easier to handle [4/6] - [6/6] having
this patch in the suspend tree.  Dunno.  Andrew, what would you prefer to do
with it?

Patches [4/6] - [6/6] have been tested on a couple of boxes in different
configuration and no major problems have been found.

Comments welcome.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 4)
@ 2009-05-13  8:32       ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:32 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

On Sunday 10 May 2009, Rafael J. Wysocki wrote:
> On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > On Thursday 07 May 2009, Rafael J. Wysocki wrote:
> > > Hi,
> > > 
> > > The following patchset is an attempt to rework the memory shrinking mechanism
> > > used during hibernation to make room for the image.  It is a work in progress
> > > and most likely it's going to be modified, but it has been discussed recently
> > > and I'd like to get comments on the current version.
> > > 
> > > [1/5] - disable the OOM kernel after freezing tasks (this will be dropped if
> > >         it's verified that we can avoid the OOM killing by using
> > >         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
> > >         in the next patches).
> > > 
> > > [2/5] - drop memory shrinking from the suspend (to RAM) code path
> > > 
> > > [3/5] - move swsusp_shrink_memory() to snapshot.c
> > > 
> > > [4/5] - rework swsusp_shrink_memory() (to use memory allocations for applying
> > >         memory pressure)
> > > 
> > > [5/5] - allocate image pages along with the shrinking.
> > 
> > Updated patchset follows.
> 
> I the meantime I added a patch that attempts to computer the size of the hard
> core working set.  I also had to rework the patch reworking
> swsusp_shrink_memory() so that it takes highmem into account.
> 
> Currently, the patchset consists of the following patches:
> 
> [1/6] - disable the OOM kernel after freezing tasks (this will be dropped if
>         it's verified that we can avoid the OOM killing by using
>         __GFP_FS|__GFP_WAIT|__GFP_NORETRY|__GFP_NOWARN
>         in the next patches).
> 
> [2/6] - drop memory shrinking from the suspend (to RAM) code path
> 
> [3/6] - move swsusp_shrink_memory() to snapshot.c
> 
> [4/6] - rework swsusp_shrink_memory() (to use memory allocations for applying
>         memory pressure)
> 
> [5/6] - allocate image pages along with the shrinking
> 
> [6/6] - estimate the size of the hard core working set and use it as the lower
>         limit of the image size.

This is the 4th (and hopefully final) version of the patchset reworking
hibernation memory shrinking.  The patches have been rearranged and some of
them were modified to fix bugs etc.

[1/6] - drop memory shrinking from the suspend (to RAM) code path

[2/6] - move swsusp_shrink_memory() to snapshot.c

[3/6] - disable the OOM killer after freezing tasks (now it is done it a
        simpler way)

[4/6] - rework swsusp_shrink_memory() to use memory allocations for applying
        memory pressure)

[5/6] - do not release the preallocated image pages

[6/6] - estimate the minimum image size and use it to avoid attempting to free
        too much memory too hard

IMO patches [1/6] and [2/6] are ready to go, so I'm going to add them to the
linux-next branch of the suspend tree.

Patch [3/6] in in mm, but it would be easier to handle [4/6] - [6/6] having
this patch in the suspend tree.  Dunno.  Andrew, what would you prefer to do
with it?

Patches [4/6] - [6/6] have been tested on a couple of boxes in different
configuration and no major problems have been found.

Comments welcome.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 1/6] PM/Suspend: Do not shrink memory before suspend
  2009-05-13  8:32       ` Rafael J. Wysocki
@ 2009-05-13  8:34         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:34 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 1/6] PM/Suspend: Do not shrink memory before suspend
  2009-05-13  8:32       ` Rafael J. Wysocki
  (?)
@ 2009-05-13  8:34       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:34 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 1/6] PM/Suspend: Do not shrink memory before suspend
@ 2009-05-13  8:34         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:34 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/main.c |   20 +-------------------
 mm/vmscan.c         |    4 ++--
 2 files changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -188,9 +188,6 @@ static void suspend_test_finish(const ch
 
 #endif
 
-/* This is just an arbitrary number */
-#define FREE_PAGE_NUMBER (100)
-
 static struct platform_suspend_ops *suspend_ops;
 
 /**
@@ -226,7 +223,6 @@ int suspend_valid_only_mem(suspend_state
 static int suspend_prepare(void)
 {
 	int error;
-	unsigned int free_pages;
 
 	if (!suspend_ops || !suspend_ops->enter)
 		return -EPERM;
@@ -241,24 +237,10 @@ static int suspend_prepare(void)
 	if (error)
 		goto Finish;
 
-	if (suspend_freeze_processes()) {
-		error = -EAGAIN;
-		goto Thaw;
-	}
-
-	free_pages = global_page_state(NR_FREE_PAGES);
-	if (free_pages < FREE_PAGE_NUMBER) {
-		pr_debug("PM: free some memory\n");
-		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
-		if (nr_free_pages() < FREE_PAGE_NUMBER) {
-			error = -ENOMEM;
-			printk(KERN_ERR "PM: No enough memory\n");
-		}
-	}
+	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
 
- Thaw:
 	suspend_thaw_processes();
 	usermodehelper_enable();
  Finish:
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2054,7 +2054,7 @@ unsigned long global_lru_pages(void)
 		+ global_page_state(NR_INACTIVE_FILE);
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_HIBERNATION
 /*
  * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
  * from LRU lists system-wide, for given pass and priority.
@@ -2194,7 +2194,7 @@ out:
 
 	return sc.nr_reclaimed;
 }
-#endif
+#endif /* CONFIG_HIBERNATION */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 2/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-13  8:32       ` Rafael J. Wysocki
@ 2009-05-13  8:35         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:35 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 2/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  2009-05-13  8:32       ` Rafael J. Wysocki
                         ` (2 preceding siblings ...)
  (?)
@ 2009-05-13  8:35       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:35 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 2/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
@ 2009-05-13  8:35         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:35 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

The next patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 kernel/power/power.h    |    4 --
 kernel/power/snapshot.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/power/swsusp.c   |   76 ---------------------------------------------
 3 files changed, 79 insertions(+), 81 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -39,6 +39,14 @@ static int swsusp_page_is_free(struct pa
 static void swsusp_set_page_forbidden(struct page *);
 static void swsusp_unset_page_forbidden(struct page *);
 
+/*
+ * Preferred image size in bytes (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned long image_size = 500 * 1024 * 1024;
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -840,7 +848,7 @@ static struct page *saveable_highmem_pag
  *	pages.
  */
 
-unsigned int count_highmem_pages(void)
+static unsigned int count_highmem_pages(void)
 {
 	struct zone *zone;
 	unsigned int n = 0;
@@ -902,7 +910,7 @@ static struct page *saveable_page(struct
  *	pages.
  */
 
-unsigned int count_data_pages(void)
+static unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long pfn, max_zone_pfn;
@@ -1058,6 +1066,74 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+#define SHRINK_BITE	10000
+static inline unsigned long __shrink_memory(long tmp)
+{
+	if (tmp > SHRINK_BITE)
+		tmp = SHRINK_BITE;
+	return shrink_all_memory(tmp);
+}
+
+int swsusp_shrink_memory(void)
+{
+	long tmp;
+	struct zone *zone;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+	struct timeval start, stop;
+
+	printk(KERN_INFO "PM: Shrinking memory...  ");
+	do_gettimeofday(&start);
+	do {
+		long size, highmem_size;
+
+		highmem_size = count_highmem_pages();
+		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
+		tmp = size;
+		size += highmem_size;
+		for_each_populated_zone(zone) {
+			tmp += snapshot_additional_pages(zone);
+			if (is_highmem(zone)) {
+				highmem_size -=
+					zone_page_state(zone, NR_FREE_PAGES);
+			} else {
+				tmp -= zone_page_state(zone, NR_FREE_PAGES);
+				tmp += zone->lowmem_reserve[ZONE_NORMAL];
+			}
+		}
+
+		if (highmem_size < 0)
+			highmem_size = 0;
+
+		tmp += highmem_size;
+		if (tmp > 0) {
+			tmp = __shrink_memory(tmp);
+			if (!tmp)
+				return -ENOMEM;
+			pages += tmp;
+		} else if (size > image_size / PAGE_SIZE) {
+			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+			pages += tmp;
+		}
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0);
+	do_gettimeofday(&stop);
+	printk("\bdone (%lu pages freed)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Freed");
+
+	return 0;
+}
+
 #ifdef CONFIG_HIGHMEM
 /**
   *	count_pages_for_highmem - compute the number of non-highmem pages
Index: linux-2.6/kernel/power/swsusp.c
===================================================================
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -55,14 +55,6 @@
 
 #include "power.h"
 
-/*
- * Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, swsusp will do its best to ensure the image
- * size will not exceed N bytes, but if that is impossible, it will
- * try to create the smallest image possible.
- */
-unsigned long image_size = 500 * 1024 * 1024;
-
 int in_suspend __nosavedata = 0;
 
 /**
@@ -195,74 +187,6 @@ void swsusp_show_speed(struct timeval *s
 			kps / 1000, (kps % 1000) / 10);
 }
 
-/**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
- */
-
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
-{
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
-}
-
-int swsusp_shrink_memory(void)
-{
-	long tmp;
-	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
-	struct timeval start, stop;
-
-	printk(KERN_INFO "PM: Shrinking memory...  ");
-	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
-
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
-	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
-
-	return 0;
-}
-
 /*
  * Platforms, like ACPI, may want us to save some memory used by them during
  * hibernation and to restore the contents of this memory during the subsequent
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern unsigned int count_data_pages(void);
+extern int swsusp_shrink_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
@@ -149,7 +149,6 @@ extern int swsusp_swap_in_use(void);
 
 /* kernel/power/disk.c */
 extern int swsusp_check(void);
-extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
@@ -176,7 +175,6 @@ extern int pm_notifier_call_chain(unsign
 #endif
 
 #ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void);
 int restore_highmem(void);
 #else
 static inline unsigned int count_highmem_pages(void) { return 0; }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13  8:32       ` Rafael J. Wysocki
@ 2009-05-13  8:37         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:37 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Currently, the following scenario appears to be possible in theory:

* Tasks are frozen for hibernation or suspend.
* Free pages are almost exhausted.
* Certain piece of code in the suspend code path attempts to allocate
  some memory using GFP_KERNEL and allocation order less than or
  equal to PAGE_ALLOC_COSTLY_ORDER.
* __alloc_pages_internal() cannot find a free page so it invokes the
  OOM killer.
* The OOM killer attempts to kill a task, but the task is frozen, so
  it doesn't die immediately.
* __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
  to find a free page and invokes the OOM killer.
* No progress can be made.

Although it is now hard to trigger during hibernation due to the
memory shrinking carried out by the hibernation code, it is
theoretically possible to trigger during suspend after the memory
shrinking has been removed from that code path.  Moreover, since
memory allocations are going to be used for the hibernation memory
shrinking, it will be even more likely to happen during hibernation.

To prevent it from happening, introduce the oom_killer_disabled
switch that will cause __alloc_pages_internal() to fail in the
situations in which the OOM killer would have been called and make
the freezer set this switch after tasks have been successfully
frozen.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h    |   12 ++++++++++++
 kernel/power/process.c |    5 +++++
 mm/page_alloc.c        |    5 +++++
 3 files changed, 22 insertions(+)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
 					PB_migrate, PB_migrate_end);
 }
 
+bool oom_killer_disabled __read_mostly;
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
@@ -1600,6 +1602,9 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		if (oom_killer_disabled)
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -245,4 +245,16 @@ void drain_zone_pages(struct zone *zone,
 void drain_all_pages(void);
 void drain_local_pages(void *dummy);
 
+extern bool oom_killer_disabled;
+
+static inline void oom_killer_disable(void)
+{
+	oom_killer_disabled = true;
+}
+
+static inline void oom_killer_enable(void)
+{
+	oom_killer_disabled = false;
+}
+
 #endif /* __LINUX_GFP_H */
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -117,9 +117,12 @@ int freeze_processes(void)
 	if (error)
 		goto Exit;
 	printk("done.");
+
+	oom_killer_disable();
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
 	return error;
 }
 
@@ -145,6 +148,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	oom_killer_enable();
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13  8:32       ` Rafael J. Wysocki
                         ` (5 preceding siblings ...)
  (?)
@ 2009-05-13  8:37       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:37 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

Currently, the following scenario appears to be possible in theory:

* Tasks are frozen for hibernation or suspend.
* Free pages are almost exhausted.
* Certain piece of code in the suspend code path attempts to allocate
  some memory using GFP_KERNEL and allocation order less than or
  equal to PAGE_ALLOC_COSTLY_ORDER.
* __alloc_pages_internal() cannot find a free page so it invokes the
  OOM killer.
* The OOM killer attempts to kill a task, but the task is frozen, so
  it doesn't die immediately.
* __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
  to find a free page and invokes the OOM killer.
* No progress can be made.

Although it is now hard to trigger during hibernation due to the
memory shrinking carried out by the hibernation code, it is
theoretically possible to trigger during suspend after the memory
shrinking has been removed from that code path.  Moreover, since
memory allocations are going to be used for the hibernation memory
shrinking, it will be even more likely to happen during hibernation.

To prevent it from happening, introduce the oom_killer_disabled
switch that will cause __alloc_pages_internal() to fail in the
situations in which the OOM killer would have been called and make
the freezer set this switch after tasks have been successfully
frozen.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h    |   12 ++++++++++++
 kernel/power/process.c |    5 +++++
 mm/page_alloc.c        |    5 +++++
 3 files changed, 22 insertions(+)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
 					PB_migrate, PB_migrate_end);
 }
 
+bool oom_killer_disabled __read_mostly;
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
@@ -1600,6 +1602,9 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		if (oom_killer_disabled)
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -245,4 +245,16 @@ void drain_zone_pages(struct zone *zone,
 void drain_all_pages(void);
 void drain_local_pages(void *dummy);
 
+extern bool oom_killer_disabled;
+
+static inline void oom_killer_disable(void)
+{
+	oom_killer_disabled = true;
+}
+
+static inline void oom_killer_enable(void)
+{
+	oom_killer_disabled = false;
+}
+
 #endif /* __LINUX_GFP_H */
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -117,9 +117,12 @@ int freeze_processes(void)
 	if (error)
 		goto Exit;
 	printk("done.");
+
+	oom_killer_disable();
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
 	return error;
 }
 
@@ -145,6 +148,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	oom_killer_enable();
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
@ 2009-05-13  8:37         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:37 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Currently, the following scenario appears to be possible in theory:

* Tasks are frozen for hibernation or suspend.
* Free pages are almost exhausted.
* Certain piece of code in the suspend code path attempts to allocate
  some memory using GFP_KERNEL and allocation order less than or
  equal to PAGE_ALLOC_COSTLY_ORDER.
* __alloc_pages_internal() cannot find a free page so it invokes the
  OOM killer.
* The OOM killer attempts to kill a task, but the task is frozen, so
  it doesn't die immediately.
* __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
  to find a free page and invokes the OOM killer.
* No progress can be made.

Although it is now hard to trigger during hibernation due to the
memory shrinking carried out by the hibernation code, it is
theoretically possible to trigger during suspend after the memory
shrinking has been removed from that code path.  Moreover, since
memory allocations are going to be used for the hibernation memory
shrinking, it will be even more likely to happen during hibernation.

To prevent it from happening, introduce the oom_killer_disabled
switch that will cause __alloc_pages_internal() to fail in the
situations in which the OOM killer would have been called and make
the freezer set this switch after tasks have been successfully
frozen.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/gfp.h    |   12 ++++++++++++
 kernel/power/process.c |    5 +++++
 mm/page_alloc.c        |    5 +++++
 3 files changed, 22 insertions(+)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
 					PB_migrate, PB_migrate_end);
 }
 
+bool oom_killer_disabled __read_mostly;
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
@@ -1600,6 +1602,9 @@ nofail_alloc:
 		if (page)
 			goto got_pg;
 	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+		if (oom_killer_disabled)
+			goto nopage;
+
 		if (!try_set_zone_oom(zonelist, gfp_mask)) {
 			schedule_timeout_uninterruptible(1);
 			goto restart;
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -245,4 +245,16 @@ void drain_zone_pages(struct zone *zone,
 void drain_all_pages(void);
 void drain_local_pages(void *dummy);
 
+extern bool oom_killer_disabled;
+
+static inline void oom_killer_disable(void)
+{
+	oom_killer_disabled = true;
+}
+
+static inline void oom_killer_enable(void)
+{
+	oom_killer_disabled = false;
+}
+
 #endif /* __LINUX_GFP_H */
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -117,9 +117,12 @@ int freeze_processes(void)
 	if (error)
 		goto Exit;
 	printk("done.");
+
+	oom_killer_disable();
  Exit:
 	BUG_ON(in_atomic());
 	printk("\n");
+
 	return error;
 }
 
@@ -145,6 +148,8 @@ static void thaw_tasks(bool nosig_only)
 
 void thaw_processes(void)
 {
+	oom_killer_enable();
+
 	printk("Restarting tasks ... ");
 	thaw_tasks(true);
 	thaw_tasks(false);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13  8:32       ` Rafael J. Wysocki
@ 2009-05-13  8:39         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:39 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  215 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 168 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,190 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
+ * Return value: Number of page frames actually allocated
+ */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
+
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
+}
+
+#define FRACTION_SHIFT	8
+
+/**
+ * compute_fraction - Compute approximate fraction x * (a/b)
+ * @x: Number to multiply.
+ * @numerator: Numerator of the fraction (a).
+ * @denominator: Denominator of the fraction (b).
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Compute an approximate value of the expression x * (a/b), where a is less
+ * than b, all x, a, b are unsigned longs and x * a may be greater than the
+ * maximum unsigned long.
  */
+static unsigned long compute_fraction(
+	unsigned long x, unsigned long numerator, unsigned long denominator)
+{
+	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+	x *= ratio;
+	return x >> FRACTION_SHIFT;
+}
+
+static unsigned long highmem_size(
+	unsigned long size, unsigned long highmem, unsigned long count)
+{
+	return highmem > count / 2 ?
+			compute_fraction(size, highmem, count) :
+			size - compute_fraction(size, count - highmem, count);
+}
+#else
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
+}
+
+static inline unsigned long highmem_size(
+	unsigned long size, unsigned long highmem, unsigned long count)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return 0;
 }
+#endif /* CONFIG_HIGHMEM */
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of saveable
+ * pages in the system is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone))
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		else
+			count += zone_page_state(zone, NR_FREE_PAGES);
+	}
+	count += highmem;
+	count -= totalreserve_pages;
+
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
+
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance will be hurt badly in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	max_size += pages_highmem;
+	alloc = count - max_size;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_image_highmem(highmem_size(size, highmem, count));
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13  8:32       ` Rafael J. Wysocki
                         ` (7 preceding siblings ...)
  (?)
@ 2009-05-13  8:39       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:39 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  215 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 168 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,190 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
+ * Return value: Number of page frames actually allocated
+ */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
+
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
+}
+
+#define FRACTION_SHIFT	8
+
+/**
+ * compute_fraction - Compute approximate fraction x * (a/b)
+ * @x: Number to multiply.
+ * @numerator: Numerator of the fraction (a).
+ * @denominator: Denominator of the fraction (b).
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Compute an approximate value of the expression x * (a/b), where a is less
+ * than b, all x, a, b are unsigned longs and x * a may be greater than the
+ * maximum unsigned long.
  */
+static unsigned long compute_fraction(
+	unsigned long x, unsigned long numerator, unsigned long denominator)
+{
+	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+	x *= ratio;
+	return x >> FRACTION_SHIFT;
+}
+
+static unsigned long highmem_size(
+	unsigned long size, unsigned long highmem, unsigned long count)
+{
+	return highmem > count / 2 ?
+			compute_fraction(size, highmem, count) :
+			size - compute_fraction(size, count - highmem, count);
+}
+#else
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
+}
+
+static inline unsigned long highmem_size(
+	unsigned long size, unsigned long highmem, unsigned long count)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return 0;
 }
+#endif /* CONFIG_HIGHMEM */
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of saveable
+ * pages in the system is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone))
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		else
+			count += zone_page_state(zone, NR_FREE_PAGES);
+	}
+	count += highmem;
+	count -= totalreserve_pages;
+
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
+
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance will be hurt badly in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	max_size += pages_highmem;
+	alloc = count - max_size;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_image_highmem(highmem_size(size, highmem, count));
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-13  8:39         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:39 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  215 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 168 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,190 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
+ * Return value: Number of page frames actually allocated
+ */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
+
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
+}
+
+#define FRACTION_SHIFT	8
+
+/**
+ * compute_fraction - Compute approximate fraction x * (a/b)
+ * @x: Number to multiply.
+ * @numerator: Numerator of the fraction (a).
+ * @denominator: Denominator of the fraction (b).
  *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Compute an approximate value of the expression x * (a/b), where a is less
+ * than b, all x, a, b are unsigned longs and x * a may be greater than the
+ * maximum unsigned long.
  */
+static unsigned long compute_fraction(
+	unsigned long x, unsigned long numerator, unsigned long denominator)
+{
+	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+	x *= ratio;
+	return x >> FRACTION_SHIFT;
+}
+
+static unsigned long highmem_size(
+	unsigned long size, unsigned long highmem, unsigned long count)
+{
+	return highmem > count / 2 ?
+			compute_fraction(size, highmem, count) :
+			size - compute_fraction(size, count - highmem, count);
+}
+#else
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
+}
+
+static inline unsigned long highmem_size(
+	unsigned long size, unsigned long highmem, unsigned long count)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return 0;
 }
+#endif /* CONFIG_HIGHMEM */
 
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of saveable
+ * pages in the system is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
-
-		if (highmem_size < 0)
-			highmem_size = 0;
-
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone))
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		else
+			count += zone_page_state(zone, NR_FREE_PAGES);
+	}
+	count += highmem;
+	count -= totalreserve_pages;
+
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
+
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance will be hurt badly in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
+
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	max_size += pages_highmem;
+	alloc = count - max_size;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_image_highmem(highmem_size(size, highmem, count));
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-13  8:32       ` Rafael J. Wysocki
@ 2009-05-13  8:40         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:40 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

[rev. 2: Take highmem into account correctly.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  206 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 149 insertions(+), 74 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1144,7 +1173,47 @@ static inline unsigned long highmem_size
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ */
+static void free_unnecessary_pages(void)
+{
+	unsigned long save_highmem, to_free_normal, to_free_highmem;
+
+	to_free_normal = alloc_normal - count_data_pages();
+	save_highmem = count_highmem_pages();
+	if (alloc_highmem > save_highmem) {
+		to_free_highmem = alloc_highmem - save_highmem;
+	} else {
+		to_free_highmem = 0;
+		to_free_normal -= save_highmem - alloc_highmem;
+	}
+
+	memory_bm_position_reset(&copy_bm);
+
+	while (to_free_normal > 0 && to_free_highmem > 0) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		if (PageHighMem(page)) {
+			if (!to_free_highmem)
+				continue;
+			to_free_highmem--;
+			alloc_highmem--;
+		} else {
+			if (!to_free_normal)
+				continue;
+			to_free_normal--;
+			alloc_normal--;
+		}
+		memory_bm_clear_bit(&copy_bm, pfn);
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1163,19 +1232,30 @@ static inline unsigned long highmem_size
  * pages in the system is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1183,7 +1263,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1202,10 +1283,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1226,10 +1310,8 @@ int swsusp_shrink_memory(void)
 	max_size += pages_highmem;
 	alloc = count - max_size;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(highmem_size(size, highmem, count));
@@ -1238,21 +1320,24 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need as many page frames for the image as there are saveable
+	 * pages in memory, but we have allocated more.  Release the excessive
+	 * ones now.
+	 */
+	free_unnecessary_pages();
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1263,7 +1348,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1285,19 +1370,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1319,7 +1402,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1339,7 +1422,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1358,51 +1441,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
-
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
+
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:


^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-13  8:32       ` Rafael J. Wysocki
                         ` (8 preceding siblings ...)
  (?)
@ 2009-05-13  8:40       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:40 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

[rev. 2: Take highmem into account correctly.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  206 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 149 insertions(+), 74 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1144,7 +1173,47 @@ static inline unsigned long highmem_size
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ */
+static void free_unnecessary_pages(void)
+{
+	unsigned long save_highmem, to_free_normal, to_free_highmem;
+
+	to_free_normal = alloc_normal - count_data_pages();
+	save_highmem = count_highmem_pages();
+	if (alloc_highmem > save_highmem) {
+		to_free_highmem = alloc_highmem - save_highmem;
+	} else {
+		to_free_highmem = 0;
+		to_free_normal -= save_highmem - alloc_highmem;
+	}
+
+	memory_bm_position_reset(&copy_bm);
+
+	while (to_free_normal > 0 && to_free_highmem > 0) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		if (PageHighMem(page)) {
+			if (!to_free_highmem)
+				continue;
+			to_free_highmem--;
+			alloc_highmem--;
+		} else {
+			if (!to_free_normal)
+				continue;
+			to_free_normal--;
+			alloc_normal--;
+		}
+		memory_bm_clear_bit(&copy_bm, pfn);
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1163,19 +1232,30 @@ static inline unsigned long highmem_size
  * pages in the system is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1183,7 +1263,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1202,10 +1283,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1226,10 +1310,8 @@ int swsusp_shrink_memory(void)
 	max_size += pages_highmem;
 	alloc = count - max_size;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(highmem_size(size, highmem, count));
@@ -1238,21 +1320,24 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need as many page frames for the image as there are saveable
+	 * pages in memory, but we have allocated more.  Release the excessive
+	 * ones now.
+	 */
+	free_unnecessary_pages();
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1263,7 +1348,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1285,19 +1370,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1319,7 +1402,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1339,7 +1422,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1358,51 +1441,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
-
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
+
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
@ 2009-05-13  8:40         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:40 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames.  The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.

[rev. 2: Take highmem into account correctly.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c     |   15 ++-
 kernel/power/power.h    |    2 
 kernel/power/snapshot.c |  206 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 149 insertions(+), 74 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
 static unsigned int nr_copy_pages;
 /* Number of pages needed for saving the original pfns of the image pages */
 static unsigned int nr_meta_pages;
+/*
+ * Numbers of normal and highmem page frames allocated for hibernation image
+ * before suspending devices.
+ */
+unsigned int alloc_normal, alloc_highmem;
+/*
+ * Memory bitmap used for marking saveable pages (during hibernation) or
+ * hibernation image pages (during restore)
+ */
+static struct memory_bitmap orig_bm;
+/*
+ * Memory bitmap used during hibernation for marking allocated page frames that
+ * will contain copies of saveable pages.  During restore it is initially used
+ * for marking hibernation image pages, but then the set bits from it are
+ * duplicated in @orig_bm and it is released.  On highmem systems it is next
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
+ */
+static struct memory_bitmap copy_bm;
 
 /**
  *	swsusp_free - free pages allocated for the suspend.
@@ -1064,6 +1083,8 @@ void swsusp_free(void)
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
 	buffer = NULL;
+	alloc_normal = 0;
+	alloc_highmem = 0;
 }
 
 /* Helper functions used for the shrinking of memory. */
@@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
 	unsigned long nr_alloc = 0;
 
 	while (nr_pages > 0) {
-		if (!alloc_image_page(mask))
-			break;
+ 		struct page *page;
+
+		page = alloc_image_page(mask);
+ 		if (!page)
+ 			break;
+		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
+		if (PageHighMem(page))
+			alloc_highmem++;
+		else
+			alloc_normal++;
 		nr_pages--;
 		nr_alloc++;
 	}
@@ -1144,7 +1173,47 @@ static inline unsigned long highmem_size
 #endif /* CONFIG_HIGHMEM */
 
 /**
- * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ * free_unnecessary_pages - Release preallocated pages not needed for the image
+ */
+static void free_unnecessary_pages(void)
+{
+	unsigned long save_highmem, to_free_normal, to_free_highmem;
+
+	to_free_normal = alloc_normal - count_data_pages();
+	save_highmem = count_highmem_pages();
+	if (alloc_highmem > save_highmem) {
+		to_free_highmem = alloc_highmem - save_highmem;
+	} else {
+		to_free_highmem = 0;
+		to_free_normal -= save_highmem - alloc_highmem;
+	}
+
+	memory_bm_position_reset(&copy_bm);
+
+	while (to_free_normal > 0 && to_free_highmem > 0) {
+		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+		struct page *page = pfn_to_page(pfn);
+
+		if (PageHighMem(page)) {
+			if (!to_free_highmem)
+				continue;
+			to_free_highmem--;
+			alloc_highmem--;
+		} else {
+			if (!to_free_normal)
+				continue;
+			to_free_normal--;
+			alloc_normal--;
+		}
+		memory_bm_clear_bit(&copy_bm, pfn);
+		swsusp_unset_page_forbidden(page);
+		swsusp_unset_page_free(page);
+		__free_page(page);
+	}
+}
+
+/**
+ * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
  * frame in use.  We also need a number of page frames to be free during
@@ -1163,19 +1232,30 @@ static inline unsigned long highmem_size
  * pages in the system is below the requested image size or it is impossible to
  * allocate more memory, whichever happens first.
  */
-int swsusp_shrink_memory(void)
+int hibernate_preallocate_memory(void)
 {
 	struct zone *zone;
 	unsigned long saveable, size, max_size, count, highmem, pages = 0;
-	unsigned long alloc, pages_highmem;
+	unsigned long alloc, save_highmem, pages_highmem;
 	struct timeval start, stop;
-	int error = 0;
+	int error;
 
-	printk(KERN_INFO "PM: Shrinking memory... ");
+	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
+	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
+	if (error)
+		goto err_out;
+
+	alloc_normal = 0;
+	alloc_highmem = 0;
+
 	/* Count the number of saveable data pages. */
-	highmem = count_highmem_pages();
+	save_highmem = count_highmem_pages();
 	saveable = count_data_pages();
 
 	/*
@@ -1183,7 +1263,8 @@ int swsusp_shrink_memory(void)
 	 * number of pages needed for image metadata (size).
 	 */
 	count = saveable;
-	saveable += highmem;
+	saveable += save_highmem;
+	highmem = save_highmem;
 	size = 0;
 	for_each_populated_zone(zone) {
 		size += snapshot_additional_pages(zone);
@@ -1202,10 +1283,13 @@ int swsusp_shrink_memory(void)
 		size = max_size;
 	/*
 	 * If the maximum is not less than the current number of saveable pages
-	 * in memory, we don't need to do anything more.
+	 * in memory, allocate page frames for the image and we're done.
 	 */
-	if (size >= saveable)
+	if (size >= saveable) {
+		pages = preallocate_image_highmem(save_highmem);
+		pages += preallocate_image_memory(saveable - pages);
 		goto out;
+	}
 
 	/*
 	 * Let the memory management subsystem know that we're going to need a
@@ -1226,10 +1310,8 @@ int swsusp_shrink_memory(void)
 	max_size += pages_highmem;
 	alloc = count - max_size;
 	pages = preallocate_image_memory(alloc);
-	if (pages < alloc) {
-		error = -ENOMEM;
-		goto free_out;
-	}
+	if (pages < alloc)
+		goto err_out;
 	size = max_size - size;
 	alloc = size;
 	size = preallocate_image_highmem(highmem_size(size, highmem, count));
@@ -1238,21 +1320,24 @@ int swsusp_shrink_memory(void)
 	pages += preallocate_image_memory(alloc);
 	pages += pages_highmem;
 
- free_out:
-	/* Release all of the preallocated page frames. */
-	swsusp_free();
-
-	if (error) {
-		printk(KERN_CONT "\n");
-		return error;
-	}
+	/*
+	 * We only need as many page frames for the image as there are saveable
+	 * pages in memory, but we have allocated more.  Release the excessive
+	 * ones now.
+	 */
+	free_unnecessary_pages();
 
  out:
 	do_gettimeofday(&stop);
-	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
-	swsusp_show_speed(&start, &stop, pages, "Freed");
+	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
+	swsusp_show_speed(&start, &stop, pages, "Allocated");
 
 	return 0;
+
+ err_out:
+	printk(KERN_CONT "\n");
+	swsusp_free();
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1263,7 +1348,7 @@ int swsusp_shrink_memory(void)
 
 static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
 {
-	unsigned int free_highmem = count_free_highmem_pages();
+	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
 
 	if (free_highmem >= nr_highmem)
 		nr_highmem = 0;
@@ -1285,19 +1370,17 @@ count_pages_for_highmem(unsigned int nr_
 static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
 {
 	struct zone *zone;
-	unsigned int free = 0, meta = 0;
+	unsigned int free = alloc_normal;
 
-	for_each_zone(zone) {
-		meta += snapshot_additional_pages(zone);
+	for_each_zone(zone)
 		if (!is_highmem(zone))
 			free += zone_page_state(zone, NR_FREE_PAGES);
-	}
 
 	nr_pages += count_pages_for_highmem(nr_highmem);
-	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
-		nr_pages, PAGES_FOR_IO, meta, free);
+	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
+		nr_pages, PAGES_FOR_IO, free);
 
-	return free > nr_pages + PAGES_FOR_IO + meta;
+	return free > nr_pages + PAGES_FOR_IO;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -1319,7 +1402,7 @@ static inline int get_highmem_buffer(int
  */
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
 {
 	unsigned int to_alloc = count_free_highmem_pages();
 
@@ -1339,7 +1422,7 @@ alloc_highmem_image_pages(struct memory_
 static inline int get_highmem_buffer(int safe_needed) { return 0; }
 
 static inline unsigned int
-alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
 #endif /* CONFIG_HIGHMEM */
 
 /**
@@ -1358,51 +1441,36 @@ static int
 swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
 		unsigned int nr_pages, unsigned int nr_highmem)
 {
-	int error;
-
-	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
-
-	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
-	if (error)
-		goto Free;
+	int error = 0;
 
 	if (nr_highmem > 0) {
 		error = get_highmem_buffer(PG_ANY);
 		if (error)
-			goto Free;
-
-		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
+			goto err_out;
+		if (nr_highmem > alloc_highmem) {
+			nr_highmem -= alloc_highmem;
+			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
+		}
 	}
-	while (nr_pages-- > 0) {
-		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
-
-		if (!page)
-			goto Free;
-
-		memory_bm_set_bit(copy_bm, page_to_pfn(page));
+	if (nr_pages > alloc_normal) {
+		nr_pages -= alloc_normal;
+		while (nr_pages-- > 0) {
+			struct page *page;
+
+			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			if (!page)
+				goto err_out;
+			memory_bm_set_bit(copy_bm, page_to_pfn(page));
+		}
 	}
+
 	return 0;
 
- Free:
+ err_out:
 	swsusp_free();
-	return -ENOMEM;
+	return error;
 }
 
-/* Memory bitmap used for marking saveable pages (during suspend) or the
- * suspend image pages (during resume)
- */
-static struct memory_bitmap orig_bm;
-/* Memory bitmap used on suspend for marking allocated pages that will contain
- * the copies of saveable pages.  During resume it is initially used for
- * marking the suspend image pages, but then its set bits are duplicated in
- * @orig_bm and it is released.  Next, on systems with high memory, it may be
- * used for marking "safe" highmem pages, but it has to be reinitialized for
- * this purpose.
- */
-static struct memory_bitmap copy_bm;
-
 asmlinkage int swsusp_save(void)
 {
 	unsigned int nr_pages, nr_highmem;
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
 
 extern int create_basic_memory_bitmaps(void);
 extern void free_basic_memory_bitmaps(void);
-extern int swsusp_shrink_memory(void);
+extern int hibernate_preallocate_memory(void);
 
 /**
  *	Auxiliary structure used for reading the snapshot image data and
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		return error;
 
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
+	/* Preallocate image memory before shutting down devices. */
+	error = hibernate_preallocate_memory();
 	if (error)
 		goto Close;
 
@@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
 	/* Control returns here after successful restore */
 
  Resume_devices:
+	/* We may need to release the preallocated image pages here. */
+	if (error || !in_suspend)
+		swsusp_free();
+
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 	resume_console();
@@ -593,7 +597,10 @@ int hibernate(void)
 		goto Thaw;
 
 	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
+	if (error)
+		goto Thaw;
+
+	if (in_suspend) {
 		unsigned int flags = 0;
 
 		if (hibernation_mode == HIBERNATION_PLATFORM)
@@ -605,8 +612,8 @@ int hibernate(void)
 			power_down();
 	} else {
 		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
 	}
+
  Thaw:
 	thaw_processes();
  Finish:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-13  8:32       ` Rafael J. Wysocki
@ 2009-05-13  8:42         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:42 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

We want to avoid attempting to free too much memory too hard during
hibernation, so estimate the minimum size of the image to use as the
lower limit for preallocating memory.

The approach here is based on the (experimental) observation that we
can't free more page frames than the sum of:

* global_page_state(NR_SLAB_RECLAIMABLE)
* global_page_state(NR_ACTIVE_ANON)
* global_page_state(NR_INACTIVE_ANON)
* global_page_state(NR_ACTIVE_FILE)
* global_page_state(NR_INACTIVE_FILE)

and even that is usually impossible to free in practice, because some
of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
in fact be freed.  It turns out, however, that if the sum of the
above numbers is subtracted from the number of saveable pages in the
system and the result is multiplied by 1.25, we get a suitable
estimate of the minimum size of the image.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 52 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ * @saveable: The total number of saveable pages in the system.
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image to use as the lower limit for
+ * preallocating memory.
+ *
+ * The minimum size of the image is computed as
+ *
+ * ([number of saveable pages] - [number of pages we can free]) * 1.25
+ *
+ * where the second term is the sum of reclaimable slab, anonymouns pages and
+ * active/inactive file pages.
+ *
+ * NOTE: It usually turns out that we can't really free all pages reported as
+ * reclaimable slab, so the number resulting from the subtraction alone is too
+ * low.  Still, it seems reasonable to assume that this number is proportional
+ * to the total number of pages that cannot be freed, which leads to the
+ * formula above.  The coefficient of proportinality in this formula, 1.25, has
+ * been determined experimentally.
+ */
+static unsigned long minimum_image_size(unsigned long saveable)
+{
+	unsigned long size;
+
+	/* Compute the number of saveable pages we can free. */
+	size = global_page_state(NR_SLAB_RECLAIMABLE)
+		+ global_page_state(NR_ACTIVE_ANON)
+		+ global_page_state(NR_INACTIVE_ANON)
+		+ global_page_state(NR_ACTIVE_FILE)
+		+ global_page_state(NR_INACTIVE_FILE);
+
+	if (saveable <= size)
+		return saveable;
+
+	size = saveable - size;
+	size += (size >> 2);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1229,8 +1272,8 @@ static void free_unnecessary_pages(void)
  *
  * If image_size is set below the number following from the above formula,
  * the preallocation of memory is continued until the total number of saveable
- * pages in the system is below the requested image size or it is impossible to
- * allocate more memory, whichever happens first.
+ * pages in the system is below the requested image size or the minimum
+ * acceptable image size returned by minimum_image_size(), whichever is greater.
  */
 int hibernate_preallocate_memory(void)
 {
@@ -1291,6 +1334,11 @@ int hibernate_preallocate_memory(void)
 		goto out;
 	}
 
+	/* Estimate the minimum size of the image. */
+	pages = minimum_image_size(saveable);
+	if (size < pages)
+		size = min_t(unsigned long, pages, max_size);
+
 	/*
 	 * Let the memory management subsystem know that we're going to need a
 	 * large number of page frames to allocate and make it free some memory.
@@ -1303,8 +1351,8 @@ int hibernate_preallocate_memory(void)
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size
-	 * of the image as much as indicated by image_size using allocations
-	 * from highmem and non-highmem zones separately.
+	 * of the image as much as indicated by 'size' using allocations from
+	 * highmem and non-highmem zones separately.
 	 */
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-13  8:32       ` Rafael J. Wysocki
                         ` (11 preceding siblings ...)
  (?)
@ 2009-05-13  8:42       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:42 UTC (permalink / raw)
  To: pm list; +Cc: LKML, linux-mm, David Rientjes, Andrew Morton, Wu Fengguang

From: Rafael J. Wysocki <rjw@sisk.pl>

We want to avoid attempting to free too much memory too hard during
hibernation, so estimate the minimum size of the image to use as the
lower limit for preallocating memory.

The approach here is based on the (experimental) observation that we
can't free more page frames than the sum of:

* global_page_state(NR_SLAB_RECLAIMABLE)
* global_page_state(NR_ACTIVE_ANON)
* global_page_state(NR_INACTIVE_ANON)
* global_page_state(NR_ACTIVE_FILE)
* global_page_state(NR_INACTIVE_FILE)

and even that is usually impossible to free in practice, because some
of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
in fact be freed.  It turns out, however, that if the sum of the
above numbers is subtracted from the number of saveable pages in the
system and the result is multiplied by 1.25, we get a suitable
estimate of the minimum size of the image.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 52 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ * @saveable: The total number of saveable pages in the system.
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image to use as the lower limit for
+ * preallocating memory.
+ *
+ * The minimum size of the image is computed as
+ *
+ * ([number of saveable pages] - [number of pages we can free]) * 1.25
+ *
+ * where the second term is the sum of reclaimable slab, anonymouns pages and
+ * active/inactive file pages.
+ *
+ * NOTE: It usually turns out that we can't really free all pages reported as
+ * reclaimable slab, so the number resulting from the subtraction alone is too
+ * low.  Still, it seems reasonable to assume that this number is proportional
+ * to the total number of pages that cannot be freed, which leads to the
+ * formula above.  The coefficient of proportinality in this formula, 1.25, has
+ * been determined experimentally.
+ */
+static unsigned long minimum_image_size(unsigned long saveable)
+{
+	unsigned long size;
+
+	/* Compute the number of saveable pages we can free. */
+	size = global_page_state(NR_SLAB_RECLAIMABLE)
+		+ global_page_state(NR_ACTIVE_ANON)
+		+ global_page_state(NR_INACTIVE_ANON)
+		+ global_page_state(NR_ACTIVE_FILE)
+		+ global_page_state(NR_INACTIVE_FILE);
+
+	if (saveable <= size)
+		return saveable;
+
+	size = saveable - size;
+	size += (size >> 2);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1229,8 +1272,8 @@ static void free_unnecessary_pages(void)
  *
  * If image_size is set below the number following from the above formula,
  * the preallocation of memory is continued until the total number of saveable
- * pages in the system is below the requested image size or it is impossible to
- * allocate more memory, whichever happens first.
+ * pages in the system is below the requested image size or the minimum
+ * acceptable image size returned by minimum_image_size(), whichever is greater.
  */
 int hibernate_preallocate_memory(void)
 {
@@ -1291,6 +1334,11 @@ int hibernate_preallocate_memory(void)
 		goto out;
 	}
 
+	/* Estimate the minimum size of the image. */
+	pages = minimum_image_size(saveable);
+	if (size < pages)
+		size = min_t(unsigned long, pages, max_size);
+
 	/*
 	 * Let the memory management subsystem know that we're going to need a
 	 * large number of page frames to allocate and make it free some memory.
@@ -1303,8 +1351,8 @@ int hibernate_preallocate_memory(void)
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size
-	 * of the image as much as indicated by image_size using allocations
-	 * from highmem and non-highmem zones separately.
+	 * of the image as much as indicated by 'size' using allocations from
+	 * highmem and non-highmem zones separately.
 	 */
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-13  8:42         ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13  8:42 UTC (permalink / raw)
  To: pm list
  Cc: Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, David Rientjes, linux-mm

From: Rafael J. Wysocki <rjw@sisk.pl>

We want to avoid attempting to free too much memory too hard during
hibernation, so estimate the minimum size of the image to use as the
lower limit for preallocating memory.

The approach here is based on the (experimental) observation that we
can't free more page frames than the sum of:

* global_page_state(NR_SLAB_RECLAIMABLE)
* global_page_state(NR_ACTIVE_ANON)
* global_page_state(NR_INACTIVE_ANON)
* global_page_state(NR_ACTIVE_FILE)
* global_page_state(NR_INACTIVE_FILE)

and even that is usually impossible to free in practice, because some
of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
in fact be freed.  It turns out, however, that if the sum of the
above numbers is subtracted from the number of saveable pages in the
system and the result is multiplied by 1.25, we get a suitable
estimate of the minimum size of the image.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 52 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ * @saveable: The total number of saveable pages in the system.
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image to use as the lower limit for
+ * preallocating memory.
+ *
+ * The minimum size of the image is computed as
+ *
+ * ([number of saveable pages] - [number of pages we can free]) * 1.25
+ *
+ * where the second term is the sum of reclaimable slab, anonymouns pages and
+ * active/inactive file pages.
+ *
+ * NOTE: It usually turns out that we can't really free all pages reported as
+ * reclaimable slab, so the number resulting from the subtraction alone is too
+ * low.  Still, it seems reasonable to assume that this number is proportional
+ * to the total number of pages that cannot be freed, which leads to the
+ * formula above.  The coefficient of proportinality in this formula, 1.25, has
+ * been determined experimentally.
+ */
+static unsigned long minimum_image_size(unsigned long saveable)
+{
+	unsigned long size;
+
+	/* Compute the number of saveable pages we can free. */
+	size = global_page_state(NR_SLAB_RECLAIMABLE)
+		+ global_page_state(NR_ACTIVE_ANON)
+		+ global_page_state(NR_INACTIVE_ANON)
+		+ global_page_state(NR_ACTIVE_FILE)
+		+ global_page_state(NR_INACTIVE_FILE);
+
+	if (saveable <= size)
+		return saveable;
+
+	size = saveable - size;
+	size += (size >> 2);
+
+	return size;
+}
+
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1229,8 +1272,8 @@ static void free_unnecessary_pages(void)
  *
  * If image_size is set below the number following from the above formula,
  * the preallocation of memory is continued until the total number of saveable
- * pages in the system is below the requested image size or it is impossible to
- * allocate more memory, whichever happens first.
+ * pages in the system is below the requested image size or the minimum
+ * acceptable image size returned by minimum_image_size(), whichever is greater.
  */
 int hibernate_preallocate_memory(void)
 {
@@ -1291,6 +1334,11 @@ int hibernate_preallocate_memory(void)
 		goto out;
 	}
 
+	/* Estimate the minimum size of the image. */
+	pages = minimum_image_size(saveable);
+	if (size < pages)
+		size = min_t(unsigned long, pages, max_size);
+
 	/*
 	 * Let the memory management subsystem know that we're going to need a
 	 * large number of page frames to allocate and make it free some memory.
@@ -1303,8 +1351,8 @@ int hibernate_preallocate_memory(void)
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size
-	 * of the image as much as indicated by image_size using allocations
-	 * from highmem and non-highmem zones separately.
+	 * of the image as much as indicated by 'size' using allocations from
+	 * highmem and non-highmem zones separately.
 	 */
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	max_size += pages_highmem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13  8:37         ` Rafael J. Wysocki
@ 2009-05-13  9:19           ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-13  9:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Wed 2009-05-13 10:37:49, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Currently, the following scenario appears to be possible in theory:
> 
> * Tasks are frozen for hibernation or suspend.
> * Free pages are almost exhausted.
> * Certain piece of code in the suspend code path attempts to allocate
>   some memory using GFP_KERNEL and allocation order less than or
>   equal to PAGE_ALLOC_COSTLY_ORDER.
> * __alloc_pages_internal() cannot find a free page so it invokes the
>   OOM killer.
> * The OOM killer attempts to kill a task, but the task is frozen, so
>   it doesn't die immediately.
> * __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
>   to find a free page and invokes the OOM killer.
> * No progress can be made.
> 
> Although it is now hard to trigger during hibernation due to the
> memory shrinking carried out by the hibernation code, it is
> theoretically possible to trigger during suspend after the memory
> shrinking has been removed from that code path.  Moreover, since
> memory allocations are going to be used for the hibernation memory
> shrinking, it will be even more likely to happen during hibernation.
> 
> To prevent it from happening, introduce the oom_killer_disabled
> switch that will cause __alloc_pages_internal() to fail in the
> situations in which the OOM killer would have been called and make
> the freezer set this switch after tasks have been successfully
> frozen.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Acked-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13  8:37         ` Rafael J. Wysocki
  (?)
@ 2009-05-13  9:19         ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-13  9:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

On Wed 2009-05-13 10:37:49, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Currently, the following scenario appears to be possible in theory:
> 
> * Tasks are frozen for hibernation or suspend.
> * Free pages are almost exhausted.
> * Certain piece of code in the suspend code path attempts to allocate
>   some memory using GFP_KERNEL and allocation order less than or
>   equal to PAGE_ALLOC_COSTLY_ORDER.
> * __alloc_pages_internal() cannot find a free page so it invokes the
>   OOM killer.
> * The OOM killer attempts to kill a task, but the task is frozen, so
>   it doesn't die immediately.
> * __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
>   to find a free page and invokes the OOM killer.
> * No progress can be made.
> 
> Although it is now hard to trigger during hibernation due to the
> memory shrinking carried out by the hibernation code, it is
> theoretically possible to trigger during suspend after the memory
> shrinking has been removed from that code path.  Moreover, since
> memory allocations are going to be used for the hibernation memory
> shrinking, it will be even more likely to happen during hibernation.
> 
> To prevent it from happening, introduce the oom_killer_disabled
> switch that will cause __alloc_pages_internal() to fail in the
> situations in which the OOM killer would have been called and make
> the freezer set this switch after tasks have been successfully
> frozen.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Acked-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
@ 2009-05-13  9:19           ` Pavel Machek
  0 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-13  9:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Wed 2009-05-13 10:37:49, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Currently, the following scenario appears to be possible in theory:
> 
> * Tasks are frozen for hibernation or suspend.
> * Free pages are almost exhausted.
> * Certain piece of code in the suspend code path attempts to allocate
>   some memory using GFP_KERNEL and allocation order less than or
>   equal to PAGE_ALLOC_COSTLY_ORDER.
> * __alloc_pages_internal() cannot find a free page so it invokes the
>   OOM killer.
> * The OOM killer attempts to kill a task, but the task is frozen, so
>   it doesn't die immediately.
> * __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
>   to find a free page and invokes the OOM killer.
> * No progress can be made.
> 
> Although it is now hard to trigger during hibernation due to the
> memory shrinking carried out by the hibernation code, it is
> theoretically possible to trigger during suspend after the memory
> shrinking has been removed from that code path.  Moreover, since
> memory allocations are going to be used for the hibernation memory
> shrinking, it will be even more likely to happen during hibernation.
> 
> To prevent it from happening, introduce the oom_killer_disabled
> switch that will cause __alloc_pages_internal() to fail in the
> situations in which the OOM killer would have been called and make
> the freezer set this switch after tasks have been successfully
> frozen.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Acked-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13  8:39         ` Rafael J. Wysocki
@ 2009-05-13 19:34           ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 19:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wed, 13 May 2009 10:39:25 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> just once to make some room for the image and then allocates memory
> to apply more pressure to the memory management subsystem, if
> necessary.
> 
> Unfortunately, we don't seem to be able to drop shrink_all_memory()
> entirely just yet, because that would lead to huge performance
> regressions in some test cases.
> 

Isn't this a somewhat large problem?  The main point (I thought) was
to remove shrink_all_memory().  Instead, we're retaining it and adding
even more stuff?

> +/**
> + * compute_fraction - Compute approximate fraction x * (a/b)
> + * @x: Number to multiply.
> + * @numerator: Numerator of the fraction (a).
> + * @denominator: Denominator of the fraction (b).
>   *
> - *	Notice: all userland should be stopped before it is called, or
> - *	livelock is possible.
> + * Compute an approximate value of the expression x * (a/b), where a is less
> + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> + * maximum unsigned long.
>   */
> +static unsigned long compute_fraction(
> +	unsigned long x, unsigned long numerator, unsigned long denominator)

I can't say I'm a great fan of the code layout here.

static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)

or

static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
					unsigned long denominator)

would be more typical.


> +{
> +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
>  
> -#define SHRINK_BITE	10000
> -static inline unsigned long __shrink_memory(long tmp)
> +	x *= ratio;
> +	return x >> FRACTION_SHIFT;
> +}

Strange function.  Would it not be simpler/clearer to do it with 64-bit
scalars, multiplication and do_div()?

> +static unsigned long highmem_size(
> +	unsigned long size, unsigned long highmem, unsigned long count)
> +{
> +	return highmem > count / 2 ?
> +			compute_fraction(size, highmem, count) :
> +			size - compute_fraction(size, count - highmem, count);
> +}

This would be considerably easier to follow if we know what the three
arguments represent.  Amount of memory?  In what units?  `count' of
what?

The `count/2' thing there is quite mysterious.

<does some reverse-engineering>

OK, `count' is "the number of pageframes we can use".  (I don't think I
helped myself a lot there).  But what's up with that divde-by-two?

<considers poking at callers to work out what `size' is>

<gives up>

Is this code as clear as we can possibly make it??

> +#else
> +static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
> +{
> +	return 0;
> +}
> +
> +static inline unsigned long highmem_size(
> +	unsigned long size, unsigned long highmem, unsigned long count)
>  {
> -	if (tmp > SHRINK_BITE)
> -		tmp = SHRINK_BITE;
> -	return shrink_all_memory(tmp);
> +	return 0;
>  }
> +#endif /* CONFIG_HIGHMEM */
>  
> +/**
> + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + *
> + * To create a hibernation image it is necessary to make a copy of every page
> + * frame in use.  We also need a number of page frames to be free during
> + * hibernation for allocations made while saving the image and for device
> + * drivers, in case they need to allocate memory from their hibernation
> + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> + * respectively, both of which are rough estimates).  To make this happen, we
> + * compute the total number of available page frames and allocate at least
> + *
> + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> + *
> + * of them, which corresponds to the maximum size of a hibernation image.
> + *
> + * If image_size is set below the number following from the above formula,
> + * the preallocation of memory is continued until the total number of saveable
> + * pages in the system is below the requested image size or it is impossible to
> + * allocate more memory, whichever happens first.
> + */

OK, that helps.



^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13  8:39         ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-13 19:34         ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 19:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, linux-mm, rientjes, linux-pm, fengguang.wu

On Wed, 13 May 2009 10:39:25 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> just once to make some room for the image and then allocates memory
> to apply more pressure to the memory management subsystem, if
> necessary.
> 
> Unfortunately, we don't seem to be able to drop shrink_all_memory()
> entirely just yet, because that would lead to huge performance
> regressions in some test cases.
> 

Isn't this a somewhat large problem?  The main point (I thought) was
to remove shrink_all_memory().  Instead, we're retaining it and adding
even more stuff?

> +/**
> + * compute_fraction - Compute approximate fraction x * (a/b)
> + * @x: Number to multiply.
> + * @numerator: Numerator of the fraction (a).
> + * @denominator: Denominator of the fraction (b).
>   *
> - *	Notice: all userland should be stopped before it is called, or
> - *	livelock is possible.
> + * Compute an approximate value of the expression x * (a/b), where a is less
> + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> + * maximum unsigned long.
>   */
> +static unsigned long compute_fraction(
> +	unsigned long x, unsigned long numerator, unsigned long denominator)

I can't say I'm a great fan of the code layout here.

static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)

or

static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
					unsigned long denominator)

would be more typical.


> +{
> +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
>  
> -#define SHRINK_BITE	10000
> -static inline unsigned long __shrink_memory(long tmp)
> +	x *= ratio;
> +	return x >> FRACTION_SHIFT;
> +}

Strange function.  Would it not be simpler/clearer to do it with 64-bit
scalars, multiplication and do_div()?

> +static unsigned long highmem_size(
> +	unsigned long size, unsigned long highmem, unsigned long count)
> +{
> +	return highmem > count / 2 ?
> +			compute_fraction(size, highmem, count) :
> +			size - compute_fraction(size, count - highmem, count);
> +}

This would be considerably easier to follow if we know what the three
arguments represent.  Amount of memory?  In what units?  `count' of
what?

The `count/2' thing there is quite mysterious.

<does some reverse-engineering>

OK, `count' is "the number of pageframes we can use".  (I don't think I
helped myself a lot there).  But what's up with that divde-by-two?

<considers poking at callers to work out what `size' is>

<gives up>

Is this code as clear as we can possibly make it??

> +#else
> +static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
> +{
> +	return 0;
> +}
> +
> +static inline unsigned long highmem_size(
> +	unsigned long size, unsigned long highmem, unsigned long count)
>  {
> -	if (tmp > SHRINK_BITE)
> -		tmp = SHRINK_BITE;
> -	return shrink_all_memory(tmp);
> +	return 0;
>  }
> +#endif /* CONFIG_HIGHMEM */
>  
> +/**
> + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + *
> + * To create a hibernation image it is necessary to make a copy of every page
> + * frame in use.  We also need a number of page frames to be free during
> + * hibernation for allocations made while saving the image and for device
> + * drivers, in case they need to allocate memory from their hibernation
> + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> + * respectively, both of which are rough estimates).  To make this happen, we
> + * compute the total number of available page frames and allocate at least
> + *
> + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> + *
> + * of them, which corresponds to the maximum size of a hibernation image.
> + *
> + * If image_size is set below the number following from the above formula,
> + * the preallocation of memory is continued until the total number of saveable
> + * pages in the system is below the requested image size or it is impossible to
> + * allocate more memory, whichever happens first.
> + */

OK, that helps.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-13 19:34           ` Andrew Morton
  0 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 19:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wed, 13 May 2009 10:39:25 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> just once to make some room for the image and then allocates memory
> to apply more pressure to the memory management subsystem, if
> necessary.
> 
> Unfortunately, we don't seem to be able to drop shrink_all_memory()
> entirely just yet, because that would lead to huge performance
> regressions in some test cases.
> 

Isn't this a somewhat large problem?  The main point (I thought) was
to remove shrink_all_memory().  Instead, we're retaining it and adding
even more stuff?

> +/**
> + * compute_fraction - Compute approximate fraction x * (a/b)
> + * @x: Number to multiply.
> + * @numerator: Numerator of the fraction (a).
> + * @denominator: Denominator of the fraction (b).
>   *
> - *	Notice: all userland should be stopped before it is called, or
> - *	livelock is possible.
> + * Compute an approximate value of the expression x * (a/b), where a is less
> + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> + * maximum unsigned long.
>   */
> +static unsigned long compute_fraction(
> +	unsigned long x, unsigned long numerator, unsigned long denominator)

I can't say I'm a great fan of the code layout here.

static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)

or

static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
					unsigned long denominator)

would be more typical.


> +{
> +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
>  
> -#define SHRINK_BITE	10000
> -static inline unsigned long __shrink_memory(long tmp)
> +	x *= ratio;
> +	return x >> FRACTION_SHIFT;
> +}

Strange function.  Would it not be simpler/clearer to do it with 64-bit
scalars, multiplication and do_div()?

> +static unsigned long highmem_size(
> +	unsigned long size, unsigned long highmem, unsigned long count)
> +{
> +	return highmem > count / 2 ?
> +			compute_fraction(size, highmem, count) :
> +			size - compute_fraction(size, count - highmem, count);
> +}

This would be considerably easier to follow if we know what the three
arguments represent.  Amount of memory?  In what units?  `count' of
what?

The `count/2' thing there is quite mysterious.

<does some reverse-engineering>

OK, `count' is "the number of pageframes we can use".  (I don't think I
helped myself a lot there).  But what's up with that divde-by-two?

<considers poking at callers to work out what `size' is>

<gives up>

Is this code as clear as we can possibly make it??

> +#else
> +static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
> +{
> +	return 0;
> +}
> +
> +static inline unsigned long highmem_size(
> +	unsigned long size, unsigned long highmem, unsigned long count)
>  {
> -	if (tmp > SHRINK_BITE)
> -		tmp = SHRINK_BITE;
> -	return shrink_all_memory(tmp);
> +	return 0;
>  }
> +#endif /* CONFIG_HIGHMEM */
>  
> +/**
> + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + *
> + * To create a hibernation image it is necessary to make a copy of every page
> + * frame in use.  We also need a number of page frames to be free during
> + * hibernation for allocations made while saving the image and for device
> + * drivers, in case they need to allocate memory from their hibernation
> + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> + * respectively, both of which are rough estimates).  To make this happen, we
> + * compute the total number of available page frames and allocate at least
> + *
> + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> + *
> + * of them, which corresponds to the maximum size of a hibernation image.
> + *
> + * If image_size is set below the number following from the above formula,
> + * the preallocation of memory is continued until the total number of saveable
> + * pages in the system is below the requested image size or it is impossible to
> + * allocate more memory, whichever happens first.
> + */

OK, that helps.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 19:34           ` Andrew Morton
@ 2009-05-13 20:55             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13 20:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wednesday 13 May 2009, Andrew Morton wrote:
> On Wed, 13 May 2009 10:39:25 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > just once to make some room for the image and then allocates memory
> > to apply more pressure to the memory management subsystem, if
> > necessary.
> > 
> > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > entirely just yet, because that would lead to huge performance
> > regressions in some test cases.
> > 
> 
> Isn't this a somewhat large problem?

Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
shrink_all_memory() in at least one test case.  100% reproducible.

> The main point (I thought) was to remove shrink_all_memory().  Instead,
> we're retaining it and adding even more stuff?

The idea is that afterwards we can drop shrink_all_memory() once the
performance problem has been resolved.  Also, we now allocate memory for the
image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
devices.  I'd think that's an improvement?

> > +/**
> > + * compute_fraction - Compute approximate fraction x * (a/b)
> > + * @x: Number to multiply.
> > + * @numerator: Numerator of the fraction (a).
> > + * @denominator: Denominator of the fraction (b).
> >   *
> > - *	Notice: all userland should be stopped before it is called, or
> > - *	livelock is possible.
> > + * Compute an approximate value of the expression x * (a/b), where a is less
> > + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> > + * maximum unsigned long.
> >   */
> > +static unsigned long compute_fraction(
> > +	unsigned long x, unsigned long numerator, unsigned long denominator)
> 
> I can't say I'm a great fan of the code layout here.
> 
> static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)
> 
> or
> 
> static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
> 					unsigned long denominator)
> 
> would be more typical.

OK
 
> > +{
> > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> >  
> > -#define SHRINK_BITE	10000
> > -static inline unsigned long __shrink_memory(long tmp)
> > +	x *= ratio;
> > +	return x >> FRACTION_SHIFT;
> > +}
> 
> Strange function.  Would it not be simpler/clearer to do it with 64-bit
> scalars, multiplication and do_div()?

Sure, I can do it this way too.  Is it fine to use u64 for this purpose?

> > +static unsigned long highmem_size(
> > +	unsigned long size, unsigned long highmem, unsigned long count)
> > +{
> > +	return highmem > count / 2 ?
> > +			compute_fraction(size, highmem, count) :
> > +			size - compute_fraction(size, count - highmem, count);
> > +}
> 
> This would be considerably easier to follow if we know what the three
> arguments represent.  Amount of memory?  In what units?  `count' of
> what?
> 
> The `count/2' thing there is quite mysterious.
> 
> <does some reverse-engineering>
> 
> OK, `count' is "the number of pageframes we can use".  (I don't think I
> helped myself a lot there).  But what's up with that divde-by-two?
> 
> <considers poking at callers to work out what `size' is>
> 
> <gives up>
> 
> Is this code as clear as we can possibly make it??

Heh

OK, I'll do my best to clean it up.

> > +#else
> > +static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline unsigned long highmem_size(
> > +	unsigned long size, unsigned long highmem, unsigned long count)
> >  {
> > -	if (tmp > SHRINK_BITE)
> > -		tmp = SHRINK_BITE;
> > -	return shrink_all_memory(tmp);
> > +	return 0;
> >  }
> > +#endif /* CONFIG_HIGHMEM */
> >  
> > +/**
> > + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> > + *
> > + * To create a hibernation image it is necessary to make a copy of every page
> > + * frame in use.  We also need a number of page frames to be free during
> > + * hibernation for allocations made while saving the image and for device
> > + * drivers, in case they need to allocate memory from their hibernation
> > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> > + * respectively, both of which are rough estimates).  To make this happen, we
> > + * compute the total number of available page frames and allocate at least
> > + *
> > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> > + *
> > + * of them, which corresponds to the maximum size of a hibernation image.
> > + *
> > + * If image_size is set below the number following from the above formula,
> > + * the preallocation of memory is continued until the total number of saveable
> > + * pages in the system is below the requested image size or it is impossible to
> > + * allocate more memory, whichever happens first.
> > + */
> 
> OK, that helps.

Great!

Thanks for the comments. :-)

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 19:34           ` Andrew Morton
  (?)
@ 2009-05-13 20:55           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13 20:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, rientjes, linux-pm, fengguang.wu

On Wednesday 13 May 2009, Andrew Morton wrote:
> On Wed, 13 May 2009 10:39:25 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > just once to make some room for the image and then allocates memory
> > to apply more pressure to the memory management subsystem, if
> > necessary.
> > 
> > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > entirely just yet, because that would lead to huge performance
> > regressions in some test cases.
> > 
> 
> Isn't this a somewhat large problem?

Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
shrink_all_memory() in at least one test case.  100% reproducible.

> The main point (I thought) was to remove shrink_all_memory().  Instead,
> we're retaining it and adding even more stuff?

The idea is that afterwards we can drop shrink_all_memory() once the
performance problem has been resolved.  Also, we now allocate memory for the
image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
devices.  I'd think that's an improvement?

> > +/**
> > + * compute_fraction - Compute approximate fraction x * (a/b)
> > + * @x: Number to multiply.
> > + * @numerator: Numerator of the fraction (a).
> > + * @denominator: Denominator of the fraction (b).
> >   *
> > - *	Notice: all userland should be stopped before it is called, or
> > - *	livelock is possible.
> > + * Compute an approximate value of the expression x * (a/b), where a is less
> > + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> > + * maximum unsigned long.
> >   */
> > +static unsigned long compute_fraction(
> > +	unsigned long x, unsigned long numerator, unsigned long denominator)
> 
> I can't say I'm a great fan of the code layout here.
> 
> static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)
> 
> or
> 
> static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
> 					unsigned long denominator)
> 
> would be more typical.

OK
 
> > +{
> > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> >  
> > -#define SHRINK_BITE	10000
> > -static inline unsigned long __shrink_memory(long tmp)
> > +	x *= ratio;
> > +	return x >> FRACTION_SHIFT;
> > +}
> 
> Strange function.  Would it not be simpler/clearer to do it with 64-bit
> scalars, multiplication and do_div()?

Sure, I can do it this way too.  Is it fine to use u64 for this purpose?

> > +static unsigned long highmem_size(
> > +	unsigned long size, unsigned long highmem, unsigned long count)
> > +{
> > +	return highmem > count / 2 ?
> > +			compute_fraction(size, highmem, count) :
> > +			size - compute_fraction(size, count - highmem, count);
> > +}
> 
> This would be considerably easier to follow if we know what the three
> arguments represent.  Amount of memory?  In what units?  `count' of
> what?
> 
> The `count/2' thing there is quite mysterious.
> 
> <does some reverse-engineering>
> 
> OK, `count' is "the number of pageframes we can use".  (I don't think I
> helped myself a lot there).  But what's up with that divde-by-two?
> 
> <considers poking at callers to work out what `size' is>
> 
> <gives up>
> 
> Is this code as clear as we can possibly make it??

Heh

OK, I'll do my best to clean it up.

> > +#else
> > +static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline unsigned long highmem_size(
> > +	unsigned long size, unsigned long highmem, unsigned long count)
> >  {
> > -	if (tmp > SHRINK_BITE)
> > -		tmp = SHRINK_BITE;
> > -	return shrink_all_memory(tmp);
> > +	return 0;
> >  }
> > +#endif /* CONFIG_HIGHMEM */
> >  
> > +/**
> > + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> > + *
> > + * To create a hibernation image it is necessary to make a copy of every page
> > + * frame in use.  We also need a number of page frames to be free during
> > + * hibernation for allocations made while saving the image and for device
> > + * drivers, in case they need to allocate memory from their hibernation
> > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> > + * respectively, both of which are rough estimates).  To make this happen, we
> > + * compute the total number of available page frames and allocate at least
> > + *
> > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> > + *
> > + * of them, which corresponds to the maximum size of a hibernation image.
> > + *
> > + * If image_size is set below the number following from the above formula,
> > + * the preallocation of memory is continued until the total number of saveable
> > + * pages in the system is below the requested image size or it is impossible to
> > + * allocate more memory, whichever happens first.
> > + */
> 
> OK, that helps.

Great!

Thanks for the comments. :-)

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-13 20:55             ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13 20:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wednesday 13 May 2009, Andrew Morton wrote:
> On Wed, 13 May 2009 10:39:25 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > just once to make some room for the image and then allocates memory
> > to apply more pressure to the memory management subsystem, if
> > necessary.
> > 
> > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > entirely just yet, because that would lead to huge performance
> > regressions in some test cases.
> > 
> 
> Isn't this a somewhat large problem?

Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
shrink_all_memory() in at least one test case.  100% reproducible.

> The main point (I thought) was to remove shrink_all_memory().  Instead,
> we're retaining it and adding even more stuff?

The idea is that afterwards we can drop shrink_all_memory() once the
performance problem has been resolved.  Also, we now allocate memory for the
image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
devices.  I'd think that's an improvement?

> > +/**
> > + * compute_fraction - Compute approximate fraction x * (a/b)
> > + * @x: Number to multiply.
> > + * @numerator: Numerator of the fraction (a).
> > + * @denominator: Denominator of the fraction (b).
> >   *
> > - *	Notice: all userland should be stopped before it is called, or
> > - *	livelock is possible.
> > + * Compute an approximate value of the expression x * (a/b), where a is less
> > + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> > + * maximum unsigned long.
> >   */
> > +static unsigned long compute_fraction(
> > +	unsigned long x, unsigned long numerator, unsigned long denominator)
> 
> I can't say I'm a great fan of the code layout here.
> 
> static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)
> 
> or
> 
> static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
> 					unsigned long denominator)
> 
> would be more typical.

OK
 
> > +{
> > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> >  
> > -#define SHRINK_BITE	10000
> > -static inline unsigned long __shrink_memory(long tmp)
> > +	x *= ratio;
> > +	return x >> FRACTION_SHIFT;
> > +}
> 
> Strange function.  Would it not be simpler/clearer to do it with 64-bit
> scalars, multiplication and do_div()?

Sure, I can do it this way too.  Is it fine to use u64 for this purpose?

> > +static unsigned long highmem_size(
> > +	unsigned long size, unsigned long highmem, unsigned long count)
> > +{
> > +	return highmem > count / 2 ?
> > +			compute_fraction(size, highmem, count) :
> > +			size - compute_fraction(size, count - highmem, count);
> > +}
> 
> This would be considerably easier to follow if we know what the three
> arguments represent.  Amount of memory?  In what units?  `count' of
> what?
> 
> The `count/2' thing there is quite mysterious.
> 
> <does some reverse-engineering>
> 
> OK, `count' is "the number of pageframes we can use".  (I don't think I
> helped myself a lot there).  But what's up with that divde-by-two?
> 
> <considers poking at callers to work out what `size' is>
> 
> <gives up>
> 
> Is this code as clear as we can possibly make it??

Heh

OK, I'll do my best to clean it up.

> > +#else
> > +static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline unsigned long highmem_size(
> > +	unsigned long size, unsigned long highmem, unsigned long count)
> >  {
> > -	if (tmp > SHRINK_BITE)
> > -		tmp = SHRINK_BITE;
> > -	return shrink_all_memory(tmp);
> > +	return 0;
> >  }
> > +#endif /* CONFIG_HIGHMEM */
> >  
> > +/**
> > + * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> > + *
> > + * To create a hibernation image it is necessary to make a copy of every page
> > + * frame in use.  We also need a number of page frames to be free during
> > + * hibernation for allocations made while saving the image and for device
> > + * drivers, in case they need to allocate memory from their hibernation
> > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> > + * respectively, both of which are rough estimates).  To make this happen, we
> > + * compute the total number of available page frames and allocate at least
> > + *
> > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> > + *
> > + * of them, which corresponds to the maximum size of a hibernation image.
> > + *
> > + * If image_size is set below the number following from the above formula,
> > + * the preallocation of memory is continued until the total number of saveable
> > + * pages in the system is below the requested image size or it is impossible to
> > + * allocate more memory, whichever happens first.
> > + */
> 
> OK, that helps.

Great!

Thanks for the comments. :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 20:55             ` Rafael J. Wysocki
@ 2009-05-13 21:16               ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 21:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wed, 13 May 2009 22:55:03 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Wednesday 13 May 2009, Andrew Morton wrote:
> > On Wed, 13 May 2009 10:39:25 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > > 
> > 
> > Isn't this a somewhat large problem?
> 
> Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> shrink_all_memory() in at least one test case.  100% reproducible.

erk.  Any ideas why?  A quick peek at a kernel profile and perhaps
the before-and-after delta in the /proc/vmstat numbers would probably
guide us there.

> > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > we're retaining it and adding even more stuff?
> 
> The idea is that afterwards we can drop shrink_all_memory() once the
> performance problem has been resolved.  Also, we now allocate memory for the
> image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> devices.  I'd think that's an improvement?

Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
could be embarrassing if the devices are frozen.  GFP_NOIO sounds
appropriate.  

> > > +{
> > > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> > >  
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +	x *= ratio;
> > > +	return x >> FRACTION_SHIFT;
> > > +}
> > 
> > Strange function.  Would it not be simpler/clearer to do it with 64-bit
> > scalars, multiplication and do_div()?
> 
> Sure, I can do it this way too.  Is it fine to use u64 for this purpose?

I suppose so.  All/most of the implementations of do_div() are done as
macros so it's pretty hard to work out what the types are.  But
do_div() does expect a u64 rather than `unsigned long long'.



^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 20:55             ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-13 21:16             ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 21:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, linux-mm, rientjes, linux-pm, fengguang.wu

On Wed, 13 May 2009 22:55:03 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Wednesday 13 May 2009, Andrew Morton wrote:
> > On Wed, 13 May 2009 10:39:25 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > > 
> > 
> > Isn't this a somewhat large problem?
> 
> Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> shrink_all_memory() in at least one test case.  100% reproducible.

erk.  Any ideas why?  A quick peek at a kernel profile and perhaps
the before-and-after delta in the /proc/vmstat numbers would probably
guide us there.

> > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > we're retaining it and adding even more stuff?
> 
> The idea is that afterwards we can drop shrink_all_memory() once the
> performance problem has been resolved.  Also, we now allocate memory for the
> image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> devices.  I'd think that's an improvement?

Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
could be embarrassing if the devices are frozen.  GFP_NOIO sounds
appropriate.  

> > > +{
> > > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> > >  
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +	x *= ratio;
> > > +	return x >> FRACTION_SHIFT;
> > > +}
> > 
> > Strange function.  Would it not be simpler/clearer to do it with 64-bit
> > scalars, multiplication and do_div()?
> 
> Sure, I can do it this way too.  Is it fine to use u64 for this purpose?

I suppose so.  All/most of the implementations of do_div() are done as
macros so it's pretty hard to work out what the types are.  But
do_div() does expect a u64 rather than `unsigned long long'.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-13 21:16               ` Andrew Morton
  0 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 21:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wed, 13 May 2009 22:55:03 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Wednesday 13 May 2009, Andrew Morton wrote:
> > On Wed, 13 May 2009 10:39:25 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > > 
> > 
> > Isn't this a somewhat large problem?
> 
> Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> shrink_all_memory() in at least one test case.  100% reproducible.

erk.  Any ideas why?  A quick peek at a kernel profile and perhaps
the before-and-after delta in the /proc/vmstat numbers would probably
guide us there.

> > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > we're retaining it and adding even more stuff?
> 
> The idea is that afterwards we can drop shrink_all_memory() once the
> performance problem has been resolved.  Also, we now allocate memory for the
> image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> devices.  I'd think that's an improvement?

Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
could be embarrassing if the devices are frozen.  GFP_NOIO sounds
appropriate.  

> > > +{
> > > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> > >  
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +	x *= ratio;
> > > +	return x >> FRACTION_SHIFT;
> > > +}
> > 
> > Strange function.  Would it not be simpler/clearer to do it with 64-bit
> > scalars, multiplication and do_div()?
> 
> Sure, I can do it this way too.  Is it fine to use u64 for this purpose?

I suppose so.  All/most of the implementations of do_div() are done as
macros so it's pretty hard to work out what the types are.  But
do_div() does expect a u64 rather than `unsigned long long'.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 21:16               ` Andrew Morton
@ 2009-05-13 21:56                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13 21:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wednesday 13 May 2009, Andrew Morton wrote:
> On Wed, 13 May 2009 22:55:03 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Wednesday 13 May 2009, Andrew Morton wrote:
> > > On Wed, 13 May 2009 10:39:25 +0200
> > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > 
> > > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > > 
> > > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > > just once to make some room for the image and then allocates memory
> > > > to apply more pressure to the memory management subsystem, if
> > > > necessary.
> > > > 
> > > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > > entirely just yet, because that would lead to huge performance
> > > > regressions in some test cases.
> > > > 
> > > 
> > > Isn't this a somewhat large problem?
> > 
> > Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> > shrink_all_memory() in at least one test case.  100% reproducible.
> 
> erk.  Any ideas why?

The swapping out things appears to be too slow.  Actually, no wonder, as it is
done one page at a time, while it looks like shrink_all_memory() appears to
make them swap out in big chunks.

> A quick peek at a kernel profile and perhaps the before-and-after delta in
> the /proc/vmstat numbers would probably guide us there.

I'm planning to do some investigation on that later.

> > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > we're retaining it and adding even more stuff?
> > 
> > The idea is that afterwards we can drop shrink_all_memory() once the
> > performance problem has been resolved.  Also, we now allocate memory for the
> > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > devices.  I'd think that's an improvement?
> 
> Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> could be embarrassing if the devices are frozen.

They aren't, because the preallocation is done upfront, so once the OOM killer
has been taken care of, it's totally safe. :-)

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 21:16               ` Andrew Morton
  (?)
@ 2009-05-13 21:56               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13 21:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, rientjes, linux-pm, fengguang.wu

On Wednesday 13 May 2009, Andrew Morton wrote:
> On Wed, 13 May 2009 22:55:03 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Wednesday 13 May 2009, Andrew Morton wrote:
> > > On Wed, 13 May 2009 10:39:25 +0200
> > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > 
> > > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > > 
> > > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > > just once to make some room for the image and then allocates memory
> > > > to apply more pressure to the memory management subsystem, if
> > > > necessary.
> > > > 
> > > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > > entirely just yet, because that would lead to huge performance
> > > > regressions in some test cases.
> > > > 
> > > 
> > > Isn't this a somewhat large problem?
> > 
> > Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> > shrink_all_memory() in at least one test case.  100% reproducible.
> 
> erk.  Any ideas why?

The swapping out things appears to be too slow.  Actually, no wonder, as it is
done one page at a time, while it looks like shrink_all_memory() appears to
make them swap out in big chunks.

> A quick peek at a kernel profile and perhaps the before-and-after delta in
> the /proc/vmstat numbers would probably guide us there.

I'm planning to do some investigation on that later.

> > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > we're retaining it and adding even more stuff?
> > 
> > The idea is that afterwards we can drop shrink_all_memory() once the
> > performance problem has been resolved.  Also, we now allocate memory for the
> > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > devices.  I'd think that's an improvement?
> 
> Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> could be embarrassing if the devices are frozen.

They aren't, because the preallocation is done upfront, so once the OOM killer
has been taken care of, it's totally safe. :-)

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-13 21:56                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-13 21:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wednesday 13 May 2009, Andrew Morton wrote:
> On Wed, 13 May 2009 22:55:03 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Wednesday 13 May 2009, Andrew Morton wrote:
> > > On Wed, 13 May 2009 10:39:25 +0200
> > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > 
> > > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > > 
> > > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > > just once to make some room for the image and then allocates memory
> > > > to apply more pressure to the memory management subsystem, if
> > > > necessary.
> > > > 
> > > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > > entirely just yet, because that would lead to huge performance
> > > > regressions in some test cases.
> > > > 
> > > 
> > > Isn't this a somewhat large problem?
> > 
> > Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> > shrink_all_memory() in at least one test case.  100% reproducible.
> 
> erk.  Any ideas why?

The swapping out things appears to be too slow.  Actually, no wonder, as it is
done one page at a time, while it looks like shrink_all_memory() appears to
make them swap out in big chunks.

> A quick peek at a kernel profile and perhaps the before-and-after delta in
> the /proc/vmstat numbers would probably guide us there.

I'm planning to do some investigation on that later.

> > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > we're retaining it and adding even more stuff?
> > 
> > The idea is that afterwards we can drop shrink_all_memory() once the
> > performance problem has been resolved.  Also, we now allocate memory for the
> > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > devices.  I'd think that's an improvement?
> 
> Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> could be embarrassing if the devices are frozen.

They aren't, because the preallocation is done upfront, so once the OOM killer
has been taken care of, it's totally safe. :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13  8:37         ` Rafael J. Wysocki
@ 2009-05-13 22:35           ` David Rientjes
  -1 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-13 22:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, linux-mm

On Wed, 13 May 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
>  					PB_migrate, PB_migrate_end);
>  }
>  
> +bool oom_killer_disabled __read_mostly;
> +
>  #ifdef CONFIG_DEBUG_VM
>  static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
>  {
> @@ -1600,6 +1602,9 @@ nofail_alloc:
>  		if (page)
>  			goto got_pg;
>  	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> +		if (oom_killer_disabled)
> +			goto nopage;
> +
>  		if (!try_set_zone_oom(zonelist, gfp_mask)) {
>  			schedule_timeout_uninterruptible(1);
>  			goto restart;

This allows __GFP_NOFAIL allocations to fail.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13  8:37         ` Rafael J. Wysocki
                           ` (2 preceding siblings ...)
  (?)
@ 2009-05-13 22:35         ` David Rientjes
  -1 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-13 22:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, pm list, Wu Fengguang, Andrew Morton

On Wed, 13 May 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
>  					PB_migrate, PB_migrate_end);
>  }
>  
> +bool oom_killer_disabled __read_mostly;
> +
>  #ifdef CONFIG_DEBUG_VM
>  static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
>  {
> @@ -1600,6 +1602,9 @@ nofail_alloc:
>  		if (page)
>  			goto got_pg;
>  	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> +		if (oom_killer_disabled)
> +			goto nopage;
> +
>  		if (!try_set_zone_oom(zonelist, gfp_mask)) {
>  			schedule_timeout_uninterruptible(1);
>  			goto restart;

This allows __GFP_NOFAIL allocations to fail.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
@ 2009-05-13 22:35           ` David Rientjes
  0 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-13 22:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Pavel Machek,
	Nigel Cunningham, linux-mm

On Wed, 13 May 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
>  					PB_migrate, PB_migrate_end);
>  }
>  
> +bool oom_killer_disabled __read_mostly;
> +
>  #ifdef CONFIG_DEBUG_VM
>  static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
>  {
> @@ -1600,6 +1602,9 @@ nofail_alloc:
>  		if (page)
>  			goto got_pg;
>  	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> +		if (oom_killer_disabled)
> +			goto nopage;
> +
>  		if (!try_set_zone_oom(zonelist, gfp_mask)) {
>  			schedule_timeout_uninterruptible(1);
>  			goto restart;

This allows __GFP_NOFAIL allocations to fail.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13 22:35           ` David Rientjes
@ 2009-05-13 22:47             ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 22:47 UTC (permalink / raw)
  To: David Rientjes
  Cc: rjw, linux-pm, fengguang.wu, linux-kernel, pavel, nigel, linux-mm

On Wed, 13 May 2009 15:35:32 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 13 May 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
> >  					PB_migrate, PB_migrate_end);
> >  }
> >  
> > +bool oom_killer_disabled __read_mostly;
> > +
> >  #ifdef CONFIG_DEBUG_VM
> >  static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
> >  {
> > @@ -1600,6 +1602,9 @@ nofail_alloc:
> >  		if (page)
> >  			goto got_pg;
> >  	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> > +		if (oom_killer_disabled)
> > +			goto nopage;
> > +
> >  		if (!try_set_zone_oom(zonelist, gfp_mask)) {
> >  			schedule_timeout_uninterruptible(1);
> >  			goto restart;
> 
> This allows __GFP_NOFAIL allocations to fail.

I think that's OK - oom_killer_disable() and __GFP_NOFAIL are
fundamentally incompatible, and __GFP_NOFAIL is a crock.


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13 22:35           ` David Rientjes
  (?)
  (?)
@ 2009-05-13 22:47           ` Andrew Morton
  -1 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 22:47 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-kernel, linux-mm, linux-pm, fengguang.wu

On Wed, 13 May 2009 15:35:32 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 13 May 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
> >  					PB_migrate, PB_migrate_end);
> >  }
> >  
> > +bool oom_killer_disabled __read_mostly;
> > +
> >  #ifdef CONFIG_DEBUG_VM
> >  static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
> >  {
> > @@ -1600,6 +1602,9 @@ nofail_alloc:
> >  		if (page)
> >  			goto got_pg;
> >  	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> > +		if (oom_killer_disabled)
> > +			goto nopage;
> > +
> >  		if (!try_set_zone_oom(zonelist, gfp_mask)) {
> >  			schedule_timeout_uninterruptible(1);
> >  			goto restart;
> 
> This allows __GFP_NOFAIL allocations to fail.

I think that's OK - oom_killer_disable() and __GFP_NOFAIL are
fundamentally incompatible, and __GFP_NOFAIL is a crock.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
@ 2009-05-13 22:47             ` Andrew Morton
  0 siblings, 0 replies; 205+ messages in thread
From: Andrew Morton @ 2009-05-13 22:47 UTC (permalink / raw)
  To: David Rientjes
  Cc: rjw, linux-pm, fengguang.wu, linux-kernel, pavel, nigel, linux-mm

On Wed, 13 May 2009 15:35:32 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 13 May 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -175,6 +175,8 @@ static void set_pageblock_migratetype(st
> >  					PB_migrate, PB_migrate_end);
> >  }
> >  
> > +bool oom_killer_disabled __read_mostly;
> > +
> >  #ifdef CONFIG_DEBUG_VM
> >  static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
> >  {
> > @@ -1600,6 +1602,9 @@ nofail_alloc:
> >  		if (page)
> >  			goto got_pg;
> >  	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> > +		if (oom_killer_disabled)
> > +			goto nopage;
> > +
> >  		if (!try_set_zone_oom(zonelist, gfp_mask)) {
> >  			schedule_timeout_uninterruptible(1);
> >  			goto restart;
> 
> This allows __GFP_NOFAIL allocations to fail.

I think that's OK - oom_killer_disable() and __GFP_NOFAIL are
fundamentally incompatible, and __GFP_NOFAIL is a crock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13 22:47             ` Andrew Morton
@ 2009-05-13 23:01               ` David Rientjes
  -1 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-13 23:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rjw, linux-pm, fengguang.wu, linux-kernel, pavel, nigel, linux-mm

On Wed, 13 May 2009, Andrew Morton wrote:

> > This allows __GFP_NOFAIL allocations to fail.
> 
> I think that's OK - oom_killer_disable() and __GFP_NOFAIL are
> fundamentally incompatible, and __GFP_NOFAIL is a crock.
> 

Ok, so we need some documentation of that or some notification that we're 
allowing an allocation to fail that has been specified to "retry 
infinitely [because] the caller cannot handle allocation failures."

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
  2009-05-13 22:47             ` Andrew Morton
  (?)
  (?)
@ 2009-05-13 23:01             ` David Rientjes
  -1 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-13 23:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, linux-pm, fengguang.wu

On Wed, 13 May 2009, Andrew Morton wrote:

> > This allows __GFP_NOFAIL allocations to fail.
> 
> I think that's OK - oom_killer_disable() and __GFP_NOFAIL are
> fundamentally incompatible, and __GFP_NOFAIL is a crock.
> 

Ok, so we need some documentation of that or some notification that we're 
allowing an allocation to fail that has been specified to "retry 
infinitely [because] the caller cannot handle allocation failures."

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen
@ 2009-05-13 23:01               ` David Rientjes
  0 siblings, 0 replies; 205+ messages in thread
From: David Rientjes @ 2009-05-13 23:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rjw, linux-pm, fengguang.wu, linux-kernel, pavel, nigel, linux-mm

On Wed, 13 May 2009, Andrew Morton wrote:

> > This allows __GFP_NOFAIL allocations to fail.
> 
> I think that's OK - oom_killer_disable() and __GFP_NOFAIL are
> fundamentally incompatible, and __GFP_NOFAIL is a crock.
> 

Ok, so we need some documentation of that or some notification that we're 
allowing an allocation to fail that has been specified to "retry 
infinitely [because] the caller cannot handle allocation failures."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 21:56                 ` Rafael J. Wysocki
@ 2009-05-14  9:40                   ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14  9:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, linux-pm, fengguang.wu, linux-kernel, nigel,
	rientjes, linux-mm


> > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > we're retaining it and adding even more stuff?
> > > 
> > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > performance problem has been resolved.  Also, we now allocate memory for the
> > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > devices.  I'd think that's an improvement?
> > 
> > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > could be embarrassing if the devices are frozen.
> 
> They aren't, because the preallocation is done upfront, so once the OOM killer
> has been taken care of, it's totally safe. :-)

As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
consequences when accounting goes wrong. (New kernel's idea of what is
on disk will differ from what is _really_ on disk.)

If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.

If accounting is wrong, GFP_ATOMIC will fail with NULL, while
GFP_KERNEL will do something bad.

I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 21:56                 ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-14  9:40                 ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14  9:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, linux-mm, rientjes, Andrew Morton, fengguang.wu, linux-pm


> > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > we're retaining it and adding even more stuff?
> > > 
> > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > performance problem has been resolved.  Also, we now allocate memory for the
> > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > devices.  I'd think that's an improvement?
> > 
> > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > could be embarrassing if the devices are frozen.
> 
> They aren't, because the preallocation is done upfront, so once the OOM killer
> has been taken care of, it's totally safe. :-)

As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
consequences when accounting goes wrong. (New kernel's idea of what is
on disk will differ from what is _really_ on disk.)

If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.

If accounting is wrong, GFP_ATOMIC will fail with NULL, while
GFP_KERNEL will do something bad.

I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-14  9:40                   ` Pavel Machek
  0 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14  9:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, linux-pm, fengguang.wu, linux-kernel, nigel,
	rientjes, linux-mm


> > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > we're retaining it and adding even more stuff?
> > > 
> > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > performance problem has been resolved.  Also, we now allocate memory for the
> > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > devices.  I'd think that's an improvement?
> > 
> > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > could be embarrassing if the devices are frozen.
> 
> They aren't, because the preallocation is done upfront, so once the OOM killer
> has been taken care of, it's totally safe. :-)

As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
consequences when accounting goes wrong. (New kernel's idea of what is
on disk will differ from what is _really_ on disk.)

If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.

If accounting is wrong, GFP_ATOMIC will fail with NULL, while
GFP_KERNEL will do something bad.

I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-13  8:40         ` Rafael J. Wysocki
@ 2009-05-14 11:09           ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14 11:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

Hi!

> Since the hibernation code is now going to use allocations of memory
> to make enough room for the image, it can also use the page frames
> allocated at this stage as image page frames.  The low-level
> hibernation code needs to be rearranged for this purpose, but it
> allows us to avoid freeing a great number of pages and allocating
> these same pages once again later, so it generally is worth doing.
> 
> [rev. 2: Take highmem into account correctly.]

I don't get it. What is advantage of this patch? It makes the code
more complex... Is it supposed to be faster?

								Pavel

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/disk.c     |   15 ++-
>  kernel/power/power.h    |    2 
>  kernel/power/snapshot.c |  206 +++++++++++++++++++++++++++++++-----------------
>  3 files changed, 149 insertions(+), 74 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
>  static unsigned int nr_copy_pages;
>  /* Number of pages needed for saving the original pfns of the image pages */
>  static unsigned int nr_meta_pages;
> +/*
> + * Numbers of normal and highmem page frames allocated for hibernation image
> + * before suspending devices.
> + */
> +unsigned int alloc_normal, alloc_highmem;
> +/*
> + * Memory bitmap used for marking saveable pages (during hibernation) or
> + * hibernation image pages (during restore)
> + */
> +static struct memory_bitmap orig_bm;
> +/*
> + * Memory bitmap used during hibernation for marking allocated page frames that
> + * will contain copies of saveable pages.  During restore it is initially used
> + * for marking hibernation image pages, but then the set bits from it are
> + * duplicated in @orig_bm and it is released.  On highmem systems it is next
> + * used for marking "safe" highmem pages, but it has to be reinitialized for
> + * this purpose.
> + */
> +static struct memory_bitmap copy_bm;
>  
>  /**
>   *	swsusp_free - free pages allocated for the suspend.
> @@ -1064,6 +1083,8 @@ void swsusp_free(void)
>  	nr_meta_pages = 0;
>  	restore_pblist = NULL;
>  	buffer = NULL;
> +	alloc_normal = 0;
> +	alloc_highmem = 0;
>  }
>  
>  /* Helper functions used for the shrinking of memory. */
> @@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
>  	unsigned long nr_alloc = 0;
>  
>  	while (nr_pages > 0) {
> -		if (!alloc_image_page(mask))
> -			break;
> + 		struct page *page;
> +
> +		page = alloc_image_page(mask);
> + 		if (!page)
> + 			break;
> +		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
> +		if (PageHighMem(page))
> +			alloc_highmem++;
> +		else
> +			alloc_normal++;
>  		nr_pages--;
>  		nr_alloc++;
>  	}
> @@ -1144,7 +1173,47 @@ static inline unsigned long highmem_size
>  #endif /* CONFIG_HIGHMEM */
>  
>  /**
> - * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + * free_unnecessary_pages - Release preallocated pages not needed for the image
> + */
> +static void free_unnecessary_pages(void)
> +{
> +	unsigned long save_highmem, to_free_normal, to_free_highmem;
> +
> +	to_free_normal = alloc_normal - count_data_pages();
> +	save_highmem = count_highmem_pages();
> +	if (alloc_highmem > save_highmem) {
> +		to_free_highmem = alloc_highmem - save_highmem;
> +	} else {
> +		to_free_highmem = 0;
> +		to_free_normal -= save_highmem - alloc_highmem;
> +	}
> +
> +	memory_bm_position_reset(&copy_bm);
> +
> +	while (to_free_normal > 0 && to_free_highmem > 0) {
> +		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
> +		struct page *page = pfn_to_page(pfn);
> +
> +		if (PageHighMem(page)) {
> +			if (!to_free_highmem)
> +				continue;
> +			to_free_highmem--;
> +			alloc_highmem--;
> +		} else {
> +			if (!to_free_normal)
> +				continue;
> +			to_free_normal--;
> +			alloc_normal--;
> +		}
> +		memory_bm_clear_bit(&copy_bm, pfn);
> +		swsusp_unset_page_forbidden(page);
> +		swsusp_unset_page_free(page);
> +		__free_page(page);
> +	}
> +}
> +
> +/**
> + * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
>   * frame in use.  We also need a number of page frames to be free during
> @@ -1163,19 +1232,30 @@ static inline unsigned long highmem_size
>   * pages in the system is below the requested image size or it is impossible to
>   * allocate more memory, whichever happens first.
>   */
> -int swsusp_shrink_memory(void)
> +int hibernate_preallocate_memory(void)
>  {
>  	struct zone *zone;
>  	unsigned long saveable, size, max_size, count, highmem, pages = 0;
> -	unsigned long alloc, pages_highmem;
> +	unsigned long alloc, save_highmem, pages_highmem;
>  	struct timeval start, stop;
> -	int error = 0;
> +	int error;
>  
> -	printk(KERN_INFO "PM: Shrinking memory... ");
> +	printk(KERN_INFO "PM: Preallocating image memory... ");
>  	do_gettimeofday(&start);
>  
> +	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
> +	if (error)
> +		goto err_out;
> +
> +	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
> +	if (error)
> +		goto err_out;
> +
> +	alloc_normal = 0;
> +	alloc_highmem = 0;
> +
>  	/* Count the number of saveable data pages. */
> -	highmem = count_highmem_pages();
> +	save_highmem = count_highmem_pages();
>  	saveable = count_data_pages();
>  
>  	/*
> @@ -1183,7 +1263,8 @@ int swsusp_shrink_memory(void)
>  	 * number of pages needed for image metadata (size).
>  	 */
>  	count = saveable;
> -	saveable += highmem;
> +	saveable += save_highmem;
> +	highmem = save_highmem;
>  	size = 0;
>  	for_each_populated_zone(zone) {
>  		size += snapshot_additional_pages(zone);
> @@ -1202,10 +1283,13 @@ int swsusp_shrink_memory(void)
>  		size = max_size;
>  	/*
>  	 * If the maximum is not less than the current number of saveable pages
> -	 * in memory, we don't need to do anything more.
> +	 * in memory, allocate page frames for the image and we're done.
>  	 */
> -	if (size >= saveable)
> +	if (size >= saveable) {
> +		pages = preallocate_image_highmem(save_highmem);
> +		pages += preallocate_image_memory(saveable - pages);
>  		goto out;
> +	}
>  
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
> @@ -1226,10 +1310,8 @@ int swsusp_shrink_memory(void)
>  	max_size += pages_highmem;
>  	alloc = count - max_size;
>  	pages = preallocate_image_memory(alloc);
> -	if (pages < alloc) {
> -		error = -ENOMEM;
> -		goto free_out;
> -	}
> +	if (pages < alloc)
> +		goto err_out;
>  	size = max_size - size;
>  	alloc = size;
>  	size = preallocate_image_highmem(highmem_size(size, highmem, count));
> @@ -1238,21 +1320,24 @@ int swsusp_shrink_memory(void)
>  	pages += preallocate_image_memory(alloc);
>  	pages += pages_highmem;
>  
> - free_out:
> -	/* Release all of the preallocated page frames. */
> -	swsusp_free();
> -
> -	if (error) {
> -		printk(KERN_CONT "\n");
> -		return error;
> -	}
> +	/*
> +	 * We only need as many page frames for the image as there are saveable
> +	 * pages in memory, but we have allocated more.  Release the excessive
> +	 * ones now.
> +	 */
> +	free_unnecessary_pages();
>  
>   out:
>  	do_gettimeofday(&stop);
> -	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
> -	swsusp_show_speed(&start, &stop, pages, "Freed");
> +	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
> +	swsusp_show_speed(&start, &stop, pages, "Allocated");
>  
>  	return 0;
> +
> + err_out:
> +	printk(KERN_CONT "\n");
> +	swsusp_free();
> +	return -ENOMEM;
>  }
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -1263,7 +1348,7 @@ int swsusp_shrink_memory(void)
>  
>  static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
>  {
> -	unsigned int free_highmem = count_free_highmem_pages();
> +	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
>  
>  	if (free_highmem >= nr_highmem)
>  		nr_highmem = 0;
> @@ -1285,19 +1370,17 @@ count_pages_for_highmem(unsigned int nr_
>  static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
>  {
>  	struct zone *zone;
> -	unsigned int free = 0, meta = 0;
> +	unsigned int free = alloc_normal;
>  
> -	for_each_zone(zone) {
> -		meta += snapshot_additional_pages(zone);
> +	for_each_zone(zone)
>  		if (!is_highmem(zone))
>  			free += zone_page_state(zone, NR_FREE_PAGES);
> -	}
>  
>  	nr_pages += count_pages_for_highmem(nr_highmem);
> -	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
> -		nr_pages, PAGES_FOR_IO, meta, free);
> +	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
> +		nr_pages, PAGES_FOR_IO, free);
>  
> -	return free > nr_pages + PAGES_FOR_IO + meta;
> +	return free > nr_pages + PAGES_FOR_IO;
>  }
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -1319,7 +1402,7 @@ static inline int get_highmem_buffer(int
>   */
>  
>  static inline unsigned int
> -alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
> +alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
>  {
>  	unsigned int to_alloc = count_free_highmem_pages();
>  
> @@ -1339,7 +1422,7 @@ alloc_highmem_image_pages(struct memory_
>  static inline int get_highmem_buffer(int safe_needed) { return 0; }
>  
>  static inline unsigned int
> -alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
> +alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
>  #endif /* CONFIG_HIGHMEM */
>  
>  /**
> @@ -1358,51 +1441,36 @@ static int
>  swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
>  		unsigned int nr_pages, unsigned int nr_highmem)
>  {
> -	int error;
> -
> -	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> -	if (error)
> -		goto Free;
> -
> -	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> -	if (error)
> -		goto Free;
> +	int error = 0;
>  
>  	if (nr_highmem > 0) {
>  		error = get_highmem_buffer(PG_ANY);
>  		if (error)
> -			goto Free;
> -
> -		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
> +			goto err_out;
> +		if (nr_highmem > alloc_highmem) {
> +			nr_highmem -= alloc_highmem;
> +			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
> +		}
>  	}
> -	while (nr_pages-- > 0) {
> -		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
> -
> -		if (!page)
> -			goto Free;
> -
> -		memory_bm_set_bit(copy_bm, page_to_pfn(page));
> +	if (nr_pages > alloc_normal) {
> +		nr_pages -= alloc_normal;
> +		while (nr_pages-- > 0) {
> +			struct page *page;
> +
> +			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
> +			if (!page)
> +				goto err_out;
> +			memory_bm_set_bit(copy_bm, page_to_pfn(page));
> +		}
>  	}
> +
>  	return 0;
>  
> - Free:
> + err_out:
>  	swsusp_free();
> -	return -ENOMEM;
> +	return error;
>  }
>  
> -/* Memory bitmap used for marking saveable pages (during suspend) or the
> - * suspend image pages (during resume)
> - */
> -static struct memory_bitmap orig_bm;
> -/* Memory bitmap used on suspend for marking allocated pages that will contain
> - * the copies of saveable pages.  During resume it is initially used for
> - * marking the suspend image pages, but then its set bits are duplicated in
> - * @orig_bm and it is released.  Next, on systems with high memory, it may be
> - * used for marking "safe" highmem pages, but it has to be reinitialized for
> - * this purpose.
> - */
> -static struct memory_bitmap copy_bm;
> -
>  asmlinkage int swsusp_save(void)
>  {
>  	unsigned int nr_pages, nr_highmem;
> Index: linux-2.6/kernel/power/power.h
> ===================================================================
> --- linux-2.6.orig/kernel/power/power.h
> +++ linux-2.6/kernel/power/power.h
> @@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
>  
>  extern int create_basic_memory_bitmaps(void);
>  extern void free_basic_memory_bitmaps(void);
> -extern int swsusp_shrink_memory(void);
> +extern int hibernate_preallocate_memory(void);
>  
>  /**
>   *	Auxiliary structure used for reading the snapshot image data and
> Index: linux-2.6/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/disk.c
> +++ linux-2.6/kernel/power/disk.c
> @@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
>  	if (error)
>  		return error;
>  
> -	/* Free memory before shutting down devices. */
> -	error = swsusp_shrink_memory();
> +	/* Preallocate image memory before shutting down devices. */
> +	error = hibernate_preallocate_memory();
>  	if (error)
>  		goto Close;
>  
> @@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
>  	/* Control returns here after successful restore */
>  
>   Resume_devices:
> +	/* We may need to release the preallocated image pages here. */
> +	if (error || !in_suspend)
> +		swsusp_free();
> +
>  	device_resume(in_suspend ?
>  		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
>  	resume_console();
> @@ -593,7 +597,10 @@ int hibernate(void)
>  		goto Thaw;
>  
>  	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
> -	if (in_suspend && !error) {
> +	if (error)
> +		goto Thaw;
> +
> +	if (in_suspend) {
>  		unsigned int flags = 0;
>  
>  		if (hibernation_mode == HIBERNATION_PLATFORM)
> @@ -605,8 +612,8 @@ int hibernate(void)
>  			power_down();
>  	} else {
>  		pr_debug("PM: Image restored successfully.\n");
> -		swsusp_free();
>  	}
> +
>   Thaw:
>  	thaw_processes();
>   Finish:

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-13  8:40         ` Rafael J. Wysocki
  (?)
@ 2009-05-14 11:09         ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14 11:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

Hi!

> Since the hibernation code is now going to use allocations of memory
> to make enough room for the image, it can also use the page frames
> allocated at this stage as image page frames.  The low-level
> hibernation code needs to be rearranged for this purpose, but it
> allows us to avoid freeing a great number of pages and allocating
> these same pages once again later, so it generally is worth doing.
> 
> [rev. 2: Take highmem into account correctly.]

I don't get it. What is advantage of this patch? It makes the code
more complex... Is it supposed to be faster?

								Pavel

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/disk.c     |   15 ++-
>  kernel/power/power.h    |    2 
>  kernel/power/snapshot.c |  206 +++++++++++++++++++++++++++++++-----------------
>  3 files changed, 149 insertions(+), 74 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
>  static unsigned int nr_copy_pages;
>  /* Number of pages needed for saving the original pfns of the image pages */
>  static unsigned int nr_meta_pages;
> +/*
> + * Numbers of normal and highmem page frames allocated for hibernation image
> + * before suspending devices.
> + */
> +unsigned int alloc_normal, alloc_highmem;
> +/*
> + * Memory bitmap used for marking saveable pages (during hibernation) or
> + * hibernation image pages (during restore)
> + */
> +static struct memory_bitmap orig_bm;
> +/*
> + * Memory bitmap used during hibernation for marking allocated page frames that
> + * will contain copies of saveable pages.  During restore it is initially used
> + * for marking hibernation image pages, but then the set bits from it are
> + * duplicated in @orig_bm and it is released.  On highmem systems it is next
> + * used for marking "safe" highmem pages, but it has to be reinitialized for
> + * this purpose.
> + */
> +static struct memory_bitmap copy_bm;
>  
>  /**
>   *	swsusp_free - free pages allocated for the suspend.
> @@ -1064,6 +1083,8 @@ void swsusp_free(void)
>  	nr_meta_pages = 0;
>  	restore_pblist = NULL;
>  	buffer = NULL;
> +	alloc_normal = 0;
> +	alloc_highmem = 0;
>  }
>  
>  /* Helper functions used for the shrinking of memory. */
> @@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
>  	unsigned long nr_alloc = 0;
>  
>  	while (nr_pages > 0) {
> -		if (!alloc_image_page(mask))
> -			break;
> + 		struct page *page;
> +
> +		page = alloc_image_page(mask);
> + 		if (!page)
> + 			break;
> +		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
> +		if (PageHighMem(page))
> +			alloc_highmem++;
> +		else
> +			alloc_normal++;
>  		nr_pages--;
>  		nr_alloc++;
>  	}
> @@ -1144,7 +1173,47 @@ static inline unsigned long highmem_size
>  #endif /* CONFIG_HIGHMEM */
>  
>  /**
> - * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + * free_unnecessary_pages - Release preallocated pages not needed for the image
> + */
> +static void free_unnecessary_pages(void)
> +{
> +	unsigned long save_highmem, to_free_normal, to_free_highmem;
> +
> +	to_free_normal = alloc_normal - count_data_pages();
> +	save_highmem = count_highmem_pages();
> +	if (alloc_highmem > save_highmem) {
> +		to_free_highmem = alloc_highmem - save_highmem;
> +	} else {
> +		to_free_highmem = 0;
> +		to_free_normal -= save_highmem - alloc_highmem;
> +	}
> +
> +	memory_bm_position_reset(&copy_bm);
> +
> +	while (to_free_normal > 0 && to_free_highmem > 0) {
> +		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
> +		struct page *page = pfn_to_page(pfn);
> +
> +		if (PageHighMem(page)) {
> +			if (!to_free_highmem)
> +				continue;
> +			to_free_highmem--;
> +			alloc_highmem--;
> +		} else {
> +			if (!to_free_normal)
> +				continue;
> +			to_free_normal--;
> +			alloc_normal--;
> +		}
> +		memory_bm_clear_bit(&copy_bm, pfn);
> +		swsusp_unset_page_forbidden(page);
> +		swsusp_unset_page_free(page);
> +		__free_page(page);
> +	}
> +}
> +
> +/**
> + * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
>   * frame in use.  We also need a number of page frames to be free during
> @@ -1163,19 +1232,30 @@ static inline unsigned long highmem_size
>   * pages in the system is below the requested image size or it is impossible to
>   * allocate more memory, whichever happens first.
>   */
> -int swsusp_shrink_memory(void)
> +int hibernate_preallocate_memory(void)
>  {
>  	struct zone *zone;
>  	unsigned long saveable, size, max_size, count, highmem, pages = 0;
> -	unsigned long alloc, pages_highmem;
> +	unsigned long alloc, save_highmem, pages_highmem;
>  	struct timeval start, stop;
> -	int error = 0;
> +	int error;
>  
> -	printk(KERN_INFO "PM: Shrinking memory... ");
> +	printk(KERN_INFO "PM: Preallocating image memory... ");
>  	do_gettimeofday(&start);
>  
> +	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
> +	if (error)
> +		goto err_out;
> +
> +	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
> +	if (error)
> +		goto err_out;
> +
> +	alloc_normal = 0;
> +	alloc_highmem = 0;
> +
>  	/* Count the number of saveable data pages. */
> -	highmem = count_highmem_pages();
> +	save_highmem = count_highmem_pages();
>  	saveable = count_data_pages();
>  
>  	/*
> @@ -1183,7 +1263,8 @@ int swsusp_shrink_memory(void)
>  	 * number of pages needed for image metadata (size).
>  	 */
>  	count = saveable;
> -	saveable += highmem;
> +	saveable += save_highmem;
> +	highmem = save_highmem;
>  	size = 0;
>  	for_each_populated_zone(zone) {
>  		size += snapshot_additional_pages(zone);
> @@ -1202,10 +1283,13 @@ int swsusp_shrink_memory(void)
>  		size = max_size;
>  	/*
>  	 * If the maximum is not less than the current number of saveable pages
> -	 * in memory, we don't need to do anything more.
> +	 * in memory, allocate page frames for the image and we're done.
>  	 */
> -	if (size >= saveable)
> +	if (size >= saveable) {
> +		pages = preallocate_image_highmem(save_highmem);
> +		pages += preallocate_image_memory(saveable - pages);
>  		goto out;
> +	}
>  
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
> @@ -1226,10 +1310,8 @@ int swsusp_shrink_memory(void)
>  	max_size += pages_highmem;
>  	alloc = count - max_size;
>  	pages = preallocate_image_memory(alloc);
> -	if (pages < alloc) {
> -		error = -ENOMEM;
> -		goto free_out;
> -	}
> +	if (pages < alloc)
> +		goto err_out;
>  	size = max_size - size;
>  	alloc = size;
>  	size = preallocate_image_highmem(highmem_size(size, highmem, count));
> @@ -1238,21 +1320,24 @@ int swsusp_shrink_memory(void)
>  	pages += preallocate_image_memory(alloc);
>  	pages += pages_highmem;
>  
> - free_out:
> -	/* Release all of the preallocated page frames. */
> -	swsusp_free();
> -
> -	if (error) {
> -		printk(KERN_CONT "\n");
> -		return error;
> -	}
> +	/*
> +	 * We only need as many page frames for the image as there are saveable
> +	 * pages in memory, but we have allocated more.  Release the excessive
> +	 * ones now.
> +	 */
> +	free_unnecessary_pages();
>  
>   out:
>  	do_gettimeofday(&stop);
> -	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
> -	swsusp_show_speed(&start, &stop, pages, "Freed");
> +	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
> +	swsusp_show_speed(&start, &stop, pages, "Allocated");
>  
>  	return 0;
> +
> + err_out:
> +	printk(KERN_CONT "\n");
> +	swsusp_free();
> +	return -ENOMEM;
>  }
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -1263,7 +1348,7 @@ int swsusp_shrink_memory(void)
>  
>  static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
>  {
> -	unsigned int free_highmem = count_free_highmem_pages();
> +	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
>  
>  	if (free_highmem >= nr_highmem)
>  		nr_highmem = 0;
> @@ -1285,19 +1370,17 @@ count_pages_for_highmem(unsigned int nr_
>  static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
>  {
>  	struct zone *zone;
> -	unsigned int free = 0, meta = 0;
> +	unsigned int free = alloc_normal;
>  
> -	for_each_zone(zone) {
> -		meta += snapshot_additional_pages(zone);
> +	for_each_zone(zone)
>  		if (!is_highmem(zone))
>  			free += zone_page_state(zone, NR_FREE_PAGES);
> -	}
>  
>  	nr_pages += count_pages_for_highmem(nr_highmem);
> -	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
> -		nr_pages, PAGES_FOR_IO, meta, free);
> +	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
> +		nr_pages, PAGES_FOR_IO, free);
>  
> -	return free > nr_pages + PAGES_FOR_IO + meta;
> +	return free > nr_pages + PAGES_FOR_IO;
>  }
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -1319,7 +1402,7 @@ static inline int get_highmem_buffer(int
>   */
>  
>  static inline unsigned int
> -alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
> +alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
>  {
>  	unsigned int to_alloc = count_free_highmem_pages();
>  
> @@ -1339,7 +1422,7 @@ alloc_highmem_image_pages(struct memory_
>  static inline int get_highmem_buffer(int safe_needed) { return 0; }
>  
>  static inline unsigned int
> -alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
> +alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
>  #endif /* CONFIG_HIGHMEM */
>  
>  /**
> @@ -1358,51 +1441,36 @@ static int
>  swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
>  		unsigned int nr_pages, unsigned int nr_highmem)
>  {
> -	int error;
> -
> -	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> -	if (error)
> -		goto Free;
> -
> -	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> -	if (error)
> -		goto Free;
> +	int error = 0;
>  
>  	if (nr_highmem > 0) {
>  		error = get_highmem_buffer(PG_ANY);
>  		if (error)
> -			goto Free;
> -
> -		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
> +			goto err_out;
> +		if (nr_highmem > alloc_highmem) {
> +			nr_highmem -= alloc_highmem;
> +			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
> +		}
>  	}
> -	while (nr_pages-- > 0) {
> -		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
> -
> -		if (!page)
> -			goto Free;
> -
> -		memory_bm_set_bit(copy_bm, page_to_pfn(page));
> +	if (nr_pages > alloc_normal) {
> +		nr_pages -= alloc_normal;
> +		while (nr_pages-- > 0) {
> +			struct page *page;
> +
> +			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
> +			if (!page)
> +				goto err_out;
> +			memory_bm_set_bit(copy_bm, page_to_pfn(page));
> +		}
>  	}
> +
>  	return 0;
>  
> - Free:
> + err_out:
>  	swsusp_free();
> -	return -ENOMEM;
> +	return error;
>  }
>  
> -/* Memory bitmap used for marking saveable pages (during suspend) or the
> - * suspend image pages (during resume)
> - */
> -static struct memory_bitmap orig_bm;
> -/* Memory bitmap used on suspend for marking allocated pages that will contain
> - * the copies of saveable pages.  During resume it is initially used for
> - * marking the suspend image pages, but then its set bits are duplicated in
> - * @orig_bm and it is released.  Next, on systems with high memory, it may be
> - * used for marking "safe" highmem pages, but it has to be reinitialized for
> - * this purpose.
> - */
> -static struct memory_bitmap copy_bm;
> -
>  asmlinkage int swsusp_save(void)
>  {
>  	unsigned int nr_pages, nr_highmem;
> Index: linux-2.6/kernel/power/power.h
> ===================================================================
> --- linux-2.6.orig/kernel/power/power.h
> +++ linux-2.6/kernel/power/power.h
> @@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
>  
>  extern int create_basic_memory_bitmaps(void);
>  extern void free_basic_memory_bitmaps(void);
> -extern int swsusp_shrink_memory(void);
> +extern int hibernate_preallocate_memory(void);
>  
>  /**
>   *	Auxiliary structure used for reading the snapshot image data and
> Index: linux-2.6/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/disk.c
> +++ linux-2.6/kernel/power/disk.c
> @@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
>  	if (error)
>  		return error;
>  
> -	/* Free memory before shutting down devices. */
> -	error = swsusp_shrink_memory();
> +	/* Preallocate image memory before shutting down devices. */
> +	error = hibernate_preallocate_memory();
>  	if (error)
>  		goto Close;
>  
> @@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
>  	/* Control returns here after successful restore */
>  
>   Resume_devices:
> +	/* We may need to release the preallocated image pages here. */
> +	if (error || !in_suspend)
> +		swsusp_free();
> +
>  	device_resume(in_suspend ?
>  		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
>  	resume_console();
> @@ -593,7 +597,10 @@ int hibernate(void)
>  		goto Thaw;
>  
>  	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
> -	if (in_suspend && !error) {
> +	if (error)
> +		goto Thaw;
> +
> +	if (in_suspend) {
>  		unsigned int flags = 0;
>  
>  		if (hibernation_mode == HIBERNATION_PLATFORM)
> @@ -605,8 +612,8 @@ int hibernate(void)
>  			power_down();
>  	} else {
>  		pr_debug("PM: Image restored successfully.\n");
> -		swsusp_free();
>  	}
> +
>   Thaw:
>  	thaw_processes();
>   Finish:

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
@ 2009-05-14 11:09           ` Pavel Machek
  0 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14 11:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

Hi!

> Since the hibernation code is now going to use allocations of memory
> to make enough room for the image, it can also use the page frames
> allocated at this stage as image page frames.  The low-level
> hibernation code needs to be rearranged for this purpose, but it
> allows us to avoid freeing a great number of pages and allocating
> these same pages once again later, so it generally is worth doing.
> 
> [rev. 2: Take highmem into account correctly.]

I don't get it. What is advantage of this patch? It makes the code
more complex... Is it supposed to be faster?

								Pavel

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/disk.c     |   15 ++-
>  kernel/power/power.h    |    2 
>  kernel/power/snapshot.c |  206 +++++++++++++++++++++++++++++++-----------------
>  3 files changed, 149 insertions(+), 74 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1033,6 +1033,25 @@ copy_data_pages(struct memory_bitmap *co
>  static unsigned int nr_copy_pages;
>  /* Number of pages needed for saving the original pfns of the image pages */
>  static unsigned int nr_meta_pages;
> +/*
> + * Numbers of normal and highmem page frames allocated for hibernation image
> + * before suspending devices.
> + */
> +unsigned int alloc_normal, alloc_highmem;
> +/*
> + * Memory bitmap used for marking saveable pages (during hibernation) or
> + * hibernation image pages (during restore)
> + */
> +static struct memory_bitmap orig_bm;
> +/*
> + * Memory bitmap used during hibernation for marking allocated page frames that
> + * will contain copies of saveable pages.  During restore it is initially used
> + * for marking hibernation image pages, but then the set bits from it are
> + * duplicated in @orig_bm and it is released.  On highmem systems it is next
> + * used for marking "safe" highmem pages, but it has to be reinitialized for
> + * this purpose.
> + */
> +static struct memory_bitmap copy_bm;
>  
>  /**
>   *	swsusp_free - free pages allocated for the suspend.
> @@ -1064,6 +1083,8 @@ void swsusp_free(void)
>  	nr_meta_pages = 0;
>  	restore_pblist = NULL;
>  	buffer = NULL;
> +	alloc_normal = 0;
> +	alloc_highmem = 0;
>  }
>  
>  /* Helper functions used for the shrinking of memory. */
> @@ -1082,8 +1103,16 @@ static unsigned long preallocate_image_p
>  	unsigned long nr_alloc = 0;
>  
>  	while (nr_pages > 0) {
> -		if (!alloc_image_page(mask))
> -			break;
> + 		struct page *page;
> +
> +		page = alloc_image_page(mask);
> + 		if (!page)
> + 			break;
> +		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
> +		if (PageHighMem(page))
> +			alloc_highmem++;
> +		else
> +			alloc_normal++;
>  		nr_pages--;
>  		nr_alloc++;
>  	}
> @@ -1144,7 +1173,47 @@ static inline unsigned long highmem_size
>  #endif /* CONFIG_HIGHMEM */
>  
>  /**
> - * swsusp_shrink_memory -  Make the kernel release as much memory as needed
> + * free_unnecessary_pages - Release preallocated pages not needed for the image
> + */
> +static void free_unnecessary_pages(void)
> +{
> +	unsigned long save_highmem, to_free_normal, to_free_highmem;
> +
> +	to_free_normal = alloc_normal - count_data_pages();
> +	save_highmem = count_highmem_pages();
> +	if (alloc_highmem > save_highmem) {
> +		to_free_highmem = alloc_highmem - save_highmem;
> +	} else {
> +		to_free_highmem = 0;
> +		to_free_normal -= save_highmem - alloc_highmem;
> +	}
> +
> +	memory_bm_position_reset(&copy_bm);
> +
> +	while (to_free_normal > 0 && to_free_highmem > 0) {
> +		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
> +		struct page *page = pfn_to_page(pfn);
> +
> +		if (PageHighMem(page)) {
> +			if (!to_free_highmem)
> +				continue;
> +			to_free_highmem--;
> +			alloc_highmem--;
> +		} else {
> +			if (!to_free_normal)
> +				continue;
> +			to_free_normal--;
> +			alloc_normal--;
> +		}
> +		memory_bm_clear_bit(&copy_bm, pfn);
> +		swsusp_unset_page_forbidden(page);
> +		swsusp_unset_page_free(page);
> +		__free_page(page);
> +	}
> +}
> +
> +/**
> + * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
>   * frame in use.  We also need a number of page frames to be free during
> @@ -1163,19 +1232,30 @@ static inline unsigned long highmem_size
>   * pages in the system is below the requested image size or it is impossible to
>   * allocate more memory, whichever happens first.
>   */
> -int swsusp_shrink_memory(void)
> +int hibernate_preallocate_memory(void)
>  {
>  	struct zone *zone;
>  	unsigned long saveable, size, max_size, count, highmem, pages = 0;
> -	unsigned long alloc, pages_highmem;
> +	unsigned long alloc, save_highmem, pages_highmem;
>  	struct timeval start, stop;
> -	int error = 0;
> +	int error;
>  
> -	printk(KERN_INFO "PM: Shrinking memory... ");
> +	printk(KERN_INFO "PM: Preallocating image memory... ");
>  	do_gettimeofday(&start);
>  
> +	error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY);
> +	if (error)
> +		goto err_out;
> +
> +	error = memory_bm_create(&copy_bm, GFP_IMAGE, PG_ANY);
> +	if (error)
> +		goto err_out;
> +
> +	alloc_normal = 0;
> +	alloc_highmem = 0;
> +
>  	/* Count the number of saveable data pages. */
> -	highmem = count_highmem_pages();
> +	save_highmem = count_highmem_pages();
>  	saveable = count_data_pages();
>  
>  	/*
> @@ -1183,7 +1263,8 @@ int swsusp_shrink_memory(void)
>  	 * number of pages needed for image metadata (size).
>  	 */
>  	count = saveable;
> -	saveable += highmem;
> +	saveable += save_highmem;
> +	highmem = save_highmem;
>  	size = 0;
>  	for_each_populated_zone(zone) {
>  		size += snapshot_additional_pages(zone);
> @@ -1202,10 +1283,13 @@ int swsusp_shrink_memory(void)
>  		size = max_size;
>  	/*
>  	 * If the maximum is not less than the current number of saveable pages
> -	 * in memory, we don't need to do anything more.
> +	 * in memory, allocate page frames for the image and we're done.
>  	 */
> -	if (size >= saveable)
> +	if (size >= saveable) {
> +		pages = preallocate_image_highmem(save_highmem);
> +		pages += preallocate_image_memory(saveable - pages);
>  		goto out;
> +	}
>  
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
> @@ -1226,10 +1310,8 @@ int swsusp_shrink_memory(void)
>  	max_size += pages_highmem;
>  	alloc = count - max_size;
>  	pages = preallocate_image_memory(alloc);
> -	if (pages < alloc) {
> -		error = -ENOMEM;
> -		goto free_out;
> -	}
> +	if (pages < alloc)
> +		goto err_out;
>  	size = max_size - size;
>  	alloc = size;
>  	size = preallocate_image_highmem(highmem_size(size, highmem, count));
> @@ -1238,21 +1320,24 @@ int swsusp_shrink_memory(void)
>  	pages += preallocate_image_memory(alloc);
>  	pages += pages_highmem;
>  
> - free_out:
> -	/* Release all of the preallocated page frames. */
> -	swsusp_free();
> -
> -	if (error) {
> -		printk(KERN_CONT "\n");
> -		return error;
> -	}
> +	/*
> +	 * We only need as many page frames for the image as there are saveable
> +	 * pages in memory, but we have allocated more.  Release the excessive
> +	 * ones now.
> +	 */
> +	free_unnecessary_pages();
>  
>   out:
>  	do_gettimeofday(&stop);
> -	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
> -	swsusp_show_speed(&start, &stop, pages, "Freed");
> +	printk(KERN_CONT "done (allocated %lu pages)\n", pages);
> +	swsusp_show_speed(&start, &stop, pages, "Allocated");
>  
>  	return 0;
> +
> + err_out:
> +	printk(KERN_CONT "\n");
> +	swsusp_free();
> +	return -ENOMEM;
>  }
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -1263,7 +1348,7 @@ int swsusp_shrink_memory(void)
>  
>  static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
>  {
> -	unsigned int free_highmem = count_free_highmem_pages();
> +	unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
>  
>  	if (free_highmem >= nr_highmem)
>  		nr_highmem = 0;
> @@ -1285,19 +1370,17 @@ count_pages_for_highmem(unsigned int nr_
>  static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
>  {
>  	struct zone *zone;
> -	unsigned int free = 0, meta = 0;
> +	unsigned int free = alloc_normal;
>  
> -	for_each_zone(zone) {
> -		meta += snapshot_additional_pages(zone);
> +	for_each_zone(zone)
>  		if (!is_highmem(zone))
>  			free += zone_page_state(zone, NR_FREE_PAGES);
> -	}
>  
>  	nr_pages += count_pages_for_highmem(nr_highmem);
> -	pr_debug("PM: Normal pages needed: %u + %u + %u, available pages: %u\n",
> -		nr_pages, PAGES_FOR_IO, meta, free);
> +	pr_debug("PM: Normal pages needed: %u + %u, available pages: %u\n",
> +		nr_pages, PAGES_FOR_IO, free);
>  
> -	return free > nr_pages + PAGES_FOR_IO + meta;
> +	return free > nr_pages + PAGES_FOR_IO;
>  }
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -1319,7 +1402,7 @@ static inline int get_highmem_buffer(int
>   */
>  
>  static inline unsigned int
> -alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
> +alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
>  {
>  	unsigned int to_alloc = count_free_highmem_pages();
>  
> @@ -1339,7 +1422,7 @@ alloc_highmem_image_pages(struct memory_
>  static inline int get_highmem_buffer(int safe_needed) { return 0; }
>  
>  static inline unsigned int
> -alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
> +alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
>  #endif /* CONFIG_HIGHMEM */
>  
>  /**
> @@ -1358,51 +1441,36 @@ static int
>  swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
>  		unsigned int nr_pages, unsigned int nr_highmem)
>  {
> -	int error;
> -
> -	error = memory_bm_create(orig_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> -	if (error)
> -		goto Free;
> -
> -	error = memory_bm_create(copy_bm, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> -	if (error)
> -		goto Free;
> +	int error = 0;
>  
>  	if (nr_highmem > 0) {
>  		error = get_highmem_buffer(PG_ANY);
>  		if (error)
> -			goto Free;
> -
> -		nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
> +			goto err_out;
> +		if (nr_highmem > alloc_highmem) {
> +			nr_highmem -= alloc_highmem;
> +			nr_pages += alloc_highmem_pages(copy_bm, nr_highmem);
> +		}
>  	}
> -	while (nr_pages-- > 0) {
> -		struct page *page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
> -
> -		if (!page)
> -			goto Free;
> -
> -		memory_bm_set_bit(copy_bm, page_to_pfn(page));
> +	if (nr_pages > alloc_normal) {
> +		nr_pages -= alloc_normal;
> +		while (nr_pages-- > 0) {
> +			struct page *page;
> +
> +			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
> +			if (!page)
> +				goto err_out;
> +			memory_bm_set_bit(copy_bm, page_to_pfn(page));
> +		}
>  	}
> +
>  	return 0;
>  
> - Free:
> + err_out:
>  	swsusp_free();
> -	return -ENOMEM;
> +	return error;
>  }
>  
> -/* Memory bitmap used for marking saveable pages (during suspend) or the
> - * suspend image pages (during resume)
> - */
> -static struct memory_bitmap orig_bm;
> -/* Memory bitmap used on suspend for marking allocated pages that will contain
> - * the copies of saveable pages.  During resume it is initially used for
> - * marking the suspend image pages, but then its set bits are duplicated in
> - * @orig_bm and it is released.  Next, on systems with high memory, it may be
> - * used for marking "safe" highmem pages, but it has to be reinitialized for
> - * this purpose.
> - */
> -static struct memory_bitmap copy_bm;
> -
>  asmlinkage int swsusp_save(void)
>  {
>  	unsigned int nr_pages, nr_highmem;
> Index: linux-2.6/kernel/power/power.h
> ===================================================================
> --- linux-2.6.orig/kernel/power/power.h
> +++ linux-2.6/kernel/power/power.h
> @@ -74,7 +74,7 @@ extern asmlinkage int swsusp_arch_resume
>  
>  extern int create_basic_memory_bitmaps(void);
>  extern void free_basic_memory_bitmaps(void);
> -extern int swsusp_shrink_memory(void);
> +extern int hibernate_preallocate_memory(void);
>  
>  /**
>   *	Auxiliary structure used for reading the snapshot image data and
> Index: linux-2.6/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/disk.c
> +++ linux-2.6/kernel/power/disk.c
> @@ -303,8 +303,8 @@ int hibernation_snapshot(int platform_mo
>  	if (error)
>  		return error;
>  
> -	/* Free memory before shutting down devices. */
> -	error = swsusp_shrink_memory();
> +	/* Preallocate image memory before shutting down devices. */
> +	error = hibernate_preallocate_memory();
>  	if (error)
>  		goto Close;
>  
> @@ -320,6 +320,10 @@ int hibernation_snapshot(int platform_mo
>  	/* Control returns here after successful restore */
>  
>   Resume_devices:
> +	/* We may need to release the preallocated image pages here. */
> +	if (error || !in_suspend)
> +		swsusp_free();
> +
>  	device_resume(in_suspend ?
>  		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
>  	resume_console();
> @@ -593,7 +597,10 @@ int hibernate(void)
>  		goto Thaw;
>  
>  	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
> -	if (in_suspend && !error) {
> +	if (error)
> +		goto Thaw;
> +
> +	if (in_suspend) {
>  		unsigned int flags = 0;
>  
>  		if (hibernation_mode == HIBERNATION_PLATFORM)
> @@ -605,8 +612,8 @@ int hibernate(void)
>  			power_down();
>  	} else {
>  		pr_debug("PM: Image restored successfully.\n");
> -		swsusp_free();
>  	}
> +
>   Thaw:
>  	thaw_processes();
>   Finish:

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-13  8:42         ` Rafael J. Wysocki
@ 2009-05-14 11:14           ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14 11:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

Hi!

> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.

Why? Is freeing memory too slow?

It used to be that user controlled image size, so he was able to
balance "time to save image" vs. "responsiveness of system after
resume".

Does this just override user's preference when he chooses too small
image size?

> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)
> 
> and even that is usually impossible to free in practice, because some
> of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> in fact be freed.  It turns out, however, that if the sum of the
> above numbers is subtracted from the number of saveable pages in the
> system and the result is multiplied by 1.25, we get a suitable
> estimate of the minimum size of the image.



> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)


>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: The total number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.

I don't get it. If user sets image size as 0, we should free as much
memory as we can. I just don't see why "we want to avoid... it".

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-13  8:42         ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-14 11:14         ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14 11:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

Hi!

> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.

Why? Is freeing memory too slow?

It used to be that user controlled image size, so he was able to
balance "time to save image" vs. "responsiveness of system after
resume".

Does this just override user's preference when he chooses too small
image size?

> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)
> 
> and even that is usually impossible to free in practice, because some
> of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> in fact be freed.  It turns out, however, that if the sum of the
> above numbers is subtracted from the number of saveable pages in the
> system and the result is multiplied by 1.25, we get a suitable
> estimate of the minimum size of the image.



> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)


>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: The total number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.

I don't get it. If user sets image size as 0, we should free as much
memory as we can. I just don't see why "we want to avoid... it".

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-14 11:14           ` Pavel Machek
  0 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-14 11:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

Hi!

> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.

Why? Is freeing memory too slow?

It used to be that user controlled image size, so he was able to
balance "time to save image" vs. "responsiveness of system after
resume".

Does this just override user's preference when he chooses too small
image size?

> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)
> 
> and even that is usually impossible to free in practice, because some
> of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> in fact be freed.  It turns out, however, that if the sum of the
> above numbers is subtracted from the number of saveable pages in the
> system and the result is multiplied by 1.25, we get a suitable
> estimate of the minimum size of the image.



> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)


>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: The total number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.

I don't get it. If user sets image size as 0, we should free as much
memory as we can. I just don't see why "we want to avoid... it".

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-14  9:40                   ` Pavel Machek
@ 2009-05-14 17:49                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, linux-pm, fengguang.wu, linux-kernel, nigel,
	rientjes, linux-mm

On Thursday 14 May 2009, Pavel Machek wrote:
> 
> > > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > > we're retaining it and adding even more stuff?
> > > > 
> > > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > > performance problem has been resolved.  Also, we now allocate memory for the
> > > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > > devices.  I'd think that's an improvement?
> > > 
> > > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > > could be embarrassing if the devices are frozen.
> > 
> > They aren't, because the preallocation is done upfront, so once the OOM killer
> > has been taken care of, it's totally safe. :-)
> 
> As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
> consequences when accounting goes wrong. (New kernel's idea of what is
> on disk will differ from what is _really_ on disk.)
> 
> If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.
> 
> If accounting is wrong, GFP_ATOMIC will fail with NULL, while
> GFP_KERNEL will do something bad.
> 
> I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 

Repeating myself: with this and the next patch applied, we preallocate memory
for the image _before_ freezing devices and therefore it is safe to use
GFP_KERNEL, because the OOM killer has been taken care of by [3/6].

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-14  9:40                   ` Pavel Machek
  (?)
  (?)
@ 2009-05-14 17:49                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, linux-mm, rientjes, Andrew Morton, fengguang.wu, linux-pm

On Thursday 14 May 2009, Pavel Machek wrote:
> 
> > > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > > we're retaining it and adding even more stuff?
> > > > 
> > > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > > performance problem has been resolved.  Also, we now allocate memory for the
> > > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > > devices.  I'd think that's an improvement?
> > > 
> > > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > > could be embarrassing if the devices are frozen.
> > 
> > They aren't, because the preallocation is done upfront, so once the OOM killer
> > has been taken care of, it's totally safe. :-)
> 
> As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
> consequences when accounting goes wrong. (New kernel's idea of what is
> on disk will differ from what is _really_ on disk.)
> 
> If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.
> 
> If accounting is wrong, GFP_ATOMIC will fail with NULL, while
> GFP_KERNEL will do something bad.
> 
> I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 

Repeating myself: with this and the next patch applied, we preallocate memory
for the image _before_ freezing devices and therefore it is safe to use
GFP_KERNEL, because the OOM killer has been taken care of by [3/6].

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-14 17:49                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, linux-pm, fengguang.wu, linux-kernel, nigel,
	rientjes, linux-mm

On Thursday 14 May 2009, Pavel Machek wrote:
> 
> > > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > > we're retaining it and adding even more stuff?
> > > > 
> > > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > > performance problem has been resolved.  Also, we now allocate memory for the
> > > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > > devices.  I'd think that's an improvement?
> > > 
> > > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > > could be embarrassing if the devices are frozen.
> > 
> > They aren't, because the preallocation is done upfront, so once the OOM killer
> > has been taken care of, it's totally safe. :-)
> 
> As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
> consequences when accounting goes wrong. (New kernel's idea of what is
> on disk will differ from what is _really_ on disk.)
> 
> If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.
> 
> If accounting is wrong, GFP_ATOMIC will fail with NULL, while
> GFP_KERNEL will do something bad.
> 
> I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 

Repeating myself: with this and the next patch applied, we preallocate memory
for the image _before_ freezing devices and therefore it is safe to use
GFP_KERNEL, because the OOM killer has been taken care of by [3/6].

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-14 11:09           ` Pavel Machek
@ 2009-05-14 17:52             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Thursday 14 May 2009, Pavel Machek wrote:
> Hi!
> 
> > Since the hibernation code is now going to use allocations of memory
> > to make enough room for the image, it can also use the page frames
> > allocated at this stage as image page frames.  The low-level
> > hibernation code needs to be rearranged for this purpose, but it
> > allows us to avoid freeing a great number of pages and allocating
> > these same pages once again later, so it generally is worth doing.
> > 
> > [rev. 2: Take highmem into account correctly.]
> 
> I don't get it. What is advantage of this patch? It makes the code
> more complex... Is it supposed to be faster?

Yes, in some test cases it is reported to be faster (along with [4/6],
actually).

Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
step in this direction.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-14 11:09           ` Pavel Machek
  (?)
  (?)
@ 2009-05-14 17:52           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

On Thursday 14 May 2009, Pavel Machek wrote:
> Hi!
> 
> > Since the hibernation code is now going to use allocations of memory
> > to make enough room for the image, it can also use the page frames
> > allocated at this stage as image page frames.  The low-level
> > hibernation code needs to be rearranged for this purpose, but it
> > allows us to avoid freeing a great number of pages and allocating
> > these same pages once again later, so it generally is worth doing.
> > 
> > [rev. 2: Take highmem into account correctly.]
> 
> I don't get it. What is advantage of this patch? It makes the code
> more complex... Is it supposed to be faster?

Yes, in some test cases it is reported to be faster (along with [4/6],
actually).

Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
step in this direction.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
@ 2009-05-14 17:52             ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Thursday 14 May 2009, Pavel Machek wrote:
> Hi!
> 
> > Since the hibernation code is now going to use allocations of memory
> > to make enough room for the image, it can also use the page frames
> > allocated at this stage as image page frames.  The low-level
> > hibernation code needs to be rearranged for this purpose, but it
> > allows us to avoid freeing a great number of pages and allocating
> > these same pages once again later, so it generally is worth doing.
> > 
> > [rev. 2: Take highmem into account correctly.]
> 
> I don't get it. What is advantage of this patch? It makes the code
> more complex... Is it supposed to be faster?

Yes, in some test cases it is reported to be faster (along with [4/6],
actually).

Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
step in this direction.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-14 11:14           ` Pavel Machek
@ 2009-05-14 17:59             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:59 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Thursday 14 May 2009, Pavel Machek wrote:
> Hi!
> 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> 
> Why? Is freeing memory too slow?
> 
> It used to be that user controlled image size, so he was able to
> balance "time to save image" vs. "responsiveness of system after
> resume".
> 
> Does this just override user's preference when he chooses too small
> image size?
> 
> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> > 
> > and even that is usually impossible to free in practice, because some
> > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > in fact be freed.  It turns out, however, that if the sum of the
> > above numbers is subtracted from the number of saveable pages in the
> > system and the result is multiplied by 1.25, we get a suitable
> > estimate of the minimum size of the image.
> 
> 
> 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> 
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: The total number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> 
> I don't get it. If user sets image size as 0, we should free as much
> memory as we can. I just don't see why "we want to avoid... it".

The "as much memory as we can" is not well defined.

Patches [4/6] and [5/6] make hibernation use memory allocations to force some
memory to be freed.  However, it is not really reasonable to try to allocate
until the allocation fails, because that stresses the memory management
subsystem too much.  It is better to predict when it fails and stop allocating
at that point, which is what the patch does.

The prediction is not very precise, but I think it need not be.  Even if it
leaves a few pages more in memory, that won't be a disaster.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-14 11:14           ` Pavel Machek
  (?)
  (?)
@ 2009-05-14 17:59           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:59 UTC (permalink / raw)
  To: Pavel Machek
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

On Thursday 14 May 2009, Pavel Machek wrote:
> Hi!
> 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> 
> Why? Is freeing memory too slow?
> 
> It used to be that user controlled image size, so he was able to
> balance "time to save image" vs. "responsiveness of system after
> resume".
> 
> Does this just override user's preference when he chooses too small
> image size?
> 
> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> > 
> > and even that is usually impossible to free in practice, because some
> > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > in fact be freed.  It turns out, however, that if the sum of the
> > above numbers is subtracted from the number of saveable pages in the
> > system and the result is multiplied by 1.25, we get a suitable
> > estimate of the minimum size of the image.
> 
> 
> 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> 
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: The total number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> 
> I don't get it. If user sets image size as 0, we should free as much
> memory as we can. I just don't see why "we want to avoid... it".

The "as much memory as we can" is not well defined.

Patches [4/6] and [5/6] make hibernation use memory allocations to force some
memory to be freed.  However, it is not really reasonable to try to allocate
until the allocation fails, because that stresses the memory management
subsystem too much.  It is better to predict when it fails and stop allocating
at that point, which is what the patch does.

The prediction is not very precise, but I think it need not be.  Even if it
leaves a few pages more in memory, that won't be a disaster.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-14 17:59             ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 17:59 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Thursday 14 May 2009, Pavel Machek wrote:
> Hi!
> 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> 
> Why? Is freeing memory too slow?
> 
> It used to be that user controlled image size, so he was able to
> balance "time to save image" vs. "responsiveness of system after
> resume".
> 
> Does this just override user's preference when he chooses too small
> image size?
> 
> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> > 
> > and even that is usually impossible to free in practice, because some
> > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > in fact be freed.  It turns out, however, that if the sum of the
> > above numbers is subtracted from the number of saveable pages in the
> > system and the result is multiplied by 1.25, we get a suitable
> > estimate of the minimum size of the image.
> 
> 
> 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> 
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: The total number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> 
> I don't get it. If user sets image size as 0, we should free as much
> memory as we can. I just don't see why "we want to avoid... it".

The "as much memory as we can" is not well defined.

Patches [4/6] and [5/6] make hibernation use memory allocations to force some
memory to be freed.  However, it is not really reasonable to try to allocate
until the allocation fails, because that stresses the memory management
subsystem too much.  It is better to predict when it fails and stop allocating
at that point, which is what the patch does.

The prediction is not very precise, but I think it need not be.  Even if it
leaves a few pages more in memory, that won't be a disaster.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 20:55             ` Rafael J. Wysocki
@ 2009-05-14 18:26               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 18:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wednesday 13 May 2009, Rafael J. Wysocki wrote:
> On Wednesday 13 May 2009, Andrew Morton wrote:
> > On Wed, 13 May 2009 10:39:25 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > > 
> > 
> > Isn't this a somewhat large problem?
> 
> Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> shrink_all_memory() in at least one test case.  100% reproducible.
> 
> > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > we're retaining it and adding even more stuff?
> 
> The idea is that afterwards we can drop shrink_all_memory() once the
> performance problem has been resolved.  Also, we now allocate memory for the
> image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> devices.  I'd think that's an improvement?
> 
> > > +/**
> > > + * compute_fraction - Compute approximate fraction x * (a/b)
> > > + * @x: Number to multiply.
> > > + * @numerator: Numerator of the fraction (a).
> > > + * @denominator: Denominator of the fraction (b).
> > >   *
> > > - *	Notice: all userland should be stopped before it is called, or
> > > - *	livelock is possible.
> > > + * Compute an approximate value of the expression x * (a/b), where a is less
> > > + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> > > + * maximum unsigned long.
> > >   */
> > > +static unsigned long compute_fraction(
> > > +	unsigned long x, unsigned long numerator, unsigned long denominator)
> > 
> > I can't say I'm a great fan of the code layout here.
> > 
> > static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)
> > 
> > or
> > 
> > static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
> > 					unsigned long denominator)
> > 
> > would be more typical.
> 
> OK
>  
> > > +{
> > > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> > >  
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +	x *= ratio;
> > > +	return x >> FRACTION_SHIFT;
> > > +}
> > 
> > Strange function.  Would it not be simpler/clearer to do it with 64-bit
> > scalars, multiplication and do_div()?
> 
> Sure, I can do it this way too.  Is it fine to use u64 for this purpose?
> 
> > > +static unsigned long highmem_size(
> > > +	unsigned long size, unsigned long highmem, unsigned long count)
> > > +{
> > > +	return highmem > count / 2 ?
> > > +			compute_fraction(size, highmem, count) :
> > > +			size - compute_fraction(size, count - highmem, count);
> > > +}
> > 
> > This would be considerably easier to follow if we know what the three
> > arguments represent.  Amount of memory?  In what units?  `count' of
> > what?
> > 
> > The `count/2' thing there is quite mysterious.
> > 
> > <does some reverse-engineering>
> > 
> > OK, `count' is "the number of pageframes we can use".  (I don't think I
> > helped myself a lot there).  But what's up with that divde-by-two?
> > 
> > <considers poking at callers to work out what `size' is>
> > 
> > <gives up>
> > 
> > Is this code as clear as we can possibly make it??
> 
> Heh
> 
> OK, I'll do my best to clean it up.

Updated patch is appended.

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Rework shrinking of memory (rev. 2)

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  204 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 158 insertions(+), 46 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,181 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
 }
 
+/**
+ *  __fraction - Compute (an approximation of) x * (multiplier / base)
+ */
+static unsigned long __fraction(u64 x, u64 multiplier, u64 base)
+{
+	x *= multiplier;
+	do_div(x, base);
+	return (unsigned long)x;
+}
+
+static unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
+						unsigned long highmem,
+						unsigned long total)
+{
+	unsigned long alloc = __fraction(nr_pages, highmem, total);
+
+	return preallocate_image_pages(alloc, GFP_IMAGE | __GFP_HIGHMEM);
+}
+#else /* CONFIG_HIGHMEM */
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
+}
+
+static inline unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
+						unsigned long highmem,
+						unsigned long total)
+{
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of saveable
+ * pages in the system is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone))
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		else
+			count += zone_page_state(zone, NR_FREE_PAGES);
+	}
+	count += highmem;
+	count -= totalreserve_pages;
+
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance will be hurt badly in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	max_size += pages_highmem;
+	alloc = count - max_size;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_highmem_fraction(size, highmem, count);
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-13 20:55             ` Rafael J. Wysocki
                               ` (3 preceding siblings ...)
  (?)
@ 2009-05-14 18:26             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 18:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, rientjes, linux-pm, fengguang.wu

On Wednesday 13 May 2009, Rafael J. Wysocki wrote:
> On Wednesday 13 May 2009, Andrew Morton wrote:
> > On Wed, 13 May 2009 10:39:25 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > > 
> > 
> > Isn't this a somewhat large problem?
> 
> Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> shrink_all_memory() in at least one test case.  100% reproducible.
> 
> > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > we're retaining it and adding even more stuff?
> 
> The idea is that afterwards we can drop shrink_all_memory() once the
> performance problem has been resolved.  Also, we now allocate memory for the
> image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> devices.  I'd think that's an improvement?
> 
> > > +/**
> > > + * compute_fraction - Compute approximate fraction x * (a/b)
> > > + * @x: Number to multiply.
> > > + * @numerator: Numerator of the fraction (a).
> > > + * @denominator: Denominator of the fraction (b).
> > >   *
> > > - *	Notice: all userland should be stopped before it is called, or
> > > - *	livelock is possible.
> > > + * Compute an approximate value of the expression x * (a/b), where a is less
> > > + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> > > + * maximum unsigned long.
> > >   */
> > > +static unsigned long compute_fraction(
> > > +	unsigned long x, unsigned long numerator, unsigned long denominator)
> > 
> > I can't say I'm a great fan of the code layout here.
> > 
> > static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)
> > 
> > or
> > 
> > static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
> > 					unsigned long denominator)
> > 
> > would be more typical.
> 
> OK
>  
> > > +{
> > > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> > >  
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +	x *= ratio;
> > > +	return x >> FRACTION_SHIFT;
> > > +}
> > 
> > Strange function.  Would it not be simpler/clearer to do it with 64-bit
> > scalars, multiplication and do_div()?
> 
> Sure, I can do it this way too.  Is it fine to use u64 for this purpose?
> 
> > > +static unsigned long highmem_size(
> > > +	unsigned long size, unsigned long highmem, unsigned long count)
> > > +{
> > > +	return highmem > count / 2 ?
> > > +			compute_fraction(size, highmem, count) :
> > > +			size - compute_fraction(size, count - highmem, count);
> > > +}
> > 
> > This would be considerably easier to follow if we know what the three
> > arguments represent.  Amount of memory?  In what units?  `count' of
> > what?
> > 
> > The `count/2' thing there is quite mysterious.
> > 
> > <does some reverse-engineering>
> > 
> > OK, `count' is "the number of pageframes we can use".  (I don't think I
> > helped myself a lot there).  But what's up with that divde-by-two?
> > 
> > <considers poking at callers to work out what `size' is>
> > 
> > <gives up>
> > 
> > Is this code as clear as we can possibly make it??
> 
> Heh
> 
> OK, I'll do my best to clean it up.

Updated patch is appended.

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Rework shrinking of memory (rev. 2)

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  204 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 158 insertions(+), 46 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,181 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
 }
 
+/**
+ *  __fraction - Compute (an approximation of) x * (multiplier / base)
+ */
+static unsigned long __fraction(u64 x, u64 multiplier, u64 base)
+{
+	x *= multiplier;
+	do_div(x, base);
+	return (unsigned long)x;
+}
+
+static unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
+						unsigned long highmem,
+						unsigned long total)
+{
+	unsigned long alloc = __fraction(nr_pages, highmem, total);
+
+	return preallocate_image_pages(alloc, GFP_IMAGE | __GFP_HIGHMEM);
+}
+#else /* CONFIG_HIGHMEM */
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
+}
+
+static inline unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
+						unsigned long highmem,
+						unsigned long total)
+{
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of saveable
+ * pages in the system is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone))
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		else
+			count += zone_page_state(zone, NR_FREE_PAGES);
+	}
+	count += highmem;
+	count -= totalreserve_pages;
+
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance will be hurt badly in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	max_size += pages_highmem;
+	alloc = count - max_size;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_highmem_fraction(size, highmem, count);
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-14 18:26               ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-14 18:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-pm, fengguang.wu, linux-kernel, pavel, nigel, rientjes, linux-mm

On Wednesday 13 May 2009, Rafael J. Wysocki wrote:
> On Wednesday 13 May 2009, Andrew Morton wrote:
> > On Wed, 13 May 2009 10:39:25 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
> > > just once to make some room for the image and then allocates memory
> > > to apply more pressure to the memory management subsystem, if
> > > necessary.
> > > 
> > > Unfortunately, we don't seem to be able to drop shrink_all_memory()
> > > entirely just yet, because that would lead to huge performance
> > > regressions in some test cases.
> > > 
> > 
> > Isn't this a somewhat large problem?
> 
> Yes, it is.  The thing is 8 times slower (15 s vs 2 s) without the
> shrink_all_memory() in at least one test case.  100% reproducible.
> 
> > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > we're retaining it and adding even more stuff?
> 
> The idea is that afterwards we can drop shrink_all_memory() once the
> performance problem has been resolved.  Also, we now allocate memory for the
> image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> devices.  I'd think that's an improvement?
> 
> > > +/**
> > > + * compute_fraction - Compute approximate fraction x * (a/b)
> > > + * @x: Number to multiply.
> > > + * @numerator: Numerator of the fraction (a).
> > > + * @denominator: Denominator of the fraction (b).
> > >   *
> > > - *	Notice: all userland should be stopped before it is called, or
> > > - *	livelock is possible.
> > > + * Compute an approximate value of the expression x * (a/b), where a is less
> > > + * than b, all x, a, b are unsigned longs and x * a may be greater than the
> > > + * maximum unsigned long.
> > >   */
> > > +static unsigned long compute_fraction(
> > > +	unsigned long x, unsigned long numerator, unsigned long denominator)
> > 
> > I can't say I'm a great fan of the code layout here.
> > 
> > static unsigned long compute_fraction(unsigned long x, unsigned long numerator, unsigned long denominator)
> > 
> > or
> > 
> > static unsigned long compute_fraction(unsigned long x, unsigned long numerator,
> > 					unsigned long denominator)
> > 
> > would be more typical.
> 
> OK
>  
> > > +{
> > > +	unsigned long ratio = (numerator << FRACTION_SHIFT) / denominator;
> > >  
> > > -#define SHRINK_BITE	10000
> > > -static inline unsigned long __shrink_memory(long tmp)
> > > +	x *= ratio;
> > > +	return x >> FRACTION_SHIFT;
> > > +}
> > 
> > Strange function.  Would it not be simpler/clearer to do it with 64-bit
> > scalars, multiplication and do_div()?
> 
> Sure, I can do it this way too.  Is it fine to use u64 for this purpose?
> 
> > > +static unsigned long highmem_size(
> > > +	unsigned long size, unsigned long highmem, unsigned long count)
> > > +{
> > > +	return highmem > count / 2 ?
> > > +			compute_fraction(size, highmem, count) :
> > > +			size - compute_fraction(size, count - highmem, count);
> > > +}
> > 
> > This would be considerably easier to follow if we know what the three
> > arguments represent.  Amount of memory?  In what units?  `count' of
> > what?
> > 
> > The `count/2' thing there is quite mysterious.
> > 
> > <does some reverse-engineering>
> > 
> > OK, `count' is "the number of pageframes we can use".  (I don't think I
> > helped myself a lot there).  But what's up with that divde-by-two?
> > 
> > <considers poking at callers to work out what `size' is>
> > 
> > <gives up>
> > 
> > Is this code as clear as we can possibly make it??
> 
> Heh
> 
> OK, I'll do my best to clean it up.

Updated patch is appended.

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Rework shrinking of memory (rev. 2)

Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
just once to make some room for the image and then allocates memory
to apply more pressure to the memory management subsystem, if
necessary.

Unfortunately, we don't seem to be able to drop shrink_all_memory()
entirely just yet, because that would lead to huge performance
regressions in some test cases.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |  204 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 158 insertions(+), 46 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1066,69 +1066,181 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+/* Helper functions used for the shrinking of memory. */
+
+#define GFP_IMAGE	(GFP_KERNEL | __GFP_NOWARN)
+
 /**
- *	swsusp_shrink_memory -  Try to free as much memory as needed
+ * preallocate_image_pages - Allocate a number of pages for hibernation image
+ * @nr_pages: Number of page frames to allocate.
+ * @mask: GFP flags to use for the allocation.
  *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped before it is called, or
- *	livelock is possible.
+ * Return value: Number of page frames actually allocated
  */
+static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
+{
+	unsigned long nr_alloc = 0;
+
+	while (nr_pages > 0) {
+		if (!alloc_image_page(mask))
+			break;
+		nr_pages--;
+		nr_alloc++;
+	}
+
+	return nr_alloc;
+}
+
+static unsigned long preallocate_image_memory(unsigned long nr_pages)
+{
+	return preallocate_image_pages(nr_pages, GFP_IMAGE);
+}
 
-#define SHRINK_BITE	10000
-static inline unsigned long __shrink_memory(long tmp)
+#ifdef CONFIG_HIGHMEM
+static unsigned long preallocate_image_highmem(unsigned long nr_pages)
 {
-	if (tmp > SHRINK_BITE)
-		tmp = SHRINK_BITE;
-	return shrink_all_memory(tmp);
+	return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM);
 }
 
+/**
+ *  __fraction - Compute (an approximation of) x * (multiplier / base)
+ */
+static unsigned long __fraction(u64 x, u64 multiplier, u64 base)
+{
+	x *= multiplier;
+	do_div(x, base);
+	return (unsigned long)x;
+}
+
+static unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
+						unsigned long highmem,
+						unsigned long total)
+{
+	unsigned long alloc = __fraction(nr_pages, highmem, total);
+
+	return preallocate_image_pages(alloc, GFP_IMAGE | __GFP_HIGHMEM);
+}
+#else /* CONFIG_HIGHMEM */
+static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
+{
+	return 0;
+}
+
+static inline unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
+						unsigned long highmem,
+						unsigned long total)
+{
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+/**
+ * swsusp_shrink_memory -  Make the kernel release as much memory as needed
+ *
+ * To create a hibernation image it is necessary to make a copy of every page
+ * frame in use.  We also need a number of page frames to be free during
+ * hibernation for allocations made while saving the image and for device
+ * drivers, in case they need to allocate memory from their hibernation
+ * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
+ * respectively, both of which are rough estimates).  To make this happen, we
+ * compute the total number of available page frames and allocate at least
+ *
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ *
+ * of them, which corresponds to the maximum size of a hibernation image.
+ *
+ * If image_size is set below the number following from the above formula,
+ * the preallocation of memory is continued until the total number of saveable
+ * pages in the system is below the requested image size or it is impossible to
+ * allocate more memory, whichever happens first.
+ */
 int swsusp_shrink_memory(void)
 {
-	long tmp;
 	struct zone *zone;
-	unsigned long pages = 0;
-	unsigned int i = 0;
-	char *p = "-\\|/";
+	unsigned long saveable, size, max_size, count, highmem, pages = 0;
+	unsigned long alloc, pages_highmem;
 	struct timeval start, stop;
+	int error = 0;
 
-	printk(KERN_INFO "PM: Shrinking memory...  ");
+	printk(KERN_INFO "PM: Shrinking memory... ");
 	do_gettimeofday(&start);
-	do {
-		long size, highmem_size;
 
-		highmem_size = count_highmem_pages();
-		size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES;
-		tmp = size;
-		size += highmem_size;
-		for_each_populated_zone(zone) {
-			tmp += snapshot_additional_pages(zone);
-			if (is_highmem(zone)) {
-				highmem_size -=
-					zone_page_state(zone, NR_FREE_PAGES);
-			} else {
-				tmp -= zone_page_state(zone, NR_FREE_PAGES);
-				tmp += zone->lowmem_reserve[ZONE_NORMAL];
-			}
-		}
+	/* Count the number of saveable data pages. */
+	highmem = count_highmem_pages();
+	saveable = count_data_pages();
+
+	/*
+	 * Compute the total number of page frames we can use (count) and the
+	 * number of pages needed for image metadata (size).
+	 */
+	count = saveable;
+	saveable += highmem;
+	size = 0;
+	for_each_populated_zone(zone) {
+		size += snapshot_additional_pages(zone);
+		if (is_highmem(zone))
+			highmem += zone_page_state(zone, NR_FREE_PAGES);
+		else
+			count += zone_page_state(zone, NR_FREE_PAGES);
+	}
+	count += highmem;
+	count -= totalreserve_pages;
+
+	/* Compute the maximum number of saveable pages to leave in memory. */
+	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
+	if (size > max_size)
+		size = max_size;
+	/*
+	 * If the maximum is not less than the current number of saveable pages
+	 * in memory, we don't need to do anything more.
+	 */
+	if (size >= saveable)
+		goto out;
 
-		if (highmem_size < 0)
-			highmem_size = 0;
+	/*
+	 * Let the memory management subsystem know that we're going to need a
+	 * large number of page frames to allocate and make it free some memory.
+	 * NOTE: If this is not done, performance will be hurt badly in some
+	 * test cases.
+	 */
+	shrink_all_memory(saveable - size);
 
-		tmp += highmem_size;
-		if (tmp > 0) {
-			tmp = __shrink_memory(tmp);
-			if (!tmp)
-				return -ENOMEM;
-			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
-			pages += tmp;
-		}
-		printk("\b%c", p[i++%4]);
-	} while (tmp > 0);
+	/*
+	 * The number of saveable pages in memory was too high, so apply some
+	 * pressure to decrease it.  First, make room for the largest possible
+	 * image and fail if that doesn't work.  Next, try to decrease the size
+	 * of the image as much as indicated by image_size using allocations
+	 * from highmem and non-highmem zones separately.
+	 */
+	pages_highmem = preallocate_image_highmem(highmem / 2);
+	max_size += pages_highmem;
+	alloc = count - max_size;
+	pages = preallocate_image_memory(alloc);
+	if (pages < alloc) {
+		error = -ENOMEM;
+		goto free_out;
+	}
+	size = max_size - size;
+	alloc = size;
+	size = preallocate_highmem_fraction(size, highmem, count);
+	pages_highmem += size;
+	alloc -= size;
+	pages += preallocate_image_memory(alloc);
+	pages += pages_highmem;
+
+ free_out:
+	/* Release all of the preallocated page frames. */
+	swsusp_free();
+
+	if (error) {
+		printk(KERN_CONT "\n");
+		return error;
+	}
+
+ out:
 	do_gettimeofday(&stop);
-	printk("\bdone (%lu pages freed)\n", pages);
+	printk(KERN_CONT "done (preallocated %lu free pages)\n", pages);
 	swsusp_show_speed(&start, &stop, pages, "Freed");
 
 	return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-14 17:49                     ` Rafael J. Wysocki
@ 2009-05-15 13:09                       ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, linux-pm, fengguang.wu, linux-kernel, nigel,
	rientjes, linux-mm

On Thu 2009-05-14 19:49:52, Rafael J. Wysocki wrote:
> On Thursday 14 May 2009, Pavel Machek wrote:
> > 
> > > > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > > > we're retaining it and adding even more stuff?
> > > > > 
> > > > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > > > performance problem has been resolved.  Also, we now allocate memory for the
> > > > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > > > devices.  I'd think that's an improvement?
> > > > 
> > > > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > > > could be embarrassing if the devices are frozen.
> > > 
> > > They aren't, because the preallocation is done upfront, so once the OOM killer
> > > has been taken care of, it's totally safe. :-)
> > 
> > As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
> > consequences when accounting goes wrong. (New kernel's idea of what is
> > on disk will differ from what is _really_ on disk.)
> > 
> > If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.
> > 
> > If accounting is wrong, GFP_ATOMIC will fail with NULL, while
> > GFP_KERNEL will do something bad.
> > 
> > I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 
> 
> Repeating myself: with this and the next patch applied, we preallocate memory
> for the image _before_ freezing devices and therefore it is safe to use
> GFP_KERNEL, because the OOM killer has been taken care of by [3/6].

Aha, I misparsed the sentecnes above.

Acked-by: Pavel Machek <pavel@ucw.cz>
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
  2009-05-14 17:49                     ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-05-15 13:09                     ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, linux-mm, rientjes, Andrew Morton, fengguang.wu, linux-pm

On Thu 2009-05-14 19:49:52, Rafael J. Wysocki wrote:
> On Thursday 14 May 2009, Pavel Machek wrote:
> > 
> > > > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > > > we're retaining it and adding even more stuff?
> > > > > 
> > > > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > > > performance problem has been resolved.  Also, we now allocate memory for the
> > > > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > > > devices.  I'd think that's an improvement?
> > > > 
> > > > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > > > could be embarrassing if the devices are frozen.
> > > 
> > > They aren't, because the preallocation is done upfront, so once the OOM killer
> > > has been taken care of, it's totally safe. :-)
> > 
> > As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
> > consequences when accounting goes wrong. (New kernel's idea of what is
> > on disk will differ from what is _really_ on disk.)
> > 
> > If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.
> > 
> > If accounting is wrong, GFP_ATOMIC will fail with NULL, while
> > GFP_KERNEL will do something bad.
> > 
> > I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 
> 
> Repeating myself: with this and the next patch applied, we preallocate memory
> for the image _before_ freezing devices and therefore it is safe to use
> GFP_KERNEL, because the OOM killer has been taken care of by [3/6].

Aha, I misparsed the sentecnes above.

Acked-by: Pavel Machek <pavel@ucw.cz>
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 4/6] PM/Hibernate: Rework shrinking of memory
@ 2009-05-15 13:09                       ` Pavel Machek
  0 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, linux-pm, fengguang.wu, linux-kernel, nigel,
	rientjes, linux-mm

On Thu 2009-05-14 19:49:52, Rafael J. Wysocki wrote:
> On Thursday 14 May 2009, Pavel Machek wrote:
> > 
> > > > > > The main point (I thought) was to remove shrink_all_memory().  Instead,
> > > > > > we're retaining it and adding even more stuff?
> > > > > 
> > > > > The idea is that afterwards we can drop shrink_all_memory() once the
> > > > > performance problem has been resolved.  Also, we now allocate memory for the
> > > > > image using GFP_KERNEL instead of doing it with GFP_ATOMIC after freezing
> > > > > devices.  I'd think that's an improvement?
> > > > 
> > > > Dunno.  GFP_KERNEL might attempt to do writeback/swapout/etc, which
> > > > could be embarrassing if the devices are frozen.
> > > 
> > > They aren't, because the preallocation is done upfront, so once the OOM killer
> > > has been taken care of, it's totally safe. :-)
> > 
> > As is GFP_ATOMIC. Except that GFP_KERNEL will cause catastrophic
> > consequences when accounting goes wrong. (New kernel's idea of what is
> > on disk will differ from what is _really_ on disk.)
> > 
> > If accounting is right, GFP_ATOMIC and GFP_KERNEL is equivalent.
> > 
> > If accounting is wrong, GFP_ATOMIC will fail with NULL, while
> > GFP_KERNEL will do something bad.
> > 
> > I'd keep GFP_ATOMIC (or GFP_NOIO or similar). 
> 
> Repeating myself: with this and the next patch applied, we preallocate memory
> for the image _before_ freezing devices and therefore it is safe to use
> GFP_KERNEL, because the OOM killer has been taken care of by [3/6].

Aha, I misparsed the sentecnes above.

Acked-by: Pavel Machek <pavel@ucw.cz>
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-14 17:52             ` Rafael J. Wysocki
@ 2009-05-15 13:11               ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Thu 2009-05-14 19:52:20, Rafael J. Wysocki wrote:
> On Thursday 14 May 2009, Pavel Machek wrote:
> > Hi!
> > 
> > > Since the hibernation code is now going to use allocations of memory
> > > to make enough room for the image, it can also use the page frames
> > > allocated at this stage as image page frames.  The low-level
> > > hibernation code needs to be rearranged for this purpose, but it
> > > allows us to avoid freeing a great number of pages and allocating
> > > these same pages once again later, so it generally is worth doing.
> > > 
> > > [rev. 2: Take highmem into account correctly.]
> > 
> > I don't get it. What is advantage of this patch? It makes the code
> > more complex... Is it supposed to be faster?
> 
> Yes, in some test cases it is reported to be faster (along with [4/6],
> actually).
> 
> Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
> step in this direction.

Ok, but maybe we should wait with applying this until we have patches
that actually get us rid of shrink_all_memory? Maybe it will not be
feasible for speed reasons after all, or something...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-14 17:52             ` Rafael J. Wysocki
  (?)
@ 2009-05-15 13:11             ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

On Thu 2009-05-14 19:52:20, Rafael J. Wysocki wrote:
> On Thursday 14 May 2009, Pavel Machek wrote:
> > Hi!
> > 
> > > Since the hibernation code is now going to use allocations of memory
> > > to make enough room for the image, it can also use the page frames
> > > allocated at this stage as image page frames.  The low-level
> > > hibernation code needs to be rearranged for this purpose, but it
> > > allows us to avoid freeing a great number of pages and allocating
> > > these same pages once again later, so it generally is worth doing.
> > > 
> > > [rev. 2: Take highmem into account correctly.]
> > 
> > I don't get it. What is advantage of this patch? It makes the code
> > more complex... Is it supposed to be faster?
> 
> Yes, in some test cases it is reported to be faster (along with [4/6],
> actually).
> 
> Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
> step in this direction.

Ok, but maybe we should wait with applying this until we have patches
that actually get us rid of shrink_all_memory? Maybe it will not be
feasible for speed reasons after all, or something...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
@ 2009-05-15 13:11               ` Pavel Machek
  0 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Thu 2009-05-14 19:52:20, Rafael J. Wysocki wrote:
> On Thursday 14 May 2009, Pavel Machek wrote:
> > Hi!
> > 
> > > Since the hibernation code is now going to use allocations of memory
> > > to make enough room for the image, it can also use the page frames
> > > allocated at this stage as image page frames.  The low-level
> > > hibernation code needs to be rearranged for this purpose, but it
> > > allows us to avoid freeing a great number of pages and allocating
> > > these same pages once again later, so it generally is worth doing.
> > > 
> > > [rev. 2: Take highmem into account correctly.]
> > 
> > I don't get it. What is advantage of this patch? It makes the code
> > more complex... Is it supposed to be faster?
> 
> Yes, in some test cases it is reported to be faster (along with [4/6],
> actually).
> 
> Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
> step in this direction.

Ok, but maybe we should wait with applying this until we have patches
that actually get us rid of shrink_all_memory? Maybe it will not be
feasible for speed reasons after all, or something...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-14 17:59             ` Rafael J. Wysocki
@ 2009-05-15 13:14               ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

Hi!

> > > We want to avoid attempting to free too much memory too hard during
> > > hibernation, so estimate the minimum size of the image to use as the
> > > lower limit for preallocating memory.
> > 
> > Why? Is freeing memory too slow?
> > 
> > It used to be that user controlled image size, so he was able to
> > balance "time to save image" vs. "responsiveness of system after
> > resume".
> > 
> > Does this just override user's preference when he chooses too small
> > image size?
> > 
> > > The approach here is based on the (experimental) observation that we
> > > can't free more page frames than the sum of:
> > > 
> > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > * global_page_state(NR_ACTIVE_ANON)
> > > * global_page_state(NR_INACTIVE_ANON)
> > > * global_page_state(NR_ACTIVE_FILE)
> > > * global_page_state(NR_INACTIVE_FILE)
> > > 
> > > and even that is usually impossible to free in practice, because some
> > > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > > in fact be freed.  It turns out, however, that if the sum of the
> > > above numbers is subtracted from the number of saveable pages in the
> > > system and the result is multiplied by 1.25, we get a suitable
> > > estimate of the minimum size of the image.
...
> > >  /**
> > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > + * @saveable: The total number of saveable pages in the system.
> > > + *
> > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > + * preallocating memory.
> > 
> > I don't get it. If user sets image size as 0, we should free as much
> > memory as we can. I just don't see why "we want to avoid... it".
> 
> The "as much memory as we can" is not well defined.

Well, while (1) kmalloc(1024, GFP_KERNEL | GFP_NO_OOMKILL); is
basically "as much memory as we can". I believe it is pretty well defined.

> Patches [4/6] and [5/6] make hibernation use memory allocations to force some
> memory to be freed.  However, it is not really reasonable to try to allocate
> until the allocation fails, because that stresses the memory management
> subsystem too much.  It is better to predict when it fails and stop allocating
> at that point, which is what the patch does.

Why is it wrong to stress memory management? It is a computer; it can
handle it. Does it take too long? Should the user just set image_size
higher in such case?

> The prediction is not very precise, but I think it need not be.  Even if it
> leaves a few pages more in memory, that won't be a disaster.

Well, on 128MB machine, you'll fail suspend even if it would fit if
code tried little harder...?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-14 17:59             ` Rafael J. Wysocki
  (?)
@ 2009-05-15 13:14             ` Pavel Machek
  -1 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

Hi!

> > > We want to avoid attempting to free too much memory too hard during
> > > hibernation, so estimate the minimum size of the image to use as the
> > > lower limit for preallocating memory.
> > 
> > Why? Is freeing memory too slow?
> > 
> > It used to be that user controlled image size, so he was able to
> > balance "time to save image" vs. "responsiveness of system after
> > resume".
> > 
> > Does this just override user's preference when he chooses too small
> > image size?
> > 
> > > The approach here is based on the (experimental) observation that we
> > > can't free more page frames than the sum of:
> > > 
> > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > * global_page_state(NR_ACTIVE_ANON)
> > > * global_page_state(NR_INACTIVE_ANON)
> > > * global_page_state(NR_ACTIVE_FILE)
> > > * global_page_state(NR_INACTIVE_FILE)
> > > 
> > > and even that is usually impossible to free in practice, because some
> > > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > > in fact be freed.  It turns out, however, that if the sum of the
> > > above numbers is subtracted from the number of saveable pages in the
> > > system and the result is multiplied by 1.25, we get a suitable
> > > estimate of the minimum size of the image.
...
> > >  /**
> > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > + * @saveable: The total number of saveable pages in the system.
> > > + *
> > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > + * preallocating memory.
> > 
> > I don't get it. If user sets image size as 0, we should free as much
> > memory as we can. I just don't see why "we want to avoid... it".
> 
> The "as much memory as we can" is not well defined.

Well, while (1) kmalloc(1024, GFP_KERNEL | GFP_NO_OOMKILL); is
basically "as much memory as we can". I believe it is pretty well defined.

> Patches [4/6] and [5/6] make hibernation use memory allocations to force some
> memory to be freed.  However, it is not really reasonable to try to allocate
> until the allocation fails, because that stresses the memory management
> subsystem too much.  It is better to predict when it fails and stop allocating
> at that point, which is what the patch does.

Why is it wrong to stress memory management? It is a computer; it can
handle it. Does it take too long? Should the user just set image_size
higher in such case?

> The prediction is not very precise, but I think it need not be.  Even if it
> leaves a few pages more in memory, that won't be a disaster.

Well, on 128MB machine, you'll fail suspend even if it would fit if
code tried little harder...?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-15 13:14               ` Pavel Machek
  0 siblings, 0 replies; 205+ messages in thread
From: Pavel Machek @ 2009-05-15 13:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

Hi!

> > > We want to avoid attempting to free too much memory too hard during
> > > hibernation, so estimate the minimum size of the image to use as the
> > > lower limit for preallocating memory.
> > 
> > Why? Is freeing memory too slow?
> > 
> > It used to be that user controlled image size, so he was able to
> > balance "time to save image" vs. "responsiveness of system after
> > resume".
> > 
> > Does this just override user's preference when he chooses too small
> > image size?
> > 
> > > The approach here is based on the (experimental) observation that we
> > > can't free more page frames than the sum of:
> > > 
> > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > * global_page_state(NR_ACTIVE_ANON)
> > > * global_page_state(NR_INACTIVE_ANON)
> > > * global_page_state(NR_ACTIVE_FILE)
> > > * global_page_state(NR_INACTIVE_FILE)
> > > 
> > > and even that is usually impossible to free in practice, because some
> > > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > > in fact be freed.  It turns out, however, that if the sum of the
> > > above numbers is subtracted from the number of saveable pages in the
> > > system and the result is multiplied by 1.25, we get a suitable
> > > estimate of the minimum size of the image.
...
> > >  /**
> > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > + * @saveable: The total number of saveable pages in the system.
> > > + *
> > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > + * preallocating memory.
> > 
> > I don't get it. If user sets image size as 0, we should free as much
> > memory as we can. I just don't see why "we want to avoid... it".
> 
> The "as much memory as we can" is not well defined.

Well, while (1) kmalloc(1024, GFP_KERNEL | GFP_NO_OOMKILL); is
basically "as much memory as we can". I believe it is pretty well defined.

> Patches [4/6] and [5/6] make hibernation use memory allocations to force some
> memory to be freed.  However, it is not really reasonable to try to allocate
> until the allocation fails, because that stresses the memory management
> subsystem too much.  It is better to predict when it fails and stop allocating
> at that point, which is what the patch does.

Why is it wrong to stress memory management? It is a computer; it can
handle it. Does it take too long? Should the user just set image_size
higher in such case?

> The prediction is not very precise, but I think it need not be.  Even if it
> leaves a few pages more in memory, that won't be a disaster.

Well, on 128MB machine, you'll fail suspend even if it would fit if
code tried little harder...?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-15 13:14               ` Pavel Machek
@ 2009-05-15 14:40                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-15 14:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Friday 15 May 2009, Pavel Machek wrote:
> Hi!
> 
> > > > We want to avoid attempting to free too much memory too hard during
> > > > hibernation, so estimate the minimum size of the image to use as the
> > > > lower limit for preallocating memory.
> > > 
> > > Why? Is freeing memory too slow?
> > > 
> > > It used to be that user controlled image size, so he was able to
> > > balance "time to save image" vs. "responsiveness of system after
> > > resume".
> > > 
> > > Does this just override user's preference when he chooses too small
> > > image size?
> > > 
> > > > The approach here is based on the (experimental) observation that we
> > > > can't free more page frames than the sum of:
> > > > 
> > > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > > * global_page_state(NR_ACTIVE_ANON)
> > > > * global_page_state(NR_INACTIVE_ANON)
> > > > * global_page_state(NR_ACTIVE_FILE)
> > > > * global_page_state(NR_INACTIVE_FILE)
> > > > 
> > > > and even that is usually impossible to free in practice, because some
> > > > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > > > in fact be freed.  It turns out, however, that if the sum of the
> > > > above numbers is subtracted from the number of saveable pages in the
> > > > system and the result is multiplied by 1.25, we get a suitable
> > > > estimate of the minimum size of the image.
> ...
> > > >  /**
> > > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > > + * @saveable: The total number of saveable pages in the system.
> > > > + *
> > > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > > + * preallocating memory.
> > > 
> > > I don't get it. If user sets image size as 0, we should free as much
> > > memory as we can. I just don't see why "we want to avoid... it".
> > 
> > The "as much memory as we can" is not well defined.
> 
> Well, while (1) kmalloc(1024, GFP_KERNEL | GFP_NO_OOMKILL); is
> basically "as much memory as we can". I believe it is pretty well defined.
> 
> > Patches [4/6] and [5/6] make hibernation use memory allocations to force some
> > memory to be freed.  However, it is not really reasonable to try to allocate
> > until the allocation fails, because that stresses the memory management
> > subsystem too much.  It is better to predict when it fails and stop allocating
> > at that point, which is what the patch does.
> 
> Why is it wrong to stress memory management? It is a computer; it can
> handle it. Does it take too long?

Yes.

> Should the user just set image_size higher in such case?

Yes, he should.

> > The prediction is not very precise, but I think it need not be.  Even if it
> > leaves a few pages more in memory, that won't be a disaster.
> 
> Well, on 128MB machine, you'll fail suspend even if it would fit if
> code tried little harder...?

No.  Did you notice the min_t(unsigned long, pages, max_size) in the patch?
It's there exactly for this purpose (although I don't think it's really going
to trigger in practice). :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-15 13:14               ` Pavel Machek
  (?)
  (?)
@ 2009-05-15 14:40               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-15 14:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

On Friday 15 May 2009, Pavel Machek wrote:
> Hi!
> 
> > > > We want to avoid attempting to free too much memory too hard during
> > > > hibernation, so estimate the minimum size of the image to use as the
> > > > lower limit for preallocating memory.
> > > 
> > > Why? Is freeing memory too slow?
> > > 
> > > It used to be that user controlled image size, so he was able to
> > > balance "time to save image" vs. "responsiveness of system after
> > > resume".
> > > 
> > > Does this just override user's preference when he chooses too small
> > > image size?
> > > 
> > > > The approach here is based on the (experimental) observation that we
> > > > can't free more page frames than the sum of:
> > > > 
> > > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > > * global_page_state(NR_ACTIVE_ANON)
> > > > * global_page_state(NR_INACTIVE_ANON)
> > > > * global_page_state(NR_ACTIVE_FILE)
> > > > * global_page_state(NR_INACTIVE_FILE)
> > > > 
> > > > and even that is usually impossible to free in practice, because some
> > > > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > > > in fact be freed.  It turns out, however, that if the sum of the
> > > > above numbers is subtracted from the number of saveable pages in the
> > > > system and the result is multiplied by 1.25, we get a suitable
> > > > estimate of the minimum size of the image.
> ...
> > > >  /**
> > > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > > + * @saveable: The total number of saveable pages in the system.
> > > > + *
> > > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > > + * preallocating memory.
> > > 
> > > I don't get it. If user sets image size as 0, we should free as much
> > > memory as we can. I just don't see why "we want to avoid... it".
> > 
> > The "as much memory as we can" is not well defined.
> 
> Well, while (1) kmalloc(1024, GFP_KERNEL | GFP_NO_OOMKILL); is
> basically "as much memory as we can". I believe it is pretty well defined.
> 
> > Patches [4/6] and [5/6] make hibernation use memory allocations to force some
> > memory to be freed.  However, it is not really reasonable to try to allocate
> > until the allocation fails, because that stresses the memory management
> > subsystem too much.  It is better to predict when it fails and stop allocating
> > at that point, which is what the patch does.
> 
> Why is it wrong to stress memory management? It is a computer; it can
> handle it. Does it take too long?

Yes.

> Should the user just set image_size higher in such case?

Yes, he should.

> > The prediction is not very precise, but I think it need not be.  Even if it
> > leaves a few pages more in memory, that won't be a disaster.
> 
> Well, on 128MB machine, you'll fail suspend even if it would fit if
> code tried little harder...?

No.  Did you notice the min_t(unsigned long, pages, max_size) in the patch?
It's there exactly for this purpose (although I don't think it's really going
to trigger in practice). :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-15 14:40                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-15 14:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Friday 15 May 2009, Pavel Machek wrote:
> Hi!
> 
> > > > We want to avoid attempting to free too much memory too hard during
> > > > hibernation, so estimate the minimum size of the image to use as the
> > > > lower limit for preallocating memory.
> > > 
> > > Why? Is freeing memory too slow?
> > > 
> > > It used to be that user controlled image size, so he was able to
> > > balance "time to save image" vs. "responsiveness of system after
> > > resume".
> > > 
> > > Does this just override user's preference when he chooses too small
> > > image size?
> > > 
> > > > The approach here is based on the (experimental) observation that we
> > > > can't free more page frames than the sum of:
> > > > 
> > > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > > * global_page_state(NR_ACTIVE_ANON)
> > > > * global_page_state(NR_INACTIVE_ANON)
> > > > * global_page_state(NR_ACTIVE_FILE)
> > > > * global_page_state(NR_INACTIVE_FILE)
> > > > 
> > > > and even that is usually impossible to free in practice, because some
> > > > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > > > in fact be freed.  It turns out, however, that if the sum of the
> > > > above numbers is subtracted from the number of saveable pages in the
> > > > system and the result is multiplied by 1.25, we get a suitable
> > > > estimate of the minimum size of the image.
> ...
> > > >  /**
> > > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > > + * @saveable: The total number of saveable pages in the system.
> > > > + *
> > > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > > + * preallocating memory.
> > > 
> > > I don't get it. If user sets image size as 0, we should free as much
> > > memory as we can. I just don't see why "we want to avoid... it".
> > 
> > The "as much memory as we can" is not well defined.
> 
> Well, while (1) kmalloc(1024, GFP_KERNEL | GFP_NO_OOMKILL); is
> basically "as much memory as we can". I believe it is pretty well defined.
> 
> > Patches [4/6] and [5/6] make hibernation use memory allocations to force some
> > memory to be freed.  However, it is not really reasonable to try to allocate
> > until the allocation fails, because that stresses the memory management
> > subsystem too much.  It is better to predict when it fails and stop allocating
> > at that point, which is what the patch does.
> 
> Why is it wrong to stress memory management? It is a computer; it can
> handle it. Does it take too long?

Yes.

> Should the user just set image_size higher in such case?

Yes, he should.

> > The prediction is not very precise, but I think it need not be.  Even if it
> > leaves a few pages more in memory, that won't be a disaster.
> 
> Well, on 128MB machine, you'll fail suspend even if it would fit if
> code tried little harder...?

No.  Did you notice the min_t(unsigned long, pages, max_size) in the patch?
It's there exactly for this purpose (although I don't think it's really going
to trigger in practice). :-)

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-15 13:11               ` Pavel Machek
@ 2009-05-15 14:52                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-15 14:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Friday 15 May 2009, Pavel Machek wrote:
> On Thu 2009-05-14 19:52:20, Rafael J. Wysocki wrote:
> > On Thursday 14 May 2009, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > Since the hibernation code is now going to use allocations of memory
> > > > to make enough room for the image, it can also use the page frames
> > > > allocated at this stage as image page frames.  The low-level
> > > > hibernation code needs to be rearranged for this purpose, but it
> > > > allows us to avoid freeing a great number of pages and allocating
> > > > these same pages once again later, so it generally is worth doing.
> > > > 
> > > > [rev. 2: Take highmem into account correctly.]
> > > 
> > > I don't get it. What is advantage of this patch? It makes the code
> > > more complex... Is it supposed to be faster?
> > 
> > Yes, in some test cases it is reported to be faster (along with [4/6],
> > actually).
> > 
> > Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
> > step in this direction.
> 
> Ok, but maybe we should wait with applying this until we have patches
> that actually get us rid of shrink_all_memory?

Well, the $subject patch is only an optimization of top of [4/6] that you've
just acked. ;-)

In fact [4/6] changes the approach to the memory shrinking and the $subject
one is only to avoid freeing all of the memory we've allocated and allocating
it once again later.

> Maybe it will not be feasible for speed reasons after all, or something...

At least it allows us to drop shrink_all_memory() easily for the sake of
experimentation (it's sufficient to comment out just one line of code for this
purpose).

Besides, after this patchset shrink_all_memory() is _only_ needed for
performance, so it should be possible to get rid of it relatively quckly.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  2009-05-15 13:11               ` Pavel Machek
  (?)
  (?)
@ 2009-05-15 14:52               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-15 14:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: LKML, linux-mm, David Rientjes, pm list, Wu Fengguang, Andrew Morton

On Friday 15 May 2009, Pavel Machek wrote:
> On Thu 2009-05-14 19:52:20, Rafael J. Wysocki wrote:
> > On Thursday 14 May 2009, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > Since the hibernation code is now going to use allocations of memory
> > > > to make enough room for the image, it can also use the page frames
> > > > allocated at this stage as image page frames.  The low-level
> > > > hibernation code needs to be rearranged for this purpose, but it
> > > > allows us to avoid freeing a great number of pages and allocating
> > > > these same pages once again later, so it generally is worth doing.
> > > > 
> > > > [rev. 2: Take highmem into account correctly.]
> > > 
> > > I don't get it. What is advantage of this patch? It makes the code
> > > more complex... Is it supposed to be faster?
> > 
> > Yes, in some test cases it is reported to be faster (along with [4/6],
> > actually).
> > 
> > Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
> > step in this direction.
> 
> Ok, but maybe we should wait with applying this until we have patches
> that actually get us rid of shrink_all_memory?

Well, the $subject patch is only an optimization of top of [4/6] that you've
just acked. ;-)

In fact [4/6] changes the approach to the memory shrinking and the $subject
one is only to avoid freeing all of the memory we've allocated and allocating
it once again later.

> Maybe it will not be feasible for speed reasons after all, or something...

At least it allows us to drop shrink_all_memory() easily for the sake of
experimentation (it's sufficient to comment out just one line of code for this
purpose).

Besides, after this patchset shrink_all_memory() is _only_ needed for
performance, so it should be possible to get rid of it relatively quckly.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
@ 2009-05-15 14:52                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-15 14:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: pm list, Wu Fengguang, Andrew Morton, LKML, Nigel Cunningham,
	David Rientjes, linux-mm

On Friday 15 May 2009, Pavel Machek wrote:
> On Thu 2009-05-14 19:52:20, Rafael J. Wysocki wrote:
> > On Thursday 14 May 2009, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > Since the hibernation code is now going to use allocations of memory
> > > > to make enough room for the image, it can also use the page frames
> > > > allocated at this stage as image page frames.  The low-level
> > > > hibernation code needs to be rearranged for this purpose, but it
> > > > allows us to avoid freeing a great number of pages and allocating
> > > > these same pages once again later, so it generally is worth doing.
> > > > 
> > > > [rev. 2: Take highmem into account correctly.]
> > > 
> > > I don't get it. What is advantage of this patch? It makes the code
> > > more complex... Is it supposed to be faster?
> > 
> > Yes, in some test cases it is reported to be faster (along with [4/6],
> > actually).
> > 
> > Besides, we'd like to get rid of shrink_all_memory() eventually and it is a
> > step in this direction.
> 
> Ok, but maybe we should wait with applying this until we have patches
> that actually get us rid of shrink_all_memory?

Well, the $subject patch is only an optimization of top of [4/6] that you've
just acked. ;-)

In fact [4/6] changes the approach to the memory shrinking and the $subject
one is only to avoid freeing all of the memory we've allocated and allocating
it once again later.

> Maybe it will not be feasible for speed reasons after all, or something...

At least it allows us to drop shrink_all_memory() easily for the sake of
experimentation (it's sufficient to comment out just one line of code for this
purpose).

Besides, after this patchset shrink_all_memory() is _only_ needed for
performance, so it should be possible to get rid of it relatively quckly.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-13  8:42         ` Rafael J. Wysocki
@ 2009-05-17 12:06           ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-17 12:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

Hi Rafael,

Sorry for being late.

On Wed, May 13, 2009 at 04:42:17PM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.
> 
> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)

It's a very good idea to count the numbers in a reverse way.

> and even that is usually impossible to free in practice, because some
> of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> in fact be freed.  It turns out, however, that if the sum of the
> above numbers is subtracted from the number of saveable pages in the
> system and the result is multiplied by 1.25, we get a suitable
> estimate of the minimum size of the image.

However, the "*1.25" looks like a hack. We should really apply more
constraints to the individual components.

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
>  }
>  
>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: The total number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.
> + *
> + * The minimum size of the image is computed as
> + *
> + * ([number of saveable pages] - [number of pages we can free]) * 1.25
> + *
> + * where the second term is the sum of reclaimable slab, anonymouns pages and
> + * active/inactive file pages.
> + *
> + * NOTE: It usually turns out that we can't really free all pages reported as
> + * reclaimable slab, so the number resulting from the subtraction alone is too
> + * low.  Still, it seems reasonable to assume that this number is proportional
> + * to the total number of pages that cannot be freed, which leads to the
> + * formula above.  The coefficient of proportinality in this formula, 1.25, has
> + * been determined experimentally.
> + */
> +static unsigned long minimum_image_size(unsigned long saveable)
> +{
> +	unsigned long size;
> +
> +	/* Compute the number of saveable pages we can free. */
> +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> +		+ global_page_state(NR_ACTIVE_ANON)
> +		+ global_page_state(NR_INACTIVE_ANON)
> +		+ global_page_state(NR_ACTIVE_FILE)
> +		+ global_page_state(NR_INACTIVE_FILE);

For example, we could drop the 1.25 ratio and calculate the above
reclaimable size with more meaningful constraints:

        /* slabs are not easy to reclaim */
	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;

        /* keep NR_ACTIVE_ANON */
	size += global_page_state(NR_INACTIVE_ANON);

        /* keep mapped files */
	size += global_page_state(NR_ACTIVE_FILE);
	size += global_page_state(NR_INACTIVE_FILE);
        size -= global_page_state(NR_FILE_MAPPED);

That restores the hard core working set logic in the reverse way ;)

Thanks,
Fengguang

> +	if (saveable <= size)
> +		return saveable;
> +
> +	size = saveable - size;
> +	size += (size >> 2);
> +
> +	return size;
> +}
> +
> +
> +/**
>   * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
> @@ -1229,8 +1272,8 @@ static void free_unnecessary_pages(void)
>   *
>   * If image_size is set below the number following from the above formula,
>   * the preallocation of memory is continued until the total number of saveable
> - * pages in the system is below the requested image size or it is impossible to
> - * allocate more memory, whichever happens first.
> + * pages in the system is below the requested image size or the minimum
> + * acceptable image size returned by minimum_image_size(), whichever is greater.
>   */
>  int hibernate_preallocate_memory(void)
>  {
> @@ -1291,6 +1334,11 @@ int hibernate_preallocate_memory(void)
>  		goto out;
>  	}
>  
> +	/* Estimate the minimum size of the image. */
> +	pages = minimum_image_size(saveable);
> +	if (size < pages)
> +		size = min_t(unsigned long, pages, max_size);
> +
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
>  	 * large number of page frames to allocate and make it free some memory.
> @@ -1303,8 +1351,8 @@ int hibernate_preallocate_memory(void)
>  	 * The number of saveable pages in memory was too high, so apply some
>  	 * pressure to decrease it.  First, make room for the largest possible
>  	 * image and fail if that doesn't work.  Next, try to decrease the size
> -	 * of the image as much as indicated by image_size using allocations
> -	 * from highmem and non-highmem zones separately.
> +	 * of the image as much as indicated by 'size' using allocations from
> +	 * highmem and non-highmem zones separately.
>  	 */
>  	pages_highmem = preallocate_image_highmem(highmem / 2);
>  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-13  8:42         ` Rafael J. Wysocki
                           ` (3 preceding siblings ...)
  (?)
@ 2009-05-17 12:06         ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-17 12:06 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

Hi Rafael,

Sorry for being late.

On Wed, May 13, 2009 at 04:42:17PM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.
> 
> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)

It's a very good idea to count the numbers in a reverse way.

> and even that is usually impossible to free in practice, because some
> of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> in fact be freed.  It turns out, however, that if the sum of the
> above numbers is subtracted from the number of saveable pages in the
> system and the result is multiplied by 1.25, we get a suitable
> estimate of the minimum size of the image.

However, the "*1.25" looks like a hack. We should really apply more
constraints to the individual components.

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
>  }
>  
>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: The total number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.
> + *
> + * The minimum size of the image is computed as
> + *
> + * ([number of saveable pages] - [number of pages we can free]) * 1.25
> + *
> + * where the second term is the sum of reclaimable slab, anonymouns pages and
> + * active/inactive file pages.
> + *
> + * NOTE: It usually turns out that we can't really free all pages reported as
> + * reclaimable slab, so the number resulting from the subtraction alone is too
> + * low.  Still, it seems reasonable to assume that this number is proportional
> + * to the total number of pages that cannot be freed, which leads to the
> + * formula above.  The coefficient of proportinality in this formula, 1.25, has
> + * been determined experimentally.
> + */
> +static unsigned long minimum_image_size(unsigned long saveable)
> +{
> +	unsigned long size;
> +
> +	/* Compute the number of saveable pages we can free. */
> +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> +		+ global_page_state(NR_ACTIVE_ANON)
> +		+ global_page_state(NR_INACTIVE_ANON)
> +		+ global_page_state(NR_ACTIVE_FILE)
> +		+ global_page_state(NR_INACTIVE_FILE);

For example, we could drop the 1.25 ratio and calculate the above
reclaimable size with more meaningful constraints:

        /* slabs are not easy to reclaim */
	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;

        /* keep NR_ACTIVE_ANON */
	size += global_page_state(NR_INACTIVE_ANON);

        /* keep mapped files */
	size += global_page_state(NR_ACTIVE_FILE);
	size += global_page_state(NR_INACTIVE_FILE);
        size -= global_page_state(NR_FILE_MAPPED);

That restores the hard core working set logic in the reverse way ;)

Thanks,
Fengguang

> +	if (saveable <= size)
> +		return saveable;
> +
> +	size = saveable - size;
> +	size += (size >> 2);
> +
> +	return size;
> +}
> +
> +
> +/**
>   * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
> @@ -1229,8 +1272,8 @@ static void free_unnecessary_pages(void)
>   *
>   * If image_size is set below the number following from the above formula,
>   * the preallocation of memory is continued until the total number of saveable
> - * pages in the system is below the requested image size or it is impossible to
> - * allocate more memory, whichever happens first.
> + * pages in the system is below the requested image size or the minimum
> + * acceptable image size returned by minimum_image_size(), whichever is greater.
>   */
>  int hibernate_preallocate_memory(void)
>  {
> @@ -1291,6 +1334,11 @@ int hibernate_preallocate_memory(void)
>  		goto out;
>  	}
>  
> +	/* Estimate the minimum size of the image. */
> +	pages = minimum_image_size(saveable);
> +	if (size < pages)
> +		size = min_t(unsigned long, pages, max_size);
> +
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
>  	 * large number of page frames to allocate and make it free some memory.
> @@ -1303,8 +1351,8 @@ int hibernate_preallocate_memory(void)
>  	 * The number of saveable pages in memory was too high, so apply some
>  	 * pressure to decrease it.  First, make room for the largest possible
>  	 * image and fail if that doesn't work.  Next, try to decrease the size
> -	 * of the image as much as indicated by image_size using allocations
> -	 * from highmem and non-highmem zones separately.
> +	 * of the image as much as indicated by 'size' using allocations from
> +	 * highmem and non-highmem zones separately.
>  	 */
>  	pages_highmem = preallocate_image_highmem(highmem / 2);
>  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-17 12:06           ` Wu Fengguang
  0 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-17 12:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

Hi Rafael,

Sorry for being late.

On Wed, May 13, 2009 at 04:42:17PM +0800, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.
> 
> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)

It's a very good idea to count the numbers in a reverse way.

> and even that is usually impossible to free in practice, because some
> of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> in fact be freed.  It turns out, however, that if the sum of the
> above numbers is subtracted from the number of saveable pages in the
> system and the result is multiplied by 1.25, we get a suitable
> estimate of the minimum size of the image.

However, the "*1.25" looks like a hack. We should really apply more
constraints to the individual components.

> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
>  }
>  
>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: The total number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.
> + *
> + * The minimum size of the image is computed as
> + *
> + * ([number of saveable pages] - [number of pages we can free]) * 1.25
> + *
> + * where the second term is the sum of reclaimable slab, anonymouns pages and
> + * active/inactive file pages.
> + *
> + * NOTE: It usually turns out that we can't really free all pages reported as
> + * reclaimable slab, so the number resulting from the subtraction alone is too
> + * low.  Still, it seems reasonable to assume that this number is proportional
> + * to the total number of pages that cannot be freed, which leads to the
> + * formula above.  The coefficient of proportinality in this formula, 1.25, has
> + * been determined experimentally.
> + */
> +static unsigned long minimum_image_size(unsigned long saveable)
> +{
> +	unsigned long size;
> +
> +	/* Compute the number of saveable pages we can free. */
> +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> +		+ global_page_state(NR_ACTIVE_ANON)
> +		+ global_page_state(NR_INACTIVE_ANON)
> +		+ global_page_state(NR_ACTIVE_FILE)
> +		+ global_page_state(NR_INACTIVE_FILE);

For example, we could drop the 1.25 ratio and calculate the above
reclaimable size with more meaningful constraints:

        /* slabs are not easy to reclaim */
	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;

        /* keep NR_ACTIVE_ANON */
	size += global_page_state(NR_INACTIVE_ANON);

        /* keep mapped files */
	size += global_page_state(NR_ACTIVE_FILE);
	size += global_page_state(NR_INACTIVE_FILE);
        size -= global_page_state(NR_FILE_MAPPED);

That restores the hard core working set logic in the reverse way ;)

Thanks,
Fengguang

> +	if (saveable <= size)
> +		return saveable;
> +
> +	size = saveable - size;
> +	size += (size >> 2);
> +
> +	return size;
> +}
> +
> +
> +/**
>   * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
> @@ -1229,8 +1272,8 @@ static void free_unnecessary_pages(void)
>   *
>   * If image_size is set below the number following from the above formula,
>   * the preallocation of memory is continued until the total number of saveable
> - * pages in the system is below the requested image size or it is impossible to
> - * allocate more memory, whichever happens first.
> + * pages in the system is below the requested image size or the minimum
> + * acceptable image size returned by minimum_image_size(), whichever is greater.
>   */
>  int hibernate_preallocate_memory(void)
>  {
> @@ -1291,6 +1334,11 @@ int hibernate_preallocate_memory(void)
>  		goto out;
>  	}
>  
> +	/* Estimate the minimum size of the image. */
> +	pages = minimum_image_size(saveable);
> +	if (size < pages)
> +		size = min_t(unsigned long, pages, max_size);
> +
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
>  	 * large number of page frames to allocate and make it free some memory.
> @@ -1303,8 +1351,8 @@ int hibernate_preallocate_memory(void)
>  	 * The number of saveable pages in memory was too high, so apply some
>  	 * pressure to decrease it.  First, make room for the largest possible
>  	 * image and fail if that doesn't work.  Next, try to decrease the size
> -	 * of the image as much as indicated by image_size using allocations
> -	 * from highmem and non-highmem zones separately.
> +	 * of the image as much as indicated by 'size' using allocations from
> +	 * highmem and non-highmem zones separately.
>  	 */
>  	pages_highmem = preallocate_image_highmem(highmem / 2);
>  	max_size += pages_highmem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 12:06           ` Wu Fengguang
@ 2009-05-17 12:55             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 12:55 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 17 May 2009, Wu Fengguang wrote:
> Hi Rafael,

Hi,

> Sorry for being late.

No big deal.

> On Wed, May 13, 2009 at 04:42:17PM +0800, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> > 
> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> 
> It's a very good idea to count the numbers in a reverse way.
> 
> > and even that is usually impossible to free in practice, because some
> > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > in fact be freed.  It turns out, however, that if the sum of the
> > above numbers is subtracted from the number of saveable pages in the
> > system and the result is multiplied by 1.25, we get a suitable
> > estimate of the minimum size of the image.
> 
> However, the "*1.25" looks like a hack.

It's just an experimental value.

> We should really apply more constraints to the individual components.
> 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 52 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
> >  }
> >  
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: The total number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> > + *
> > + * The minimum size of the image is computed as
> > + *
> > + * ([number of saveable pages] - [number of pages we can free]) * 1.25
> > + *
> > + * where the second term is the sum of reclaimable slab, anonymouns pages and
> > + * active/inactive file pages.
> > + *
> > + * NOTE: It usually turns out that we can't really free all pages reported as
> > + * reclaimable slab, so the number resulting from the subtraction alone is too
> > + * low.  Still, it seems reasonable to assume that this number is proportional
> > + * to the total number of pages that cannot be freed, which leads to the
> > + * formula above.  The coefficient of proportinality in this formula, 1.25, has
> > + * been determined experimentally.
> > + */
> > +static unsigned long minimum_image_size(unsigned long saveable)
> > +{
> > +	unsigned long size;
> > +
> > +	/* Compute the number of saveable pages we can free. */
> > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > +		+ global_page_state(NR_ACTIVE_ANON)
> > +		+ global_page_state(NR_INACTIVE_ANON)
> > +		+ global_page_state(NR_ACTIVE_FILE)
> > +		+ global_page_state(NR_INACTIVE_FILE);
> 
> For example, we could drop the 1.25 ratio and calculate the above
> reclaimable size with more meaningful constraints:
> 
>         /* slabs are not easy to reclaim */
> 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;

Why 1/2?
 
>         /* keep NR_ACTIVE_ANON */
> 	size += global_page_state(NR_INACTIVE_ANON);

Why exactly did you omit ACTIVE_ANON?
 	
>         /* keep mapped files */
> 	size += global_page_state(NR_ACTIVE_FILE);
> 	size += global_page_state(NR_INACTIVE_FILE);
>         size -= global_page_state(NR_FILE_MAPPED);
> 
> That restores the hard core working set logic in the reverse way ;)

I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
but I'm going to check that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 12:06           ` Wu Fengguang
  (?)
  (?)
@ 2009-05-17 12:55           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 12:55 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Sunday 17 May 2009, Wu Fengguang wrote:
> Hi Rafael,

Hi,

> Sorry for being late.

No big deal.

> On Wed, May 13, 2009 at 04:42:17PM +0800, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> > 
> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> 
> It's a very good idea to count the numbers in a reverse way.
> 
> > and even that is usually impossible to free in practice, because some
> > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > in fact be freed.  It turns out, however, that if the sum of the
> > above numbers is subtracted from the number of saveable pages in the
> > system and the result is multiplied by 1.25, we get a suitable
> > estimate of the minimum size of the image.
> 
> However, the "*1.25" looks like a hack.

It's just an experimental value.

> We should really apply more constraints to the individual components.
> 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 52 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
> >  }
> >  
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: The total number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> > + *
> > + * The minimum size of the image is computed as
> > + *
> > + * ([number of saveable pages] - [number of pages we can free]) * 1.25
> > + *
> > + * where the second term is the sum of reclaimable slab, anonymouns pages and
> > + * active/inactive file pages.
> > + *
> > + * NOTE: It usually turns out that we can't really free all pages reported as
> > + * reclaimable slab, so the number resulting from the subtraction alone is too
> > + * low.  Still, it seems reasonable to assume that this number is proportional
> > + * to the total number of pages that cannot be freed, which leads to the
> > + * formula above.  The coefficient of proportinality in this formula, 1.25, has
> > + * been determined experimentally.
> > + */
> > +static unsigned long minimum_image_size(unsigned long saveable)
> > +{
> > +	unsigned long size;
> > +
> > +	/* Compute the number of saveable pages we can free. */
> > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > +		+ global_page_state(NR_ACTIVE_ANON)
> > +		+ global_page_state(NR_INACTIVE_ANON)
> > +		+ global_page_state(NR_ACTIVE_FILE)
> > +		+ global_page_state(NR_INACTIVE_FILE);
> 
> For example, we could drop the 1.25 ratio and calculate the above
> reclaimable size with more meaningful constraints:
> 
>         /* slabs are not easy to reclaim */
> 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;

Why 1/2?
 
>         /* keep NR_ACTIVE_ANON */
> 	size += global_page_state(NR_INACTIVE_ANON);

Why exactly did you omit ACTIVE_ANON?
 	
>         /* keep mapped files */
> 	size += global_page_state(NR_ACTIVE_FILE);
> 	size += global_page_state(NR_INACTIVE_FILE);
>         size -= global_page_state(NR_FILE_MAPPED);
> 
> That restores the hard core working set logic in the reverse way ;)

I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
but I'm going to check that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-17 12:55             ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 12:55 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 17 May 2009, Wu Fengguang wrote:
> Hi Rafael,

Hi,

> Sorry for being late.

No big deal.

> On Wed, May 13, 2009 at 04:42:17PM +0800, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> > 
> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> 
> It's a very good idea to count the numbers in a reverse way.
> 
> > and even that is usually impossible to free in practice, because some
> > of the pages reported as global_page_state(NR_SLAB_RECLAIMABLE) can't
> > in fact be freed.  It turns out, however, that if the sum of the
> > above numbers is subtracted from the number of saveable pages in the
> > system and the result is multiplied by 1.25, we get a suitable
> > estimate of the minimum size of the image.
> 
> However, the "*1.25" looks like a hack.

It's just an experimental value.

> We should really apply more constraints to the individual components.
> 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   56 ++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 52 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1213,6 +1213,49 @@ static void free_unnecessary_pages(void)
> >  }
> >  
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: The total number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> > + *
> > + * The minimum size of the image is computed as
> > + *
> > + * ([number of saveable pages] - [number of pages we can free]) * 1.25
> > + *
> > + * where the second term is the sum of reclaimable slab, anonymouns pages and
> > + * active/inactive file pages.
> > + *
> > + * NOTE: It usually turns out that we can't really free all pages reported as
> > + * reclaimable slab, so the number resulting from the subtraction alone is too
> > + * low.  Still, it seems reasonable to assume that this number is proportional
> > + * to the total number of pages that cannot be freed, which leads to the
> > + * formula above.  The coefficient of proportinality in this formula, 1.25, has
> > + * been determined experimentally.
> > + */
> > +static unsigned long minimum_image_size(unsigned long saveable)
> > +{
> > +	unsigned long size;
> > +
> > +	/* Compute the number of saveable pages we can free. */
> > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > +		+ global_page_state(NR_ACTIVE_ANON)
> > +		+ global_page_state(NR_INACTIVE_ANON)
> > +		+ global_page_state(NR_ACTIVE_FILE)
> > +		+ global_page_state(NR_INACTIVE_FILE);
> 
> For example, we could drop the 1.25 ratio and calculate the above
> reclaimable size with more meaningful constraints:
> 
>         /* slabs are not easy to reclaim */
> 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;

Why 1/2?
 
>         /* keep NR_ACTIVE_ANON */
> 	size += global_page_state(NR_INACTIVE_ANON);

Why exactly did you omit ACTIVE_ANON?
 	
>         /* keep mapped files */
> 	size += global_page_state(NR_ACTIVE_FILE);
> 	size += global_page_state(NR_INACTIVE_FILE);
>         size -= global_page_state(NR_FILE_MAPPED);
> 
> That restores the hard core working set logic in the reverse way ;)

I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
but I'm going to check that.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 12:55             ` Rafael J. Wysocki
@ 2009-05-17 14:07               ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-17 14:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:

> > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > +{
> > > +	unsigned long size;
> > > +
> > > +	/* Compute the number of saveable pages we can free. */
> > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > +		+ global_page_state(NR_INACTIVE_FILE);
> > 
> > For example, we could drop the 1.25 ratio and calculate the above
> > reclaimable size with more meaningful constraints:
> > 
> >         /* slabs are not easy to reclaim */
> > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> 
> Why 1/2?

Also a very coarse value:
- we don't want to stress icache/dcache too much
  (unless they grow too large)
- my experience was that the icache/dcache are scanned in a slower
  pace than lru pages.
- most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
  of the pages are actually *in use* and cannot be freed:
        % cat /proc/sys/fs/inode-nr     
        30450   16605
        % cat /proc/sys/fs/dentry-state 
        41598   35731   45      0       0       0
  See? More than half entries are in-use. Sure many of them will actually
  become unused when dentries are freed, but in the mean time the internal
  fragmentations in the slabs can go up.

> >         /* keep NR_ACTIVE_ANON */
> > 	size += global_page_state(NR_INACTIVE_ANON);
> 
> Why exactly did you omit ACTIVE_ANON?

To keep the "core working set" :)
  	
> >         /* keep mapped files */
> > 	size += global_page_state(NR_ACTIVE_FILE);
> > 	size += global_page_state(NR_INACTIVE_FILE);
> >         size -= global_page_state(NR_FILE_MAPPED);
> > 
> > That restores the hard core working set logic in the reverse way ;)
> 
> I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> but I'm going to check that.

Yes, after updatedb. In that case simple magics numbers may not help.
In that case we should really first call shrink_slab() in a loop to
cut down the slab pages to a sane number.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 12:55             ` Rafael J. Wysocki
  (?)
@ 2009-05-17 14:07             ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-17 14:07 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:

> > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > +{
> > > +	unsigned long size;
> > > +
> > > +	/* Compute the number of saveable pages we can free. */
> > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > +		+ global_page_state(NR_INACTIVE_FILE);
> > 
> > For example, we could drop the 1.25 ratio and calculate the above
> > reclaimable size with more meaningful constraints:
> > 
> >         /* slabs are not easy to reclaim */
> > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> 
> Why 1/2?

Also a very coarse value:
- we don't want to stress icache/dcache too much
  (unless they grow too large)
- my experience was that the icache/dcache are scanned in a slower
  pace than lru pages.
- most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
  of the pages are actually *in use* and cannot be freed:
        % cat /proc/sys/fs/inode-nr     
        30450   16605
        % cat /proc/sys/fs/dentry-state 
        41598   35731   45      0       0       0
  See? More than half entries are in-use. Sure many of them will actually
  become unused when dentries are freed, but in the mean time the internal
  fragmentations in the slabs can go up.

> >         /* keep NR_ACTIVE_ANON */
> > 	size += global_page_state(NR_INACTIVE_ANON);
> 
> Why exactly did you omit ACTIVE_ANON?

To keep the "core working set" :)
  	
> >         /* keep mapped files */
> > 	size += global_page_state(NR_ACTIVE_FILE);
> > 	size += global_page_state(NR_INACTIVE_FILE);
> >         size -= global_page_state(NR_FILE_MAPPED);
> > 
> > That restores the hard core working set logic in the reverse way ;)
> 
> I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> but I'm going to check that.

Yes, after updatedb. In that case simple magics numbers may not help.
In that case we should really first call shrink_slab() in a loop to
cut down the slab pages to a sane number.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-17 14:07               ` Wu Fengguang
  0 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-17 14:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:

> > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > +{
> > > +	unsigned long size;
> > > +
> > > +	/* Compute the number of saveable pages we can free. */
> > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > +		+ global_page_state(NR_INACTIVE_FILE);
> > 
> > For example, we could drop the 1.25 ratio and calculate the above
> > reclaimable size with more meaningful constraints:
> > 
> >         /* slabs are not easy to reclaim */
> > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> 
> Why 1/2?

Also a very coarse value:
- we don't want to stress icache/dcache too much
  (unless they grow too large)
- my experience was that the icache/dcache are scanned in a slower
  pace than lru pages.
- most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
  of the pages are actually *in use* and cannot be freed:
        % cat /proc/sys/fs/inode-nr     
        30450   16605
        % cat /proc/sys/fs/dentry-state 
        41598   35731   45      0       0       0
  See? More than half entries are in-use. Sure many of them will actually
  become unused when dentries are freed, but in the mean time the internal
  fragmentations in the slabs can go up.

> >         /* keep NR_ACTIVE_ANON */
> > 	size += global_page_state(NR_INACTIVE_ANON);
> 
> Why exactly did you omit ACTIVE_ANON?

To keep the "core working set" :)
  	
> >         /* keep mapped files */
> > 	size += global_page_state(NR_ACTIVE_FILE);
> > 	size += global_page_state(NR_INACTIVE_FILE);
> >         size -= global_page_state(NR_FILE_MAPPED);
> > 
> > That restores the hard core working set logic in the reverse way ;)
> 
> I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> but I'm going to check that.

Yes, after updatedb. In that case simple magics numbers may not help.
In that case we should really first call shrink_slab() in a loop to
cut down the slab pages to a sane number.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 14:07               ` Wu Fengguang
@ 2009-05-17 16:53                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 16:53 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 17 May 2009, Wu Fengguang wrote:
> On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> 
> > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > +{
> > > > +	unsigned long size;
> > > > +
> > > > +	/* Compute the number of saveable pages we can free. */
> > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > 
> > > For example, we could drop the 1.25 ratio and calculate the above
> > > reclaimable size with more meaningful constraints:
> > > 
> > >         /* slabs are not easy to reclaim */
> > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > 
> > Why 1/2?
> 
> Also a very coarse value:
> - we don't want to stress icache/dcache too much
>   (unless they grow too large)
> - my experience was that the icache/dcache are scanned in a slower
>   pace than lru pages.

That doesn't really matter, we're talking about the minimum image size.

> - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
>   of the pages are actually *in use* and cannot be freed:
>         % cat /proc/sys/fs/inode-nr     
>         30450   16605
>         % cat /proc/sys/fs/dentry-state 
>         41598   35731   45      0       0       0
>   See? More than half entries are in-use. Sure many of them will actually
>   become unused when dentries are freed, but in the mean time the internal
>   fragmentations in the slabs can go up.
> 
> > >         /* keep NR_ACTIVE_ANON */
> > > 	size += global_page_state(NR_INACTIVE_ANON);
> > 
> > Why exactly did you omit ACTIVE_ANON?
> 
> To keep the "core working set" :)
>   	
> > >         /* keep mapped files */
> > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > 	size += global_page_state(NR_INACTIVE_FILE);
> > >         size -= global_page_state(NR_FILE_MAPPED);
> > > 
> > > That restores the hard core working set logic in the reverse way ;)
> > 
> > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > but I'm going to check that.
> 
> Yes, after updatedb. In that case simple magics numbers may not help.
> In that case we should really first call shrink_slab() in a loop to
> cut down the slab pages to a sane number.

Unfortunately your formula above doesn't also work after running
shrink_all_memory(<all saveable pages>), because the number given by it is
still too high in that case.  The resulting minimum image size is then too low.

OTOH, the number computed in accordance with my original 1.25 * (<sum>) formula
is fine in all cases I have checked (it actuall would be sufficient to take
1.2 * <sum>, but the difference is not really significant).

I don't think we can derive everything directly from the statistics collected
by the mm subsystem.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 14:07               ` Wu Fengguang
  (?)
@ 2009-05-17 16:53               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 16:53 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Sunday 17 May 2009, Wu Fengguang wrote:
> On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> 
> > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > +{
> > > > +	unsigned long size;
> > > > +
> > > > +	/* Compute the number of saveable pages we can free. */
> > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > 
> > > For example, we could drop the 1.25 ratio and calculate the above
> > > reclaimable size with more meaningful constraints:
> > > 
> > >         /* slabs are not easy to reclaim */
> > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > 
> > Why 1/2?
> 
> Also a very coarse value:
> - we don't want to stress icache/dcache too much
>   (unless they grow too large)
> - my experience was that the icache/dcache are scanned in a slower
>   pace than lru pages.

That doesn't really matter, we're talking about the minimum image size.

> - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
>   of the pages are actually *in use* and cannot be freed:
>         % cat /proc/sys/fs/inode-nr     
>         30450   16605
>         % cat /proc/sys/fs/dentry-state 
>         41598   35731   45      0       0       0
>   See? More than half entries are in-use. Sure many of them will actually
>   become unused when dentries are freed, but in the mean time the internal
>   fragmentations in the slabs can go up.
> 
> > >         /* keep NR_ACTIVE_ANON */
> > > 	size += global_page_state(NR_INACTIVE_ANON);
> > 
> > Why exactly did you omit ACTIVE_ANON?
> 
> To keep the "core working set" :)
>   	
> > >         /* keep mapped files */
> > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > 	size += global_page_state(NR_INACTIVE_FILE);
> > >         size -= global_page_state(NR_FILE_MAPPED);
> > > 
> > > That restores the hard core working set logic in the reverse way ;)
> > 
> > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > but I'm going to check that.
> 
> Yes, after updatedb. In that case simple magics numbers may not help.
> In that case we should really first call shrink_slab() in a loop to
> cut down the slab pages to a sane number.

Unfortunately your formula above doesn't also work after running
shrink_all_memory(<all saveable pages>), because the number given by it is
still too high in that case.  The resulting minimum image size is then too low.

OTOH, the number computed in accordance with my original 1.25 * (<sum>) formula
is fine in all cases I have checked (it actuall would be sufficient to take
1.2 * <sum>, but the difference is not really significant).

I don't think we can derive everything directly from the statistics collected
by the mm subsystem.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-17 16:53                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 16:53 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 17 May 2009, Wu Fengguang wrote:
> On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> 
> > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > +{
> > > > +	unsigned long size;
> > > > +
> > > > +	/* Compute the number of saveable pages we can free. */
> > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > 
> > > For example, we could drop the 1.25 ratio and calculate the above
> > > reclaimable size with more meaningful constraints:
> > > 
> > >         /* slabs are not easy to reclaim */
> > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > 
> > Why 1/2?
> 
> Also a very coarse value:
> - we don't want to stress icache/dcache too much
>   (unless they grow too large)
> - my experience was that the icache/dcache are scanned in a slower
>   pace than lru pages.

That doesn't really matter, we're talking about the minimum image size.

> - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
>   of the pages are actually *in use* and cannot be freed:
>         % cat /proc/sys/fs/inode-nr     
>         30450   16605
>         % cat /proc/sys/fs/dentry-state 
>         41598   35731   45      0       0       0
>   See? More than half entries are in-use. Sure many of them will actually
>   become unused when dentries are freed, but in the mean time the internal
>   fragmentations in the slabs can go up.
> 
> > >         /* keep NR_ACTIVE_ANON */
> > > 	size += global_page_state(NR_INACTIVE_ANON);
> > 
> > Why exactly did you omit ACTIVE_ANON?
> 
> To keep the "core working set" :)
>   	
> > >         /* keep mapped files */
> > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > 	size += global_page_state(NR_INACTIVE_FILE);
> > >         size -= global_page_state(NR_FILE_MAPPED);
> > > 
> > > That restores the hard core working set logic in the reverse way ;)
> > 
> > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > but I'm going to check that.
> 
> Yes, after updatedb. In that case simple magics numbers may not help.
> In that case we should really first call shrink_slab() in a loop to
> cut down the slab pages to a sane number.

Unfortunately your formula above doesn't also work after running
shrink_all_memory(<all saveable pages>), because the number given by it is
still too high in that case.  The resulting minimum image size is then too low.

OTOH, the number computed in accordance with my original 1.25 * (<sum>) formula
is fine in all cases I have checked (it actuall would be sufficient to take
1.2 * <sum>, but the difference is not really significant).

I don't think we can derive everything directly from the statistics collected
by the mm subsystem.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 14:07               ` Wu Fengguang
@ 2009-05-17 21:14                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 21:14 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 17 May 2009, Wu Fengguang wrote:
> On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> 
> > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > +{
> > > > +	unsigned long size;
> > > > +
> > > > +	/* Compute the number of saveable pages we can free. */
> > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > 
> > > For example, we could drop the 1.25 ratio and calculate the above
> > > reclaimable size with more meaningful constraints:
> > > 
> > >         /* slabs are not easy to reclaim */
> > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > 
> > Why 1/2?
> 
> Also a very coarse value:
> - we don't want to stress icache/dcache too much
>   (unless they grow too large)
> - my experience was that the icache/dcache are scanned in a slower
>   pace than lru pages.
> - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
>   of the pages are actually *in use* and cannot be freed:
>         % cat /proc/sys/fs/inode-nr     
>         30450   16605
>         % cat /proc/sys/fs/dentry-state 
>         41598   35731   45      0       0       0
>   See? More than half entries are in-use. Sure many of them will actually
>   become unused when dentries are freed, but in the mean time the internal
>   fragmentations in the slabs can go up.
> 
> > >         /* keep NR_ACTIVE_ANON */
> > > 	size += global_page_state(NR_INACTIVE_ANON);
> > 
> > Why exactly did you omit ACTIVE_ANON?
> 
> To keep the "core working set" :)
>   	
> > >         /* keep mapped files */
> > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > 	size += global_page_state(NR_INACTIVE_FILE);
> > >         size -= global_page_state(NR_FILE_MAPPED);
> > > 
> > > That restores the hard core working set logic in the reverse way ;)
> > 
> > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > but I'm going to check that.
> 
> Yes, after updatedb. In that case simple magics numbers may not help.
> In that case we should really first call shrink_slab() in a loop to
> cut down the slab pages to a sane number.

I have verified that the appended patch works reasonably well.

The value returned as the minimum image size is usually too high, but not very
much (on x86_64 usually about 20%) and there are no "magic" coefficients
involved any more and the computation of the minimum image size is carried out
before calling shrink_all_memory() (so it's still going to be useful after
we've dropped shrink_all_memory() at one point).

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)

We want to avoid attempting to free too much memory too hard during
hibernation, so estimate the minimum size of the image to use as the
lower limit for preallocating memory.

The approach here is based on the (experimental) observation that we
can't free more page frames than the sum of:

* global_page_state(NR_SLAB_RECLAIMABLE)
* global_page_state(NR_ACTIVE_ANON)
* global_page_state(NR_INACTIVE_ANON)
* global_page_state(NR_ACTIVE_FILE)
* global_page_state(NR_INACTIVE_FILE)

minus

* global_page_state(NR_FILE_MAPPED)

Namely, if this number is subtracted from the number of saveable
pages in the system, we get a good estimate of the minimum reasonable
size of a hibernation image.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ * @saveable: Number of saveable pages in the system.
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image to use as the lower limit for
+ * preallocating memory.
+ *
+ * We assume that the minimum image size should be proportional to
+ *
+ * [number of saveable pages] - [number of pages that can be freed in theory]
+ *
+ * where the second term is the sum of (1) reclaimable slab pages, (2) active
+ * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
+ * minus mapped file pages.
+ */
+static unsigned long minimum_image_size(unsigned long saveable)
+{
+	unsigned long size;
+
+	size = global_page_state(NR_SLAB_RECLAIMABLE)
+		+ global_page_state(NR_ACTIVE_ANON)
+		+ global_page_state(NR_INACTIVE_ANON)
+		+ global_page_state(NR_ACTIVE_FILE)
+		+ global_page_state(NR_INACTIVE_FILE)
+		- global_page_state(NR_FILE_MAPPED);
+
+	return saveable <= size ? 0 : saveable - size;
+}
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
  *
  * If image_size is set below the number following from the above formula,
  * the preallocation of memory is continued until the total number of saveable
- * pages in the system is below the requested image size or it is impossible to
- * allocate more memory, whichever happens first.
+ * pages in the system is below the requested image size or the minimum
+ * acceptable image size returned by minimum_image_size(), whichever is greater.
  */
 int hibernate_preallocate_memory(void)
 {
@@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
 		goto out;
 	}
 
+	/* Estimate the minimum size of the image. */
+	pages = minimum_image_size(saveable);
+	if (size < pages)
+		size = min_t(unsigned long, pages, max_size);
+
 	/*
 	 * Let the memory management subsystem know that we're going to need a
 	 * large number of page frames to allocate and make it free some memory.
@@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size
-	 * of the image as much as indicated by image_size using allocations
-	 * from highmem and non-highmem zones separately.
+	 * of the image as much as indicated by 'size' using allocations from
+	 * highmem and non-highmem zones separately.
 	 */
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 14:07               ` Wu Fengguang
                                 ` (2 preceding siblings ...)
  (?)
@ 2009-05-17 21:14               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 21:14 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Sunday 17 May 2009, Wu Fengguang wrote:
> On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> 
> > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > +{
> > > > +	unsigned long size;
> > > > +
> > > > +	/* Compute the number of saveable pages we can free. */
> > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > 
> > > For example, we could drop the 1.25 ratio and calculate the above
> > > reclaimable size with more meaningful constraints:
> > > 
> > >         /* slabs are not easy to reclaim */
> > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > 
> > Why 1/2?
> 
> Also a very coarse value:
> - we don't want to stress icache/dcache too much
>   (unless they grow too large)
> - my experience was that the icache/dcache are scanned in a slower
>   pace than lru pages.
> - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
>   of the pages are actually *in use* and cannot be freed:
>         % cat /proc/sys/fs/inode-nr     
>         30450   16605
>         % cat /proc/sys/fs/dentry-state 
>         41598   35731   45      0       0       0
>   See? More than half entries are in-use. Sure many of them will actually
>   become unused when dentries are freed, but in the mean time the internal
>   fragmentations in the slabs can go up.
> 
> > >         /* keep NR_ACTIVE_ANON */
> > > 	size += global_page_state(NR_INACTIVE_ANON);
> > 
> > Why exactly did you omit ACTIVE_ANON?
> 
> To keep the "core working set" :)
>   	
> > >         /* keep mapped files */
> > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > 	size += global_page_state(NR_INACTIVE_FILE);
> > >         size -= global_page_state(NR_FILE_MAPPED);
> > > 
> > > That restores the hard core working set logic in the reverse way ;)
> > 
> > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > but I'm going to check that.
> 
> Yes, after updatedb. In that case simple magics numbers may not help.
> In that case we should really first call shrink_slab() in a loop to
> cut down the slab pages to a sane number.

I have verified that the appended patch works reasonably well.

The value returned as the minimum image size is usually too high, but not very
much (on x86_64 usually about 20%) and there are no "magic" coefficients
involved any more and the computation of the minimum image size is carried out
before calling shrink_all_memory() (so it's still going to be useful after
we've dropped shrink_all_memory() at one point).

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)

We want to avoid attempting to free too much memory too hard during
hibernation, so estimate the minimum size of the image to use as the
lower limit for preallocating memory.

The approach here is based on the (experimental) observation that we
can't free more page frames than the sum of:

* global_page_state(NR_SLAB_RECLAIMABLE)
* global_page_state(NR_ACTIVE_ANON)
* global_page_state(NR_INACTIVE_ANON)
* global_page_state(NR_ACTIVE_FILE)
* global_page_state(NR_INACTIVE_FILE)

minus

* global_page_state(NR_FILE_MAPPED)

Namely, if this number is subtracted from the number of saveable
pages in the system, we get a good estimate of the minimum reasonable
size of a hibernation image.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ * @saveable: Number of saveable pages in the system.
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image to use as the lower limit for
+ * preallocating memory.
+ *
+ * We assume that the minimum image size should be proportional to
+ *
+ * [number of saveable pages] - [number of pages that can be freed in theory]
+ *
+ * where the second term is the sum of (1) reclaimable slab pages, (2) active
+ * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
+ * minus mapped file pages.
+ */
+static unsigned long minimum_image_size(unsigned long saveable)
+{
+	unsigned long size;
+
+	size = global_page_state(NR_SLAB_RECLAIMABLE)
+		+ global_page_state(NR_ACTIVE_ANON)
+		+ global_page_state(NR_INACTIVE_ANON)
+		+ global_page_state(NR_ACTIVE_FILE)
+		+ global_page_state(NR_INACTIVE_FILE)
+		- global_page_state(NR_FILE_MAPPED);
+
+	return saveable <= size ? 0 : saveable - size;
+}
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
  *
  * If image_size is set below the number following from the above formula,
  * the preallocation of memory is continued until the total number of saveable
- * pages in the system is below the requested image size or it is impossible to
- * allocate more memory, whichever happens first.
+ * pages in the system is below the requested image size or the minimum
+ * acceptable image size returned by minimum_image_size(), whichever is greater.
  */
 int hibernate_preallocate_memory(void)
 {
@@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
 		goto out;
 	}
 
+	/* Estimate the minimum size of the image. */
+	pages = minimum_image_size(saveable);
+	if (size < pages)
+		size = min_t(unsigned long, pages, max_size);
+
 	/*
 	 * Let the memory management subsystem know that we're going to need a
 	 * large number of page frames to allocate and make it free some memory.
@@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size
-	 * of the image as much as indicated by image_size using allocations
-	 * from highmem and non-highmem zones separately.
+	 * of the image as much as indicated by 'size' using allocations from
+	 * highmem and non-highmem zones separately.
 	 */
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-17 21:14                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-17 21:14 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Sunday 17 May 2009, Wu Fengguang wrote:
> On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> 
> > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > +{
> > > > +	unsigned long size;
> > > > +
> > > > +	/* Compute the number of saveable pages we can free. */
> > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > 
> > > For example, we could drop the 1.25 ratio and calculate the above
> > > reclaimable size with more meaningful constraints:
> > > 
> > >         /* slabs are not easy to reclaim */
> > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > 
> > Why 1/2?
> 
> Also a very coarse value:
> - we don't want to stress icache/dcache too much
>   (unless they grow too large)
> - my experience was that the icache/dcache are scanned in a slower
>   pace than lru pages.
> - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
>   of the pages are actually *in use* and cannot be freed:
>         % cat /proc/sys/fs/inode-nr     
>         30450   16605
>         % cat /proc/sys/fs/dentry-state 
>         41598   35731   45      0       0       0
>   See? More than half entries are in-use. Sure many of them will actually
>   become unused when dentries are freed, but in the mean time the internal
>   fragmentations in the slabs can go up.
> 
> > >         /* keep NR_ACTIVE_ANON */
> > > 	size += global_page_state(NR_INACTIVE_ANON);
> > 
> > Why exactly did you omit ACTIVE_ANON?
> 
> To keep the "core working set" :)
>   	
> > >         /* keep mapped files */
> > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > 	size += global_page_state(NR_INACTIVE_FILE);
> > >         size -= global_page_state(NR_FILE_MAPPED);
> > > 
> > > That restores the hard core working set logic in the reverse way ;)
> > 
> > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > but I'm going to check that.
> 
> Yes, after updatedb. In that case simple magics numbers may not help.
> In that case we should really first call shrink_slab() in a loop to
> cut down the slab pages to a sane number.

I have verified that the appended patch works reasonably well.

The value returned as the minimum image size is usually too high, but not very
much (on x86_64 usually about 20%) and there are no "magic" coefficients
involved any more and the computation of the minimum image size is carried out
before calling shrink_all_memory() (so it's still going to be useful after
we've dropped shrink_all_memory() at one point).

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)

We want to avoid attempting to free too much memory too hard during
hibernation, so estimate the minimum size of the image to use as the
lower limit for preallocating memory.

The approach here is based on the (experimental) observation that we
can't free more page frames than the sum of:

* global_page_state(NR_SLAB_RECLAIMABLE)
* global_page_state(NR_ACTIVE_ANON)
* global_page_state(NR_INACTIVE_ANON)
* global_page_state(NR_ACTIVE_FILE)
* global_page_state(NR_INACTIVE_FILE)

minus

* global_page_state(NR_FILE_MAPPED)

Namely, if this number is subtracted from the number of saveable
pages in the system, we get a good estimate of the minimum reasonable
size of a hibernation image.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
 }
 
 /**
+ * minimum_image_size - Estimate the minimum acceptable size of an image
+ * @saveable: Number of saveable pages in the system.
+ *
+ * We want to avoid attempting to free too much memory too hard, so estimate the
+ * minimum acceptable size of a hibernation image to use as the lower limit for
+ * preallocating memory.
+ *
+ * We assume that the minimum image size should be proportional to
+ *
+ * [number of saveable pages] - [number of pages that can be freed in theory]
+ *
+ * where the second term is the sum of (1) reclaimable slab pages, (2) active
+ * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
+ * minus mapped file pages.
+ */
+static unsigned long minimum_image_size(unsigned long saveable)
+{
+	unsigned long size;
+
+	size = global_page_state(NR_SLAB_RECLAIMABLE)
+		+ global_page_state(NR_ACTIVE_ANON)
+		+ global_page_state(NR_INACTIVE_ANON)
+		+ global_page_state(NR_ACTIVE_FILE)
+		+ global_page_state(NR_INACTIVE_FILE)
+		- global_page_state(NR_FILE_MAPPED);
+
+	return saveable <= size ? 0 : saveable - size;
+}
+
+/**
  * hibernate_preallocate_memory - Preallocate memory for hibernation image
  *
  * To create a hibernation image it is necessary to make a copy of every page
@@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
  *
  * If image_size is set below the number following from the above formula,
  * the preallocation of memory is continued until the total number of saveable
- * pages in the system is below the requested image size or it is impossible to
- * allocate more memory, whichever happens first.
+ * pages in the system is below the requested image size or the minimum
+ * acceptable image size returned by minimum_image_size(), whichever is greater.
  */
 int hibernate_preallocate_memory(void)
 {
@@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
 		goto out;
 	}
 
+	/* Estimate the minimum size of the image. */
+	pages = minimum_image_size(saveable);
+	if (size < pages)
+		size = min_t(unsigned long, pages, max_size);
+
 	/*
 	 * Let the memory management subsystem know that we're going to need a
 	 * large number of page frames to allocate and make it free some memory.
@@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
 	 * The number of saveable pages in memory was too high, so apply some
 	 * pressure to decrease it.  First, make room for the largest possible
 	 * image and fail if that doesn't work.  Next, try to decrease the size
-	 * of the image as much as indicated by image_size using allocations
-	 * from highmem and non-highmem zones separately.
+	 * of the image as much as indicated by 'size' using allocations from
+	 * highmem and non-highmem zones separately.
 	 */
 	pages_highmem = preallocate_image_highmem(highmem / 2);
 	max_size += pages_highmem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 16:53                 ` Rafael J. Wysocki
@ 2009-05-18  8:32                   ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-18  8:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Mon, May 18, 2009 at 12:53:37AM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:
> > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > 
> > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > +{
> > > > > +	unsigned long size;
> > > > > +
> > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > 
> > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > reclaimable size with more meaningful constraints:
> > > > 
> > > >         /* slabs are not easy to reclaim */
> > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > 
> > > Why 1/2?
> > 
> > Also a very coarse value:
> > - we don't want to stress icache/dcache too much
> >   (unless they grow too large)
> > - my experience was that the icache/dcache are scanned in a slower
> >   pace than lru pages.
> 
> That doesn't really matter, we're talking about the minimum image size.

Have you dropped to goal to keep the minimal working set?

OK, even we only care about the success of page allocation,
NR_SLAB_RECLAIMABLE is not really all reclaimable, not even close.
In my desktop, 34MB pages are actually unreclaimable:

        echo 2 > /proc/sys/vm/drop_caches

        Slab:              69864 kB
        SReclaimable:      34052 kB

But on the other hand, updatedb can make it really huge:

        Slab:            1700852 kB
        SReclaimable:    1664456 kB

> > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> >   of the pages are actually *in use* and cannot be freed:
> >         % cat /proc/sys/fs/inode-nr     
> >         30450   16605
> >         % cat /proc/sys/fs/dentry-state 
> >         41598   35731   45      0       0       0
> >   See? More than half entries are in-use. Sure many of them will actually
> >   become unused when dentries are freed, but in the mean time the internal
> >   fragmentations in the slabs can go up.
> > 
> > > >         /* keep NR_ACTIVE_ANON */
> > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > 
> > > Why exactly did you omit ACTIVE_ANON?
> > 
> > To keep the "core working set" :)
>
> > > >         /* keep mapped files */
> > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > 
> > > > That restores the hard core working set logic in the reverse way ;)
> > > 
> > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > but I'm going to check that.
> > 
> > Yes, after updatedb. In that case simple magics numbers may not help.
> > In that case we should really first call shrink_slab() in a loop to
> > cut down the slab pages to a sane number.
> 
> Unfortunately your formula above doesn't also work after running
> shrink_all_memory(<all saveable pages>), because the number given by it is
> still too high in that case.  The resulting minimum image size is then too low.

That means more items should be preserved, hehe.

> OTOH, the number computed in accordance with my original 1.25 * (<sum>) formula
> is fine in all cases I have checked (it actuall would be sufficient to take
> 1.2 * <sum>, but the difference is not really significant).
> 
> I don't think we can derive everything directly from the statistics collected
> by the mm subsystem.

I agree that the numbers only reflects a coarse outline of the lru cache
contents and can grow large in abnormal situations. We shall not be
too dependent on them.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 16:53                 ` Rafael J. Wysocki
  (?)
@ 2009-05-18  8:32                 ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-18  8:32 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Mon, May 18, 2009 at 12:53:37AM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:
> > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > 
> > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > +{
> > > > > +	unsigned long size;
> > > > > +
> > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > 
> > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > reclaimable size with more meaningful constraints:
> > > > 
> > > >         /* slabs are not easy to reclaim */
> > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > 
> > > Why 1/2?
> > 
> > Also a very coarse value:
> > - we don't want to stress icache/dcache too much
> >   (unless they grow too large)
> > - my experience was that the icache/dcache are scanned in a slower
> >   pace than lru pages.
> 
> That doesn't really matter, we're talking about the minimum image size.

Have you dropped to goal to keep the minimal working set?

OK, even we only care about the success of page allocation,
NR_SLAB_RECLAIMABLE is not really all reclaimable, not even close.
In my desktop, 34MB pages are actually unreclaimable:

        echo 2 > /proc/sys/vm/drop_caches

        Slab:              69864 kB
        SReclaimable:      34052 kB

But on the other hand, updatedb can make it really huge:

        Slab:            1700852 kB
        SReclaimable:    1664456 kB

> > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> >   of the pages are actually *in use* and cannot be freed:
> >         % cat /proc/sys/fs/inode-nr     
> >         30450   16605
> >         % cat /proc/sys/fs/dentry-state 
> >         41598   35731   45      0       0       0
> >   See? More than half entries are in-use. Sure many of them will actually
> >   become unused when dentries are freed, but in the mean time the internal
> >   fragmentations in the slabs can go up.
> > 
> > > >         /* keep NR_ACTIVE_ANON */
> > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > 
> > > Why exactly did you omit ACTIVE_ANON?
> > 
> > To keep the "core working set" :)
>
> > > >         /* keep mapped files */
> > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > 
> > > > That restores the hard core working set logic in the reverse way ;)
> > > 
> > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > but I'm going to check that.
> > 
> > Yes, after updatedb. In that case simple magics numbers may not help.
> > In that case we should really first call shrink_slab() in a loop to
> > cut down the slab pages to a sane number.
> 
> Unfortunately your formula above doesn't also work after running
> shrink_all_memory(<all saveable pages>), because the number given by it is
> still too high in that case.  The resulting minimum image size is then too low.

That means more items should be preserved, hehe.

> OTOH, the number computed in accordance with my original 1.25 * (<sum>) formula
> is fine in all cases I have checked (it actuall would be sufficient to take
> 1.2 * <sum>, but the difference is not really significant).
> 
> I don't think we can derive everything directly from the statistics collected
> by the mm subsystem.

I agree that the numbers only reflects a coarse outline of the lru cache
contents and can grow large in abnormal situations. We shall not be
too dependent on them.

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-18  8:32                   ` Wu Fengguang
  0 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-18  8:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Mon, May 18, 2009 at 12:53:37AM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:
> > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > 
> > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > +{
> > > > > +	unsigned long size;
> > > > > +
> > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > 
> > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > reclaimable size with more meaningful constraints:
> > > > 
> > > >         /* slabs are not easy to reclaim */
> > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > 
> > > Why 1/2?
> > 
> > Also a very coarse value:
> > - we don't want to stress icache/dcache too much
> >   (unless they grow too large)
> > - my experience was that the icache/dcache are scanned in a slower
> >   pace than lru pages.
> 
> That doesn't really matter, we're talking about the minimum image size.

Have you dropped to goal to keep the minimal working set?

OK, even we only care about the success of page allocation,
NR_SLAB_RECLAIMABLE is not really all reclaimable, not even close.
In my desktop, 34MB pages are actually unreclaimable:

        echo 2 > /proc/sys/vm/drop_caches

        Slab:              69864 kB
        SReclaimable:      34052 kB

But on the other hand, updatedb can make it really huge:

        Slab:            1700852 kB
        SReclaimable:    1664456 kB

> > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> >   of the pages are actually *in use* and cannot be freed:
> >         % cat /proc/sys/fs/inode-nr     
> >         30450   16605
> >         % cat /proc/sys/fs/dentry-state 
> >         41598   35731   45      0       0       0
> >   See? More than half entries are in-use. Sure many of them will actually
> >   become unused when dentries are freed, but in the mean time the internal
> >   fragmentations in the slabs can go up.
> > 
> > > >         /* keep NR_ACTIVE_ANON */
> > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > 
> > > Why exactly did you omit ACTIVE_ANON?
> > 
> > To keep the "core working set" :)
>
> > > >         /* keep mapped files */
> > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > 
> > > > That restores the hard core working set logic in the reverse way ;)
> > > 
> > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > but I'm going to check that.
> > 
> > Yes, after updatedb. In that case simple magics numbers may not help.
> > In that case we should really first call shrink_slab() in a loop to
> > cut down the slab pages to a sane number.
> 
> Unfortunately your formula above doesn't also work after running
> shrink_all_memory(<all saveable pages>), because the number given by it is
> still too high in that case.  The resulting minimum image size is then too low.

That means more items should be preserved, hehe.

> OTOH, the number computed in accordance with my original 1.25 * (<sum>) formula
> is fine in all cases I have checked (it actuall would be sufficient to take
> 1.2 * <sum>, but the difference is not really significant).
> 
> I don't think we can derive everything directly from the statistics collected
> by the mm subsystem.

I agree that the numbers only reflects a coarse outline of the lru cache
contents and can grow large in abnormal situations. We shall not be
too dependent on them.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 21:14                 ` Rafael J. Wysocki
@ 2009-05-18  8:56                   ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-18  8:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:
> > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > 
> > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > +{
> > > > > +	unsigned long size;
> > > > > +
> > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > 
> > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > reclaimable size with more meaningful constraints:
> > > > 
> > > >         /* slabs are not easy to reclaim */
> > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > 
> > > Why 1/2?
> > 
> > Also a very coarse value:
> > - we don't want to stress icache/dcache too much
> >   (unless they grow too large)
> > - my experience was that the icache/dcache are scanned in a slower
> >   pace than lru pages.
> > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> >   of the pages are actually *in use* and cannot be freed:
> >         % cat /proc/sys/fs/inode-nr     
> >         30450   16605
> >         % cat /proc/sys/fs/dentry-state 
> >         41598   35731   45      0       0       0
> >   See? More than half entries are in-use. Sure many of them will actually
> >   become unused when dentries are freed, but in the mean time the internal
> >   fragmentations in the slabs can go up.
> > 
> > > >         /* keep NR_ACTIVE_ANON */
> > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > 
> > > Why exactly did you omit ACTIVE_ANON?
> > 
> > To keep the "core working set" :)
> >   	
> > > >         /* keep mapped files */
> > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > 
> > > > That restores the hard core working set logic in the reverse way ;)
> > > 
> > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > but I'm going to check that.
> > 
> > Yes, after updatedb. In that case simple magics numbers may not help.
> > In that case we should really first call shrink_slab() in a loop to
> > cut down the slab pages to a sane number.
> 
> I have verified that the appended patch works reasonably well.

This is illogical: in previous email you complained the formula

        TOTAL - MAPPED - ACTIVE_ANON - SLAB/2

gives too high number, while 

        TOTAL - MAPPED

in this patch is OK.  (I'm not claiming the first formula to be fine.)

> The value returned as the minimum image size is usually too high, but not very
> much (on x86_64 usually about 20%) and there are no "magic" coefficients

It is _OK_ for the minimum image size to be higher, that margin serves
as a safety margin as well as the working set size we want to preserve.

> involved any more and the computation of the minimum image size is carried out
> before calling shrink_all_memory() (so it's still going to be useful after
> we've dropped shrink_all_memory() at one point).

That's OK. Because shrink_all_memory() shrinks memory in a prioritized
list-after-list order.

> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> 
> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.

I'd like to advocate to add "working set preservation" as another goal
of this function, and I can even do with the formula in this patch :-)

That means, when one day more accurate working set estimation is
possible, we can extend this function to support that goal.

Thanks,
Fengguang

> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)
> 
> minus
> 
> * global_page_state(NR_FILE_MAPPED)
> 
> Namely, if this number is subtracted from the number of saveable
> pages in the system, we get a good estimate of the minimum reasonable
> size of a hibernation image.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 39 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
>  }
>  
>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: Number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.
> + *
> + * We assume that the minimum image size should be proportional to
> + *
> + * [number of saveable pages] - [number of pages that can be freed in theory]
> + *
> + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> + * minus mapped file pages.
> + */
> +static unsigned long minimum_image_size(unsigned long saveable)
> +{
> +	unsigned long size;
> +
> +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> +		+ global_page_state(NR_ACTIVE_ANON)
> +		+ global_page_state(NR_INACTIVE_ANON)
> +		+ global_page_state(NR_ACTIVE_FILE)
> +		+ global_page_state(NR_INACTIVE_FILE)
> +		- global_page_state(NR_FILE_MAPPED);
> +
> +	return saveable <= size ? 0 : saveable - size;
> +}
> +
> +/**
>   * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
> @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
>   *
>   * If image_size is set below the number following from the above formula,
>   * the preallocation of memory is continued until the total number of saveable
> - * pages in the system is below the requested image size or it is impossible to
> - * allocate more memory, whichever happens first.
> + * pages in the system is below the requested image size or the minimum
> + * acceptable image size returned by minimum_image_size(), whichever is greater.
>   */
>  int hibernate_preallocate_memory(void)
>  {
> @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
>  		goto out;
>  	}
>  
> +	/* Estimate the minimum size of the image. */
> +	pages = minimum_image_size(saveable);
> +	if (size < pages)
> +		size = min_t(unsigned long, pages, max_size);
> +
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
>  	 * large number of page frames to allocate and make it free some memory.
> @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
>  	 * The number of saveable pages in memory was too high, so apply some
>  	 * pressure to decrease it.  First, make room for the largest possible
>  	 * image and fail if that doesn't work.  Next, try to decrease the size
> -	 * of the image as much as indicated by image_size using allocations
> -	 * from highmem and non-highmem zones separately.
> +	 * of the image as much as indicated by 'size' using allocations from
> +	 * highmem and non-highmem zones separately.
>  	 */
>  	pages_highmem = preallocate_image_highmem(highmem / 2);
>  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-17 21:14                 ` Rafael J. Wysocki
  (?)
@ 2009-05-18  8:56                 ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-18  8:56 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:
> > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > 
> > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > +{
> > > > > +	unsigned long size;
> > > > > +
> > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > 
> > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > reclaimable size with more meaningful constraints:
> > > > 
> > > >         /* slabs are not easy to reclaim */
> > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > 
> > > Why 1/2?
> > 
> > Also a very coarse value:
> > - we don't want to stress icache/dcache too much
> >   (unless they grow too large)
> > - my experience was that the icache/dcache are scanned in a slower
> >   pace than lru pages.
> > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> >   of the pages are actually *in use* and cannot be freed:
> >         % cat /proc/sys/fs/inode-nr     
> >         30450   16605
> >         % cat /proc/sys/fs/dentry-state 
> >         41598   35731   45      0       0       0
> >   See? More than half entries are in-use. Sure many of them will actually
> >   become unused when dentries are freed, but in the mean time the internal
> >   fragmentations in the slabs can go up.
> > 
> > > >         /* keep NR_ACTIVE_ANON */
> > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > 
> > > Why exactly did you omit ACTIVE_ANON?
> > 
> > To keep the "core working set" :)
> >   	
> > > >         /* keep mapped files */
> > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > 
> > > > That restores the hard core working set logic in the reverse way ;)
> > > 
> > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > but I'm going to check that.
> > 
> > Yes, after updatedb. In that case simple magics numbers may not help.
> > In that case we should really first call shrink_slab() in a loop to
> > cut down the slab pages to a sane number.
> 
> I have verified that the appended patch works reasonably well.

This is illogical: in previous email you complained the formula

        TOTAL - MAPPED - ACTIVE_ANON - SLAB/2

gives too high number, while 

        TOTAL - MAPPED

in this patch is OK.  (I'm not claiming the first formula to be fine.)

> The value returned as the minimum image size is usually too high, but not very
> much (on x86_64 usually about 20%) and there are no "magic" coefficients

It is _OK_ for the minimum image size to be higher, that margin serves
as a safety margin as well as the working set size we want to preserve.

> involved any more and the computation of the minimum image size is carried out
> before calling shrink_all_memory() (so it's still going to be useful after
> we've dropped shrink_all_memory() at one point).

That's OK. Because shrink_all_memory() shrinks memory in a prioritized
list-after-list order.

> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> 
> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.

I'd like to advocate to add "working set preservation" as another goal
of this function, and I can even do with the formula in this patch :-)

That means, when one day more accurate working set estimation is
possible, we can extend this function to support that goal.

Thanks,
Fengguang

> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)
> 
> minus
> 
> * global_page_state(NR_FILE_MAPPED)
> 
> Namely, if this number is subtracted from the number of saveable
> pages in the system, we get a good estimate of the minimum reasonable
> size of a hibernation image.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 39 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
>  }
>  
>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: Number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.
> + *
> + * We assume that the minimum image size should be proportional to
> + *
> + * [number of saveable pages] - [number of pages that can be freed in theory]
> + *
> + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> + * minus mapped file pages.
> + */
> +static unsigned long minimum_image_size(unsigned long saveable)
> +{
> +	unsigned long size;
> +
> +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> +		+ global_page_state(NR_ACTIVE_ANON)
> +		+ global_page_state(NR_INACTIVE_ANON)
> +		+ global_page_state(NR_ACTIVE_FILE)
> +		+ global_page_state(NR_INACTIVE_FILE)
> +		- global_page_state(NR_FILE_MAPPED);
> +
> +	return saveable <= size ? 0 : saveable - size;
> +}
> +
> +/**
>   * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
> @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
>   *
>   * If image_size is set below the number following from the above formula,
>   * the preallocation of memory is continued until the total number of saveable
> - * pages in the system is below the requested image size or it is impossible to
> - * allocate more memory, whichever happens first.
> + * pages in the system is below the requested image size or the minimum
> + * acceptable image size returned by minimum_image_size(), whichever is greater.
>   */
>  int hibernate_preallocate_memory(void)
>  {
> @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
>  		goto out;
>  	}
>  
> +	/* Estimate the minimum size of the image. */
> +	pages = minimum_image_size(saveable);
> +	if (size < pages)
> +		size = min_t(unsigned long, pages, max_size);
> +
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
>  	 * large number of page frames to allocate and make it free some memory.
> @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
>  	 * The number of saveable pages in memory was too high, so apply some
>  	 * pressure to decrease it.  First, make room for the largest possible
>  	 * image and fail if that doesn't work.  Next, try to decrease the size
> -	 * of the image as much as indicated by image_size using allocations
> -	 * from highmem and non-highmem zones separately.
> +	 * of the image as much as indicated by 'size' using allocations from
> +	 * highmem and non-highmem zones separately.
>  	 */
>  	pages_highmem = preallocate_image_highmem(highmem / 2);
>  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-18  8:56                   ` Wu Fengguang
  0 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-18  8:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> On Sunday 17 May 2009, Wu Fengguang wrote:
> > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > 
> > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > +{
> > > > > +	unsigned long size;
> > > > > +
> > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > 
> > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > reclaimable size with more meaningful constraints:
> > > > 
> > > >         /* slabs are not easy to reclaim */
> > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > 
> > > Why 1/2?
> > 
> > Also a very coarse value:
> > - we don't want to stress icache/dcache too much
> >   (unless they grow too large)
> > - my experience was that the icache/dcache are scanned in a slower
> >   pace than lru pages.
> > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> >   of the pages are actually *in use* and cannot be freed:
> >         % cat /proc/sys/fs/inode-nr     
> >         30450   16605
> >         % cat /proc/sys/fs/dentry-state 
> >         41598   35731   45      0       0       0
> >   See? More than half entries are in-use. Sure many of them will actually
> >   become unused when dentries are freed, but in the mean time the internal
> >   fragmentations in the slabs can go up.
> > 
> > > >         /* keep NR_ACTIVE_ANON */
> > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > 
> > > Why exactly did you omit ACTIVE_ANON?
> > 
> > To keep the "core working set" :)
> >   	
> > > >         /* keep mapped files */
> > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > 
> > > > That restores the hard core working set logic in the reverse way ;)
> > > 
> > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > but I'm going to check that.
> > 
> > Yes, after updatedb. In that case simple magics numbers may not help.
> > In that case we should really first call shrink_slab() in a loop to
> > cut down the slab pages to a sane number.
> 
> I have verified that the appended patch works reasonably well.

This is illogical: in previous email you complained the formula

        TOTAL - MAPPED - ACTIVE_ANON - SLAB/2

gives too high number, while 

        TOTAL - MAPPED

in this patch is OK.  (I'm not claiming the first formula to be fine.)

> The value returned as the minimum image size is usually too high, but not very
> much (on x86_64 usually about 20%) and there are no "magic" coefficients

It is _OK_ for the minimum image size to be higher, that margin serves
as a safety margin as well as the working set size we want to preserve.

> involved any more and the computation of the minimum image size is carried out
> before calling shrink_all_memory() (so it's still going to be useful after
> we've dropped shrink_all_memory() at one point).

That's OK. Because shrink_all_memory() shrinks memory in a prioritized
list-after-list order.

> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> 
> We want to avoid attempting to free too much memory too hard during
> hibernation, so estimate the minimum size of the image to use as the
> lower limit for preallocating memory.

I'd like to advocate to add "working set preservation" as another goal
of this function, and I can even do with the formula in this patch :-)

That means, when one day more accurate working set estimation is
possible, we can extend this function to support that goal.

Thanks,
Fengguang

> The approach here is based on the (experimental) observation that we
> can't free more page frames than the sum of:
> 
> * global_page_state(NR_SLAB_RECLAIMABLE)
> * global_page_state(NR_ACTIVE_ANON)
> * global_page_state(NR_INACTIVE_ANON)
> * global_page_state(NR_ACTIVE_FILE)
> * global_page_state(NR_INACTIVE_FILE)
> 
> minus
> 
> * global_page_state(NR_FILE_MAPPED)
> 
> Namely, if this number is subtracted from the number of saveable
> pages in the system, we get a good estimate of the minimum reasonable
> size of a hibernation image.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 39 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
>  }
>  
>  /**
> + * minimum_image_size - Estimate the minimum acceptable size of an image
> + * @saveable: Number of saveable pages in the system.
> + *
> + * We want to avoid attempting to free too much memory too hard, so estimate the
> + * minimum acceptable size of a hibernation image to use as the lower limit for
> + * preallocating memory.
> + *
> + * We assume that the minimum image size should be proportional to
> + *
> + * [number of saveable pages] - [number of pages that can be freed in theory]
> + *
> + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> + * minus mapped file pages.
> + */
> +static unsigned long minimum_image_size(unsigned long saveable)
> +{
> +	unsigned long size;
> +
> +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> +		+ global_page_state(NR_ACTIVE_ANON)
> +		+ global_page_state(NR_INACTIVE_ANON)
> +		+ global_page_state(NR_ACTIVE_FILE)
> +		+ global_page_state(NR_INACTIVE_FILE)
> +		- global_page_state(NR_FILE_MAPPED);
> +
> +	return saveable <= size ? 0 : saveable - size;
> +}
> +
> +/**
>   * hibernate_preallocate_memory - Preallocate memory for hibernation image
>   *
>   * To create a hibernation image it is necessary to make a copy of every page
> @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
>   *
>   * If image_size is set below the number following from the above formula,
>   * the preallocation of memory is continued until the total number of saveable
> - * pages in the system is below the requested image size or it is impossible to
> - * allocate more memory, whichever happens first.
> + * pages in the system is below the requested image size or the minimum
> + * acceptable image size returned by minimum_image_size(), whichever is greater.
>   */
>  int hibernate_preallocate_memory(void)
>  {
> @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
>  		goto out;
>  	}
>  
> +	/* Estimate the minimum size of the image. */
> +	pages = minimum_image_size(saveable);
> +	if (size < pages)
> +		size = min_t(unsigned long, pages, max_size);
> +
>  	/*
>  	 * Let the memory management subsystem know that we're going to need a
>  	 * large number of page frames to allocate and make it free some memory.
> @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
>  	 * The number of saveable pages in memory was too high, so apply some
>  	 * pressure to decrease it.  First, make room for the largest possible
>  	 * image and fail if that doesn't work.  Next, try to decrease the size
> -	 * of the image as much as indicated by image_size using allocations
> -	 * from highmem and non-highmem zones separately.
> +	 * of the image as much as indicated by 'size' using allocations from
> +	 * highmem and non-highmem zones separately.
>  	 */
>  	pages_highmem = preallocate_image_highmem(highmem / 2);
>  	max_size += pages_highmem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-18  8:56                   ` Wu Fengguang
@ 2009-05-18 17:07                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-18 17:07 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Monday 18 May 2009, Wu Fengguang wrote:
> On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > 
> > > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > > +{
> > > > > > +	unsigned long size;
> > > > > > +
> > > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > > 
> > > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > > reclaimable size with more meaningful constraints:
> > > > > 
> > > > >         /* slabs are not easy to reclaim */
> > > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > > 
> > > > Why 1/2?
> > > 
> > > Also a very coarse value:
> > > - we don't want to stress icache/dcache too much
> > >   (unless they grow too large)
> > > - my experience was that the icache/dcache are scanned in a slower
> > >   pace than lru pages.
> > > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> > >   of the pages are actually *in use* and cannot be freed:
> > >         % cat /proc/sys/fs/inode-nr     
> > >         30450   16605
> > >         % cat /proc/sys/fs/dentry-state 
> > >         41598   35731   45      0       0       0
> > >   See? More than half entries are in-use. Sure many of them will actually
> > >   become unused when dentries are freed, but in the mean time the internal
> > >   fragmentations in the slabs can go up.
> > > 
> > > > >         /* keep NR_ACTIVE_ANON */
> > > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > > 
> > > > Why exactly did you omit ACTIVE_ANON?
> > > 
> > > To keep the "core working set" :)
> > >   	
> > > > >         /* keep mapped files */
> > > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > > 
> > > > > That restores the hard core working set logic in the reverse way ;)
> > > > 
> > > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > > but I'm going to check that.
> > > 
> > > Yes, after updatedb. In that case simple magics numbers may not help.
> > > In that case we should really first call shrink_slab() in a loop to
> > > cut down the slab pages to a sane number.
> > 
> > I have verified that the appended patch works reasonably well.
> 
> This is illogical: in previous email you complained the formula
> 
>         TOTAL - MAPPED - ACTIVE_ANON - SLAB/2
> 
> gives too high number, while 
> 
>         TOTAL - MAPPED
> 
> in this patch is OK.  (I'm not claiming the first formula to be fine.)

I wasn't precise enough. :-)

The problem with the first formula is that it's not really useful when used
_before_ running shrink_all_memory(), becuase it may give arbitraty result
in that case (everything depends on the preceding memory usage pattern).
However, if it is used _after_ running shrink_all_memory(<all saveable pages>),
the resulting minimum image size is usually (most often) below the real minimum
number of saveable pages that can stay in memory.

The second formula, OTOH, doesn't depend so much on the preceding memory usage
pattern and therefore it seems to be suitable for computing the estimate of the
minimum image size _before_ running shrink_all_memory().  Still, when used
_after_ running shrink_all_memory(<all saveable pages>), it will give a number
below the actual minimum number of saveable pages (ie. not a really suitable
one).

Now, since we're going to get rid of shrink_all_memory() at one point, I think
we should be looking for a formula suitable for using before it's called.
This, IMO, the second one is just about right. :-)

> > The value returned as the minimum image size is usually too high, but not very
> > much (on x86_64 usually about 20%) and there are no "magic" coefficients
> 
> It is _OK_ for the minimum image size to be higher, that margin serves
> as a safety margin as well as the working set size we want to preserve.

I didn't say it wasn't OK. :-)  It's totally fine by me.

> > involved any more and the computation of the minimum image size is carried out
> > before calling shrink_all_memory() (so it's still going to be useful after
> > we've dropped shrink_all_memory() at one point).
> 
> That's OK. Because shrink_all_memory() shrinks memory in a prioritized
> list-after-list order.
> 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> > 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> 
> I'd like to advocate to add "working set preservation" as another goal
> of this function, and I can even do with the formula in this patch :-)
>
> That means, when one day more accurate working set estimation is
> possible, we can extend this function to support that goal.

OK, so do you think it's fine to go with the patch below for now?

Thanks,
Rafael


> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> > 
> > minus
> > 
> > * global_page_state(NR_FILE_MAPPED)
> > 
> > Namely, if this number is subtracted from the number of saveable
> > pages in the system, we get a good estimate of the minimum reasonable
> > size of a hibernation image.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 39 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
> >  }
> >  
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: Number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> > + *
> > + * We assume that the minimum image size should be proportional to
> > + *
> > + * [number of saveable pages] - [number of pages that can be freed in theory]
> > + *
> > + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> > + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> > + * minus mapped file pages.
> > + */
> > +static unsigned long minimum_image_size(unsigned long saveable)
> > +{
> > +	unsigned long size;
> > +
> > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > +		+ global_page_state(NR_ACTIVE_ANON)
> > +		+ global_page_state(NR_INACTIVE_ANON)
> > +		+ global_page_state(NR_ACTIVE_FILE)
> > +		+ global_page_state(NR_INACTIVE_FILE)
> > +		- global_page_state(NR_FILE_MAPPED);
> > +
> > +	return saveable <= size ? 0 : saveable - size;
> > +}
> > +
> > +/**
> >   * hibernate_preallocate_memory - Preallocate memory for hibernation image
> >   *
> >   * To create a hibernation image it is necessary to make a copy of every page
> > @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
> >   *
> >   * If image_size is set below the number following from the above formula,
> >   * the preallocation of memory is continued until the total number of saveable
> > - * pages in the system is below the requested image size or it is impossible to
> > - * allocate more memory, whichever happens first.
> > + * pages in the system is below the requested image size or the minimum
> > + * acceptable image size returned by minimum_image_size(), whichever is greater.
> >   */
> >  int hibernate_preallocate_memory(void)
> >  {
> > @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
> >  		goto out;
> >  	}
> >  
> > +	/* Estimate the minimum size of the image. */
> > +	pages = minimum_image_size(saveable);
> > +	if (size < pages)
> > +		size = min_t(unsigned long, pages, max_size);
> > +
> >  	/*
> >  	 * Let the memory management subsystem know that we're going to need a
> >  	 * large number of page frames to allocate and make it free some memory.
> > @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
> >  	 * The number of saveable pages in memory was too high, so apply some
> >  	 * pressure to decrease it.  First, make room for the largest possible
> >  	 * image and fail if that doesn't work.  Next, try to decrease the size
> > -	 * of the image as much as indicated by image_size using allocations
> > -	 * from highmem and non-highmem zones separately.
> > +	 * of the image as much as indicated by 'size' using allocations from
> > +	 * highmem and non-highmem zones separately.
> >  	 */
> >  	pages_highmem = preallocate_image_highmem(highmem / 2);
> >  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-18  8:56                   ` Wu Fengguang
  (?)
@ 2009-05-18 17:07                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-18 17:07 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Monday 18 May 2009, Wu Fengguang wrote:
> On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > 
> > > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > > +{
> > > > > > +	unsigned long size;
> > > > > > +
> > > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > > 
> > > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > > reclaimable size with more meaningful constraints:
> > > > > 
> > > > >         /* slabs are not easy to reclaim */
> > > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > > 
> > > > Why 1/2?
> > > 
> > > Also a very coarse value:
> > > - we don't want to stress icache/dcache too much
> > >   (unless they grow too large)
> > > - my experience was that the icache/dcache are scanned in a slower
> > >   pace than lru pages.
> > > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> > >   of the pages are actually *in use* and cannot be freed:
> > >         % cat /proc/sys/fs/inode-nr     
> > >         30450   16605
> > >         % cat /proc/sys/fs/dentry-state 
> > >         41598   35731   45      0       0       0
> > >   See? More than half entries are in-use. Sure many of them will actually
> > >   become unused when dentries are freed, but in the mean time the internal
> > >   fragmentations in the slabs can go up.
> > > 
> > > > >         /* keep NR_ACTIVE_ANON */
> > > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > > 
> > > > Why exactly did you omit ACTIVE_ANON?
> > > 
> > > To keep the "core working set" :)
> > >   	
> > > > >         /* keep mapped files */
> > > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > > 
> > > > > That restores the hard core working set logic in the reverse way ;)
> > > > 
> > > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > > but I'm going to check that.
> > > 
> > > Yes, after updatedb. In that case simple magics numbers may not help.
> > > In that case we should really first call shrink_slab() in a loop to
> > > cut down the slab pages to a sane number.
> > 
> > I have verified that the appended patch works reasonably well.
> 
> This is illogical: in previous email you complained the formula
> 
>         TOTAL - MAPPED - ACTIVE_ANON - SLAB/2
> 
> gives too high number, while 
> 
>         TOTAL - MAPPED
> 
> in this patch is OK.  (I'm not claiming the first formula to be fine.)

I wasn't precise enough. :-)

The problem with the first formula is that it's not really useful when used
_before_ running shrink_all_memory(), becuase it may give arbitraty result
in that case (everything depends on the preceding memory usage pattern).
However, if it is used _after_ running shrink_all_memory(<all saveable pages>),
the resulting minimum image size is usually (most often) below the real minimum
number of saveable pages that can stay in memory.

The second formula, OTOH, doesn't depend so much on the preceding memory usage
pattern and therefore it seems to be suitable for computing the estimate of the
minimum image size _before_ running shrink_all_memory().  Still, when used
_after_ running shrink_all_memory(<all saveable pages>), it will give a number
below the actual minimum number of saveable pages (ie. not a really suitable
one).

Now, since we're going to get rid of shrink_all_memory() at one point, I think
we should be looking for a formula suitable for using before it's called.
This, IMO, the second one is just about right. :-)

> > The value returned as the minimum image size is usually too high, but not very
> > much (on x86_64 usually about 20%) and there are no "magic" coefficients
> 
> It is _OK_ for the minimum image size to be higher, that margin serves
> as a safety margin as well as the working set size we want to preserve.

I didn't say it wasn't OK. :-)  It's totally fine by me.

> > involved any more and the computation of the minimum image size is carried out
> > before calling shrink_all_memory() (so it's still going to be useful after
> > we've dropped shrink_all_memory() at one point).
> 
> That's OK. Because shrink_all_memory() shrinks memory in a prioritized
> list-after-list order.
> 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> > 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> 
> I'd like to advocate to add "working set preservation" as another goal
> of this function, and I can even do with the formula in this patch :-)
>
> That means, when one day more accurate working set estimation is
> possible, we can extend this function to support that goal.

OK, so do you think it's fine to go with the patch below for now?

Thanks,
Rafael


> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> > 
> > minus
> > 
> > * global_page_state(NR_FILE_MAPPED)
> > 
> > Namely, if this number is subtracted from the number of saveable
> > pages in the system, we get a good estimate of the minimum reasonable
> > size of a hibernation image.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 39 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
> >  }
> >  
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: Number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> > + *
> > + * We assume that the minimum image size should be proportional to
> > + *
> > + * [number of saveable pages] - [number of pages that can be freed in theory]
> > + *
> > + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> > + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> > + * minus mapped file pages.
> > + */
> > +static unsigned long minimum_image_size(unsigned long saveable)
> > +{
> > +	unsigned long size;
> > +
> > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > +		+ global_page_state(NR_ACTIVE_ANON)
> > +		+ global_page_state(NR_INACTIVE_ANON)
> > +		+ global_page_state(NR_ACTIVE_FILE)
> > +		+ global_page_state(NR_INACTIVE_FILE)
> > +		- global_page_state(NR_FILE_MAPPED);
> > +
> > +	return saveable <= size ? 0 : saveable - size;
> > +}
> > +
> > +/**
> >   * hibernate_preallocate_memory - Preallocate memory for hibernation image
> >   *
> >   * To create a hibernation image it is necessary to make a copy of every page
> > @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
> >   *
> >   * If image_size is set below the number following from the above formula,
> >   * the preallocation of memory is continued until the total number of saveable
> > - * pages in the system is below the requested image size or it is impossible to
> > - * allocate more memory, whichever happens first.
> > + * pages in the system is below the requested image size or the minimum
> > + * acceptable image size returned by minimum_image_size(), whichever is greater.
> >   */
> >  int hibernate_preallocate_memory(void)
> >  {
> > @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
> >  		goto out;
> >  	}
> >  
> > +	/* Estimate the minimum size of the image. */
> > +	pages = minimum_image_size(saveable);
> > +	if (size < pages)
> > +		size = min_t(unsigned long, pages, max_size);
> > +
> >  	/*
> >  	 * Let the memory management subsystem know that we're going to need a
> >  	 * large number of page frames to allocate and make it free some memory.
> > @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
> >  	 * The number of saveable pages in memory was too high, so apply some
> >  	 * pressure to decrease it.  First, make room for the largest possible
> >  	 * image and fail if that doesn't work.  Next, try to decrease the size
> > -	 * of the image as much as indicated by image_size using allocations
> > -	 * from highmem and non-highmem zones separately.
> > +	 * of the image as much as indicated by 'size' using allocations from
> > +	 * highmem and non-highmem zones separately.
> >  	 */
> >  	pages_highmem = preallocate_image_highmem(highmem / 2);
> >  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-18 17:07                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 205+ messages in thread
From: Rafael J. Wysocki @ 2009-05-18 17:07 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Monday 18 May 2009, Wu Fengguang wrote:
> On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > 
> > > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > > +{
> > > > > > +	unsigned long size;
> > > > > > +
> > > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > > 
> > > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > > reclaimable size with more meaningful constraints:
> > > > > 
> > > > >         /* slabs are not easy to reclaim */
> > > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > > 
> > > > Why 1/2?
> > > 
> > > Also a very coarse value:
> > > - we don't want to stress icache/dcache too much
> > >   (unless they grow too large)
> > > - my experience was that the icache/dcache are scanned in a slower
> > >   pace than lru pages.
> > > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> > >   of the pages are actually *in use* and cannot be freed:
> > >         % cat /proc/sys/fs/inode-nr     
> > >         30450   16605
> > >         % cat /proc/sys/fs/dentry-state 
> > >         41598   35731   45      0       0       0
> > >   See? More than half entries are in-use. Sure many of them will actually
> > >   become unused when dentries are freed, but in the mean time the internal
> > >   fragmentations in the slabs can go up.
> > > 
> > > > >         /* keep NR_ACTIVE_ANON */
> > > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > > 
> > > > Why exactly did you omit ACTIVE_ANON?
> > > 
> > > To keep the "core working set" :)
> > >   	
> > > > >         /* keep mapped files */
> > > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > > 
> > > > > That restores the hard core working set logic in the reverse way ;)
> > > > 
> > > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > > but I'm going to check that.
> > > 
> > > Yes, after updatedb. In that case simple magics numbers may not help.
> > > In that case we should really first call shrink_slab() in a loop to
> > > cut down the slab pages to a sane number.
> > 
> > I have verified that the appended patch works reasonably well.
> 
> This is illogical: in previous email you complained the formula
> 
>         TOTAL - MAPPED - ACTIVE_ANON - SLAB/2
> 
> gives too high number, while 
> 
>         TOTAL - MAPPED
> 
> in this patch is OK.  (I'm not claiming the first formula to be fine.)

I wasn't precise enough. :-)

The problem with the first formula is that it's not really useful when used
_before_ running shrink_all_memory(), becuase it may give arbitraty result
in that case (everything depends on the preceding memory usage pattern).
However, if it is used _after_ running shrink_all_memory(<all saveable pages>),
the resulting minimum image size is usually (most often) below the real minimum
number of saveable pages that can stay in memory.

The second formula, OTOH, doesn't depend so much on the preceding memory usage
pattern and therefore it seems to be suitable for computing the estimate of the
minimum image size _before_ running shrink_all_memory().  Still, when used
_after_ running shrink_all_memory(<all saveable pages>), it will give a number
below the actual minimum number of saveable pages (ie. not a really suitable
one).

Now, since we're going to get rid of shrink_all_memory() at one point, I think
we should be looking for a formula suitable for using before it's called.
This, IMO, the second one is just about right. :-)

> > The value returned as the minimum image size is usually too high, but not very
> > much (on x86_64 usually about 20%) and there are no "magic" coefficients
> 
> It is _OK_ for the minimum image size to be higher, that margin serves
> as a safety margin as well as the working set size we want to preserve.

I didn't say it wasn't OK. :-)  It's totally fine by me.

> > involved any more and the computation of the minimum image size is carried out
> > before calling shrink_all_memory() (so it's still going to be useful after
> > we've dropped shrink_all_memory() at one point).
> 
> That's OK. Because shrink_all_memory() shrinks memory in a prioritized
> list-after-list order.
> 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> > 
> > We want to avoid attempting to free too much memory too hard during
> > hibernation, so estimate the minimum size of the image to use as the
> > lower limit for preallocating memory.
> 
> I'd like to advocate to add "working set preservation" as another goal
> of this function, and I can even do with the formula in this patch :-)
>
> That means, when one day more accurate working set estimation is
> possible, we can extend this function to support that goal.

OK, so do you think it's fine to go with the patch below for now?

Thanks,
Rafael


> > The approach here is based on the (experimental) observation that we
> > can't free more page frames than the sum of:
> > 
> > * global_page_state(NR_SLAB_RECLAIMABLE)
> > * global_page_state(NR_ACTIVE_ANON)
> > * global_page_state(NR_INACTIVE_ANON)
> > * global_page_state(NR_ACTIVE_FILE)
> > * global_page_state(NR_INACTIVE_FILE)
> > 
> > minus
> > 
> > * global_page_state(NR_FILE_MAPPED)
> > 
> > Namely, if this number is subtracted from the number of saveable
> > pages in the system, we get a good estimate of the minimum reasonable
> > size of a hibernation image.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 39 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6/kernel/power/snapshot.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/snapshot.c
> > +++ linux-2.6/kernel/power/snapshot.c
> > @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
> >  }
> >  
> >  /**
> > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > + * @saveable: Number of saveable pages in the system.
> > + *
> > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > + * preallocating memory.
> > + *
> > + * We assume that the minimum image size should be proportional to
> > + *
> > + * [number of saveable pages] - [number of pages that can be freed in theory]
> > + *
> > + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> > + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> > + * minus mapped file pages.
> > + */
> > +static unsigned long minimum_image_size(unsigned long saveable)
> > +{
> > +	unsigned long size;
> > +
> > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > +		+ global_page_state(NR_ACTIVE_ANON)
> > +		+ global_page_state(NR_INACTIVE_ANON)
> > +		+ global_page_state(NR_ACTIVE_FILE)
> > +		+ global_page_state(NR_INACTIVE_FILE)
> > +		- global_page_state(NR_FILE_MAPPED);
> > +
> > +	return saveable <= size ? 0 : saveable - size;
> > +}
> > +
> > +/**
> >   * hibernate_preallocate_memory - Preallocate memory for hibernation image
> >   *
> >   * To create a hibernation image it is necessary to make a copy of every page
> > @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
> >   *
> >   * If image_size is set below the number following from the above formula,
> >   * the preallocation of memory is continued until the total number of saveable
> > - * pages in the system is below the requested image size or it is impossible to
> > - * allocate more memory, whichever happens first.
> > + * pages in the system is below the requested image size or the minimum
> > + * acceptable image size returned by minimum_image_size(), whichever is greater.
> >   */
> >  int hibernate_preallocate_memory(void)
> >  {
> > @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
> >  		goto out;
> >  	}
> >  
> > +	/* Estimate the minimum size of the image. */
> > +	pages = minimum_image_size(saveable);
> > +	if (size < pages)
> > +		size = min_t(unsigned long, pages, max_size);
> > +
> >  	/*
> >  	 * Let the memory management subsystem know that we're going to need a
> >  	 * large number of page frames to allocate and make it free some memory.
> > @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
> >  	 * The number of saveable pages in memory was too high, so apply some
> >  	 * pressure to decrease it.  First, make room for the largest possible
> >  	 * image and fail if that doesn't work.  Next, try to decrease the size
> > -	 * of the image as much as indicated by image_size using allocations
> > -	 * from highmem and non-highmem zones separately.
> > +	 * of the image as much as indicated by 'size' using allocations from
> > +	 * highmem and non-highmem zones separately.
> >  	 */
> >  	pages_highmem = preallocate_image_highmem(highmem / 2);
> >  	max_size += pages_highmem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-18 17:07                     ` Rafael J. Wysocki
@ 2009-05-19  0:47                       ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-19  0:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Tue, May 19, 2009 at 01:07:41AM +0800, Rafael J. Wysocki wrote:
> On Monday 18 May 2009, Wu Fengguang wrote:
> > On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > > 
> > > > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > > > +{
> > > > > > > +	unsigned long size;
> > > > > > > +
> > > > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > > > 
> > > > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > > > reclaimable size with more meaningful constraints:
> > > > > > 
> > > > > >         /* slabs are not easy to reclaim */
> > > > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > > > 
> > > > > Why 1/2?
> > > > 
> > > > Also a very coarse value:
> > > > - we don't want to stress icache/dcache too much
> > > >   (unless they grow too large)
> > > > - my experience was that the icache/dcache are scanned in a slower
> > > >   pace than lru pages.
> > > > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> > > >   of the pages are actually *in use* and cannot be freed:
> > > >         % cat /proc/sys/fs/inode-nr     
> > > >         30450   16605
> > > >         % cat /proc/sys/fs/dentry-state 
> > > >         41598   35731   45      0       0       0
> > > >   See? More than half entries are in-use. Sure many of them will actually
> > > >   become unused when dentries are freed, but in the mean time the internal
> > > >   fragmentations in the slabs can go up.
> > > > 
> > > > > >         /* keep NR_ACTIVE_ANON */
> > > > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > > > 
> > > > > Why exactly did you omit ACTIVE_ANON?
> > > > 
> > > > To keep the "core working set" :)
> > > >   	
> > > > > >         /* keep mapped files */
> > > > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > > > 
> > > > > > That restores the hard core working set logic in the reverse way ;)
> > > > > 
> > > > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > > > but I'm going to check that.
> > > > 
> > > > Yes, after updatedb. In that case simple magics numbers may not help.
> > > > In that case we should really first call shrink_slab() in a loop to
> > > > cut down the slab pages to a sane number.
> > > 
> > > I have verified that the appended patch works reasonably well.
> > 
> > This is illogical: in previous email you complained the formula
> > 
> >         TOTAL - MAPPED - ACTIVE_ANON - SLAB/2
> > 
> > gives too high number, while 
> > 
> >         TOTAL - MAPPED
> > 
> > in this patch is OK.  (I'm not claiming the first formula to be fine.)
> 
> I wasn't precise enough. :-)
> 
> The problem with the first formula is that it's not really useful when used
> _before_ running shrink_all_memory(), becuase it may give arbitraty result
> in that case (everything depends on the preceding memory usage pattern).
> However, if it is used _after_ running shrink_all_memory(<all saveable pages>),
> the resulting minimum image size is usually (most often) below the real minimum
> number of saveable pages that can stay in memory.
> 
> The second formula, OTOH, doesn't depend so much on the preceding memory usage
> pattern and therefore it seems to be suitable for computing the estimate of the
> minimum image size _before_ running shrink_all_memory().  Still, when used
> _after_ running shrink_all_memory(<all saveable pages>), it will give a number
> below the actual minimum number of saveable pages (ie. not a really suitable
> one).
> 
> Now, since we're going to get rid of shrink_all_memory() at one point, I think
> we should be looking for a formula suitable for using before it's called.
> This, IMO, the second one is just about right. :-)

Ah OK, thanks for the explanation!

> > > The value returned as the minimum image size is usually too high, but not very
> > > much (on x86_64 usually about 20%) and there are no "magic" coefficients
> > 
> > It is _OK_ for the minimum image size to be higher, that margin serves
> > as a safety margin as well as the working set size we want to preserve.
> 
> I didn't say it wasn't OK. :-)  It's totally fine by me.

Great!

> > > involved any more and the computation of the minimum image size is carried out
> > > before calling shrink_all_memory() (so it's still going to be useful after
> > > we've dropped shrink_all_memory() at one point).
> > 
> > That's OK. Because shrink_all_memory() shrinks memory in a prioritized
> > list-after-list order.
> > 
> > > ---
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> > > 
> > > We want to avoid attempting to free too much memory too hard during
> > > hibernation, so estimate the minimum size of the image to use as the
> > > lower limit for preallocating memory.
> > 
> > I'd like to advocate to add "working set preservation" as another goal
> > of this function, and I can even do with the formula in this patch :-)
> >
> > That means, when one day more accurate working set estimation is
> > possible, we can extend this function to support that goal.
> 
> OK, so do you think it's fine to go with the patch below for now?

Sure, I'm fine with it.

Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

Thanks,
Fengguang

> > > The approach here is based on the (experimental) observation that we
> > > can't free more page frames than the sum of:
> > > 
> > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > * global_page_state(NR_ACTIVE_ANON)
> > > * global_page_state(NR_INACTIVE_ANON)
> > > * global_page_state(NR_ACTIVE_FILE)
> > > * global_page_state(NR_INACTIVE_FILE)
> > > 
> > > minus
> > > 
> > > * global_page_state(NR_FILE_MAPPED)
> > > 
> > > Namely, if this number is subtracted from the number of saveable
> > > pages in the system, we get a good estimate of the minimum reasonable
> > > size of a hibernation image.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > >  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 39 insertions(+), 4 deletions(-)
> > > 
> > > Index: linux-2.6/kernel/power/snapshot.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/power/snapshot.c
> > > +++ linux-2.6/kernel/power/snapshot.c
> > > @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
> > >  }
> > >  
> > >  /**
> > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > + * @saveable: Number of saveable pages in the system.
> > > + *
> > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > + * preallocating memory.
> > > + *
> > > + * We assume that the minimum image size should be proportional to
> > > + *
> > > + * [number of saveable pages] - [number of pages that can be freed in theory]
> > > + *
> > > + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> > > + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> > > + * minus mapped file pages.
> > > + */
> > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > +{
> > > +	unsigned long size;
> > > +
> > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > +		+ global_page_state(NR_INACTIVE_FILE)
> > > +		- global_page_state(NR_FILE_MAPPED);
> > > +
> > > +	return saveable <= size ? 0 : saveable - size;
> > > +}
> > > +
> > > +/**
> > >   * hibernate_preallocate_memory - Preallocate memory for hibernation image
> > >   *
> > >   * To create a hibernation image it is necessary to make a copy of every page
> > > @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
> > >   *
> > >   * If image_size is set below the number following from the above formula,
> > >   * the preallocation of memory is continued until the total number of saveable
> > > - * pages in the system is below the requested image size or it is impossible to
> > > - * allocate more memory, whichever happens first.
> > > + * pages in the system is below the requested image size or the minimum
> > > + * acceptable image size returned by minimum_image_size(), whichever is greater.
> > >   */
> > >  int hibernate_preallocate_memory(void)
> > >  {
> > > @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
> > >  		goto out;
> > >  	}
> > >  
> > > +	/* Estimate the minimum size of the image. */
> > > +	pages = minimum_image_size(saveable);
> > > +	if (size < pages)
> > > +		size = min_t(unsigned long, pages, max_size);
> > > +
> > >  	/*
> > >  	 * Let the memory management subsystem know that we're going to need a
> > >  	 * large number of page frames to allocate and make it free some memory.
> > > @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
> > >  	 * The number of saveable pages in memory was too high, so apply some
> > >  	 * pressure to decrease it.  First, make room for the largest possible
> > >  	 * image and fail if that doesn't work.  Next, try to decrease the size
> > > -	 * of the image as much as indicated by image_size using allocations
> > > -	 * from highmem and non-highmem zones separately.
> > > +	 * of the image as much as indicated by 'size' using allocations from
> > > +	 * highmem and non-highmem zones separately.
> > >  	 */
> > >  	pages_highmem = preallocate_image_highmem(highmem / 2);
> > >  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
  2009-05-18 17:07                     ` Rafael J. Wysocki
  (?)
@ 2009-05-19  0:47                     ` Wu Fengguang
  -1 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-19  0:47 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, linux-mm, David Rientjes, pm list, Andrew Morton

On Tue, May 19, 2009 at 01:07:41AM +0800, Rafael J. Wysocki wrote:
> On Monday 18 May 2009, Wu Fengguang wrote:
> > On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > > 
> > > > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > > > +{
> > > > > > > +	unsigned long size;
> > > > > > > +
> > > > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > > > 
> > > > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > > > reclaimable size with more meaningful constraints:
> > > > > > 
> > > > > >         /* slabs are not easy to reclaim */
> > > > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > > > 
> > > > > Why 1/2?
> > > > 
> > > > Also a very coarse value:
> > > > - we don't want to stress icache/dcache too much
> > > >   (unless they grow too large)
> > > > - my experience was that the icache/dcache are scanned in a slower
> > > >   pace than lru pages.
> > > > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> > > >   of the pages are actually *in use* and cannot be freed:
> > > >         % cat /proc/sys/fs/inode-nr     
> > > >         30450   16605
> > > >         % cat /proc/sys/fs/dentry-state 
> > > >         41598   35731   45      0       0       0
> > > >   See? More than half entries are in-use. Sure many of them will actually
> > > >   become unused when dentries are freed, but in the mean time the internal
> > > >   fragmentations in the slabs can go up.
> > > > 
> > > > > >         /* keep NR_ACTIVE_ANON */
> > > > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > > > 
> > > > > Why exactly did you omit ACTIVE_ANON?
> > > > 
> > > > To keep the "core working set" :)
> > > >   	
> > > > > >         /* keep mapped files */
> > > > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > > > 
> > > > > > That restores the hard core working set logic in the reverse way ;)
> > > > > 
> > > > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > > > but I'm going to check that.
> > > > 
> > > > Yes, after updatedb. In that case simple magics numbers may not help.
> > > > In that case we should really first call shrink_slab() in a loop to
> > > > cut down the slab pages to a sane number.
> > > 
> > > I have verified that the appended patch works reasonably well.
> > 
> > This is illogical: in previous email you complained the formula
> > 
> >         TOTAL - MAPPED - ACTIVE_ANON - SLAB/2
> > 
> > gives too high number, while 
> > 
> >         TOTAL - MAPPED
> > 
> > in this patch is OK.  (I'm not claiming the first formula to be fine.)
> 
> I wasn't precise enough. :-)
> 
> The problem with the first formula is that it's not really useful when used
> _before_ running shrink_all_memory(), becuase it may give arbitraty result
> in that case (everything depends on the preceding memory usage pattern).
> However, if it is used _after_ running shrink_all_memory(<all saveable pages>),
> the resulting minimum image size is usually (most often) below the real minimum
> number of saveable pages that can stay in memory.
> 
> The second formula, OTOH, doesn't depend so much on the preceding memory usage
> pattern and therefore it seems to be suitable for computing the estimate of the
> minimum image size _before_ running shrink_all_memory().  Still, when used
> _after_ running shrink_all_memory(<all saveable pages>), it will give a number
> below the actual minimum number of saveable pages (ie. not a really suitable
> one).
> 
> Now, since we're going to get rid of shrink_all_memory() at one point, I think
> we should be looking for a formula suitable for using before it's called.
> This, IMO, the second one is just about right. :-)

Ah OK, thanks for the explanation!

> > > The value returned as the minimum image size is usually too high, but not very
> > > much (on x86_64 usually about 20%) and there are no "magic" coefficients
> > 
> > It is _OK_ for the minimum image size to be higher, that margin serves
> > as a safety margin as well as the working set size we want to preserve.
> 
> I didn't say it wasn't OK. :-)  It's totally fine by me.

Great!

> > > involved any more and the computation of the minimum image size is carried out
> > > before calling shrink_all_memory() (so it's still going to be useful after
> > > we've dropped shrink_all_memory() at one point).
> > 
> > That's OK. Because shrink_all_memory() shrinks memory in a prioritized
> > list-after-list order.
> > 
> > > ---
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> > > 
> > > We want to avoid attempting to free too much memory too hard during
> > > hibernation, so estimate the minimum size of the image to use as the
> > > lower limit for preallocating memory.
> > 
> > I'd like to advocate to add "working set preservation" as another goal
> > of this function, and I can even do with the formula in this patch :-)
> >
> > That means, when one day more accurate working set estimation is
> > possible, we can extend this function to support that goal.
> 
> OK, so do you think it's fine to go with the patch below for now?

Sure, I'm fine with it.

Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

Thanks,
Fengguang

> > > The approach here is based on the (experimental) observation that we
> > > can't free more page frames than the sum of:
> > > 
> > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > * global_page_state(NR_ACTIVE_ANON)
> > > * global_page_state(NR_INACTIVE_ANON)
> > > * global_page_state(NR_ACTIVE_FILE)
> > > * global_page_state(NR_INACTIVE_FILE)
> > > 
> > > minus
> > > 
> > > * global_page_state(NR_FILE_MAPPED)
> > > 
> > > Namely, if this number is subtracted from the number of saveable
> > > pages in the system, we get a good estimate of the minimum reasonable
> > > size of a hibernation image.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > >  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 39 insertions(+), 4 deletions(-)
> > > 
> > > Index: linux-2.6/kernel/power/snapshot.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/power/snapshot.c
> > > +++ linux-2.6/kernel/power/snapshot.c
> > > @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
> > >  }
> > >  
> > >  /**
> > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > + * @saveable: Number of saveable pages in the system.
> > > + *
> > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > + * preallocating memory.
> > > + *
> > > + * We assume that the minimum image size should be proportional to
> > > + *
> > > + * [number of saveable pages] - [number of pages that can be freed in theory]
> > > + *
> > > + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> > > + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> > > + * minus mapped file pages.
> > > + */
> > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > +{
> > > +	unsigned long size;
> > > +
> > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > +		+ global_page_state(NR_INACTIVE_FILE)
> > > +		- global_page_state(NR_FILE_MAPPED);
> > > +
> > > +	return saveable <= size ? 0 : saveable - size;
> > > +}
> > > +
> > > +/**
> > >   * hibernate_preallocate_memory - Preallocate memory for hibernation image
> > >   *
> > >   * To create a hibernation image it is necessary to make a copy of every page
> > > @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
> > >   *
> > >   * If image_size is set below the number following from the above formula,
> > >   * the preallocation of memory is continued until the total number of saveable
> > > - * pages in the system is below the requested image size or it is impossible to
> > > - * allocate more memory, whichever happens first.
> > > + * pages in the system is below the requested image size or the minimum
> > > + * acceptable image size returned by minimum_image_size(), whichever is greater.
> > >   */
> > >  int hibernate_preallocate_memory(void)
> > >  {
> > > @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
> > >  		goto out;
> > >  	}
> > >  
> > > +	/* Estimate the minimum size of the image. */
> > > +	pages = minimum_image_size(saveable);
> > > +	if (size < pages)
> > > +		size = min_t(unsigned long, pages, max_size);
> > > +
> > >  	/*
> > >  	 * Let the memory management subsystem know that we're going to need a
> > >  	 * large number of page frames to allocate and make it free some memory.
> > > @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
> > >  	 * The number of saveable pages in memory was too high, so apply some
> > >  	 * pressure to decrease it.  First, make room for the largest possible
> > >  	 * image and fail if that doesn't work.  Next, try to decrease the size
> > > -	 * of the image as much as indicated by image_size using allocations
> > > -	 * from highmem and non-highmem zones separately.
> > > +	 * of the image as much as indicated by 'size' using allocations from
> > > +	 * highmem and non-highmem zones separately.
> > >  	 */
> > >  	pages_highmem = preallocate_image_highmem(highmem / 2);
> > >  	max_size += pages_highmem;

^ permalink raw reply	[flat|nested] 205+ messages in thread

* Re: [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard
@ 2009-05-19  0:47                       ` Wu Fengguang
  0 siblings, 0 replies; 205+ messages in thread
From: Wu Fengguang @ 2009-05-19  0:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, Andrew Morton, LKML, Pavel Machek, Nigel Cunningham,
	David Rientjes, linux-mm

On Tue, May 19, 2009 at 01:07:41AM +0800, Rafael J. Wysocki wrote:
> On Monday 18 May 2009, Wu Fengguang wrote:
> > On Mon, May 18, 2009 at 05:14:29AM +0800, Rafael J. Wysocki wrote:
> > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > > On Sun, May 17, 2009 at 08:55:05PM +0800, Rafael J. Wysocki wrote:
> > > > > On Sunday 17 May 2009, Wu Fengguang wrote:
> > > > 
> > > > > > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > > > > > +{
> > > > > > > +	unsigned long size;
> > > > > > > +
> > > > > > > +	/* Compute the number of saveable pages we can free. */
> > > > > > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > > > > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > > > > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > > > > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > > > > > +		+ global_page_state(NR_INACTIVE_FILE);
> > > > > > 
> > > > > > For example, we could drop the 1.25 ratio and calculate the above
> > > > > > reclaimable size with more meaningful constraints:
> > > > > > 
> > > > > >         /* slabs are not easy to reclaim */
> > > > > > 	size = global_page_state(NR_SLAB_RECLAIMABLE) / 2;
> > > > > 
> > > > > Why 1/2?
> > > > 
> > > > Also a very coarse value:
> > > > - we don't want to stress icache/dcache too much
> > > >   (unless they grow too large)
> > > > - my experience was that the icache/dcache are scanned in a slower
> > > >   pace than lru pages.
> > > > - most importantly, inside the NR_SLAB_RECLAIMABLE pages, maybe half
> > > >   of the pages are actually *in use* and cannot be freed:
> > > >         % cat /proc/sys/fs/inode-nr     
> > > >         30450   16605
> > > >         % cat /proc/sys/fs/dentry-state 
> > > >         41598   35731   45      0       0       0
> > > >   See? More than half entries are in-use. Sure many of them will actually
> > > >   become unused when dentries are freed, but in the mean time the internal
> > > >   fragmentations in the slabs can go up.
> > > > 
> > > > > >         /* keep NR_ACTIVE_ANON */
> > > > > > 	size += global_page_state(NR_INACTIVE_ANON);
> > > > > 
> > > > > Why exactly did you omit ACTIVE_ANON?
> > > > 
> > > > To keep the "core working set" :)
> > > >   	
> > > > > >         /* keep mapped files */
> > > > > > 	size += global_page_state(NR_ACTIVE_FILE);
> > > > > > 	size += global_page_state(NR_INACTIVE_FILE);
> > > > > >         size -= global_page_state(NR_FILE_MAPPED);
> > > > > > 
> > > > > > That restores the hard core working set logic in the reverse way ;)
> > > > > 
> > > > > I think the 1/2 factor for NR_SLAB_RECLAIMABLE may be too high in some cases,
> > > > > but I'm going to check that.
> > > > 
> > > > Yes, after updatedb. In that case simple magics numbers may not help.
> > > > In that case we should really first call shrink_slab() in a loop to
> > > > cut down the slab pages to a sane number.
> > > 
> > > I have verified that the appended patch works reasonably well.
> > 
> > This is illogical: in previous email you complained the formula
> > 
> >         TOTAL - MAPPED - ACTIVE_ANON - SLAB/2
> > 
> > gives too high number, while 
> > 
> >         TOTAL - MAPPED
> > 
> > in this patch is OK.  (I'm not claiming the first formula to be fine.)
> 
> I wasn't precise enough. :-)
> 
> The problem with the first formula is that it's not really useful when used
> _before_ running shrink_all_memory(), becuase it may give arbitraty result
> in that case (everything depends on the preceding memory usage pattern).
> However, if it is used _after_ running shrink_all_memory(<all saveable pages>),
> the resulting minimum image size is usually (most often) below the real minimum
> number of saveable pages that can stay in memory.
> 
> The second formula, OTOH, doesn't depend so much on the preceding memory usage
> pattern and therefore it seems to be suitable for computing the estimate of the
> minimum image size _before_ running shrink_all_memory().  Still, when used
> _after_ running shrink_all_memory(<all saveable pages>), it will give a number
> below the actual minimum number of saveable pages (ie. not a really suitable
> one).
> 
> Now, since we're going to get rid of shrink_all_memory() at one point, I think
> we should be looking for a formula suitable for using before it's called.
> This, IMO, the second one is just about right. :-)

Ah OK, thanks for the explanation!

> > > The value returned as the minimum image size is usually too high, but not very
> > > much (on x86_64 usually about 20%) and there are no "magic" coefficients
> > 
> > It is _OK_ for the minimum image size to be higher, that margin serves
> > as a safety margin as well as the working set size we want to preserve.
> 
> I didn't say it wasn't OK. :-)  It's totally fine by me.

Great!

> > > involved any more and the computation of the minimum image size is carried out
> > > before calling shrink_all_memory() (so it's still going to be useful after
> > > we've dropped shrink_all_memory() at one point).
> > 
> > That's OK. Because shrink_all_memory() shrinks memory in a prioritized
> > list-after-list order.
> > 
> > > ---
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > Subject: PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
> > > 
> > > We want to avoid attempting to free too much memory too hard during
> > > hibernation, so estimate the minimum size of the image to use as the
> > > lower limit for preallocating memory.
> > 
> > I'd like to advocate to add "working set preservation" as another goal
> > of this function, and I can even do with the formula in this patch :-)
> >
> > That means, when one day more accurate working set estimation is
> > possible, we can extend this function to support that goal.
> 
> OK, so do you think it's fine to go with the patch below for now?

Sure, I'm fine with it.

Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

Thanks,
Fengguang

> > > The approach here is based on the (experimental) observation that we
> > > can't free more page frames than the sum of:
> > > 
> > > * global_page_state(NR_SLAB_RECLAIMABLE)
> > > * global_page_state(NR_ACTIVE_ANON)
> > > * global_page_state(NR_INACTIVE_ANON)
> > > * global_page_state(NR_ACTIVE_FILE)
> > > * global_page_state(NR_INACTIVE_FILE)
> > > 
> > > minus
> > > 
> > > * global_page_state(NR_FILE_MAPPED)
> > > 
> > > Namely, if this number is subtracted from the number of saveable
> > > pages in the system, we get a good estimate of the minimum reasonable
> > > size of a hibernation image.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > >  kernel/power/snapshot.c |   43 +++++++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 39 insertions(+), 4 deletions(-)
> > > 
> > > Index: linux-2.6/kernel/power/snapshot.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/power/snapshot.c
> > > +++ linux-2.6/kernel/power/snapshot.c
> > > @@ -1204,6 +1204,36 @@ static void free_unnecessary_pages(void)
> > >  }
> > >  
> > >  /**
> > > + * minimum_image_size - Estimate the minimum acceptable size of an image
> > > + * @saveable: Number of saveable pages in the system.
> > > + *
> > > + * We want to avoid attempting to free too much memory too hard, so estimate the
> > > + * minimum acceptable size of a hibernation image to use as the lower limit for
> > > + * preallocating memory.
> > > + *
> > > + * We assume that the minimum image size should be proportional to
> > > + *
> > > + * [number of saveable pages] - [number of pages that can be freed in theory]
> > > + *
> > > + * where the second term is the sum of (1) reclaimable slab pages, (2) active
> > > + * and (3) inactive anonymouns pages, (4) active and (5) inactive file pages,
> > > + * minus mapped file pages.
> > > + */
> > > +static unsigned long minimum_image_size(unsigned long saveable)
> > > +{
> > > +	unsigned long size;
> > > +
> > > +	size = global_page_state(NR_SLAB_RECLAIMABLE)
> > > +		+ global_page_state(NR_ACTIVE_ANON)
> > > +		+ global_page_state(NR_INACTIVE_ANON)
> > > +		+ global_page_state(NR_ACTIVE_FILE)
> > > +		+ global_page_state(NR_INACTIVE_FILE)
> > > +		- global_page_state(NR_FILE_MAPPED);
> > > +
> > > +	return saveable <= size ? 0 : saveable - size;
> > > +}
> > > +
> > > +/**
> > >   * hibernate_preallocate_memory - Preallocate memory for hibernation image
> > >   *
> > >   * To create a hibernation image it is necessary to make a copy of every page
> > > @@ -1220,8 +1250,8 @@ static void free_unnecessary_pages(void)
> > >   *
> > >   * If image_size is set below the number following from the above formula,
> > >   * the preallocation of memory is continued until the total number of saveable
> > > - * pages in the system is below the requested image size or it is impossible to
> > > - * allocate more memory, whichever happens first.
> > > + * pages in the system is below the requested image size or the minimum
> > > + * acceptable image size returned by minimum_image_size(), whichever is greater.
> > >   */
> > >  int hibernate_preallocate_memory(void)
> > >  {
> > > @@ -1282,6 +1312,11 @@ int hibernate_preallocate_memory(void)
> > >  		goto out;
> > >  	}
> > >  
> > > +	/* Estimate the minimum size of the image. */
> > > +	pages = minimum_image_size(saveable);
> > > +	if (size < pages)
> > > +		size = min_t(unsigned long, pages, max_size);
> > > +
> > >  	/*
> > >  	 * Let the memory management subsystem know that we're going to need a
> > >  	 * large number of page frames to allocate and make it free some memory.
> > > @@ -1294,8 +1329,8 @@ int hibernate_preallocate_memory(void)
> > >  	 * The number of saveable pages in memory was too high, so apply some
> > >  	 * pressure to decrease it.  First, make room for the largest possible
> > >  	 * image and fail if that doesn't work.  Next, try to decrease the size
> > > -	 * of the image as much as indicated by image_size using allocations
> > > -	 * from highmem and non-highmem zones separately.
> > > +	 * of the image as much as indicated by 'size' using allocations from
> > > +	 * highmem and non-highmem zones separately.
> > >  	 */
> > >  	pages_highmem = preallocate_image_highmem(highmem / 2);
> > >  	max_size += pages_highmem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 205+ messages in thread

end of thread, other threads:[~2009-05-19  0:47 UTC | newest]

Thread overview: 205+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-06 22:40 [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking Rafael J. Wysocki
2009-05-06 22:41 ` [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer when tasks are frozen Rafael J. Wysocki
2009-05-06 22:41 ` Rafael J. Wysocki
2009-05-06 23:00   ` Nigel Cunningham
2009-05-06 23:00   ` Nigel Cunningham
2009-05-07 12:10     ` Rafael J. Wysocki
2009-05-07 12:10     ` Rafael J. Wysocki
2009-05-07  0:36   ` [RFC][PATCH 1/5] PM/Freezer: Disable OOM killer whentasks " Matt Helsley
2009-05-07  0:36   ` [linux-pm] " Matt Helsley
2009-05-07 12:09     ` Rafael J. Wysocki
2009-05-07 12:09     ` Rafael J. Wysocki
2009-05-06 22:42 ` [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
2009-05-06 23:01   ` Nigel Cunningham
2009-05-06 23:01   ` Nigel Cunningham
2009-05-06 22:42 ` Rafael J. Wysocki
2009-05-06 22:42 ` [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2) Rafael J. Wysocki
2009-05-06 22:42 ` Rafael J. Wysocki
2009-05-06 22:44 ` [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
2009-05-06 23:27   ` Nigel Cunningham
2009-05-07 12:18     ` Rafael J. Wysocki
2009-05-07 12:18     ` Rafael J. Wysocki
2009-05-07 20:00       ` Rafael J. Wysocki
2009-05-07 20:00       ` Rafael J. Wysocki
2009-05-07 20:53         ` Nigel Cunningham
2009-05-07 20:53         ` Nigel Cunningham
2009-05-07 20:51       ` Nigel Cunningham
2009-05-07 20:51       ` Nigel Cunningham
2009-05-06 23:27   ` Nigel Cunningham
2009-05-06 22:44 ` Rafael J. Wysocki
2009-05-06 22:48 ` [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily Rafael J. Wysocki
2009-05-06 22:48 ` Rafael J. Wysocki
2009-05-07 21:48 ` [RFC][PATCH 0/5] PM/Hibernate: Rework memory shrinking (rev. 2) Rafael J. Wysocki
2009-05-07 21:48 ` Rafael J. Wysocki
2009-05-07 21:48   ` Rafael J. Wysocki
2009-05-07 21:50   ` [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL Rafael J. Wysocki
2009-05-07 21:50   ` Rafael J. Wysocki
2009-05-07 21:50     ` Rafael J. Wysocki
2009-05-07 22:24     ` [RFC][PATCH] PM/Freezer: Disable OOM killer when tasks are frozen (was: Re: [RFC][PATCH 1/5] mm: Introduce __GFP_NO_OOM_KILL) Rafael J. Wysocki
2009-05-07 22:24       ` Rafael J. Wysocki
2009-05-07 22:24     ` Rafael J. Wysocki
2009-05-07 21:51   ` [RFC][PATCH 2/5] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
2009-05-07 21:51   ` Rafael J. Wysocki
2009-05-07 21:51     ` Rafael J. Wysocki
2009-05-08  8:52     ` Wu Fengguang
2009-05-08  8:52       ` Wu Fengguang
2009-05-08  8:52     ` Wu Fengguang
2009-05-07 21:51   ` [RFC][PATCH 3/5] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2) Rafael J. Wysocki
2009-05-07 21:51     ` Rafael J. Wysocki
2009-05-08  8:53     ` Wu Fengguang
2009-05-08  8:53       ` Wu Fengguang
2009-05-08  8:53     ` Wu Fengguang
2009-05-07 21:51   ` Rafael J. Wysocki
2009-05-07 21:53   ` [RFC][PATCH 4/5] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
2009-05-07 21:53     ` Rafael J. Wysocki
2009-05-07 21:53   ` Rafael J. Wysocki
2009-05-07 21:55   ` [RFC][PATCH 5/5] PM/Hibernate: Do not release preallocated memory unnecessarily Rafael J. Wysocki
2009-05-07 21:55     ` Rafael J. Wysocki
2009-05-07 21:55   ` Rafael J. Wysocki
2009-05-10 13:48   ` [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 3) Rafael J. Wysocki
2009-05-10 13:48     ` Rafael J. Wysocki
2009-05-10 13:50     ` [RFC][PATCH 1/6] mm: Introduce __GFP_NO_OOM_KILL Rafael J. Wysocki
2009-05-10 13:50     ` Rafael J. Wysocki
2009-05-10 13:50       ` Rafael J. Wysocki
2009-05-11 20:12       ` David Rientjes
2009-05-11 20:12       ` David Rientjes
2009-05-11 20:12         ` David Rientjes
2009-05-11 22:14         ` Rafael J. Wysocki
2009-05-11 22:14         ` Rafael J. Wysocki
2009-05-11 22:14           ` Rafael J. Wysocki
2009-05-11 22:33           ` Andrew Morton
2009-05-11 22:33             ` Andrew Morton
2009-05-11 23:04             ` Rafael J. Wysocki
2009-05-11 23:04               ` Rafael J. Wysocki
2009-05-11 23:04             ` Rafael J. Wysocki
2009-05-11 22:33           ` Andrew Morton
2009-05-10 13:50     ` [RFC][PATCH 2/6] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
2009-05-10 13:50       ` Rafael J. Wysocki
2009-05-10 13:50       ` Rafael J. Wysocki
2009-05-10 13:51     ` [RFC][PATCH 3/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2) Rafael J. Wysocki
2009-05-10 13:51     ` Rafael J. Wysocki
2009-05-10 13:51       ` Rafael J. Wysocki
2009-05-10 13:53     ` [RFC][PATCH 4/6] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
2009-05-10 13:53       ` Rafael J. Wysocki
2009-05-10 13:53     ` Rafael J. Wysocki
2009-05-10 13:57     ` [RFC][PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily Rafael J. Wysocki
2009-05-10 13:57     ` Rafael J. Wysocki
2009-05-10 13:57       ` Rafael J. Wysocki
2009-05-10 19:49       ` Rafael J. Wysocki
2009-05-10 19:49         ` Rafael J. Wysocki
2009-05-10 19:49       ` Rafael J. Wysocki
2009-05-10 14:12     ` [RFC][PATCH 6/6] PM/Hibernate: Estimate hard core working set size Rafael J. Wysocki
2009-05-10 14:12       ` Rafael J. Wysocki
2009-05-10 19:53       ` Rafael J. Wysocki
2009-05-10 19:53       ` Rafael J. Wysocki
2009-05-10 19:53         ` Rafael J. Wysocki
2009-05-10 14:12     ` Rafael J. Wysocki
2009-05-13  8:32     ` [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 4) Rafael J. Wysocki
2009-05-13  8:32       ` Rafael J. Wysocki
2009-05-13  8:34       ` [PATCH 1/6] PM/Suspend: Do not shrink memory before suspend Rafael J. Wysocki
2009-05-13  8:34       ` Rafael J. Wysocki
2009-05-13  8:34         ` Rafael J. Wysocki
2009-05-13  8:35       ` [PATCH 2/6] PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2) Rafael J. Wysocki
2009-05-13  8:35       ` Rafael J. Wysocki
2009-05-13  8:35         ` Rafael J. Wysocki
2009-05-13  8:37       ` [PATCH 3/6] mm, PM/Freezer: Disable OOM killer when tasks are frozen Rafael J. Wysocki
2009-05-13  8:37         ` Rafael J. Wysocki
2009-05-13  9:19         ` Pavel Machek
2009-05-13  9:19         ` Pavel Machek
2009-05-13  9:19           ` Pavel Machek
2009-05-13 22:35         ` David Rientjes
2009-05-13 22:35         ` David Rientjes
2009-05-13 22:35           ` David Rientjes
2009-05-13 22:47           ` Andrew Morton
2009-05-13 22:47             ` Andrew Morton
2009-05-13 23:01             ` David Rientjes
2009-05-13 23:01               ` David Rientjes
2009-05-13 23:01             ` David Rientjes
2009-05-13 22:47           ` Andrew Morton
2009-05-13  8:37       ` Rafael J. Wysocki
2009-05-13  8:39       ` [PATCH 4/6] PM/Hibernate: Rework shrinking of memory Rafael J. Wysocki
2009-05-13  8:39         ` Rafael J. Wysocki
2009-05-13 19:34         ` Andrew Morton
2009-05-13 19:34           ` Andrew Morton
2009-05-13 20:55           ` Rafael J. Wysocki
2009-05-13 20:55           ` Rafael J. Wysocki
2009-05-13 20:55             ` Rafael J. Wysocki
2009-05-13 21:16             ` Andrew Morton
2009-05-13 21:16               ` Andrew Morton
2009-05-13 21:56               ` Rafael J. Wysocki
2009-05-13 21:56               ` Rafael J. Wysocki
2009-05-13 21:56                 ` Rafael J. Wysocki
2009-05-14  9:40                 ` Pavel Machek
2009-05-14  9:40                   ` Pavel Machek
2009-05-14 17:49                   ` Rafael J. Wysocki
2009-05-14 17:49                     ` Rafael J. Wysocki
2009-05-15 13:09                     ` Pavel Machek
2009-05-15 13:09                       ` Pavel Machek
2009-05-15 13:09                     ` Pavel Machek
2009-05-14 17:49                   ` Rafael J. Wysocki
2009-05-14  9:40                 ` Pavel Machek
2009-05-13 21:16             ` Andrew Morton
2009-05-14 18:26             ` Rafael J. Wysocki
2009-05-14 18:26               ` Rafael J. Wysocki
2009-05-14 18:26             ` Rafael J. Wysocki
2009-05-13 19:34         ` Andrew Morton
2009-05-13  8:39       ` Rafael J. Wysocki
2009-05-13  8:40       ` [PATCH 5/6] PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2) Rafael J. Wysocki
2009-05-13  8:40       ` Rafael J. Wysocki
2009-05-13  8:40         ` Rafael J. Wysocki
2009-05-14 11:09         ` Pavel Machek
2009-05-14 11:09         ` Pavel Machek
2009-05-14 11:09           ` Pavel Machek
2009-05-14 17:52           ` Rafael J. Wysocki
2009-05-14 17:52             ` Rafael J. Wysocki
2009-05-15 13:11             ` Pavel Machek
2009-05-15 13:11             ` Pavel Machek
2009-05-15 13:11               ` Pavel Machek
2009-05-15 14:52               ` Rafael J. Wysocki
2009-05-15 14:52                 ` Rafael J. Wysocki
2009-05-15 14:52               ` Rafael J. Wysocki
2009-05-14 17:52           ` Rafael J. Wysocki
2009-05-13  8:42       ` [RFC][PATCH 6/6] PM/Hibernate: Do not try to allocate too much memory too hard Rafael J. Wysocki
2009-05-13  8:42         ` Rafael J. Wysocki
2009-05-14 11:14         ` Pavel Machek
2009-05-14 11:14           ` Pavel Machek
2009-05-14 17:59           ` Rafael J. Wysocki
2009-05-14 17:59             ` Rafael J. Wysocki
2009-05-15 13:14             ` Pavel Machek
2009-05-15 13:14             ` Pavel Machek
2009-05-15 13:14               ` Pavel Machek
2009-05-15 14:40               ` Rafael J. Wysocki
2009-05-15 14:40                 ` Rafael J. Wysocki
2009-05-15 14:40               ` Rafael J. Wysocki
2009-05-14 17:59           ` Rafael J. Wysocki
2009-05-14 11:14         ` Pavel Machek
2009-05-17 12:06         ` Wu Fengguang
2009-05-17 12:06           ` Wu Fengguang
2009-05-17 12:55           ` Rafael J. Wysocki
2009-05-17 12:55             ` Rafael J. Wysocki
2009-05-17 14:07             ` Wu Fengguang
2009-05-17 14:07             ` Wu Fengguang
2009-05-17 14:07               ` Wu Fengguang
2009-05-17 16:53               ` Rafael J. Wysocki
2009-05-17 16:53               ` Rafael J. Wysocki
2009-05-17 16:53                 ` Rafael J. Wysocki
2009-05-18  8:32                 ` Wu Fengguang
2009-05-18  8:32                 ` Wu Fengguang
2009-05-18  8:32                   ` Wu Fengguang
2009-05-17 21:14               ` Rafael J. Wysocki
2009-05-17 21:14               ` Rafael J. Wysocki
2009-05-17 21:14                 ` Rafael J. Wysocki
2009-05-18  8:56                 ` Wu Fengguang
2009-05-18  8:56                 ` Wu Fengguang
2009-05-18  8:56                   ` Wu Fengguang
2009-05-18 17:07                   ` Rafael J. Wysocki
2009-05-18 17:07                   ` Rafael J. Wysocki
2009-05-18 17:07                     ` Rafael J. Wysocki
2009-05-19  0:47                     ` Wu Fengguang
2009-05-19  0:47                     ` Wu Fengguang
2009-05-19  0:47                       ` Wu Fengguang
2009-05-17 12:55           ` Rafael J. Wysocki
2009-05-17 12:06         ` Wu Fengguang
2009-05-13  8:42       ` Rafael J. Wysocki
2009-05-13  8:32     ` [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 4) Rafael J. Wysocki
2009-05-10 13:48   ` [RFC][PATCH 0/6] PM/Hibernate: Rework memory shrinking (rev. 3) Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.