All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation
@ 2016-07-14 13:15 Chen Yu
  2016-07-14 13:15 ` [PATCH 1/2][RFC " Chen Yu
  2016-07-14 13:15 ` [PATCH 2/2][RFC v4] PM / Documentation: Add description for snapshot test mode Chen Yu
  0 siblings, 2 replies; 4+ messages in thread
From: Chen Yu @ 2016-07-14 13:15 UTC (permalink / raw)
  To: linux-pm
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, linux-kernel, Chen Yu

Sometimes we need to debug during hibernation resume, thus it would
be convenient for the users to verify if the snapshot stored in the disk
can be successfully restored, without BIOSes or system re-initialization
involved in. This patch has introduced a snapshot pm_test mode for this purpose.

Chen Yu (2):
  PM / hibernate: Introduce snapshot test mode for hibernation
  PM / Documentation: Add description for snapshot test mode

 Documentation/power/basic-pm-debugging.txt | 15 +++++++--
 kernel/power/hibernate.c                   | 54 +++++++++++++++++++++++++++++-
 kernel/power/main.c                        |  3 ++
 kernel/power/power.h                       |  3 ++
 kernel/power/suspend.c                     |  8 +++++
 kernel/power/swap.c                        |  7 ++++
 6 files changed, 86 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation
  2016-07-14 13:15 [PATCH 0/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation Chen Yu
@ 2016-07-14 13:15 ` Chen Yu
  2016-07-15 22:05   ` Rafael J. Wysocki
  2016-07-14 13:15 ` [PATCH 2/2][RFC v4] PM / Documentation: Add description for snapshot test mode Chen Yu
  1 sibling, 1 reply; 4+ messages in thread
From: Chen Yu @ 2016-07-14 13:15 UTC (permalink / raw)
  To: linux-pm
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, linux-kernel, Chen Yu

snapshot test mode is used to verify if the snapshot data
written to the swap device can be successfully restored to
the memory. It is useful to ease the debugging process on
hibernation, since this mode can not only bypass the
BIOSes/bootloader, but also the system re-initialization.

For example:

echo snapshot > /sys/power/pm_test
echo disk > /sys/power/state

[  267.959784] PM: Image saving progress:  80%
[  268.038669] PM: Image saving progress:  90%
[  268.111745] PM: Image saving progress: 100%
[  268.129269] PM: Image saving done.
[  268.133485] PM: Wrote 518612 kbytes in 0.75 seconds (691.48 MB/s)
[  268.140564] PM: S|
[  268.160067] hibernation debug: Waiting for 5 seconds.
...
[  273.508638] PM: Looking for hibernation image.
[  273.516583] PM: Image signature found, resuming
[  273.926547] PM: Loading and decompressing image data (129653 pages)...
[  274.122452] PM: Image loading progress:   0%
[  274.322127] PM: Image loading progress:  10%
...

Comments and suggestions would be appreciated.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v4:
 - Fix some errors and modify the comment for software_resume_unthaw.
v3:
 - As Pavel mentioned, there was a potential risk in previous
   version that might break the filesystem. According to Rafael's suggestion,
   this version avoids that issue by restoring the pages with user/kernel
   threads kept in frozen. Also updated the document.
---
 kernel/power/hibernate.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/power/main.c      |  3 +++
 kernel/power/power.h     |  3 +++
 kernel/power/suspend.c   |  8 +++++++
 kernel/power/swap.c      |  7 +++++++
 5 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 51441d8..57864fc 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -643,11 +643,59 @@ static void power_down(void)
 }
 
 /**
+ * software_resume_unthaw - Resume from a saved hibernation image with threads frozen.
+ *
+ * This routine is similar to software_resume, except that this one tries
+ * to resume from hibernation with user/kernel threads frozen. And it is mainly
+ * used for snapshot test mode because it is important to keep the threads frozen,
+ * otherwise the filesystem might be broken due to inconsistent disk-data/metadata
+ * across hibernation.
+ *
+ * software_resume_unthaw() does not invoke PM_RESTORE_PREPARE related callbacks,
+ * since it is unclear whether it is safe to 'jump' from PM_HIBERNATION_PREPARE
+ * to PM_RESTORE_PREPARE directly, and bypass this message should not impact much.
+ */
+static int software_resume_unthaw(void)
+{
+	int error;
+	unsigned int flags;
+
+	pr_debug("PM: Hibernation image partition %d:%d present\n",
+		MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
+
+	pr_debug("PM: Looking for hibernation image.\n");
+	error = swsusp_check();
+	if (error)
+		return error;
+
+	pr_debug("PM: Loading hibernation image.\n");
+
+	lock_device_hotplug();
+	error = create_basic_memory_bitmaps();
+	if (error)
+		goto Unlock;
+
+	error = swsusp_read(&flags);
+	swsusp_close(FMODE_READ);
+	if (!error)
+		hibernation_restore(flags & SF_PLATFORM_MODE);
+
+	printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
+	swsusp_free();
+	free_basic_memory_bitmaps();
+ Unlock:
+	unlock_device_hotplug();
+
+	return error;
+}
+
+/**
  * hibernate - Carry out system hibernation, including saving the image.
  */
 int hibernate(void)
 {
 	int error, nr_calls = 0;
+	bool snapshot_test = false;
 
 	if (!hibernation_available()) {
 		pr_debug("PM: Hibernation not available.\n");
@@ -699,7 +747,9 @@ int hibernate(void)
 		pr_debug("PM: writing image.\n");
 		error = swsusp_write(flags);
 		swsusp_free();
-		if (!error)
+		if (hibernation_test(TEST_SNAPSHOT))
+			snapshot_test = true;
+		if (!error && !snapshot_test)
 			power_down();
 		in_suspend = 0;
 		pm_restore_gfp_mask();
@@ -711,6 +761,8 @@ int hibernate(void)
 	free_basic_memory_bitmaps();
  Thaw:
 	unlock_device_hotplug();
+	if (snapshot_test)
+		software_resume_unthaw();
 	thaw_processes();
 
 	/* Don't bother checking whether freezer_test_done is true */
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 5ea50b1..80fe48e 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -83,6 +83,9 @@ int pm_test_level = TEST_NONE;
 
 static const char * const pm_tests[__TEST_AFTER_LAST] = {
 	[TEST_NONE] = "none",
+#ifdef CONFIG_HIBERNATION
+	[TEST_SNAPSHOT] = "snapshot",
+#endif
 	[TEST_CORE] = "core",
 	[TEST_CPUS] = "processors",
 	[TEST_PLATFORM] = "platform",
diff --git a/kernel/power/power.h b/kernel/power/power.h
index 064963e..101d636 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -225,6 +225,9 @@ static inline int restore_highmem(void) { return 0; }
 enum {
 	/* keep first */
 	TEST_NONE,
+#ifdef CONFIG_HIBERNATION
+	TEST_SNAPSHOT,
+#endif
 	TEST_CORE,
 	TEST_CPUS,
 	TEST_PLATFORM,
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 0acab9d..0afa875 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -479,6 +479,14 @@ static int enter_state(suspend_state_t state)
 			return -EAGAIN;
 		}
 #endif
+	} else if (state == PM_SUSPEND_MEM) {
+#ifdef CONFIG_PM_DEBUG
+		if (pm_test_level != TEST_NONE && pm_test_level < TEST_CORE) {
+			pr_warn("PM: Unsupported test mode for suspend to ram, please choose"
+				" none/freezer/devices/platform/processors/core.\n");
+			return -EAGAIN;
+		}
+#endif
 	} else if (!valid_state(state)) {
 		return -EINVAL;
 	}
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 160e100..facd71b 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -348,6 +348,13 @@ static int swsusp_swap_check(void)
 	if (res < 0)
 		blkdev_put(hib_resume_bdev, FMODE_WRITE);
 
+	/*
+	 * Update the resume device to the one actually used,
+	 * so software_resume() can use it in case it is invoked
+	 * from hibernate() to test the snapshot.
+	 */
+	swsusp_resume_device = hib_resume_bdev->bd_dev;
+
 	return res;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2][RFC v4] PM / Documentation: Add description for snapshot test mode
  2016-07-14 13:15 [PATCH 0/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation Chen Yu
  2016-07-14 13:15 ` [PATCH 1/2][RFC " Chen Yu
@ 2016-07-14 13:15 ` Chen Yu
  1 sibling, 0 replies; 4+ messages in thread
From: Chen Yu @ 2016-07-14 13:15 UTC (permalink / raw)
  To: linux-pm
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, linux-kernel, Chen Yu

Introduce snapshot test mode for hibernation debugging, and
update the document accordingly.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 Documentation/power/basic-pm-debugging.txt | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/Documentation/power/basic-pm-debugging.txt b/Documentation/power/basic-pm-debugging.txt
index b96098c..5cc5b21 100644
--- a/Documentation/power/basic-pm-debugging.txt
+++ b/Documentation/power/basic-pm-debugging.txt
@@ -41,7 +41,7 @@ a) Test modes of hibernation
 To find out why hibernation fails on your system, you can use a special testing
 facility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,
 there is the file /sys/power/pm_test that can be used to make the hibernation
-core run in a test mode.  There are 5 test modes available:
+core run in a test mode.  There are 6 test modes available:
 
 freezer
 - test the freezing of processes
@@ -62,6 +62,11 @@ core
   control methods(*), the disabling of nonboot CPUs and suspending of
   platform/system devices
 
+snapshot
+- test the freezing of processes, suspending of devices, platform global
+  control methods(*), the disabling of nonboot CPUs, suspending of
+  platform/system devices and the restoring of snapshot image
+
 (*) the platform global control methods are only available on ACPI systems
     and are only tested if the hibernation mode is set to "platform"
 
@@ -95,7 +100,7 @@ one and the "core" level tests the hardware and drivers as deeply as possible
 without creating a hibernation image.  Obviously, if the "devices" test fails,
 the "platform" test will fail as well and so on.  Thus, as a rule of thumb, you
 should try the test modes starting from "freezer", through "devices", "platform"
-and "processors" up to "core" (repeat the test on each level a couple of times
+, "processors" and "core" up to "snapshot" (repeat the test on each level a couple of times
 to make sure that any random factors are avoided).
 
 If the "freezer" test fails, there is a task that cannot be frozen (in that case
@@ -140,7 +145,11 @@ devices has failed (these devices are suspended on one CPU with interrupts off),
 the problem is most probably hardware-related and serious, so it should be
 reported.
 
-A failure of any of the "platform", "processors" or "core" tests may cause your
+If the "snapshot" test fails, which means that restoring of the pages from the
+swap device has failed, it might be caused by a broken swap device, or other memory
+management issues, so it should be reported.
+
+A failure of any of the "platform", "processors" "core" or "snapshot" tests may cause your
 system to hang or become unstable, so please beware.  Such a failure usually
 indicates a serious problem that very well may be related to the hardware, but
 please report it anyway.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation
  2016-07-14 13:15 ` [PATCH 1/2][RFC " Chen Yu
@ 2016-07-15 22:05   ` Rafael J. Wysocki
  0 siblings, 0 replies; 4+ messages in thread
From: Rafael J. Wysocki @ 2016-07-15 22:05 UTC (permalink / raw)
  To: Chen Yu; +Cc: linux-pm, Pavel Machek, Len Brown, linux-kernel

On Thursday, July 14, 2016 09:15:46 PM Chen Yu wrote:
> snapshot test mode is used to verify if the snapshot data
> written to the swap device can be successfully restored to
> the memory. It is useful to ease the debugging process on
> hibernation, since this mode can not only bypass the
> BIOSes/bootloader, but also the system re-initialization.
> 
> For example:
> 
> echo snapshot > /sys/power/pm_test
> echo disk > /sys/power/state
> 
> [  267.959784] PM: Image saving progress:  80%
> [  268.038669] PM: Image saving progress:  90%
> [  268.111745] PM: Image saving progress: 100%
> [  268.129269] PM: Image saving done.
> [  268.133485] PM: Wrote 518612 kbytes in 0.75 seconds (691.48 MB/s)
> [  268.140564] PM: S|
> [  268.160067] hibernation debug: Waiting for 5 seconds.
> ...
> [  273.508638] PM: Looking for hibernation image.
> [  273.516583] PM: Image signature found, resuming
> [  273.926547] PM: Loading and decompressing image data (129653 pages)...
> [  274.122452] PM: Image loading progress:   0%
> [  274.322127] PM: Image loading progress:  10%
> ...
> 
> Comments and suggestions would be appreciated.
> 
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
> v4:
>  - Fix some errors and modify the comment for software_resume_unthaw.
> v3:
>  - As Pavel mentioned, there was a potential risk in previous
>    version that might break the filesystem. According to Rafael's suggestion,
>    this version avoids that issue by restoring the pages with user/kernel
>    threads kept in frozen. Also updated the document.
> ---
>  kernel/power/hibernate.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/power/main.c      |  3 +++
>  kernel/power/power.h     |  3 +++
>  kernel/power/suspend.c   |  8 +++++++
>  kernel/power/swap.c      |  7 +++++++
>  5 files changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 51441d8..57864fc 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -643,11 +643,59 @@ static void power_down(void)
>  }
>  
>  /**
> + * software_resume_unthaw - Resume from a saved hibernation image with threads frozen.
> + *
> + * This routine is similar to software_resume, except that this one tries
> + * to resume from hibernation with user/kernel threads frozen. And it is mainly
> + * used for snapshot test mode because it is important to keep the threads frozen,
> + * otherwise the filesystem might be broken due to inconsistent disk-data/metadata
> + * across hibernation.
> + *
> + * software_resume_unthaw() does not invoke PM_RESTORE_PREPARE related callbacks,
> + * since it is unclear whether it is safe to 'jump' from PM_HIBERNATION_PREPARE
> + * to PM_RESTORE_PREPARE directly, and bypass this message should not impact much.
> + */
> +static int software_resume_unthaw(void)

This name is really not the best one and quite confusing.

> +{
> +	int error;
> +	unsigned int flags;
> +
> +	pr_debug("PM: Hibernation image partition %d:%d present\n",
> +		MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));

I don't think the above is necessary.

The image partition is known to be present at this point.

> +
> +	pr_debug("PM: Looking for hibernation image.\n");
> +	error = swsusp_check();
> +	if (error)
> +		return error;

The code below can be shared with software_resume(), can't it?  And I'd move
the above check to hibernate().

So I'd move the part of software_resume() between the "PM: Loading hibernation
image.\n" message and the invocation of unlock_device_hotplug() inclusive to a
separate function and call that from software_resume() and from here.

That new function can be called load_image_and_restore() or similar.

> +
> +	pr_debug("PM: Loading hibernation image.\n");
> +
> +	lock_device_hotplug();
> +	error = create_basic_memory_bitmaps();
> +	if (error)
> +		goto Unlock;
> +
> +	error = swsusp_read(&flags);
> +	swsusp_close(FMODE_READ);
> +	if (!error)
> +		hibernation_restore(flags & SF_PLATFORM_MODE);
> +
> +	printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
> +	swsusp_free();
> +	free_basic_memory_bitmaps();
> + Unlock:
> +	unlock_device_hotplug();
> +
> +	return error;
> +}
> +
> +/**
>   * hibernate - Carry out system hibernation, including saving the image.
>   */
>  int hibernate(void)
>  {
>  	int error, nr_calls = 0;
> +	bool snapshot_test = false;
>  
>  	if (!hibernation_available()) {
>  		pr_debug("PM: Hibernation not available.\n");
> @@ -699,7 +747,9 @@ int hibernate(void)
>  		pr_debug("PM: writing image.\n");
>  		error = swsusp_write(flags);
>  		swsusp_free();
> -		if (!error)
> +		if (hibernation_test(TEST_SNAPSHOT))
> +			snapshot_test = true;
> +		if (!error && !snapshot_test)
>  			power_down();
>  		in_suspend = 0;
>  		pm_restore_gfp_mask();
> @@ -711,6 +761,8 @@ int hibernate(void)
>  	free_basic_memory_bitmaps();
>   Thaw:
>  	unlock_device_hotplug();
> +	if (snapshot_test)
> +		software_resume_unthaw();

What about:

	if (snapshot_test) {
		pr_debug("PM: Checking hibernation image\n");
		error = swsusp_check();
		if (!error)
			error = load_image_and_restore();
	}

>  	thaw_processes();
>  
>  	/* Don't bother checking whether freezer_test_done is true */
> diff --git a/kernel/power/main.c b/kernel/power/main.c
> index 5ea50b1..80fe48e 100644
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -83,6 +83,9 @@ int pm_test_level = TEST_NONE;
>  
>  static const char * const pm_tests[__TEST_AFTER_LAST] = {
>  	[TEST_NONE] = "none",
> +#ifdef CONFIG_HIBERNATION
> +	[TEST_SNAPSHOT] = "snapshot",
> +#endif
>  	[TEST_CORE] = "core",
>  	[TEST_CPUS] = "processors",
>  	[TEST_PLATFORM] = "platform",
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index 064963e..101d636 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -225,6 +225,9 @@ static inline int restore_highmem(void) { return 0; }
>  enum {
>  	/* keep first */
>  	TEST_NONE,
> +#ifdef CONFIG_HIBERNATION
> +	TEST_SNAPSHOT,
> +#endif
>  	TEST_CORE,
>  	TEST_CPUS,
>  	TEST_PLATFORM,
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 0acab9d..0afa875 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -479,6 +479,14 @@ static int enter_state(suspend_state_t state)
>  			return -EAGAIN;
>  		}
>  #endif
> +	} else if (state == PM_SUSPEND_MEM) {
> +#ifdef CONFIG_PM_DEBUG
> +		if (pm_test_level != TEST_NONE && pm_test_level < TEST_CORE) {
> +			pr_warn("PM: Unsupported test mode for suspend to ram, please choose"
> +				" none/freezer/devices/platform/processors/core.\n");
> +			return -EAGAIN;

Well, this isn't nice.

Maybe go back to the idea with having an extra string in /sys/power/disk,
like "test_resume"?

And if someone does

# echo test_resume > /sys/power/disk
# echo disk > /sys/power/state

it will trigger the new feature?

> +		}
> +#endif
>  	} else if (!valid_state(state)) {
>  		return -EINVAL;
>  	}
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index 160e100..facd71b 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -348,6 +348,13 @@ static int swsusp_swap_check(void)
>  	if (res < 0)
>  		blkdev_put(hib_resume_bdev, FMODE_WRITE);
>  
> +	/*
> +	 * Update the resume device to the one actually used,
> +	 * so software_resume() can use it in case it is invoked
> +	 * from hibernate() to test the snapshot.
> +	 */
> +	swsusp_resume_device = hib_resume_bdev->bd_dev;
> +
>  	return res;
>  }

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-15 22:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-14 13:15 [PATCH 0/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation Chen Yu
2016-07-14 13:15 ` [PATCH 1/2][RFC " Chen Yu
2016-07-15 22:05   ` Rafael J. Wysocki
2016-07-14 13:15 ` [PATCH 2/2][RFC v4] PM / Documentation: Add description for snapshot test mode Chen Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.