All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-07  9:07 SUZUKI, Kazuhiro
  2011-02-07  9:08 ` [PATCH 1/2] " SUZUKI, Kazuhiro
                   ` (5 more replies)
  0 siblings, 6 replies; 58+ messages in thread
From: SUZUKI, Kazuhiro @ 2011-02-07  9:07 UTC (permalink / raw)
  To: rjw, linux-pm; +Cc: xen-devel

Hi,

The following patch series fixes hangup after creating checkpoint on
Xen. The Linux Xen guest can be saved the state to restore later, and
also created snapshot like checkpoint via the hypervisor.
But, when the snapshot is created for the PV guest, it will hangup.

We added 'PMSG_CANCEL' message and 'cancel' handler in dev_pm_ops
struct in the pm-linux part.

In creating checkpoint mode, the resume handler of xenbus should not
be called. In this case, it is recognized that the suspend was canceled
in drivers/xen/manage.c and call dpm_resume_end() with PMSG_CANCEL.
If the 'cancel' handler is defined, it is called instead of resume().

[1/2] - Fix hangup after creating checkpoint on Xen -- pm-linux part.
[2/2] - Fix hangup after creating checkpoint on Xen -- Xen part.

Please review.

Thanks,
KAZ

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:07 [PATCH 0/2] Fix hangup after creating checkpoint on Xen SUZUKI, Kazuhiro
  2011-02-07  9:08 ` [PATCH 1/2] " SUZUKI, Kazuhiro
@ 2011-02-07  9:08 ` SUZUKI, Kazuhiro
  2011-02-07  9:08 ` [PATCH 2/2] " SUZUKI, Kazuhiro
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: SUZUKI, Kazuhiro @ 2011-02-07  9:08 UTC (permalink / raw)
  To: rjw, linux-pm; +Cc: xen-devel

Hi,

This is a pm-linux part patch.

Thanks,
KAZ

Signed-off-by: Kenji Wakamiya <wkenji@jp.fujitsu.com>
Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
---
 drivers/base/power/main.c |    9 +++++++++
 include/linux/pm.h        |    3 +++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 8aa2443..e348b5d 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -181,6 +181,13 @@ static int pm_op(struct device *dev,
 			suspend_report_result(ops->suspend, error);
 		}
 		break;
+	case PM_EVENT_CANCEL:
+		if (ops->cancel) {
+			error = ops->cancel(dev);
+			suspend_report_result(ops->cancel, error);
+			break;
+		}
+		/* Fall through */
 	case PM_EVENT_RESUME:
 		if (ops->resume) {
 			error = ops->resume(dev);
@@ -293,6 +300,8 @@ static char *pm_verb(int event)
 		return "suspend";
 	case PM_EVENT_RESUME:
 		return "resume";
+	case PM_EVENT_CANCEL:
+		return "cancel";
 	case PM_EVENT_FREEZE:
 		return "freeze";
 	case PM_EVENT_QUIESCE:
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 3b7e04b..d118781 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -198,6 +198,7 @@ struct dev_pm_ops {
 	void (*complete)(struct device *dev);
 	int (*suspend)(struct device *dev);
 	int (*resume)(struct device *dev);
+	int (*cancel)(struct device *dev);
 	int (*freeze)(struct device *dev);
 	int (*thaw)(struct device *dev);
 	int (*poweroff)(struct device *dev);
@@ -291,6 +292,7 @@ struct dev_pm_ops name = { \
 #define PM_EVENT_USER		0x0100
 #define PM_EVENT_REMOTE		0x0200
 #define PM_EVENT_AUTO		0x0400
+#define PM_EVENT_CANCEL		0x0800
 
 #define PM_EVENT_SLEEP		(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
 #define PM_EVENT_USER_SUSPEND	(PM_EVENT_USER | PM_EVENT_SUSPEND)
@@ -308,6 +310,7 @@ struct dev_pm_ops name = { \
 #define PMSG_THAW	((struct pm_message){ .event = PM_EVENT_THAW, })
 #define PMSG_RESTORE	((struct pm_message){ .event = PM_EVENT_RESTORE, })
 #define PMSG_RECOVER	((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_CANCEL	((struct pm_message){ .event = PM_EVENT_CANCEL, })
 #define PMSG_USER_SUSPEND	((struct pm_message) \
 					{ .event = PM_EVENT_USER_SUSPEND, })
 #define PMSG_USER_RESUME	((struct pm_message) \
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 1/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:07 [PATCH 0/2] Fix hangup after creating checkpoint on Xen SUZUKI, Kazuhiro
@ 2011-02-07  9:08 ` SUZUKI, Kazuhiro
  2011-02-07  9:08 ` SUZUKI, Kazuhiro
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: SUZUKI, Kazuhiro @ 2011-02-07  9:08 UTC (permalink / raw)
  To: rjw, linux-pm; +Cc: xen-devel

Hi,

This is a pm-linux part patch.

Thanks,
KAZ

Signed-off-by: Kenji Wakamiya <wkenji@jp.fujitsu.com>
Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
---
 drivers/base/power/main.c |    9 +++++++++
 include/linux/pm.h        |    3 +++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 8aa2443..e348b5d 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -181,6 +181,13 @@ static int pm_op(struct device *dev,
 			suspend_report_result(ops->suspend, error);
 		}
 		break;
+	case PM_EVENT_CANCEL:
+		if (ops->cancel) {
+			error = ops->cancel(dev);
+			suspend_report_result(ops->cancel, error);
+			break;
+		}
+		/* Fall through */
 	case PM_EVENT_RESUME:
 		if (ops->resume) {
 			error = ops->resume(dev);
@@ -293,6 +300,8 @@ static char *pm_verb(int event)
 		return "suspend";
 	case PM_EVENT_RESUME:
 		return "resume";
+	case PM_EVENT_CANCEL:
+		return "cancel";
 	case PM_EVENT_FREEZE:
 		return "freeze";
 	case PM_EVENT_QUIESCE:
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 3b7e04b..d118781 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -198,6 +198,7 @@ struct dev_pm_ops {
 	void (*complete)(struct device *dev);
 	int (*suspend)(struct device *dev);
 	int (*resume)(struct device *dev);
+	int (*cancel)(struct device *dev);
 	int (*freeze)(struct device *dev);
 	int (*thaw)(struct device *dev);
 	int (*poweroff)(struct device *dev);
@@ -291,6 +292,7 @@ struct dev_pm_ops name = { \
 #define PM_EVENT_USER		0x0100
 #define PM_EVENT_REMOTE		0x0200
 #define PM_EVENT_AUTO		0x0400
+#define PM_EVENT_CANCEL		0x0800
 
 #define PM_EVENT_SLEEP		(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
 #define PM_EVENT_USER_SUSPEND	(PM_EVENT_USER | PM_EVENT_SUSPEND)
@@ -308,6 +310,7 @@ struct dev_pm_ops name = { \
 #define PMSG_THAW	((struct pm_message){ .event = PM_EVENT_THAW, })
 #define PMSG_RESTORE	((struct pm_message){ .event = PM_EVENT_RESTORE, })
 #define PMSG_RECOVER	((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_CANCEL	((struct pm_message){ .event = PM_EVENT_CANCEL, })
 #define PMSG_USER_SUSPEND	((struct pm_message) \
 					{ .event = PM_EVENT_USER_SUSPEND, })
 #define PMSG_USER_RESUME	((struct pm_message) \
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:07 [PATCH 0/2] Fix hangup after creating checkpoint on Xen SUZUKI, Kazuhiro
                   ` (2 preceding siblings ...)
  2011-02-07  9:08 ` [PATCH 2/2] " SUZUKI, Kazuhiro
@ 2011-02-07  9:08 ` SUZUKI, Kazuhiro
  2011-02-07  9:35 ` [PATCH 0/2] " Rafael J. Wysocki
  2011-02-07  9:35 ` Rafael J. Wysocki
  5 siblings, 0 replies; 58+ messages in thread
From: SUZUKI, Kazuhiro @ 2011-02-07  9:08 UTC (permalink / raw)
  To: rjw, linux-pm; +Cc: xen-devel

Hi,

This is a Xen part patch.

Thanks,
KAZ

Signed-off-by: Kenji Wakamiya <wkenji@jp.fujitsu.com>
Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
---
 drivers/net/xen-netfront.c                 |    2 +-
 drivers/xen/manage.c                       |    7 ++++---
 drivers/xen/xenbus/xenbus_probe.c          |   12 ++++++++++--
 drivers/xen/xenbus/xenbus_probe.h          |    3 ++-
 drivers/xen/xenbus/xenbus_probe_frontend.c |    8 ++++++--
 include/xen/xenbus.h                       |    2 +-
 6 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3f71199..22c6288 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1293,7 +1293,7 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 	info->rx.sring = NULL;
 }
 
-static int netfront_suspend(struct xenbus_device *dev, pm_message_t state)
+static int netfront_suspend(struct xenbus_device *dev)
 {
 	struct netfront_info *info = dev_get_drvdata(&dev->dev);
 	struct hrtimer *timer = &info->smart_poll.timer;
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 0b50906..845afb8 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -148,10 +148,11 @@ out_resume:
 	if (!cancelled) {
 		xen_arch_resume();
 		xs_resume();
-	} else
+		dpm_resume_end(PMSG_RESUME);
+	} else {
 		xs_suspend_cancel();
-
-	dpm_resume_end(PMSG_RESUME);
+		dpm_resume_end(PMSG_CANCEL);
+	}
 
 	/* Make sure timer events get retriggered on all CPUs */
 	clock_was_set();
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3a83ba2..02dbb8b 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -575,7 +575,7 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_changed);
 
-int xenbus_dev_suspend(struct device *dev, pm_message_t state)
+int xenbus_dev_suspend(struct device *dev)
 {
 	int err = 0;
 	struct xenbus_driver *drv;
@@ -587,7 +587,7 @@ int xenbus_dev_suspend(struct device *dev, pm_message_t state)
 		return 0;
 	drv = to_xenbus_driver(dev->driver);
 	if (drv->suspend)
-		err = drv->suspend(xdev, state);
+		err = drv->suspend(xdev);
 	if (err)
 		printk(KERN_WARNING
 		       "xenbus: suspend %s failed: %i\n", dev_name(dev), err);
@@ -638,6 +638,14 @@ int xenbus_dev_resume(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_resume);
 
+int xenbus_dev_cancel(struct device *dev)
+{
+	/* Do nothing */
+	DPRINTK("cancel");
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xenbus_dev_cancel);
+
 /* A flag to determine if xenstored is 'ready' (i.e. has started) */
 int xenstored_ready = 0;
 
diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h
index 0e5fc4c..4019187 100644
--- a/drivers/xen/xenbus/xenbus_probe.h
+++ b/drivers/xen/xenbus/xenbus_probe.h
@@ -62,8 +62,9 @@ extern void xenbus_dev_changed(const char *node, struct xen_bus_type *bus);
 
 extern void xenbus_dev_shutdown(struct device *_dev);
 
-extern int xenbus_dev_suspend(struct device *dev, pm_message_t state);
+extern int xenbus_dev_suspend(struct device *dev);
 extern int xenbus_dev_resume(struct device *dev);
+extern int xenbus_dev_cancel(struct device *dev);
 
 extern void xenbus_otherend_changed(struct xenbus_watch *watch,
 				    const char **vec, unsigned int len,
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 5413248..6dfc588 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -82,6 +82,11 @@ static struct device_attribute xenbus_frontend_dev_attrs[] = {
 	__ATTR_NULL
 };
 
+static struct dev_pm_ops xenbus_pm_ops = {
+	.suspend = xenbus_dev_suspend,
+	.resume  = xenbus_dev_resume,
+	.cancel  = xenbus_dev_cancel,
+};
 
 static struct xen_bus_type xenbus_frontend = {
 	.root = "device",
@@ -98,8 +103,7 @@ static struct xen_bus_type xenbus_frontend = {
 		.shutdown = xenbus_dev_shutdown,
 		.dev_attrs= xenbus_frontend_dev_attrs,
 
-		.suspend  = xenbus_dev_suspend,
-		.resume   = xenbus_dev_resume,
+		.pm       = &xenbus_pm_ops,
 	},
 };
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 542ca7c..23e7f25 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -91,7 +91,7 @@ struct xenbus_driver {
 	void (*otherend_changed)(struct xenbus_device *dev,
 				 enum xenbus_state backend_state);
 	int (*remove)(struct xenbus_device *dev);
-	int (*suspend)(struct xenbus_device *dev, pm_message_t state);
+	int (*suspend)(struct xenbus_device *dev);
 	int (*resume)(struct xenbus_device *dev);
 	int (*uevent)(struct xenbus_device *, struct kobj_uevent_env *);
 	struct device_driver driver;
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:07 [PATCH 0/2] Fix hangup after creating checkpoint on Xen SUZUKI, Kazuhiro
  2011-02-07  9:08 ` [PATCH 1/2] " SUZUKI, Kazuhiro
  2011-02-07  9:08 ` SUZUKI, Kazuhiro
@ 2011-02-07  9:08 ` SUZUKI, Kazuhiro
  2011-02-07  9:08 ` SUZUKI, Kazuhiro
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: SUZUKI, Kazuhiro @ 2011-02-07  9:08 UTC (permalink / raw)
  To: rjw, linux-pm; +Cc: xen-devel

Hi,

This is a Xen part patch.

Thanks,
KAZ

Signed-off-by: Kenji Wakamiya <wkenji@jp.fujitsu.com>
Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
---
 drivers/net/xen-netfront.c                 |    2 +-
 drivers/xen/manage.c                       |    7 ++++---
 drivers/xen/xenbus/xenbus_probe.c          |   12 ++++++++++--
 drivers/xen/xenbus/xenbus_probe.h          |    3 ++-
 drivers/xen/xenbus/xenbus_probe_frontend.c |    8 ++++++--
 include/xen/xenbus.h                       |    2 +-
 6 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3f71199..22c6288 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1293,7 +1293,7 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 	info->rx.sring = NULL;
 }
 
-static int netfront_suspend(struct xenbus_device *dev, pm_message_t state)
+static int netfront_suspend(struct xenbus_device *dev)
 {
 	struct netfront_info *info = dev_get_drvdata(&dev->dev);
 	struct hrtimer *timer = &info->smart_poll.timer;
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 0b50906..845afb8 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -148,10 +148,11 @@ out_resume:
 	if (!cancelled) {
 		xen_arch_resume();
 		xs_resume();
-	} else
+		dpm_resume_end(PMSG_RESUME);
+	} else {
 		xs_suspend_cancel();
-
-	dpm_resume_end(PMSG_RESUME);
+		dpm_resume_end(PMSG_CANCEL);
+	}
 
 	/* Make sure timer events get retriggered on all CPUs */
 	clock_was_set();
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3a83ba2..02dbb8b 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -575,7 +575,7 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_changed);
 
-int xenbus_dev_suspend(struct device *dev, pm_message_t state)
+int xenbus_dev_suspend(struct device *dev)
 {
 	int err = 0;
 	struct xenbus_driver *drv;
@@ -587,7 +587,7 @@ int xenbus_dev_suspend(struct device *dev, pm_message_t state)
 		return 0;
 	drv = to_xenbus_driver(dev->driver);
 	if (drv->suspend)
-		err = drv->suspend(xdev, state);
+		err = drv->suspend(xdev);
 	if (err)
 		printk(KERN_WARNING
 		       "xenbus: suspend %s failed: %i\n", dev_name(dev), err);
@@ -638,6 +638,14 @@ int xenbus_dev_resume(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_resume);
 
+int xenbus_dev_cancel(struct device *dev)
+{
+	/* Do nothing */
+	DPRINTK("cancel");
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xenbus_dev_cancel);
+
 /* A flag to determine if xenstored is 'ready' (i.e. has started) */
 int xenstored_ready = 0;
 
diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h
index 0e5fc4c..4019187 100644
--- a/drivers/xen/xenbus/xenbus_probe.h
+++ b/drivers/xen/xenbus/xenbus_probe.h
@@ -62,8 +62,9 @@ extern void xenbus_dev_changed(const char *node, struct xen_bus_type *bus);
 
 extern void xenbus_dev_shutdown(struct device *_dev);
 
-extern int xenbus_dev_suspend(struct device *dev, pm_message_t state);
+extern int xenbus_dev_suspend(struct device *dev);
 extern int xenbus_dev_resume(struct device *dev);
+extern int xenbus_dev_cancel(struct device *dev);
 
 extern void xenbus_otherend_changed(struct xenbus_watch *watch,
 				    const char **vec, unsigned int len,
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 5413248..6dfc588 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -82,6 +82,11 @@ static struct device_attribute xenbus_frontend_dev_attrs[] = {
 	__ATTR_NULL
 };
 
+static struct dev_pm_ops xenbus_pm_ops = {
+	.suspend = xenbus_dev_suspend,
+	.resume  = xenbus_dev_resume,
+	.cancel  = xenbus_dev_cancel,
+};
 
 static struct xen_bus_type xenbus_frontend = {
 	.root = "device",
@@ -98,8 +103,7 @@ static struct xen_bus_type xenbus_frontend = {
 		.shutdown = xenbus_dev_shutdown,
 		.dev_attrs= xenbus_frontend_dev_attrs,
 
-		.suspend  = xenbus_dev_suspend,
-		.resume   = xenbus_dev_resume,
+		.pm       = &xenbus_pm_ops,
 	},
 };
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 542ca7c..23e7f25 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -91,7 +91,7 @@ struct xenbus_driver {
 	void (*otherend_changed)(struct xenbus_device *dev,
 				 enum xenbus_state backend_state);
 	int (*remove)(struct xenbus_device *dev);
-	int (*suspend)(struct xenbus_device *dev, pm_message_t state);
+	int (*suspend)(struct xenbus_device *dev);
 	int (*resume)(struct xenbus_device *dev);
 	int (*uevent)(struct xenbus_device *, struct kobj_uevent_env *);
 	struct device_driver driver;
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:07 [PATCH 0/2] Fix hangup after creating checkpoint on Xen SUZUKI, Kazuhiro
                   ` (3 preceding siblings ...)
  2011-02-07  9:08 ` SUZUKI, Kazuhiro
@ 2011-02-07  9:35 ` Rafael J. Wysocki
  2011-02-08 11:22     ` Ian Campbell
  2011-02-08 11:22   ` Ian Campbell
  2011-02-07  9:35 ` Rafael J. Wysocki
  5 siblings, 2 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-07  9:35 UTC (permalink / raw)
  To: SUZUKI, Kazuhiro; +Cc: linux-pm, xen-devel, Greg KH, LKML

On Monday, February 07, 2011, SUZUKI, Kazuhiro wrote:
> Hi,

Hi,

> The following patch series fixes hangup after creating checkpoint on
> Xen. The Linux Xen guest can be saved the state to restore later, and
> also created snapshot like checkpoint via the hypervisor.
> But, when the snapshot is created for the PV guest, it will hangup.
> 
> We added 'PMSG_CANCEL' message and 'cancel' handler in dev_pm_ops
> struct in the pm-linux part.

Please don't do that, unless you can convince me there's no other way to fix
the problem you're trying to address.

In my opinion it's highly unrealistic to assume that device drivers
(or even subsystems) will implement the ->cancel() callback just for the
benefit of Xen.

And if the only subsystem that needs to implement ->cancel() is Xen, then the
issue should be addressed without modifying the device core code, in a different
way.

> In creating checkpoint mode, the resume handler of xenbus should not
> be called. In this case, it is recognized that the suspend was canceled
> in drivers/xen/manage.c and call dpm_resume_end() with PMSG_CANCEL.
> If the 'cancel' handler is defined, it is called instead of resume().
> 
> [1/2] - Fix hangup after creating checkpoint on Xen -- pm-linux part.
> [2/2] - Fix hangup after creating checkpoint on Xen -- Xen part.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:07 [PATCH 0/2] Fix hangup after creating checkpoint on Xen SUZUKI, Kazuhiro
                   ` (4 preceding siblings ...)
  2011-02-07  9:35 ` [PATCH 0/2] " Rafael J. Wysocki
@ 2011-02-07  9:35 ` Rafael J. Wysocki
  5 siblings, 0 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-07  9:35 UTC (permalink / raw)
  To: SUZUKI, Kazuhiro; +Cc: linux-pm, xen-devel, LKML

On Monday, February 07, 2011, SUZUKI, Kazuhiro wrote:
> Hi,

Hi,

> The following patch series fixes hangup after creating checkpoint on
> Xen. The Linux Xen guest can be saved the state to restore later, and
> also created snapshot like checkpoint via the hypervisor.
> But, when the snapshot is created for the PV guest, it will hangup.
> 
> We added 'PMSG_CANCEL' message and 'cancel' handler in dev_pm_ops
> struct in the pm-linux part.

Please don't do that, unless you can convince me there's no other way to fix
the problem you're trying to address.

In my opinion it's highly unrealistic to assume that device drivers
(or even subsystems) will implement the ->cancel() callback just for the
benefit of Xen.

And if the only subsystem that needs to implement ->cancel() is Xen, then the
issue should be addressed without modifying the device core code, in a different
way.

> In creating checkpoint mode, the resume handler of xenbus should not
> be called. In this case, it is recognized that the suspend was canceled
> in drivers/xen/manage.c and call dpm_resume_end() with PMSG_CANCEL.
> If the 'cancel' handler is defined, it is called instead of resume().
> 
> [1/2] - Fix hangup after creating checkpoint on Xen -- pm-linux part.
> [2/2] - Fix hangup after creating checkpoint on Xen -- Xen part.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:35 ` [PATCH 0/2] " Rafael J. Wysocki
@ 2011-02-08 11:22     ` Ian Campbell
  2011-02-08 11:22   ` Ian Campbell
  1 sibling, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-08 11:22 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: SUZUKI, Kazuhiro, linux-pm, xen-devel, Greg KH, LKML

On Mon, 2011-02-07 at 10:35 +0100, Rafael J. Wysocki wrote:
> On Monday, February 07, 2011, SUZUKI, Kazuhiro wrote:
> > Hi,
> 
> Hi,
> 
> > The following patch series fixes hangup after creating checkpoint on
> > Xen. The Linux Xen guest can be saved the state to restore later, and
> > also created snapshot like checkpoint via the hypervisor.
> > But, when the snapshot is created for the PV guest, it will hangup.
> > 
> > We added 'PMSG_CANCEL' message and 'cancel' handler in dev_pm_ops
> > struct in the pm-linux part.
> 
> Please don't do that, unless you can convince me there's no other way to fix
> the problem you're trying to address.

Sorry, it was my advise to Kazuhiro that solving the underlying issue by
extending the core would be preferable to making Xen specific hacks.

The problem is that currently we have:

        dpm_suspend_start(PMSG_SUSPEND);
        
                dpm_suspend_noirq(PMSG_SUSPEND);
                        
                        sysdev_suspend(PMSG_SUSPEND);
                        /* suspend hypercall */
                        sysdev_resume();
                
                dpm_resume_noirq(PMSG_RESUME);
        
        dpm_resume_end(PMSG_RESUME);

However the suspend hypercall can return a value indicating that the
suspend didn't actually happen (e.g. was cancelled). This is used e.g.
when checkpointing guests, because in that case you want the original
guest to continue. When the suspend didn't happen the drivers need to
recover differently from if it did.

The originally proposed solution was to only call dpm_resume_end if the
suspend was not cancelled. My concern with this was that unbalancing the
dpm_suspend_* and dpm_resume_* did not seem like a correct interaction
with the core. For example dpm_suspend_* adds stuff to
dpm_suspended_list and if dpm_resume_* is not called presumably this all
gets out of sync for next time. Hence my suggestion to add a cancel
message type.

> In my opinion it's highly unrealistic to assume that device drivers
> (or even subsystems) will implement the ->cancel() callback just for the
> benefit of Xen.

I did vague wonder if a similar message might be of interest to e.g.
cancelling hibernations or similar.

> And if the only subsystem that needs to implement ->cancel() is Xen, then the
> issue should be addressed without modifying the device core code, in a different
> way.

I thought it would be preferable to make use of/extend core
functionality where possible but if that's not the case we can find
another way.

Do you have any suggestions for how to correctly interact with the core
functions?

Is adding a suspend_cancel operation to just at the struct xenbus_driver
level and introducing a xen specific function to walk to the bus the
sort of thing you were thinking of? (it seems reasonable). Should we be
doing anything with dpm_*_list in that case?

(FWIW original thread is on xen-devel at
http://thread.gmane.org/gmane.comp.emulators.xen.devel/95265/)

Ian.

-- 
Ian Campbell
Current Noise: Crowbar - Dead Sun

Why on earth do people buy old bottles of wine when they can get a
fresh one for a quarter of the price?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-07  9:35 ` [PATCH 0/2] " Rafael J. Wysocki
  2011-02-08 11:22     ` Ian Campbell
@ 2011-02-08 11:22   ` Ian Campbell
  1 sibling, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-08 11:22 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, xen-devel, SUZUKI, Kazuhiro, LKML

On Mon, 2011-02-07 at 10:35 +0100, Rafael J. Wysocki wrote:
> On Monday, February 07, 2011, SUZUKI, Kazuhiro wrote:
> > Hi,
> 
> Hi,
> 
> > The following patch series fixes hangup after creating checkpoint on
> > Xen. The Linux Xen guest can be saved the state to restore later, and
> > also created snapshot like checkpoint via the hypervisor.
> > But, when the snapshot is created for the PV guest, it will hangup.
> > 
> > We added 'PMSG_CANCEL' message and 'cancel' handler in dev_pm_ops
> > struct in the pm-linux part.
> 
> Please don't do that, unless you can convince me there's no other way to fix
> the problem you're trying to address.

Sorry, it was my advise to Kazuhiro that solving the underlying issue by
extending the core would be preferable to making Xen specific hacks.

The problem is that currently we have:

        dpm_suspend_start(PMSG_SUSPEND);
        
                dpm_suspend_noirq(PMSG_SUSPEND);
                        
                        sysdev_suspend(PMSG_SUSPEND);
                        /* suspend hypercall */
                        sysdev_resume();
                
                dpm_resume_noirq(PMSG_RESUME);
        
        dpm_resume_end(PMSG_RESUME);

However the suspend hypercall can return a value indicating that the
suspend didn't actually happen (e.g. was cancelled). This is used e.g.
when checkpointing guests, because in that case you want the original
guest to continue. When the suspend didn't happen the drivers need to
recover differently from if it did.

The originally proposed solution was to only call dpm_resume_end if the
suspend was not cancelled. My concern with this was that unbalancing the
dpm_suspend_* and dpm_resume_* did not seem like a correct interaction
with the core. For example dpm_suspend_* adds stuff to
dpm_suspended_list and if dpm_resume_* is not called presumably this all
gets out of sync for next time. Hence my suggestion to add a cancel
message type.

> In my opinion it's highly unrealistic to assume that device drivers
> (or even subsystems) will implement the ->cancel() callback just for the
> benefit of Xen.

I did vague wonder if a similar message might be of interest to e.g.
cancelling hibernations or similar.

> And if the only subsystem that needs to implement ->cancel() is Xen, then the
> issue should be addressed without modifying the device core code, in a different
> way.

I thought it would be preferable to make use of/extend core
functionality where possible but if that's not the case we can find
another way.

Do you have any suggestions for how to correctly interact with the core
functions?

Is adding a suspend_cancel operation to just at the struct xenbus_driver
level and introducing a xen specific function to walk to the bus the
sort of thing you were thinking of? (it seems reasonable). Should we be
doing anything with dpm_*_list in that case?

(FWIW original thread is on xen-devel at
http://thread.gmane.org/gmane.comp.emulators.xen.devel/95265/)

Ian.

-- 
Ian Campbell
Current Noise: Crowbar - Dead Sun

Why on earth do people buy old bottles of wine when they can get a
fresh one for a quarter of the price?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-08 11:22     ` Ian Campbell
  0 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-08 11:22 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Greg KH, linux-pm, xen-devel, SUZUKI, Kazuhiro, LKML

On Mon, 2011-02-07 at 10:35 +0100, Rafael J. Wysocki wrote:
> On Monday, February 07, 2011, SUZUKI, Kazuhiro wrote:
> > Hi,
> 
> Hi,
> 
> > The following patch series fixes hangup after creating checkpoint on
> > Xen. The Linux Xen guest can be saved the state to restore later, and
> > also created snapshot like checkpoint via the hypervisor.
> > But, when the snapshot is created for the PV guest, it will hangup.
> > 
> > We added 'PMSG_CANCEL' message and 'cancel' handler in dev_pm_ops
> > struct in the pm-linux part.
> 
> Please don't do that, unless you can convince me there's no other way to fix
> the problem you're trying to address.

Sorry, it was my advise to Kazuhiro that solving the underlying issue by
extending the core would be preferable to making Xen specific hacks.

The problem is that currently we have:

        dpm_suspend_start(PMSG_SUSPEND);
        
                dpm_suspend_noirq(PMSG_SUSPEND);
                        
                        sysdev_suspend(PMSG_SUSPEND);
                        /* suspend hypercall */
                        sysdev_resume();
                
                dpm_resume_noirq(PMSG_RESUME);
        
        dpm_resume_end(PMSG_RESUME);

However the suspend hypercall can return a value indicating that the
suspend didn't actually happen (e.g. was cancelled). This is used e.g.
when checkpointing guests, because in that case you want the original
guest to continue. When the suspend didn't happen the drivers need to
recover differently from if it did.

The originally proposed solution was to only call dpm_resume_end if the
suspend was not cancelled. My concern with this was that unbalancing the
dpm_suspend_* and dpm_resume_* did not seem like a correct interaction
with the core. For example dpm_suspend_* adds stuff to
dpm_suspended_list and if dpm_resume_* is not called presumably this all
gets out of sync for next time. Hence my suggestion to add a cancel
message type.

> In my opinion it's highly unrealistic to assume that device drivers
> (or even subsystems) will implement the ->cancel() callback just for the
> benefit of Xen.

I did vague wonder if a similar message might be of interest to e.g.
cancelling hibernations or similar.

> And if the only subsystem that needs to implement ->cancel() is Xen, then the
> issue should be addressed without modifying the device core code, in a different
> way.

I thought it would be preferable to make use of/extend core
functionality where possible but if that's not the case we can find
another way.

Do you have any suggestions for how to correctly interact with the core
functions?

Is adding a suspend_cancel operation to just at the struct xenbus_driver
level and introducing a xen specific function to walk to the bus the
sort of thing you were thinking of? (it seems reasonable). Should we be
doing anything with dpm_*_list in that case?

(FWIW original thread is on xen-devel at
http://thread.gmane.org/gmane.comp.emulators.xen.devel/95265/)

Ian.

-- 
Ian Campbell
Current Noise: Crowbar - Dead Sun

Why on earth do people buy old bottles of wine when they can get a
fresh one for a quarter of the price?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-08 11:22     ` Ian Campbell
@ 2011-02-08 16:46       ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-08 16:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Rafael J. Wysocki, linux-pm, xen-devel, SUZUKI, Kazuhiro, LKML

On Tue, 8 Feb 2011, Ian Campbell wrote:

> The problem is that currently we have:
> 
>         dpm_suspend_start(PMSG_SUSPEND);
>         
>                 dpm_suspend_noirq(PMSG_SUSPEND);
>                         
>                         sysdev_suspend(PMSG_SUSPEND);
>                         /* suspend hypercall */
>                         sysdev_resume();
>                 
>                 dpm_resume_noirq(PMSG_RESUME);
>         
>         dpm_resume_end(PMSG_RESUME);
> 
> However the suspend hypercall can return a value indicating that the
> suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> when checkpointing guests, because in that case you want the original
> guest to continue. When the suspend didn't happen the drivers need to
> recover differently from if it did.

That is odd, and it is quite different from the intended design of the 
PM core.  Drivers are supposed to put their devices into a known 
suspended state; then afterwards they put the devices back into an 
operational state.  What happens while the devices are in the suspended 
state isn't supposed to matter -- the system transition can fail, but 
devices get treated exactly the same way as if it succeeded.

Why do your drivers need to recover differently based on the success of 
the hypercall?

Alan Stern


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-08 11:22     ` Ian Campbell
  (?)
@ 2011-02-08 16:46     ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-08 16:46 UTC (permalink / raw)
  To: Ian Campbell; +Cc: linux-pm, xen-devel, SUZUKI, Kazuhiro, LKML

On Tue, 8 Feb 2011, Ian Campbell wrote:

> The problem is that currently we have:
> 
>         dpm_suspend_start(PMSG_SUSPEND);
>         
>                 dpm_suspend_noirq(PMSG_SUSPEND);
>                         
>                         sysdev_suspend(PMSG_SUSPEND);
>                         /* suspend hypercall */
>                         sysdev_resume();
>                 
>                 dpm_resume_noirq(PMSG_RESUME);
>         
>         dpm_resume_end(PMSG_RESUME);
> 
> However the suspend hypercall can return a value indicating that the
> suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> when checkpointing guests, because in that case you want the original
> guest to continue. When the suspend didn't happen the drivers need to
> recover differently from if it did.

That is odd, and it is quite different from the intended design of the 
PM core.  Drivers are supposed to put their devices into a known 
suspended state; then afterwards they put the devices back into an 
operational state.  What happens while the devices are in the suspended 
state isn't supposed to matter -- the system transition can fail, but 
devices get treated exactly the same way as if it succeeded.

Why do your drivers need to recover differently based on the success of 
the hypercall?

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-08 16:46       ` Alan Stern
  0 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-08 16:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Rafael J. Wysocki, linux-pm, xen-devel, SUZUKI, Kazuhiro, LKML

On Tue, 8 Feb 2011, Ian Campbell wrote:

> The problem is that currently we have:
> 
>         dpm_suspend_start(PMSG_SUSPEND);
>         
>                 dpm_suspend_noirq(PMSG_SUSPEND);
>                         
>                         sysdev_suspend(PMSG_SUSPEND);
>                         /* suspend hypercall */
>                         sysdev_resume();
>                 
>                 dpm_resume_noirq(PMSG_RESUME);
>         
>         dpm_resume_end(PMSG_RESUME);
> 
> However the suspend hypercall can return a value indicating that the
> suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> when checkpointing guests, because in that case you want the original
> guest to continue. When the suspend didn't happen the drivers need to
> recover differently from if it did.

That is odd, and it is quite different from the intended design of the 
PM core.  Drivers are supposed to put their devices into a known 
suspended state; then afterwards they put the devices back into an 
operational state.  What happens while the devices are in the suspended 
state isn't supposed to matter -- the system transition can fail, but 
devices get treated exactly the same way as if it succeeded.

Why do your drivers need to recover differently based on the success of 
the hypercall?

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-08 16:46       ` Alan Stern
@ 2011-02-08 17:35         ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-08 17:35 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, linux-pm, xen-devel, SUZUKI, Kazuhiro, LKML,
	Brendan Cully

On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote:
> On Tue, 8 Feb 2011, Ian Campbell wrote:
> 
> > The problem is that currently we have:
> > 
> >         dpm_suspend_start(PMSG_SUSPEND);
> >         
> >                 dpm_suspend_noirq(PMSG_SUSPEND);
> >                         
> >                         sysdev_suspend(PMSG_SUSPEND);
> >                         /* suspend hypercall */
> >                         sysdev_resume();
> >                 
> >                 dpm_resume_noirq(PMSG_RESUME);
> >         
> >         dpm_resume_end(PMSG_RESUME);
> > 
> > However the suspend hypercall can return a value indicating that the
> > suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> > when checkpointing guests, because in that case you want the original
> > guest to continue. When the suspend didn't happen the drivers need to
> > recover differently from if it did.
> 
> That is odd, and it is quite different from the intended design of the 
> PM core.  Drivers are supposed to put their devices into a known 
> suspended state; then afterwards they put the devices back into an 
> operational state.  What happens while the devices are in the suspended 
> state isn't supposed to matter -- the system transition can fail, but 
> devices get treated exactly the same way as if it succeeded.
> 
> Why do your drivers need to recover differently based on the success of 
> the hypercall?

checkpointing isn't really my area but AIUI you don't want to do a full
device teardown and reconnect like you would with a proper suspend
because of the time that takes which prevents you from doing continuous
rolling checkpoints at granularity which people want to implement
various disaster recovery schemes.

Hopefully one of the Xen checkpointing folks will chime in and explain
why this is not possible to achieve at the individual driver level (or,
even better, with a patch which does it that way ;-)).

Ian.

-- 
Ian Campbell
Current Noise: Buckcherry - King Of Kings

Beauty?  What's that?
		-- Larry Wall in <199710221937.MAA25131@wall.org>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-08 16:46       ` Alan Stern
  (?)
  (?)
@ 2011-02-08 17:35       ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-08 17:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: xen-devel, LKML, Brendan Cully, linux-pm, SUZUKI, Kazuhiro

On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote:
> On Tue, 8 Feb 2011, Ian Campbell wrote:
> 
> > The problem is that currently we have:
> > 
> >         dpm_suspend_start(PMSG_SUSPEND);
> >         
> >                 dpm_suspend_noirq(PMSG_SUSPEND);
> >                         
> >                         sysdev_suspend(PMSG_SUSPEND);
> >                         /* suspend hypercall */
> >                         sysdev_resume();
> >                 
> >                 dpm_resume_noirq(PMSG_RESUME);
> >         
> >         dpm_resume_end(PMSG_RESUME);
> > 
> > However the suspend hypercall can return a value indicating that the
> > suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> > when checkpointing guests, because in that case you want the original
> > guest to continue. When the suspend didn't happen the drivers need to
> > recover differently from if it did.
> 
> That is odd, and it is quite different from the intended design of the 
> PM core.  Drivers are supposed to put their devices into a known 
> suspended state; then afterwards they put the devices back into an 
> operational state.  What happens while the devices are in the suspended 
> state isn't supposed to matter -- the system transition can fail, but 
> devices get treated exactly the same way as if it succeeded.
> 
> Why do your drivers need to recover differently based on the success of 
> the hypercall?

checkpointing isn't really my area but AIUI you don't want to do a full
device teardown and reconnect like you would with a proper suspend
because of the time that takes which prevents you from doing continuous
rolling checkpoints at granularity which people want to implement
various disaster recovery schemes.

Hopefully one of the Xen checkpointing folks will chime in and explain
why this is not possible to achieve at the individual driver level (or,
even better, with a patch which does it that way ;-)).

Ian.

-- 
Ian Campbell
Current Noise: Buckcherry - King Of Kings

Beauty?  What's that?
		-- Larry Wall in <199710221937.MAA25131@wall.org>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-08 17:35         ` Ian Campbell
  0 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-08 17:35 UTC (permalink / raw)
  To: Alan Stern
  Cc: xen-devel, LKML, Rafael J. Wysocki, Brendan Cully, linux-pm,
	SUZUKI, Kazuhiro

On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote:
> On Tue, 8 Feb 2011, Ian Campbell wrote:
> 
> > The problem is that currently we have:
> > 
> >         dpm_suspend_start(PMSG_SUSPEND);
> >         
> >                 dpm_suspend_noirq(PMSG_SUSPEND);
> >                         
> >                         sysdev_suspend(PMSG_SUSPEND);
> >                         /* suspend hypercall */
> >                         sysdev_resume();
> >                 
> >                 dpm_resume_noirq(PMSG_RESUME);
> >         
> >         dpm_resume_end(PMSG_RESUME);
> > 
> > However the suspend hypercall can return a value indicating that the
> > suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> > when checkpointing guests, because in that case you want the original
> > guest to continue. When the suspend didn't happen the drivers need to
> > recover differently from if it did.
> 
> That is odd, and it is quite different from the intended design of the 
> PM core.  Drivers are supposed to put their devices into a known 
> suspended state; then afterwards they put the devices back into an 
> operational state.  What happens while the devices are in the suspended 
> state isn't supposed to matter -- the system transition can fail, but 
> devices get treated exactly the same way as if it succeeded.
> 
> Why do your drivers need to recover differently based on the success of 
> the hypercall?

checkpointing isn't really my area but AIUI you don't want to do a full
device teardown and reconnect like you would with a proper suspend
because of the time that takes which prevents you from doing continuous
rolling checkpoints at granularity which people want to implement
various disaster recovery schemes.

Hopefully one of the Xen checkpointing folks will chime in and explain
why this is not possible to achieve at the individual driver level (or,
even better, with a patch which does it that way ;-)).

Ian.

-- 
Ian Campbell
Current Noise: Buckcherry - King Of Kings

Beauty?  What's that?
		-- Larry Wall in <199710221937.MAA25131@wall.org>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-08 17:35         ` Ian Campbell
@ 2011-02-09 23:16           ` Brendan Cully
  -1 siblings, 0 replies; 58+ messages in thread
From: Brendan Cully @ 2011-02-09 23:16 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Alan Stern, Rafael J. Wysocki, linux-pm, xen-devel, SUZUKI,
	Kazuhiro, LKML

On Tuesday, 08 February 2011 at 17:35, Ian Campbell wrote:
> On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote:
> > On Tue, 8 Feb 2011, Ian Campbell wrote:
> > 
> > > The problem is that currently we have:
> > > 
> > >         dpm_suspend_start(PMSG_SUSPEND);
> > >         
> > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > >                         
> > >                         sysdev_suspend(PMSG_SUSPEND);
> > >                         /* suspend hypercall */
> > >                         sysdev_resume();
> > >                 
> > >                 dpm_resume_noirq(PMSG_RESUME);
> > >         
> > >         dpm_resume_end(PMSG_RESUME);
> > > 
> > > However the suspend hypercall can return a value indicating that the
> > > suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> > > when checkpointing guests, because in that case you want the original
> > > guest to continue. When the suspend didn't happen the drivers need to
> > > recover differently from if it did.
> > 
> > That is odd, and it is quite different from the intended design of the 
> > PM core.  Drivers are supposed to put their devices into a known 
> > suspended state; then afterwards they put the devices back into an 
> > operational state.  What happens while the devices are in the suspended 
> > state isn't supposed to matter -- the system transition can fail, but 
> > devices get treated exactly the same way as if it succeeded.
> > 
> > Why do your drivers need to recover differently based on the success of 
> > the hypercall?
> 
> checkpointing isn't really my area but AIUI you don't want to do a full
> device teardown and reconnect like you would with a proper suspend
> because of the time that takes which prevents you from doing continuous
> rolling checkpoints at granularity which people want to implement
> various disaster recovery schemes.
> 
> Hopefully one of the Xen checkpointing folks will chime in and explain
> why this is not possible to achieve at the individual driver level (or,
> even better, with a patch which does it that way ;-)).

As Ian says, Xen has suspend_cancel because while the normal full
suspend/resume path works, it is much slower, and the work done during
resume is redundant. I don't remember the numbers offhand, but when we
added suspend_cancel I think we could do full suspend/resume in under
100us, which was maybe a couple of orders of magnitude faster than
full resume (largely due to slow xenstore handshaking on resume,
IIRC). It made a big difference for our Remus HA project, which was
checkpointing tens of times per second.

I'd like to keep the fast resume option, and expect that it can be
contained entirely in Xen-specific code. I'll try to get someone to
look into it here.

I think fast resume is somewhat orthogonal to the problem of hanging
on resume, which just sounds like a xen-specific bug in the slow
path.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-08 17:35         ` Ian Campbell
  (?)
  (?)
@ 2011-02-09 23:16         ` Brendan Cully
  -1 siblings, 0 replies; 58+ messages in thread
From: Brendan Cully @ 2011-02-09 23:16 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, LKML, linux-pm, SUZUKI, Kazuhiro

On Tuesday, 08 February 2011 at 17:35, Ian Campbell wrote:
> On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote:
> > On Tue, 8 Feb 2011, Ian Campbell wrote:
> > 
> > > The problem is that currently we have:
> > > 
> > >         dpm_suspend_start(PMSG_SUSPEND);
> > >         
> > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > >                         
> > >                         sysdev_suspend(PMSG_SUSPEND);
> > >                         /* suspend hypercall */
> > >                         sysdev_resume();
> > >                 
> > >                 dpm_resume_noirq(PMSG_RESUME);
> > >         
> > >         dpm_resume_end(PMSG_RESUME);
> > > 
> > > However the suspend hypercall can return a value indicating that the
> > > suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> > > when checkpointing guests, because in that case you want the original
> > > guest to continue. When the suspend didn't happen the drivers need to
> > > recover differently from if it did.
> > 
> > That is odd, and it is quite different from the intended design of the 
> > PM core.  Drivers are supposed to put their devices into a known 
> > suspended state; then afterwards they put the devices back into an 
> > operational state.  What happens while the devices are in the suspended 
> > state isn't supposed to matter -- the system transition can fail, but 
> > devices get treated exactly the same way as if it succeeded.
> > 
> > Why do your drivers need to recover differently based on the success of 
> > the hypercall?
> 
> checkpointing isn't really my area but AIUI you don't want to do a full
> device teardown and reconnect like you would with a proper suspend
> because of the time that takes which prevents you from doing continuous
> rolling checkpoints at granularity which people want to implement
> various disaster recovery schemes.
> 
> Hopefully one of the Xen checkpointing folks will chime in and explain
> why this is not possible to achieve at the individual driver level (or,
> even better, with a patch which does it that way ;-)).

As Ian says, Xen has suspend_cancel because while the normal full
suspend/resume path works, it is much slower, and the work done during
resume is redundant. I don't remember the numbers offhand, but when we
added suspend_cancel I think we could do full suspend/resume in under
100us, which was maybe a couple of orders of magnitude faster than
full resume (largely due to slow xenstore handshaking on resume,
IIRC). It made a big difference for our Remus HA project, which was
checkpointing tens of times per second.

I'd like to keep the fast resume option, and expect that it can be
contained entirely in Xen-specific code. I'll try to get someone to
look into it here.

I think fast resume is somewhat orthogonal to the problem of hanging
on resume, which just sounds like a xen-specific bug in the slow
path.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-09 23:16           ` Brendan Cully
  0 siblings, 0 replies; 58+ messages in thread
From: Brendan Cully @ 2011-02-09 23:16 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, LKML, Rafael J. Wysocki, Alan Stern, linux-pm, SUZUKI,
	Kazuhiro

On Tuesday, 08 February 2011 at 17:35, Ian Campbell wrote:
> On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote:
> > On Tue, 8 Feb 2011, Ian Campbell wrote:
> > 
> > > The problem is that currently we have:
> > > 
> > >         dpm_suspend_start(PMSG_SUSPEND);
> > >         
> > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > >                         
> > >                         sysdev_suspend(PMSG_SUSPEND);
> > >                         /* suspend hypercall */
> > >                         sysdev_resume();
> > >                 
> > >                 dpm_resume_noirq(PMSG_RESUME);
> > >         
> > >         dpm_resume_end(PMSG_RESUME);
> > > 
> > > However the suspend hypercall can return a value indicating that the
> > > suspend didn't actually happen (e.g. was cancelled). This is used e.g.
> > > when checkpointing guests, because in that case you want the original
> > > guest to continue. When the suspend didn't happen the drivers need to
> > > recover differently from if it did.
> > 
> > That is odd, and it is quite different from the intended design of the 
> > PM core.  Drivers are supposed to put their devices into a known 
> > suspended state; then afterwards they put the devices back into an 
> > operational state.  What happens while the devices are in the suspended 
> > state isn't supposed to matter -- the system transition can fail, but 
> > devices get treated exactly the same way as if it succeeded.
> > 
> > Why do your drivers need to recover differently based on the success of 
> > the hypercall?
> 
> checkpointing isn't really my area but AIUI you don't want to do a full
> device teardown and reconnect like you would with a proper suspend
> because of the time that takes which prevents you from doing continuous
> rolling checkpoints at granularity which people want to implement
> various disaster recovery schemes.
> 
> Hopefully one of the Xen checkpointing folks will chime in and explain
> why this is not possible to achieve at the individual driver level (or,
> even better, with a patch which does it that way ;-)).

As Ian says, Xen has suspend_cancel because while the normal full
suspend/resume path works, it is much slower, and the work done during
resume is redundant. I don't remember the numbers offhand, but when we
added suspend_cancel I think we could do full suspend/resume in under
100us, which was maybe a couple of orders of magnitude faster than
full resume (largely due to slow xenstore handshaking on resume,
IIRC). It made a big difference for our Remus HA project, which was
checkpointing tens of times per second.

I'd like to keep the fast resume option, and expect that it can be
contained entirely in Xen-specific code. I'll try to get someone to
look into it here.

I think fast resume is somewhat orthogonal to the problem of hanging
on resume, which just sounds like a xen-specific bug in the slow
path.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-09 23:16           ` Brendan Cully
@ 2011-02-09 23:42             ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-09 23:42 UTC (permalink / raw)
  To: Brendan Cully
  Cc: Ian Campbell, Rafael J. Wysocki, linux-pm, xen-devel, SUZUKI,
	Kazuhiro, LKML

On Wed, 9 Feb 2011, Brendan Cully wrote:

> > > That is odd, and it is quite different from the intended design of the 
> > > PM core.  Drivers are supposed to put their devices into a known 
> > > suspended state; then afterwards they put the devices back into an 
> > > operational state.  What happens while the devices are in the suspended 
> > > state isn't supposed to matter -- the system transition can fail, but 
> > > devices get treated exactly the same way as if it succeeded.
> > > 
> > > Why do your drivers need to recover differently based on the success of 
> > > the hypercall?
> > 
> > checkpointing isn't really my area but AIUI you don't want to do a full
> > device teardown and reconnect like you would with a proper suspend
> > because of the time that takes which prevents you from doing continuous
> > rolling checkpoints at granularity which people want to implement
> > various disaster recovery schemes.
> > 
> > Hopefully one of the Xen checkpointing folks will chime in and explain
> > why this is not possible to achieve at the individual driver level (or,
> > even better, with a patch which does it that way ;-)).
> 
> As Ian says, Xen has suspend_cancel because while the normal full
> suspend/resume path works, it is much slower, and the work done during
> resume is redundant. I don't remember the numbers offhand, but when we
> added suspend_cancel I think we could do full suspend/resume in under
> 100us, which was maybe a couple of orders of magnitude faster than
> full resume (largely due to slow xenstore handshaking on resume,
> IIRC). It made a big difference for our Remus HA project, which was
> checkpointing tens of times per second.
> 
> I'd like to keep the fast resume option, and expect that it can be
> contained entirely in Xen-specific code. I'll try to get someone to
> look into it here.
> 
> I think fast resume is somewhat orthogonal to the problem of hanging
> on resume, which just sounds like a xen-specific bug in the slow
> path.

In fact there already is a "fast suspend & resume" path in the PM core.  
It's the freeze/thaw procedure used when starting to hibernate.  The
documentation specifically says that drivers' freeze methods are
supposed to quiesce their devices but not change power levels.  In
addition, the thaw method is invoked as part of recovery from a failed
hibernation attempt, so it already has the "cancel" semantics that xen 
seems to want.

Alan Stern


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-09 23:16           ` Brendan Cully
  (?)
  (?)
@ 2011-02-09 23:42           ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-09 23:42 UTC (permalink / raw)
  To: Brendan Cully; +Cc: xen-devel, LKML, Ian Campbell, linux-pm, SUZUKI, Kazuhiro

On Wed, 9 Feb 2011, Brendan Cully wrote:

> > > That is odd, and it is quite different from the intended design of the 
> > > PM core.  Drivers are supposed to put their devices into a known 
> > > suspended state; then afterwards they put the devices back into an 
> > > operational state.  What happens while the devices are in the suspended 
> > > state isn't supposed to matter -- the system transition can fail, but 
> > > devices get treated exactly the same way as if it succeeded.
> > > 
> > > Why do your drivers need to recover differently based on the success of 
> > > the hypercall?
> > 
> > checkpointing isn't really my area but AIUI you don't want to do a full
> > device teardown and reconnect like you would with a proper suspend
> > because of the time that takes which prevents you from doing continuous
> > rolling checkpoints at granularity which people want to implement
> > various disaster recovery schemes.
> > 
> > Hopefully one of the Xen checkpointing folks will chime in and explain
> > why this is not possible to achieve at the individual driver level (or,
> > even better, with a patch which does it that way ;-)).
> 
> As Ian says, Xen has suspend_cancel because while the normal full
> suspend/resume path works, it is much slower, and the work done during
> resume is redundant. I don't remember the numbers offhand, but when we
> added suspend_cancel I think we could do full suspend/resume in under
> 100us, which was maybe a couple of orders of magnitude faster than
> full resume (largely due to slow xenstore handshaking on resume,
> IIRC). It made a big difference for our Remus HA project, which was
> checkpointing tens of times per second.
> 
> I'd like to keep the fast resume option, and expect that it can be
> contained entirely in Xen-specific code. I'll try to get someone to
> look into it here.
> 
> I think fast resume is somewhat orthogonal to the problem of hanging
> on resume, which just sounds like a xen-specific bug in the slow
> path.

In fact there already is a "fast suspend & resume" path in the PM core.  
It's the freeze/thaw procedure used when starting to hibernate.  The
documentation specifically says that drivers' freeze methods are
supposed to quiesce their devices but not change power levels.  In
addition, the thaw method is invoked as part of recovery from a failed
hibernation attempt, so it already has the "cancel" semantics that xen 
seems to want.

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-09 23:42             ` Alan Stern
  0 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-09 23:42 UTC (permalink / raw)
  To: Brendan Cully
  Cc: Ian Campbell, Rafael J. Wysocki, linux-pm, xen-devel, SUZUKI,
	Kazuhiro, LKML

On Wed, 9 Feb 2011, Brendan Cully wrote:

> > > That is odd, and it is quite different from the intended design of the 
> > > PM core.  Drivers are supposed to put their devices into a known 
> > > suspended state; then afterwards they put the devices back into an 
> > > operational state.  What happens while the devices are in the suspended 
> > > state isn't supposed to matter -- the system transition can fail, but 
> > > devices get treated exactly the same way as if it succeeded.
> > > 
> > > Why do your drivers need to recover differently based on the success of 
> > > the hypercall?
> > 
> > checkpointing isn't really my area but AIUI you don't want to do a full
> > device teardown and reconnect like you would with a proper suspend
> > because of the time that takes which prevents you from doing continuous
> > rolling checkpoints at granularity which people want to implement
> > various disaster recovery schemes.
> > 
> > Hopefully one of the Xen checkpointing folks will chime in and explain
> > why this is not possible to achieve at the individual driver level (or,
> > even better, with a patch which does it that way ;-)).
> 
> As Ian says, Xen has suspend_cancel because while the normal full
> suspend/resume path works, it is much slower, and the work done during
> resume is redundant. I don't remember the numbers offhand, but when we
> added suspend_cancel I think we could do full suspend/resume in under
> 100us, which was maybe a couple of orders of magnitude faster than
> full resume (largely due to slow xenstore handshaking on resume,
> IIRC). It made a big difference for our Remus HA project, which was
> checkpointing tens of times per second.
> 
> I'd like to keep the fast resume option, and expect that it can be
> contained entirely in Xen-specific code. I'll try to get someone to
> look into it here.
> 
> I think fast resume is somewhat orthogonal to the problem of hanging
> on resume, which just sounds like a xen-specific bug in the slow
> path.

In fact there already is a "fast suspend & resume" path in the PM core.  
It's the freeze/thaw procedure used when starting to hibernate.  The
documentation specifically says that drivers' freeze methods are
supposed to quiesce their devices but not change power levels.  In
addition, the thaw method is invoked as part of recovery from a failed
hibernation attempt, so it already has the "cancel" semantics that xen 
seems to want.

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-09 23:16           ` Brendan Cully
@ 2011-02-10 11:31             ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 11:31 UTC (permalink / raw)
  To: Brendan Cully
  Cc: xen-devel, LKML, Rafael J. Wysocki, Alan Stern, linux-pm, SUZUKI,
	Kazuhiro

On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> I'd like to keep the fast resume option, and expect that it can be
> contained entirely in Xen-specific code. I'll try to get someone to
> look into it here.

Thanks, please put them in contact with Kazuhiro who has already been
looking into this.

> I think fast resume is somewhat orthogonal to the problem of hanging
> on resume, which just sounds like a xen-specific bug in the slow
> path.

The bug on the Xen side is taking the slow path when the suspend was
actually cancelled, which is what the original patch tries to fix.

Or should/could the slow path be able cope either way?

Ian.

-- 
Ian Campbell
Current Noise: Neil Young - Love And War

For every complex problem, there is a solution that is simple, neat, and wrong.
		-- H. L. Mencken


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-09 23:16           ` Brendan Cully
                             ` (3 preceding siblings ...)
  (?)
@ 2011-02-10 11:31           ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 11:31 UTC (permalink / raw)
  To: Brendan Cully; +Cc: xen-devel, LKML, linux-pm, SUZUKI, Kazuhiro

On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> I'd like to keep the fast resume option, and expect that it can be
> contained entirely in Xen-specific code. I'll try to get someone to
> look into it here.

Thanks, please put them in contact with Kazuhiro who has already been
looking into this.

> I think fast resume is somewhat orthogonal to the problem of hanging
> on resume, which just sounds like a xen-specific bug in the slow
> path.

The bug on the Xen side is taking the slow path when the suspend was
actually cancelled, which is what the original patch tries to fix.

Or should/could the slow path be able cope either way?

Ian.

-- 
Ian Campbell
Current Noise: Neil Young - Love And War

For every complex problem, there is a solution that is simple, neat, and wrong.
		-- H. L. Mencken

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 11:31             ` Ian Campbell
  0 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 11:31 UTC (permalink / raw)
  To: Brendan Cully
  Cc: xen-devel, LKML, Alan, Rafael J. Wysocki, Stern, linux-pm,
	SUZUKI, Kazuhiro

On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> I'd like to keep the fast resume option, and expect that it can be
> contained entirely in Xen-specific code. I'll try to get someone to
> look into it here.

Thanks, please put them in contact with Kazuhiro who has already been
looking into this.

> I think fast resume is somewhat orthogonal to the problem of hanging
> on resume, which just sounds like a xen-specific bug in the slow
> path.

The bug on the Xen side is taking the slow path when the suspend was
actually cancelled, which is what the original patch tries to fix.

Or should/could the slow path be able cope either way?

Ian.

-- 
Ian Campbell
Current Noise: Neil Young - Love And War

For every complex problem, there is a solution that is simple, neat, and wrong.
		-- H. L. Mencken

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-09 23:42             ` Alan Stern
@ 2011-02-10 11:40               ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 11:40 UTC (permalink / raw)
  To: Alan Stern
  Cc: Brendan Cully, xen-devel, LKML, Rafael J. Wysocki, linux-pm,
	SUZUKI, Kazuhiro

On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> In fact there already is a "fast suspend & resume" path in the PM core.  
> It's the freeze/thaw procedure used when starting to hibernate.  The
> documentation specifically says that drivers' freeze methods are
> supposed to quiesce their devices but not change power levels.  In
> addition, the thaw method is invoked as part of recovery from a failed
> hibernation attempt, so it already has the "cancel" semantics that xen 
> seems to want.

Sounds like that would work and I would much prefer to simply make
correct use of the core functionality.

So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
on whether the suspend was cancelled or not? So the sequence of events
is something like:
	dpm_suspend_start(PMSG_FREEZE);
         
		dpm_suspend_noirq(PMSG_FREEZE);
                         
			sysdev_suspend(PMSG_QUIESCE);
			cancelled = suspend_hypercall()
			sysdev_resume();
                 
		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
         
	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
?

(For comparison we currently have:
> > >         dpm_suspend_start(PMSG_SUSPEND);
> > >         
> > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > >                         
> > >                         sysdev_suspend(PMSG_SUSPEND);
> > >                         /* suspend hypercall */
> > >                         sysdev_resume();
> > >                 
> > >                 dpm_resume_noirq(PMSG_RESUME);
> > >         
> > >         dpm_resume_end(PMSG_RESUME);
)

Ian.
-- 
Ian Campbell
Current Noise: Neil Young - Angry World

If only one could get that wonderful feeling of accomplishment without
having to accomplish anything.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-09 23:42             ` Alan Stern
  (?)
@ 2011-02-10 11:40             ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 11:40 UTC (permalink / raw)
  To: Alan Stern; +Cc: xen-devel, LKML, Brendan Cully, linux-pm, SUZUKI, Kazuhiro

On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> In fact there already is a "fast suspend & resume" path in the PM core.  
> It's the freeze/thaw procedure used when starting to hibernate.  The
> documentation specifically says that drivers' freeze methods are
> supposed to quiesce their devices but not change power levels.  In
> addition, the thaw method is invoked as part of recovery from a failed
> hibernation attempt, so it already has the "cancel" semantics that xen 
> seems to want.

Sounds like that would work and I would much prefer to simply make
correct use of the core functionality.

So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
on whether the suspend was cancelled or not? So the sequence of events
is something like:
	dpm_suspend_start(PMSG_FREEZE);
         
		dpm_suspend_noirq(PMSG_FREEZE);
                         
			sysdev_suspend(PMSG_QUIESCE);
			cancelled = suspend_hypercall()
			sysdev_resume();
                 
		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
         
	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
?

(For comparison we currently have:
> > >         dpm_suspend_start(PMSG_SUSPEND);
> > >         
> > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > >                         
> > >                         sysdev_suspend(PMSG_SUSPEND);
> > >                         /* suspend hypercall */
> > >                         sysdev_resume();
> > >                 
> > >                 dpm_resume_noirq(PMSG_RESUME);
> > >         
> > >         dpm_resume_end(PMSG_RESUME);
)

Ian.
-- 
Ian Campbell
Current Noise: Neil Young - Angry World

If only one could get that wonderful feeling of accomplishment without
having to accomplish anything.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 11:40               ` Ian Campbell
  0 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 11:40 UTC (permalink / raw)
  To: Alan Stern
  Cc: xen-devel, LKML, Rafael J. Wysocki, Brendan Cully, linux-pm,
	SUZUKI, Kazuhiro

On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> In fact there already is a "fast suspend & resume" path in the PM core.  
> It's the freeze/thaw procedure used when starting to hibernate.  The
> documentation specifically says that drivers' freeze methods are
> supposed to quiesce their devices but not change power levels.  In
> addition, the thaw method is invoked as part of recovery from a failed
> hibernation attempt, so it already has the "cancel" semantics that xen 
> seems to want.

Sounds like that would work and I would much prefer to simply make
correct use of the core functionality.

So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
on whether the suspend was cancelled or not? So the sequence of events
is something like:
	dpm_suspend_start(PMSG_FREEZE);
         
		dpm_suspend_noirq(PMSG_FREEZE);
                         
			sysdev_suspend(PMSG_QUIESCE);
			cancelled = suspend_hypercall()
			sysdev_resume();
                 
		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
         
	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
?

(For comparison we currently have:
> > >         dpm_suspend_start(PMSG_SUSPEND);
> > >         
> > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > >                         
> > >                         sysdev_suspend(PMSG_SUSPEND);
> > >                         /* suspend hypercall */
> > >                         sysdev_resume();
> > >                 
> > >                 dpm_resume_noirq(PMSG_RESUME);
> > >         
> > >         dpm_resume_end(PMSG_RESUME);
)

Ian.
-- 
Ian Campbell
Current Noise: Neil Young - Angry World

If only one could get that wonderful feeling of accomplishment without
having to accomplish anything.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 11:31             ` Ian Campbell
  (?)
@ 2011-02-10 12:40             ` Ian Campbell
  2011-02-10 19:31               ` Brendan Cully
  -1 siblings, 1 reply; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 12:40 UTC (permalink / raw)
  To: Brendan Cully; +Cc: xen-devel

(cutting CC to just xen-devel)

On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:
> On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > I'd like to keep the fast resume option, and expect that it can be
> > contained entirely in Xen-specific code. I'll try to get someone to
> > look into it here.

On that note: Is there someone around who is willing to more actively
and visibly maintain Remus? i.e. make it work with the new toolstack,
upstream it to the mainline kernel, work to get it into a state where it
can be tested as a matter of course, and generally be seen to be
improving it (and its integration/interaction with the rest of the
ecosystem) with time?

At the moment the impression is that Remus is mostly being left to rot
apart from when noise gets made about specific issues.

Ian.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 11:40               ` Ian Campbell
@ 2011-02-10 16:00                 ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-10 16:00 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Brendan Cully, xen-devel, LKML, Rafael J. Wysocki, linux-pm,
	SUZUKI, Kazuhiro

On Thu, 10 Feb 2011, Ian Campbell wrote:

> On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > In fact there already is a "fast suspend & resume" path in the PM core.  
> > It's the freeze/thaw procedure used when starting to hibernate.  The
> > documentation specifically says that drivers' freeze methods are
> > supposed to quiesce their devices but not change power levels.  In
> > addition, the thaw method is invoked as part of recovery from a failed
> > hibernation attempt, so it already has the "cancel" semantics that xen 
> > seems to want.
> 
> Sounds like that would work and I would much prefer to simply make
> correct use of the core functionality.

It seems like a reasonable approach.  Whether it will actually _work_ 
is a harder question...  :-)

> So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> on whether the suspend was cancelled or not?

Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
after a memory image has been restored (although this isn't relevant to
your snapshotting).  See the comments in include/linux/pm.h.

>  So the sequence of events
> is something like:
> 	dpm_suspend_start(PMSG_FREEZE);
>          
> 		dpm_suspend_noirq(PMSG_FREEZE);
>                          
> 			sysdev_suspend(PMSG_QUIESCE);

This should say sysdev_suspend(PMSG_FREEZE).

> 			cancelled = suspend_hypercall()

At this point swsusp_arch_suspend() is called.  If that translates to 
suspend_hypercall() in your setting, then yes.

> 			sysdev_resume();
>                  
> 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
>          
> 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> ?

Yes.

> (For comparison we currently have:
> > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > >         
> > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > >                         
> > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > >                         /* suspend hypercall */
> > > >                         sysdev_resume();
> > > >                 
> > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > >         
> > > >         dpm_resume_end(PMSG_RESUME);
> )

Right.  The sequence of calls is the same, but the PMSG_ argument is 
different so drivers are expected to act differently in response.

Alan Stern


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 11:40               ` Ian Campbell
  (?)
@ 2011-02-10 16:00               ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-10 16:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, LKML, Brendan Cully, linux-pm, SUZUKI, Kazuhiro

On Thu, 10 Feb 2011, Ian Campbell wrote:

> On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > In fact there already is a "fast suspend & resume" path in the PM core.  
> > It's the freeze/thaw procedure used when starting to hibernate.  The
> > documentation specifically says that drivers' freeze methods are
> > supposed to quiesce their devices but not change power levels.  In
> > addition, the thaw method is invoked as part of recovery from a failed
> > hibernation attempt, so it already has the "cancel" semantics that xen 
> > seems to want.
> 
> Sounds like that would work and I would much prefer to simply make
> correct use of the core functionality.

It seems like a reasonable approach.  Whether it will actually _work_ 
is a harder question...  :-)

> So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> on whether the suspend was cancelled or not?

Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
after a memory image has been restored (although this isn't relevant to
your snapshotting).  See the comments in include/linux/pm.h.

>  So the sequence of events
> is something like:
> 	dpm_suspend_start(PMSG_FREEZE);
>          
> 		dpm_suspend_noirq(PMSG_FREEZE);
>                          
> 			sysdev_suspend(PMSG_QUIESCE);

This should say sysdev_suspend(PMSG_FREEZE).

> 			cancelled = suspend_hypercall()

At this point swsusp_arch_suspend() is called.  If that translates to 
suspend_hypercall() in your setting, then yes.

> 			sysdev_resume();
>                  
> 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
>          
> 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> ?

Yes.

> (For comparison we currently have:
> > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > >         
> > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > >                         
> > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > >                         /* suspend hypercall */
> > > >                         sysdev_resume();
> > > >                 
> > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > >         
> > > >         dpm_resume_end(PMSG_RESUME);
> )

Right.  The sequence of calls is the same, but the PMSG_ argument is 
different so drivers are expected to act differently in response.

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 16:00                 ` Alan Stern
  0 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-10 16:00 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, LKML, Rafael J. Wysocki, Brendan Cully, linux-pm,
	SUZUKI, Kazuhiro

On Thu, 10 Feb 2011, Ian Campbell wrote:

> On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > In fact there already is a "fast suspend & resume" path in the PM core.  
> > It's the freeze/thaw procedure used when starting to hibernate.  The
> > documentation specifically says that drivers' freeze methods are
> > supposed to quiesce their devices but not change power levels.  In
> > addition, the thaw method is invoked as part of recovery from a failed
> > hibernation attempt, so it already has the "cancel" semantics that xen 
> > seems to want.
> 
> Sounds like that would work and I would much prefer to simply make
> correct use of the core functionality.

It seems like a reasonable approach.  Whether it will actually _work_ 
is a harder question...  :-)

> So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> on whether the suspend was cancelled or not?

Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
after a memory image has been restored (although this isn't relevant to
your snapshotting).  See the comments in include/linux/pm.h.

>  So the sequence of events
> is something like:
> 	dpm_suspend_start(PMSG_FREEZE);
>          
> 		dpm_suspend_noirq(PMSG_FREEZE);
>                          
> 			sysdev_suspend(PMSG_QUIESCE);

This should say sysdev_suspend(PMSG_FREEZE).

> 			cancelled = suspend_hypercall()

At this point swsusp_arch_suspend() is called.  If that translates to 
suspend_hypercall() in your setting, then yes.

> 			sysdev_resume();
>                  
> 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
>          
> 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> ?

Yes.

> (For comparison we currently have:
> > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > >         
> > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > >                         
> > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > >                         /* suspend hypercall */
> > > >                         sysdev_resume();
> > > >                 
> > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > >         
> > > >         dpm_resume_end(PMSG_RESUME);
> )

Right.  The sequence of calls is the same, but the PMSG_ argument is 
different so drivers are expected to act differently in response.

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:00                 ` Alan Stern
@ 2011-02-10 16:26                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-10 16:26 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ian Campbell, Brendan Cully, xen-devel, LKML, linux-pm, SUZUKI, Kazuhiro

On Thursday, February 10, 2011, Alan Stern wrote:
> On Thu, 10 Feb 2011, Ian Campbell wrote:
> 
> > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > documentation specifically says that drivers' freeze methods are
> > > supposed to quiesce their devices but not change power levels.  In
> > > addition, the thaw method is invoked as part of recovery from a failed
> > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > seems to want.
> > 
> > Sounds like that would work and I would much prefer to simply make
> > correct use of the core functionality.
> 
> It seems like a reasonable approach.  Whether it will actually _work_ 
> is a harder question...  :-)
> 
> > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > on whether the suspend was cancelled or not?

That's correct, but from drivers' point of view PMSG_THAW is equivalent to
PMSG_RECOVER, because the both of them cause ->thaw() callbacks to be executed.

> Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> after a memory image has been restored (although this isn't relevant to
> your snapshotting).  See the comments in include/linux/pm.h.

Yup.

> >  So the sequence of events
> > is something like:
> > 	dpm_suspend_start(PMSG_FREEZE);
> >          
> > 		dpm_suspend_noirq(PMSG_FREEZE);
> >                          
> > 			sysdev_suspend(PMSG_QUIESCE);
> 
> This should say sysdev_suspend(PMSG_FREEZE).

Yes, PMSG_QUIESCE is restore-specific.

> > 			cancelled = suspend_hypercall()
> 
> At this point swsusp_arch_suspend() is called.  If that translates to 
> suspend_hypercall() in your setting, then yes.
> 
> > 			sysdev_resume();
> >                  
> > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> >          
> > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > ?
> 
> Yes.

Actually, I think PMSG_THAW can be used in both cases.  The resume-side
routines only use the 'state' argument for diagnostics.

> > (For comparison we currently have:
> > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > >         
> > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > >                         
> > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > >                         /* suspend hypercall */
> > > > >                         sysdev_resume();
> > > > >                 
> > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > >         
> > > > >         dpm_resume_end(PMSG_RESUME);
> > )
> 
> Right.  The sequence of calls is the same, but the PMSG_ argument is 
> different so drivers are expected to act differently in response.

That's correct.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:00                 ` Alan Stern
  (?)
  (?)
@ 2011-02-10 16:26                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-10 16:26 UTC (permalink / raw)
  To: Alan Stern
  Cc: xen-devel, LKML, Brendan Cully, Ian Campbell, linux-pm, SUZUKI, Kazuhiro

On Thursday, February 10, 2011, Alan Stern wrote:
> On Thu, 10 Feb 2011, Ian Campbell wrote:
> 
> > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > documentation specifically says that drivers' freeze methods are
> > > supposed to quiesce their devices but not change power levels.  In
> > > addition, the thaw method is invoked as part of recovery from a failed
> > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > seems to want.
> > 
> > Sounds like that would work and I would much prefer to simply make
> > correct use of the core functionality.
> 
> It seems like a reasonable approach.  Whether it will actually _work_ 
> is a harder question...  :-)
> 
> > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > on whether the suspend was cancelled or not?

That's correct, but from drivers' point of view PMSG_THAW is equivalent to
PMSG_RECOVER, because the both of them cause ->thaw() callbacks to be executed.

> Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> after a memory image has been restored (although this isn't relevant to
> your snapshotting).  See the comments in include/linux/pm.h.

Yup.

> >  So the sequence of events
> > is something like:
> > 	dpm_suspend_start(PMSG_FREEZE);
> >          
> > 		dpm_suspend_noirq(PMSG_FREEZE);
> >                          
> > 			sysdev_suspend(PMSG_QUIESCE);
> 
> This should say sysdev_suspend(PMSG_FREEZE).

Yes, PMSG_QUIESCE is restore-specific.

> > 			cancelled = suspend_hypercall()
> 
> At this point swsusp_arch_suspend() is called.  If that translates to 
> suspend_hypercall() in your setting, then yes.
> 
> > 			sysdev_resume();
> >                  
> > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> >          
> > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > ?
> 
> Yes.

Actually, I think PMSG_THAW can be used in both cases.  The resume-side
routines only use the 'state' argument for diagnostics.

> > (For comparison we currently have:
> > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > >         
> > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > >                         
> > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > >                         /* suspend hypercall */
> > > > >                         sysdev_resume();
> > > > >                 
> > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > >         
> > > > >         dpm_resume_end(PMSG_RESUME);
> > )
> 
> Right.  The sequence of calls is the same, but the PMSG_ argument is 
> different so drivers are expected to act differently in response.

That's correct.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 16:26                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-10 16:26 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ian Campbell, Brendan Cully, xen-devel, LKML, linux-pm, SUZUKI, Kazuhiro

On Thursday, February 10, 2011, Alan Stern wrote:
> On Thu, 10 Feb 2011, Ian Campbell wrote:
> 
> > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > documentation specifically says that drivers' freeze methods are
> > > supposed to quiesce their devices but not change power levels.  In
> > > addition, the thaw method is invoked as part of recovery from a failed
> > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > seems to want.
> > 
> > Sounds like that would work and I would much prefer to simply make
> > correct use of the core functionality.
> 
> It seems like a reasonable approach.  Whether it will actually _work_ 
> is a harder question...  :-)
> 
> > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > on whether the suspend was cancelled or not?

That's correct, but from drivers' point of view PMSG_THAW is equivalent to
PMSG_RECOVER, because the both of them cause ->thaw() callbacks to be executed.

> Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> after a memory image has been restored (although this isn't relevant to
> your snapshotting).  See the comments in include/linux/pm.h.

Yup.

> >  So the sequence of events
> > is something like:
> > 	dpm_suspend_start(PMSG_FREEZE);
> >          
> > 		dpm_suspend_noirq(PMSG_FREEZE);
> >                          
> > 			sysdev_suspend(PMSG_QUIESCE);
> 
> This should say sysdev_suspend(PMSG_FREEZE).

Yes, PMSG_QUIESCE is restore-specific.

> > 			cancelled = suspend_hypercall()
> 
> At this point swsusp_arch_suspend() is called.  If that translates to 
> suspend_hypercall() in your setting, then yes.
> 
> > 			sysdev_resume();
> >                  
> > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> >          
> > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > ?
> 
> Yes.

Actually, I think PMSG_THAW can be used in both cases.  The resume-side
routines only use the 'state' argument for diagnostics.

> > (For comparison we currently have:
> > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > >         
> > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > >                         
> > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > >                         /* suspend hypercall */
> > > > >                         sysdev_resume();
> > > > >                 
> > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > >         
> > > > >         dpm_resume_end(PMSG_RESUME);
> > )
> 
> Right.  The sequence of calls is the same, but the PMSG_ argument is 
> different so drivers are expected to act differently in response.

That's correct.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:00                 ` Alan Stern
@ 2011-02-10 16:34                   ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 16:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Brendan Cully, xen-devel, LKML, Rafael J. Wysocki, linux-pm,
	SUZUKI, Kazuhiro

On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote:
> On Thu, 10 Feb 2011, Ian Campbell wrote:
> 
> > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > documentation specifically says that drivers' freeze methods are
> > > supposed to quiesce their devices but not change power levels.  In
> > > addition, the thaw method is invoked as part of recovery from a failed
> > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > seems to want.
> > 
> > Sounds like that would work and I would much prefer to simply make
> > correct use of the core functionality.
> 
> It seems like a reasonable approach.  Whether it will actually _work_ 
> is a harder question...  :-)

Heh.

> > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > on whether the suspend was cancelled or not?
> 
> Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> after a memory image has been restored (although this isn't relevant to
> your snapshotting).  See the comments in include/linux/pm.h.

The documentation of the individual events in pm.h is good. Is there a
reference for the sequence of events for the different types of
suspend/hibernate/etc?

> >  So the sequence of events
> > is something like:
> > 	dpm_suspend_start(PMSG_FREEZE);
> >          
> > 		dpm_suspend_noirq(PMSG_FREEZE);
> >                          
> > 			sysdev_suspend(PMSG_QUIESCE);
> 
> This should say sysdev_suspend(PMSG_FREEZE).
> 
> > 			cancelled = suspend_hypercall()
> 
> At this point swsusp_arch_suspend() is called.  If that translates to 
> suspend_hypercall() in your setting, then yes.
> 
> > 			sysdev_resume();
> >                  
> > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> >          
> > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > ?
> 
> Yes.

Both of those call ->thaw ->complete. Did I mean "cancelled ?
PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?)

If the suspend was cancelled then we want the devices to simply pickup
where they were before the freeze, wereas if we really did suspend (or
migrate or whatever) then they need to do a more complete reset and
reconnect operation so we want some sort of indication to the driver
which happened.

> > (For comparison we currently have:
> > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > >         
> > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > >                         
> > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > >                         /* suspend hypercall */
> > > > >                         sysdev_resume();
> > > > >                 
> > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > >         
> > > > >         dpm_resume_end(PMSG_RESUME);
> > )
> 
> Right.  The sequence of calls is the same, but the PMSG_ argument is 
> different so drivers are expected to act differently in response.

The drivers don't actually see the PMSG_* though right? They only see a
differing sequence of hooks from dev_pm_ops called.

Thanks,
Ian.

> 
> Alan Stern
> 
> 

-- 
Ian Campbell
Current Noise: Sworn Amongst - Useless

Accident, n.:
	A condition in which presence of mind is good, but absence of
	body is better.
		-- Foolish Dictionary


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:00                 ` Alan Stern
                                   ` (2 preceding siblings ...)
  (?)
@ 2011-02-10 16:34                 ` Ian Campbell
  -1 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 16:34 UTC (permalink / raw)
  To: Alan Stern; +Cc: xen-devel, LKML, Brendan Cully, linux-pm, SUZUKI, Kazuhiro

On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote:
> On Thu, 10 Feb 2011, Ian Campbell wrote:
> 
> > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > documentation specifically says that drivers' freeze methods are
> > > supposed to quiesce their devices but not change power levels.  In
> > > addition, the thaw method is invoked as part of recovery from a failed
> > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > seems to want.
> > 
> > Sounds like that would work and I would much prefer to simply make
> > correct use of the core functionality.
> 
> It seems like a reasonable approach.  Whether it will actually _work_ 
> is a harder question...  :-)

Heh.

> > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > on whether the suspend was cancelled or not?
> 
> Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> after a memory image has been restored (although this isn't relevant to
> your snapshotting).  See the comments in include/linux/pm.h.

The documentation of the individual events in pm.h is good. Is there a
reference for the sequence of events for the different types of
suspend/hibernate/etc?

> >  So the sequence of events
> > is something like:
> > 	dpm_suspend_start(PMSG_FREEZE);
> >          
> > 		dpm_suspend_noirq(PMSG_FREEZE);
> >                          
> > 			sysdev_suspend(PMSG_QUIESCE);
> 
> This should say sysdev_suspend(PMSG_FREEZE).
> 
> > 			cancelled = suspend_hypercall()
> 
> At this point swsusp_arch_suspend() is called.  If that translates to 
> suspend_hypercall() in your setting, then yes.
> 
> > 			sysdev_resume();
> >                  
> > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> >          
> > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > ?
> 
> Yes.

Both of those call ->thaw ->complete. Did I mean "cancelled ?
PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?)

If the suspend was cancelled then we want the devices to simply pickup
where they were before the freeze, wereas if we really did suspend (or
migrate or whatever) then they need to do a more complete reset and
reconnect operation so we want some sort of indication to the driver
which happened.

> > (For comparison we currently have:
> > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > >         
> > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > >                         
> > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > >                         /* suspend hypercall */
> > > > >                         sysdev_resume();
> > > > >                 
> > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > >         
> > > > >         dpm_resume_end(PMSG_RESUME);
> > )
> 
> Right.  The sequence of calls is the same, but the PMSG_ argument is 
> different so drivers are expected to act differently in response.

The drivers don't actually see the PMSG_* though right? They only see a
differing sequence of hooks from dev_pm_ops called.

Thanks,
Ian.

> 
> Alan Stern
> 
> 

-- 
Ian Campbell
Current Noise: Sworn Amongst - Useless

Accident, n.:
	A condition in which presence of mind is good, but absence of
	body is better.
		-- Foolish Dictionary

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 16:34                   ` Ian Campbell
  0 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-10 16:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: xen-devel, LKML, Rafael J. Wysocki, Brendan Cully, linux-pm,
	SUZUKI, Kazuhiro

On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote:
> On Thu, 10 Feb 2011, Ian Campbell wrote:
> 
> > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > documentation specifically says that drivers' freeze methods are
> > > supposed to quiesce their devices but not change power levels.  In
> > > addition, the thaw method is invoked as part of recovery from a failed
> > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > seems to want.
> > 
> > Sounds like that would work and I would much prefer to simply make
> > correct use of the core functionality.
> 
> It seems like a reasonable approach.  Whether it will actually _work_ 
> is a harder question...  :-)

Heh.

> > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > on whether the suspend was cancelled or not?
> 
> Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> after a memory image has been restored (although this isn't relevant to
> your snapshotting).  See the comments in include/linux/pm.h.

The documentation of the individual events in pm.h is good. Is there a
reference for the sequence of events for the different types of
suspend/hibernate/etc?

> >  So the sequence of events
> > is something like:
> > 	dpm_suspend_start(PMSG_FREEZE);
> >          
> > 		dpm_suspend_noirq(PMSG_FREEZE);
> >                          
> > 			sysdev_suspend(PMSG_QUIESCE);
> 
> This should say sysdev_suspend(PMSG_FREEZE).
> 
> > 			cancelled = suspend_hypercall()
> 
> At this point swsusp_arch_suspend() is called.  If that translates to 
> suspend_hypercall() in your setting, then yes.
> 
> > 			sysdev_resume();
> >                  
> > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> >          
> > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > ?
> 
> Yes.

Both of those call ->thaw ->complete. Did I mean "cancelled ?
PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?)

If the suspend was cancelled then we want the devices to simply pickup
where they were before the freeze, wereas if we really did suspend (or
migrate or whatever) then they need to do a more complete reset and
reconnect operation so we want some sort of indication to the driver
which happened.

> > (For comparison we currently have:
> > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > >         
> > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > >                         
> > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > >                         /* suspend hypercall */
> > > > >                         sysdev_resume();
> > > > >                 
> > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > >         
> > > > >         dpm_resume_end(PMSG_RESUME);
> > )
> 
> Right.  The sequence of calls is the same, but the PMSG_ argument is 
> different so drivers are expected to act differently in response.

The drivers don't actually see the PMSG_* though right? They only see a
differing sequence of hooks from dev_pm_ops called.

Thanks,
Ian.

> 
> Alan Stern
> 
> 

-- 
Ian Campbell
Current Noise: Sworn Amongst - Useless

Accident, n.:
	A condition in which presence of mind is good, but absence of
	body is better.
		-- Foolish Dictionary

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:34                   ` Ian Campbell
@ 2011-02-10 17:01                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-10 17:01 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Alan Stern, Brendan Cully, xen-devel, LKML, linux-pm, SUZUKI, Kazuhiro

On Thursday, February 10, 2011, Ian Campbell wrote:
> On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote:
> > On Thu, 10 Feb 2011, Ian Campbell wrote:
> > 
> > > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > > documentation specifically says that drivers' freeze methods are
> > > > supposed to quiesce their devices but not change power levels.  In
> > > > addition, the thaw method is invoked as part of recovery from a failed
> > > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > > seems to want.
> > > 
> > > Sounds like that would work and I would much prefer to simply make
> > > correct use of the core functionality.
> > 
> > It seems like a reasonable approach.  Whether it will actually _work_ 
> > is a harder question...  :-)
> 
> Heh.
> 
> > > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > > on whether the suspend was cancelled or not?
> > 
> > Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> > after a memory image has been restored (although this isn't relevant to
> > your snapshotting).  See the comments in include/linux/pm.h.
> 
> The documentation of the individual events in pm.h is good. Is there a
> reference for the sequence of events for the different types of
> suspend/hibernate/etc?
> 
> > >  So the sequence of events
> > > is something like:
> > > 	dpm_suspend_start(PMSG_FREEZE);
> > >          
> > > 		dpm_suspend_noirq(PMSG_FREEZE);
> > >                          
> > > 			sysdev_suspend(PMSG_QUIESCE);
> > 
> > This should say sysdev_suspend(PMSG_FREEZE).
> > 
> > > 			cancelled = suspend_hypercall()
> > 
> > At this point swsusp_arch_suspend() is called.  If that translates to 
> > suspend_hypercall() in your setting, then yes.
> > 
> > > 			sysdev_resume();
> > >                  
> > > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > >          
> > > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > > ?
> > 
> > Yes.
> 
> Both of those call ->thaw ->complete. Did I mean "cancelled ?
> PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?)
> 
> If the suspend was cancelled then we want the devices to simply pickup
> where they were before the freeze, wereas if we really did suspend (or
> migrate or whatever) then they need to do a more complete reset and
> reconnect operation so we want some sort of indication to the driver
> which happened.

In that case you should probably use PMSG_THAW (or PMSG_RECOVER) for the
"cancel" case and PMSG_RESTORE for the "success" case (pretty much what
hibernation does).

And please don't forget to update the comments in pm.h to cover your usage
case. :-)

> > > (For comparison we currently have:
> > > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > > >         
> > > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > > >                         
> > > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > > >                         /* suspend hypercall */
> > > > > >                         sysdev_resume();
> > > > > >                 
> > > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > > >         
> > > > > >         dpm_resume_end(PMSG_RESUME);
> > > )
> > 
> > Right.  The sequence of calls is the same, but the PMSG_ argument is 
> > different so drivers are expected to act differently in response.
> 
> The drivers don't actually see the PMSG_* though right? They only see a
> differing sequence of hooks from dev_pm_ops called.

That's correct.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:34                   ` Ian Campbell
  (?)
  (?)
@ 2011-02-10 17:01                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-10 17:01 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, LKML, Brendan Cully, linux-pm, SUZUKI, Kazuhiro

On Thursday, February 10, 2011, Ian Campbell wrote:
> On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote:
> > On Thu, 10 Feb 2011, Ian Campbell wrote:
> > 
> > > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > > documentation specifically says that drivers' freeze methods are
> > > > supposed to quiesce their devices but not change power levels.  In
> > > > addition, the thaw method is invoked as part of recovery from a failed
> > > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > > seems to want.
> > > 
> > > Sounds like that would work and I would much prefer to simply make
> > > correct use of the core functionality.
> > 
> > It seems like a reasonable approach.  Whether it will actually _work_ 
> > is a harder question...  :-)
> 
> Heh.
> 
> > > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > > on whether the suspend was cancelled or not?
> > 
> > Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> > after a memory image has been restored (although this isn't relevant to
> > your snapshotting).  See the comments in include/linux/pm.h.
> 
> The documentation of the individual events in pm.h is good. Is there a
> reference for the sequence of events for the different types of
> suspend/hibernate/etc?
> 
> > >  So the sequence of events
> > > is something like:
> > > 	dpm_suspend_start(PMSG_FREEZE);
> > >          
> > > 		dpm_suspend_noirq(PMSG_FREEZE);
> > >                          
> > > 			sysdev_suspend(PMSG_QUIESCE);
> > 
> > This should say sysdev_suspend(PMSG_FREEZE).
> > 
> > > 			cancelled = suspend_hypercall()
> > 
> > At this point swsusp_arch_suspend() is called.  If that translates to 
> > suspend_hypercall() in your setting, then yes.
> > 
> > > 			sysdev_resume();
> > >                  
> > > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > >          
> > > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > > ?
> > 
> > Yes.
> 
> Both of those call ->thaw ->complete. Did I mean "cancelled ?
> PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?)
> 
> If the suspend was cancelled then we want the devices to simply pickup
> where they were before the freeze, wereas if we really did suspend (or
> migrate or whatever) then they need to do a more complete reset and
> reconnect operation so we want some sort of indication to the driver
> which happened.

In that case you should probably use PMSG_THAW (or PMSG_RECOVER) for the
"cancel" case and PMSG_RESTORE for the "success" case (pretty much what
hibernation does).

And please don't forget to update the comments in pm.h to cover your usage
case. :-)

> > > (For comparison we currently have:
> > > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > > >         
> > > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > > >                         
> > > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > > >                         /* suspend hypercall */
> > > > > >                         sysdev_resume();
> > > > > >                 
> > > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > > >         
> > > > > >         dpm_resume_end(PMSG_RESUME);
> > > )
> > 
> > Right.  The sequence of calls is the same, but the PMSG_ argument is 
> > different so drivers are expected to act differently in response.
> 
> The drivers don't actually see the PMSG_* though right? They only see a
> differing sequence of hooks from dev_pm_ops called.

That's correct.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 17:01                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 58+ messages in thread
From: Rafael J. Wysocki @ 2011-02-10 17:01 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, LKML, Brendan Cully, Alan Stern, linux-pm, SUZUKI, Kazuhiro

On Thursday, February 10, 2011, Ian Campbell wrote:
> On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote:
> > On Thu, 10 Feb 2011, Ian Campbell wrote:
> > 
> > > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:
> > > > In fact there already is a "fast suspend & resume" path in the PM core.  
> > > > It's the freeze/thaw procedure used when starting to hibernate.  The
> > > > documentation specifically says that drivers' freeze methods are
> > > > supposed to quiesce their devices but not change power levels.  In
> > > > addition, the thaw method is invoked as part of recovery from a failed
> > > > hibernation attempt, so it already has the "cancel" semantics that xen 
> > > > seems to want.
> > > 
> > > Sounds like that would work and I would much prefer to simply make
> > > correct use of the core functionality.
> > 
> > It seems like a reasonable approach.  Whether it will actually _work_ 
> > is a harder question...  :-)
> 
> Heh.
> 
> > > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending
> > > on whether the suspend was cancelled or not?
> > 
> > Basically yes.  It is also "balanced" by PMSG_RESTORE, which is used
> > after a memory image has been restored (although this isn't relevant to
> > your snapshotting).  See the comments in include/linux/pm.h.
> 
> The documentation of the individual events in pm.h is good. Is there a
> reference for the sequence of events for the different types of
> suspend/hibernate/etc?
> 
> > >  So the sequence of events
> > > is something like:
> > > 	dpm_suspend_start(PMSG_FREEZE);
> > >          
> > > 		dpm_suspend_noirq(PMSG_FREEZE);
> > >                          
> > > 			sysdev_suspend(PMSG_QUIESCE);
> > 
> > This should say sysdev_suspend(PMSG_FREEZE).
> > 
> > > 			cancelled = suspend_hypercall()
> > 
> > At this point swsusp_arch_suspend() is called.  If that translates to 
> > suspend_hypercall() in your setting, then yes.
> > 
> > > 			sysdev_resume();
> > >                  
> > > 		dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > >          
> > > 	dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW);
> > > ?
> > 
> > Yes.
> 
> Both of those call ->thaw ->complete. Did I mean "cancelled ?
> PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?)
> 
> If the suspend was cancelled then we want the devices to simply pickup
> where they were before the freeze, wereas if we really did suspend (or
> migrate or whatever) then they need to do a more complete reset and
> reconnect operation so we want some sort of indication to the driver
> which happened.

In that case you should probably use PMSG_THAW (or PMSG_RECOVER) for the
"cancel" case and PMSG_RESTORE for the "success" case (pretty much what
hibernation does).

And please don't forget to update the comments in pm.h to cover your usage
case. :-)

> > > (For comparison we currently have:
> > > > > >         dpm_suspend_start(PMSG_SUSPEND);
> > > > > >         
> > > > > >                 dpm_suspend_noirq(PMSG_SUSPEND);
> > > > > >                         
> > > > > >                         sysdev_suspend(PMSG_SUSPEND);
> > > > > >                         /* suspend hypercall */
> > > > > >                         sysdev_resume();
> > > > > >                 
> > > > > >                 dpm_resume_noirq(PMSG_RESUME);
> > > > > >         
> > > > > >         dpm_resume_end(PMSG_RESUME);
> > > )
> > 
> > Right.  The sequence of calls is the same, but the PMSG_ argument is 
> > different so drivers are expected to act differently in response.
> 
> The drivers don't actually see the PMSG_* though right? They only see a
> differing sequence of hooks from dev_pm_ops called.

That's correct.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 11:31             ` Ian Campbell
@ 2011-02-10 17:53               ` Brendan Cully
  -1 siblings, 0 replies; 58+ messages in thread
From: Brendan Cully @ 2011-02-10 17:53 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, LKML, Alan, Rafael J. Wysocki, Stern, linux-pm,
	SUZUKI, Kazuhiro

On Thursday, 10 February 2011 at 11:31, Ian Campbell wrote:
> On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > I'd like to keep the fast resume option, and expect that it can be
> > contained entirely in Xen-specific code. I'll try to get someone to
> > look into it here.
> 
> Thanks, please put them in contact with Kazuhiro who has already been
> looking into this.
> 
> > I think fast resume is somewhat orthogonal to the problem of hanging
> > on resume, which just sounds like a xen-specific bug in the slow
> > path.
> 
> The bug on the Xen side is taking the slow path when the suspend was
> actually cancelled, which is what the original patch tries to fix.
> 
> Or should/could the slow path be able cope either way?

I haven't looked at what's actually happening yet, but what should be
happening is that the pvops kernel should not advertise SUSPEND_CANCEL
since it doesn't yet support it. So xm save -c should call
xc_domain_resume with the fast argument set to 0, to do slow path
resumption, and that slow path should be able to handle resuming on
the same host where it suspended. Slow path resumption from checkpoint
was supported in the pre-pvops kernel.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 11:31             ` Ian Campbell
  (?)
  (?)
@ 2011-02-10 17:53             ` Brendan Cully
  -1 siblings, 0 replies; 58+ messages in thread
From: Brendan Cully @ 2011-02-10 17:53 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Alan, LKML, linux-pm, SUZUKI, Kazuhiro

On Thursday, 10 February 2011 at 11:31, Ian Campbell wrote:
> On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > I'd like to keep the fast resume option, and expect that it can be
> > contained entirely in Xen-specific code. I'll try to get someone to
> > look into it here.
> 
> Thanks, please put them in contact with Kazuhiro who has already been
> looking into this.
> 
> > I think fast resume is somewhat orthogonal to the problem of hanging
> > on resume, which just sounds like a xen-specific bug in the slow
> > path.
> 
> The bug on the Xen side is taking the slow path when the suspend was
> actually cancelled, which is what the original patch tries to fix.
> 
> Or should/could the slow path be able cope either way?

I haven't looked at what's actually happening yet, but what should be
happening is that the pvops kernel should not advertise SUSPEND_CANCEL
since it doesn't yet support it. So xm save -c should call
xc_domain_resume with the fast argument set to 0, to do slow path
resumption, and that slow path should be able to handle resuming on
the same host where it suspended. Slow path resumption from checkpoint
was supported in the pre-pvops kernel.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 17:53               ` Brendan Cully
  0 siblings, 0 replies; 58+ messages in thread
From: Brendan Cully @ 2011-02-10 17:53 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, LKML, Alan, Rafael J. Wysocki, Stern, linux-pm,
	SUZUKI, Kazuhiro

On Thursday, 10 February 2011 at 11:31, Ian Campbell wrote:
> On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > I'd like to keep the fast resume option, and expect that it can be
> > contained entirely in Xen-specific code. I'll try to get someone to
> > look into it here.
> 
> Thanks, please put them in contact with Kazuhiro who has already been
> looking into this.
> 
> > I think fast resume is somewhat orthogonal to the problem of hanging
> > on resume, which just sounds like a xen-specific bug in the slow
> > path.
> 
> The bug on the Xen side is taking the slow path when the suspend was
> actually cancelled, which is what the original patch tries to fix.
> 
> Or should/could the slow path be able cope either way?

I haven't looked at what's actually happening yet, but what should be
happening is that the pvops kernel should not advertise SUSPEND_CANCEL
since it doesn't yet support it. So xm save -c should call
xc_domain_resume with the fast argument set to 0, to do slow path
resumption, and that slow path should be able to handle resuming on
the same host where it suspended. Slow path resumption from checkpoint
was supported in the pre-pvops kernel.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:34                   ` Ian Campbell
@ 2011-02-10 18:56                     ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-10 18:56 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Brendan Cully, xen-devel, LKML, Rafael J. Wysocki, linux-pm,
	SUZUKI, Kazuhiro

On Thu, 10 Feb 2011, Ian Campbell wrote:

> The documentation of the individual events in pm.h is good. Is there a
> reference for the sequence of events for the different types of
> suspend/hibernate/etc?

There's Documentation/power/devices.txt.

Alan Stern


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 16:34                   ` Ian Campbell
                                     ` (3 preceding siblings ...)
  (?)
@ 2011-02-10 18:56                   ` Alan Stern
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-10 18:56 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, LKML, Brendan Cully, linux-pm, SUZUKI, Kazuhiro

On Thu, 10 Feb 2011, Ian Campbell wrote:

> The documentation of the individual events in pm.h is good. Is there a
> reference for the sequence of events for the different types of
> suspend/hibernate/etc?

There's Documentation/power/devices.txt.

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
@ 2011-02-10 18:56                     ` Alan Stern
  0 siblings, 0 replies; 58+ messages in thread
From: Alan Stern @ 2011-02-10 18:56 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Brendan Cully, xen-devel, LKML, Rafael J. Wysocki, linux-pm,
	SUZUKI, Kazuhiro

On Thu, 10 Feb 2011, Ian Campbell wrote:

> The documentation of the individual events in pm.h is good. Is there a
> reference for the sequence of events for the different types of
> suspend/hibernate/etc?

There's Documentation/power/devices.txt.

Alan Stern

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 12:40             ` Ian Campbell
@ 2011-02-10 19:31               ` Brendan Cully
  2011-02-11  9:14                 ` Ian Campbell
  2011-02-11  9:51                 ` Ian Campbell
  0 siblings, 2 replies; 58+ messages in thread
From: Brendan Cully @ 2011-02-10 19:31 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Shriram Rajagopalan, Andrew Warfield, xen-devel

On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote:
> (cutting CC to just xen-devel)
> 
> On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:
> > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > > I'd like to keep the fast resume option, and expect that it can be
> > > contained entirely in Xen-specific code. I'll try to get someone to
> > > look into it here.
> 
> On that note: Is there someone around who is willing to more actively
> and visibly maintain Remus? i.e. make it work with the new toolstack,
> upstream it to the mainline kernel, work to get it into a state where it
> can be tested as a matter of course, and generally be seen to be
> improving it (and its integration/interaction with the rest of the
> ecosystem) with time?
> 
> At the moment the impression is that Remus is mostly being left to rot
> apart from when noise gets made about specific issues.

I'd like to introduce Shriram Rajagopalan, another PhD student here at
UBC who has been doing a lot of work with Remus internally. Up to this
point he's concentrated on new features that aren't quite ready for
upstream yet: checkpoint compression, DRBD integration, and failback
(live resync back to the primary after it has recovered from
failure).

He's also started doing maintenance work for his own purposes that
would be beneficial for Xen: building a regression test suite, working
on pvops support, and planning xl integration. He's excited to take on
a maintenance role, and ready to put together a plan for upstreaming
his work.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 19:31               ` Brendan Cully
@ 2011-02-11  9:14                 ` Ian Campbell
  2011-02-11  9:37                   ` Pasi Kärkkäinen
  2011-02-11  9:51                 ` Ian Campbell
  1 sibling, 1 reply; 58+ messages in thread
From: Ian Campbell @ 2011-02-11  9:14 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Rajagopalan, Andrew Warfield, xen-devel, Shriram

On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote:
> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote:
> > (cutting CC to just xen-devel)
> > 
> > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:
> > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > > > I'd like to keep the fast resume option, and expect that it can be
> > > > contained entirely in Xen-specific code. I'll try to get someone to
> > > > look into it here.
> > 
> > On that note: Is there someone around who is willing to more actively
> > and visibly maintain Remus? i.e. make it work with the new toolstack,
> > upstream it to the mainline kernel, work to get it into a state where it
> > can be tested as a matter of course, and generally be seen to be
> > improving it (and its integration/interaction with the rest of the
> > ecosystem) with time?
> > 
> > At the moment the impression is that Remus is mostly being left to rot
> > apart from when noise gets made about specific issues.
> 
> I'd like to introduce Shriram Rajagopalan, another PhD student here at
> UBC who has been doing a lot of work with Remus internally. Up to this
> point he's concentrated on new features that aren't quite ready for
> upstream yet: checkpoint compression, DRBD integration, and failback
> (live resync back to the primary after it has recovered from
> failure).
> 
> He's also started doing maintenance work for his own purposes that
> would be beneficial for Xen: building a regression test suite, working
> on pvops support, and planning xl integration. He's excited to take on
> a maintenance role, and ready to put together a plan for upstreaming
> his work.

Awesome! Welcome Shriram!

Ian.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-11  9:14                 ` Ian Campbell
@ 2011-02-11  9:37                   ` Pasi Kärkkäinen
  0 siblings, 0 replies; 58+ messages in thread
From: Pasi Kärkkäinen @ 2011-02-11  9:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Rajagopalan, Brendan Cully, Shriram, xen-devel, Andrew Warfield

On Fri, Feb 11, 2011 at 09:14:09AM +0000, Ian Campbell wrote:
> On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote:
> > On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote:
> > > (cutting CC to just xen-devel)
> > > 
> > > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:
> > > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > > > > I'd like to keep the fast resume option, and expect that it can be
> > > > > contained entirely in Xen-specific code. I'll try to get someone to
> > > > > look into it here.
> > > 
> > > On that note: Is there someone around who is willing to more actively
> > > and visibly maintain Remus? i.e. make it work with the new toolstack,
> > > upstream it to the mainline kernel, work to get it into a state where it
> > > can be tested as a matter of course, and generally be seen to be
> > > improving it (and its integration/interaction with the rest of the
> > > ecosystem) with time?
> > > 
> > > At the moment the impression is that Remus is mostly being left to rot
> > > apart from when noise gets made about specific issues.
> > 
> > I'd like to introduce Shriram Rajagopalan, another PhD student here at
> > UBC who has been doing a lot of work with Remus internally. Up to this
> > point he's concentrated on new features that aren't quite ready for
> > upstream yet: checkpoint compression, DRBD integration, and failback
> > (live resync back to the primary after it has recovered from
> > failure).
> > 
> > He's also started doing maintenance work for his own purposes that
> > would be beneficial for Xen: building a regression test suite, working
> > on pvops support, and planning xl integration. He's excited to take on
> > a maintenance role, and ready to put together a plan for upstreaming
> > his work.
> 
> Awesome! Welcome Shriram!
> 

Indeed, very much welcome! Remus is discussed almost every day
on ##xen on irc, many people seem to want to use it!

-- Pasi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-10 19:31               ` Brendan Cully
  2011-02-11  9:14                 ` Ian Campbell
@ 2011-02-11  9:51                 ` Ian Campbell
  2011-02-11 18:13                   ` Shriram Rajagopalan
  1 sibling, 1 reply; 58+ messages in thread
From: Ian Campbell @ 2011-02-11  9:51 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Rajagopalan, Andrew Warfield, xen-devel, Shriram

On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote:
> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote:
> > (cutting CC to just xen-devel)
> > 
> > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:
> > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> > > > I'd like to keep the fast resume option, and expect that it can be
> > > > contained entirely in Xen-specific code. I'll try to get someone to
> > > > look into it here.
> > 
> > On that note: Is there someone around who is willing to more actively
> > and visibly maintain Remus? i.e. make it work with the new toolstack,
> > upstream it to the mainline kernel, work to get it into a state where it
> > can be tested as a matter of course, and generally be seen to be
> > improving it (and its integration/interaction with the rest of the
> > ecosystem) with time?
> > 
> > At the moment the impression is that Remus is mostly being left to rot
> > apart from when noise gets made about specific issues.
> 
> I'd like to introduce Shriram Rajagopalan,[...] He's excited to take on
> a maintenance role, and ready to put together a plan for upstreaming
> his work.

Shriram, are you happy to add your acked-by to the following?

I mostly guessed the most obvious path for the F: tag, feel free to
extend (I presume there are bits under tools/python too?). If you have a
Remus tree somewhere you can add a T: label too if you like. We can get
you a tree on xenbis.xen.org if you want.

Ian.

8<-------------------

MAINTAINERS: Add Remus maintainer.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff -r 9e463cb15658 MAINTAINERS
--- a/MAINTAINERS	Mon Feb 07 17:02:46 2011 +0000
+++ b/MAINTAINERS	Fri Feb 11 09:46:38 2011 +0000
@@ -158,6 +158,11 @@ S:	Supported
 S:	Supported
 T:	git git://xenbits.xen.org/qemu-xen-*.git
 
+REMUS
+M:	Shriram Rajagopalan <rshriram@cs.ubc.ca>
+S:	Maintained
+F:	tools/remus/
+
 SCHEDULING
 M:	George Dunlap <george.dunlap@eu.citrix.com>
 S:	Supported

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-11  9:51                 ` Ian Campbell
@ 2011-02-11 18:13                   ` Shriram Rajagopalan
  2011-02-14  9:15                     ` Ian Campbell
  0 siblings, 1 reply; 58+ messages in thread
From: Shriram Rajagopalan @ 2011-02-11 18:13 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andrew Warfield, Brendan Cully, xen-devel

On Fri, Feb 11, 2011 at 1:51 AM, Ian Campbell
<Ian.Campbell@eu.citrix.com> wrote:
> On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote:
>> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote:
>> > (cutting CC to just xen-devel)
>> >
>> > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:
>> > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
>> > > > I'd like to keep the fast resume option, and expect that it can be
>> > > > contained entirely in Xen-specific code. I'll try to get someone to
>> > > > look into it here.
>> >
>> > On that note: Is there someone around who is willing to more actively
>> > and visibly maintain Remus? i.e. make it work with the new toolstack,
>> > upstream it to the mainline kernel, work to get it into a state where it
>> > can be tested as a matter of course, and generally be seen to be
>> > improving it (and its integration/interaction with the rest of the
>> > ecosystem) with time?
>> >
>> > At the moment the impression is that Remus is mostly being left to rot
>> > apart from when noise gets made about specific issues.
>>
>> I'd like to introduce Shriram Rajagopalan,[...] He's excited to take on
>> a maintenance role, and ready to put together a plan for upstreaming
>> his work.
>
> Shriram, are you happy to add your acked-by to the following?
>
> I mostly guessed the most obvious path for the F: tag, feel free to
> extend (I presume there are bits under tools/python too?). If you have a
> Remus tree somewhere you can add a T: label too if you like. We can get
> you a tree on xenbis.xen.org if you want.
>
> Ian.
>
> 8<-------------------
>
> MAINTAINERS: Add Remus maintainer.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>
> diff -r 9e463cb15658 MAINTAINERS
> --- a/MAINTAINERS       Mon Feb 07 17:02:46 2011 +0000
> +++ b/MAINTAINERS       Fri Feb 11 09:46:38 2011 +0000
> @@ -158,6 +158,11 @@ S: Supported
>  S:     Supported
>  T:     git git://xenbits.xen.org/qemu-xen-*.git
>
> +REMUS
> +M:     Shriram Rajagopalan <rshriram@cs.ubc.ca>
> +S:     Maintained
> +F:     tools/remus/
> +
>  SCHEDULING
>  M:     George Dunlap <george.dunlap@eu.citrix.com>
>  S:     Supported
>
>
>

Yep. You can add the acked-by.
About the F: tag, the following paths contain remus code
  tools/remus/
  tools/python/xen/remus/
  tools/python/xen/lowlevel/checkpoint/
  tools/blktap2/drivers/

The blktap2/drivers path has only few files (block-remus.c, hashtable* files)

About the tree, it would be great if I could get a tree. :)

shriram

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-11 18:13                   ` Shriram Rajagopalan
@ 2011-02-14  9:15                     ` Ian Campbell
  2011-02-14  9:27                       ` Ian Campbell
  0 siblings, 1 reply; 58+ messages in thread
From: Ian Campbell @ 2011-02-14  9:15 UTC (permalink / raw)
  To: rshriram; +Cc: Andrew Warfield, Brendan Cully, xen-devel

On Fri, 2011-02-11 at 18:13 +0000, Shriram Rajagopalan wrote:
> On Fri, Feb 11, 2011 at 1:51 AM, Ian Campbell
> <Ian.Campbell@eu.citrix.com> wrote:
> > On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote:
> >> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote:
> >> > (cutting CC to just xen-devel)
> >> >
> >> > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:
> >> > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:
> >> > > > I'd like to keep the fast resume option, and expect that it can be
> >> > > > contained entirely in Xen-specific code. I'll try to get someone to
> >> > > > look into it here.
> >> >
> >> > On that note: Is there someone around who is willing to more actively
> >> > and visibly maintain Remus? i.e. make it work with the new toolstack,
> >> > upstream it to the mainline kernel, work to get it into a state where it
> >> > can be tested as a matter of course, and generally be seen to be
> >> > improving it (and its integration/interaction with the rest of the
> >> > ecosystem) with time?
> >> >
> >> > At the moment the impression is that Remus is mostly being left to rot
> >> > apart from when noise gets made about specific issues.
> >>
> >> I'd like to introduce Shriram Rajagopalan,[...] He's excited to take on
> >> a maintenance role, and ready to put together a plan for upstreaming
> >> his work.
> >
> > Shriram, are you happy to add your acked-by to the following?
> >[...]
> Yep. You can add the acked-by.
> About the F: tag, the following paths contain remus code
>   tools/remus/
>   tools/python/xen/remus/
>   tools/python/xen/lowlevel/checkpoint/
>   tools/blktap2/drivers/
> 
> The blktap2/drivers path has only few files (block-remus.c, hashtable* files)

Thanks, I'll resend the patch with this update and your ack.

> About the tree, it would be great if I could get a tree. :)

I've dropped you and the xenbits admins an email offline about this.

Ian.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
  2011-02-14  9:15                     ` Ian Campbell
@ 2011-02-14  9:27                       ` Ian Campbell
  0 siblings, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-14  9:27 UTC (permalink / raw)
  To: rshriram; +Cc: Andrew Warfield, Brendan Cully, xen-devel

On Mon, 2011-02-14 at 09:15 +0000, Ian Campbell wrote:
> On Fri, 2011-02-11 at 18:13 +0000, Shriram Rajagopalan wrote:
> > About the F: tag, the following paths contain remus code
> >   tools/remus/
> >   tools/python/xen/remus/
> >   tools/python/xen/lowlevel/checkpoint/
> >   tools/blktap2/drivers/
> > 
> > The blktap2/drivers path has only few files (block-remus.c, hashtable* files)
> 
> Thanks, I'll resend the patch with this update and your ack.

Keir beat me to the punch with the original patch. I've sent an update
with the extra paths.

Ian.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] update comments in pm.h describing Xen Guest save/restore/checkpoint use case
  2011-02-10 17:01                     ` Rafael J. Wysocki
  (?)
@ 2011-02-17  7:56                     ` Shriram Rajagopalan
  -1 siblings, 0 replies; 58+ messages in thread
From: Shriram Rajagopalan @ 2011-02-17  7:56 UTC (permalink / raw)
  To: linux-pm; +Cc: xen-devel

Add documentation to pm.h on how xen uses PM events (freeze,
thaw, restore) to implement Guest VM save/checkpoint/restore
functionality.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 include/linux/pm.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/include/linux/pm.h b/include/linux/pm.h
index dd9c7ab..2ddd9d3 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -516,6 +516,25 @@ extern void update_pm_runtime_accounting(struct device *dev);
  * well as during system sleep states like PM_SUSPEND_STANDBY.  They may
  * be able to use wakeup events to exit from runtime low-power states,
  * or from system low-power states such as standby or suspend-to-RAM.
+ *
+ * Xen Guest Kernels use PM_FREEZE, PM_RESTORE and PM_THAW to implement
+ * VM save/restore/checkpoint functionality. Save and Restore are somewhat
+ * similar to hibernate functionality. The sequence of events is shown below:
+ *        dpm_suspend_start(PMSG_FREEZE);
+ *
+ *              dpm_suspend_noirq(PMSG_FREEZE);
+ *
+ *                     sysdev_suspend(PMSG_FREEZE);
+ *                     cancelled = suspend_hypercall()
+ *                     sysdev_resume();
+ *
+ *             dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE);
+ *
+ *     dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE);
+ *
+ * If the syspend_hypercall returns 1, it means that the VM was merely
+ * checkpointed (akin to THAW). If it returns 0, it means the system has been
+ * fully restored from its on-disk snapshot (akin to RESTORE).
  */
 
 #ifdef CONFIG_PM_SLEEP
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH] update comments in pm.h describing Xen Guest save/restore/checkpoint use case
  2011-02-10 17:01                     ` Rafael J. Wysocki
  (?)
  (?)
@ 2011-02-17  7:56                     ` Shriram Rajagopalan
  2011-02-17 10:56                       ` [Xen-devel] " Ian Campbell
  2011-02-17 10:56                       ` Ian Campbell
  -1 siblings, 2 replies; 58+ messages in thread
From: Shriram Rajagopalan @ 2011-02-17  7:56 UTC (permalink / raw)
  To: linux-pm; +Cc: rjw, xen-devel

Add documentation to pm.h on how xen uses PM events (freeze,
thaw, restore) to implement Guest VM save/checkpoint/restore
functionality.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
 include/linux/pm.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/include/linux/pm.h b/include/linux/pm.h
index dd9c7ab..2ddd9d3 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -516,6 +516,25 @@ extern void update_pm_runtime_accounting(struct device *dev);
  * well as during system sleep states like PM_SUSPEND_STANDBY.  They may
  * be able to use wakeup events to exit from runtime low-power states,
  * or from system low-power states such as standby or suspend-to-RAM.
+ *
+ * Xen Guest Kernels use PM_FREEZE, PM_RESTORE and PM_THAW to implement
+ * VM save/restore/checkpoint functionality. Save and Restore are somewhat
+ * similar to hibernate functionality. The sequence of events is shown below:
+ *        dpm_suspend_start(PMSG_FREEZE);
+ *
+ *              dpm_suspend_noirq(PMSG_FREEZE);
+ *
+ *                     sysdev_suspend(PMSG_FREEZE);
+ *                     cancelled = suspend_hypercall()
+ *                     sysdev_resume();
+ *
+ *             dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE);
+ *
+ *     dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE);
+ *
+ * If the syspend_hypercall returns 1, it means that the VM was merely
+ * checkpointed (akin to THAW). If it returns 0, it means the system has been
+ * fully restored from its on-disk snapshot (akin to RESTORE).
  */
 
 #ifdef CONFIG_PM_SLEEP
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Xen-devel] [PATCH] update comments in pm.h describing Xen Guest save/restore/checkpoint use case
  2011-02-17  7:56                     ` Shriram Rajagopalan
@ 2011-02-17 10:56                       ` Ian Campbell
  2011-02-17 10:56                       ` Ian Campbell
  1 sibling, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-17 10:56 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: linux-pm, xen-devel

On Thu, 2011-02-17 at 07:56 +0000, Shriram Rajagopalan wrote:
> Add documentation to pm.h on how xen uses PM events (freeze,
> thaw, restore) to implement Guest VM save/checkpoint/restore
> functionality.

The change to freeze/thaw/restore instead of suspend/resume raises the
question of what the correct .config option for Xen to key off is.

Currently the Xen suspend functionality in drivers/xen/manage.c is keyed
off CONFIG_PM_SLEEP (which depends on SUSPEND || HIBERNATION ||
XEN_SAVE_RESTORE).

Since PMSG_{SUSPEND,RESUME} are covered by CONFIG_SUSPEND in pm_op()
this already seems incorrect before this change, since PM_SLEEP can be
enabled without SUSPEND, and is equally incorrect after changing to
PMSG_{FREEZE,THAW,RESUME}, which are covered by CONFIG_HIBERNATION.

Is the correct fix to update pm_op to include
	|| defined(CONFIG_XEN_SAVE_RESTORE)
where appropriate in pm_op()? 

Or should we be looking to adjust the Kconfig and/or code on the Xen
side? Having Xen depend on HIBERNATION ="Hibernation (aka 'suspend to
disk')" seems semantically incorrect.

Ian.

> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> ---
>  include/linux/pm.h |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index dd9c7ab..2ddd9d3 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -516,6 +516,25 @@ extern void update_pm_runtime_accounting(struct device *dev);
>   * well as during system sleep states like PM_SUSPEND_STANDBY.  They may
>   * be able to use wakeup events to exit from runtime low-power states,
>   * or from system low-power states such as standby or suspend-to-RAM.
> + *
> + * Xen Guest Kernels use PM_FREEZE, PM_RESTORE and PM_THAW to implement
> + * VM save/restore/checkpoint functionality. Save and Restore are somewhat
> + * similar to hibernate functionality. The sequence of events is shown below:
> + *        dpm_suspend_start(PMSG_FREEZE);
> + *
> + *              dpm_suspend_noirq(PMSG_FREEZE);
> + *
> + *                     sysdev_suspend(PMSG_FREEZE);
> + *                     cancelled = suspend_hypercall()
> + *                     sysdev_resume();
> + *
> + *             dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE);
> + *
> + *     dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE);
> + *
> + * If the syspend_hypercall returns 1, it means that the VM was merely
> + * checkpointed (akin to THAW). If it returns 0, it means the system has been
> + * fully restored from its on-disk snapshot (akin to RESTORE).
>   */
>  
>  #ifdef CONFIG_PM_SLEEP

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] update comments in pm.h describing Xen Guest save/restore/checkpoint use case
  2011-02-17  7:56                     ` Shriram Rajagopalan
  2011-02-17 10:56                       ` [Xen-devel] " Ian Campbell
@ 2011-02-17 10:56                       ` Ian Campbell
  1 sibling, 0 replies; 58+ messages in thread
From: Ian Campbell @ 2011-02-17 10:56 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: rjw, linux-pm, xen-devel

On Thu, 2011-02-17 at 07:56 +0000, Shriram Rajagopalan wrote:
> Add documentation to pm.h on how xen uses PM events (freeze,
> thaw, restore) to implement Guest VM save/checkpoint/restore
> functionality.

The change to freeze/thaw/restore instead of suspend/resume raises the
question of what the correct .config option for Xen to key off is.

Currently the Xen suspend functionality in drivers/xen/manage.c is keyed
off CONFIG_PM_SLEEP (which depends on SUSPEND || HIBERNATION ||
XEN_SAVE_RESTORE).

Since PMSG_{SUSPEND,RESUME} are covered by CONFIG_SUSPEND in pm_op()
this already seems incorrect before this change, since PM_SLEEP can be
enabled without SUSPEND, and is equally incorrect after changing to
PMSG_{FREEZE,THAW,RESUME}, which are covered by CONFIG_HIBERNATION.

Is the correct fix to update pm_op to include
	|| defined(CONFIG_XEN_SAVE_RESTORE)
where appropriate in pm_op()? 

Or should we be looking to adjust the Kconfig and/or code on the Xen
side? Having Xen depend on HIBERNATION ="Hibernation (aka 'suspend to
disk')" seems semantically incorrect.

Ian.

> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> ---
>  include/linux/pm.h |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index dd9c7ab..2ddd9d3 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -516,6 +516,25 @@ extern void update_pm_runtime_accounting(struct device *dev);
>   * well as during system sleep states like PM_SUSPEND_STANDBY.  They may
>   * be able to use wakeup events to exit from runtime low-power states,
>   * or from system low-power states such as standby or suspend-to-RAM.
> + *
> + * Xen Guest Kernels use PM_FREEZE, PM_RESTORE and PM_THAW to implement
> + * VM save/restore/checkpoint functionality. Save and Restore are somewhat
> + * similar to hibernate functionality. The sequence of events is shown below:
> + *        dpm_suspend_start(PMSG_FREEZE);
> + *
> + *              dpm_suspend_noirq(PMSG_FREEZE);
> + *
> + *                     sysdev_suspend(PMSG_FREEZE);
> + *                     cancelled = suspend_hypercall()
> + *                     sysdev_resume();
> + *
> + *             dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE);
> + *
> + *     dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE);
> + *
> + * If the syspend_hypercall returns 1, it means that the VM was merely
> + * checkpointed (akin to THAW). If it returns 0, it means the system has been
> + * fully restored from its on-disk snapshot (akin to RESTORE).
>   */
>  
>  #ifdef CONFIG_PM_SLEEP

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2011-02-17 10:56 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-07  9:07 [PATCH 0/2] Fix hangup after creating checkpoint on Xen SUZUKI, Kazuhiro
2011-02-07  9:08 ` [PATCH 1/2] " SUZUKI, Kazuhiro
2011-02-07  9:08 ` SUZUKI, Kazuhiro
2011-02-07  9:08 ` [PATCH 2/2] " SUZUKI, Kazuhiro
2011-02-07  9:08 ` SUZUKI, Kazuhiro
2011-02-07  9:35 ` [PATCH 0/2] " Rafael J. Wysocki
2011-02-08 11:22   ` Ian Campbell
2011-02-08 11:22     ` Ian Campbell
2011-02-08 16:46     ` Alan Stern
2011-02-08 16:46     ` [linux-pm] " Alan Stern
2011-02-08 16:46       ` Alan Stern
2011-02-08 17:35       ` Ian Campbell
2011-02-08 17:35         ` Ian Campbell
2011-02-09 23:16         ` Brendan Cully
2011-02-09 23:16           ` Brendan Cully
2011-02-09 23:42           ` Alan Stern
2011-02-09 23:42             ` Alan Stern
2011-02-10 11:40             ` [Xen-devel] " Ian Campbell
2011-02-10 11:40             ` [Xen-devel] Re: [linux-pm] " Ian Campbell
2011-02-10 11:40               ` Ian Campbell
2011-02-10 16:00               ` [Xen-devel] " Alan Stern
2011-02-10 16:00               ` [Xen-devel] Re: [linux-pm] " Alan Stern
2011-02-10 16:00                 ` Alan Stern
2011-02-10 16:26                 ` [Xen-devel] " Rafael J. Wysocki
2011-02-10 16:26                   ` Rafael J. Wysocki
2011-02-10 16:26                 ` [Xen-devel] " Rafael J. Wysocki
2011-02-10 16:34                 ` Ian Campbell
2011-02-10 16:34                 ` [Xen-devel] Re: [linux-pm] " Ian Campbell
2011-02-10 16:34                   ` Ian Campbell
2011-02-10 17:01                   ` [Xen-devel] " Rafael J. Wysocki
2011-02-10 17:01                     ` Rafael J. Wysocki
2011-02-17  7:56                     ` [PATCH] update comments in pm.h describing Xen Guest save/restore/checkpoint use case Shriram Rajagopalan
2011-02-17  7:56                     ` Shriram Rajagopalan
2011-02-17 10:56                       ` [Xen-devel] " Ian Campbell
2011-02-17 10:56                       ` Ian Campbell
2011-02-10 17:01                   ` [Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen Rafael J. Wysocki
2011-02-10 18:56                   ` [Xen-devel] Re: [linux-pm] " Alan Stern
2011-02-10 18:56                     ` Alan Stern
2011-02-10 18:56                   ` [Xen-devel] " Alan Stern
2011-02-09 23:42           ` Alan Stern
2011-02-10 11:31           ` [Xen-devel] Re: [linux-pm] " Ian Campbell
2011-02-10 11:31             ` Ian Campbell
2011-02-10 12:40             ` Ian Campbell
2011-02-10 19:31               ` Brendan Cully
2011-02-11  9:14                 ` Ian Campbell
2011-02-11  9:37                   ` Pasi Kärkkäinen
2011-02-11  9:51                 ` Ian Campbell
2011-02-11 18:13                   ` Shriram Rajagopalan
2011-02-14  9:15                     ` Ian Campbell
2011-02-14  9:27                       ` Ian Campbell
2011-02-10 17:53             ` [Xen-devel] " Brendan Cully
2011-02-10 17:53             ` [Xen-devel] Re: [linux-pm] " Brendan Cully
2011-02-10 17:53               ` Brendan Cully
2011-02-10 11:31           ` [Xen-devel] " Ian Campbell
2011-02-09 23:16         ` Brendan Cully
2011-02-08 17:35       ` Ian Campbell
2011-02-08 11:22   ` Ian Campbell
2011-02-07  9:35 ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.