All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
@ 2009-02-22 17:37 ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

Hi,

The following two patches modify the way in which we handle disabling
interrupts during suspend and enabling them during resume.  Namely, currently
interrupts are disabled on the boot CPU as soon as the nonboot CPUs have been
disabled, which doesn't allow device drivers' "late" suspend and "early" resume
callbacks to sleep.  Among other things this means they cannot execute ACPI
AML routines, which leads to problems with suspend-resume of PCI devices,
as recently discussed on this list.

1/2 is based on an earlier patch from Linus and it only splits up
sysdev_[suspend|resume] from the ["late suspend|"early" resume'] of devices.

2/2 actually modifies the [suspend|hibernation] and resume code, as well as the
other code using the device PM framework.

The patches have been initially tested and they don't appear to break suspend
on my boxes, but this is the first approximation only.  In particular, I'm not sure
if I did the XEN, kexec and APM parts right, so people with experience in these
areas are gently requested to have a look and tell me if there's anything to
fix in there.

Moreover, the real purpose of these changes is to be able to execute the
"late" suspend and "early" resume device callbacks with timer interrupts
enabled, so that they can use mutexes etc.  However, x86 currently doesn't set
the IRQF_TIMER flag and I need to make it do so before going further in this
direction and changing the PCI PM framework to take advantage of the $subject
changes, for example.  So, I need to know how to modify x86 timer code so that
the IRQF_TIMER flag is set by it.

Comments welcome.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
@ 2009-02-22 17:37 ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 17:37 UTC (permalink / raw)
  To: LKML
  Cc: Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

Hi,

The following two patches modify the way in which we handle disabling
interrupts during suspend and enabling them during resume.  Namely, currently
interrupts are disabled on the boot CPU as soon as the nonboot CPUs have been
disabled, which doesn't allow device drivers' "late" suspend and "early" resume
callbacks to sleep.  Among other things this means they cannot execute ACPI
AML routines, which leads to problems with suspend-resume of PCI devices,
as recently discussed on this list.

1/2 is based on an earlier patch from Linus and it only splits up
sysdev_[suspend|resume] from the ["late suspend|"early" resume'] of devices.

2/2 actually modifies the [suspend|hibernation] and resume code, as well as the
other code using the device PM framework.

The patches have been initially tested and they don't appear to break suspend
on my boxes, but this is the first approximation only.  In particular, I'm not sure
if I did the XEN, kexec and APM parts right, so people with experience in these
areas are gently requested to have a look and tell me if there's anything to
fix in there.

Moreover, the real purpose of these changes is to be able to execute the
"late" suspend and "early" resume device callbacks with timer interrupts
enabled, so that they can use mutexes etc.  However, x86 currently doesn't set
the IRQF_TIMER flag and I need to make it do so before going further in this
direction and changing the PCI PM framework to take advantage of the $subject
changes, for example.  So, I need to know how to modify x86 timer code so that
the IRQF_TIMER flag is set by it.

Comments welcome.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 17:37 ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-02-22 17:38 ` Rafael J. Wysocki
  2009-02-22 20:56   ` Adrian Bunk
                     ` (3 more replies)
  -1 siblings, 4 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 17:38 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.

This is based on an earlier patch from Linus.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |    4 ++++
 drivers/base/base.h       |    2 --
 drivers/base/power/main.c |    3 ---
 drivers/xen/manage.c      |    8 ++++++++
 include/linux/pm.h        |    2 ++
 kernel/kexec.c            |    7 +++++++
 kernel/power/disk.c       |   11 +++++++++++
 kernel/power/main.c       |    8 ++++++--
 8 files changed, 38 insertions(+), 7 deletions(-)

Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1192,6 +1192,7 @@ static int suspend(int vetoable)
 	device_suspend(PMSG_SUSPEND);
 	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
 
@@ -1208,6 +1209,7 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+	sysdev_resume();
 	device_power_up(PMSG_RESUME);
 	local_irq_enable();
 	device_resume(PMSG_RESUME);
@@ -1228,6 +1230,7 @@ static void standby(void)
 
 	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
 	err = set_system_power_state(APM_STATE_STANDBY);
@@ -1235,6 +1238,7 @@ static void standby(void)
 		apm_error("standby", err);
 
 	local_irq_disable();
+	sysdev_resume();
 	device_power_up(PMSG_RESUME);
 	local_irq_enable();
 }
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -333,7 +333,6 @@ static void dpm_power_up(pm_message_t st
  */
 void device_power_up(pm_message_t state)
 {
-	sysdev_resume();
 	dpm_power_up(state);
 }
 EXPORT_SYMBOL_GPL(device_power_up);
@@ -577,8 +576,6 @@ int device_power_down(pm_message_t state
 		}
 		dev->power.status = DPM_OFF_IRQ;
 	}
-	if (!error)
-		error = sysdev_suspend(state);
 	if (error)
 		dpm_power_up(resume_event(state));
 	return error;
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -45,6 +45,13 @@ static int xen_suspend(void *data)
 		       err);
 		return err;
 	}
+	err = sysdev_suspend(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
+			err);
+		device_power_up(PMSG_RESUME);
+		return err;
+	}
 
 	xen_mm_pin_all();
 	gnttab_suspend();
@@ -61,6 +68,7 @@ static int xen_suspend(void *data)
 	gnttab_resume();
 	xen_mm_unpin_all();
 
+	sysdev_resume();
 	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1465,6 +1465,11 @@ int kernel_kexec(void)
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
 			goto Enable_irqs;
+
+		/* Suspend system devices */
+		error = sysdev_suspend(PMSG_FREEZE);
+		if (error)
+			goto Power_up_devices;
 	} else
 #endif
 	{
@@ -1477,6 +1482,8 @@ int kernel_kexec(void)
 
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
+		sysdev_resume();
+ Power_up_devices:
 		device_power_up(PMSG_RESTORE);
  Enable_irqs:
 		local_irq_enable();
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -227,6 +227,12 @@ static int create_image(int platform_mod
 			"aborting hibernation\n");
 		goto Enable_irqs;
 	}
+	sysdev_suspend(PMSG_FREEZE);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down, "
+			"aborting hibernation\n");
+		goto Power_up_devices;
+	}
 
 	if (hibernation_test(TEST_CORE))
 		goto Power_up;
@@ -242,9 +248,11 @@ static int create_image(int platform_mod
 	if (!in_suspend)
 		platform_leave(platform_mode);
  Power_up:
+	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+ Power_up_devices:
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
  Enable_irqs:
@@ -335,6 +343,7 @@ static int resume_target_kernel(void)
 			"aborting resume\n");
 		goto Enable_irqs;
 	}
+	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -357,6 +366,7 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+	sysdev_resume();
 	device_power_up(PMSG_RECOVER);
  Enable_irqs:
 	local_irq_enable();
@@ -440,6 +450,7 @@ int hibernation_platform_enter(void)
 	local_irq_disable();
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -298,8 +298,12 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
-	if (!suspend_test(TEST_CORE))
-		error = suspend_ops->enter(state);
+	error = sysdev_suspend(PMSG_SUSPEND);
+	if (!error) {
+		if (!suspend_test(TEST_CORE))
+			error = suspend_ops->enter(state);
+		sysdev_resume();
+	}
 
 	device_power_up(PMSG_RESUME);
  Done:
Index: linux-2.6/drivers/base/base.h
===================================================================
--- linux-2.6.orig/drivers/base/base.h
+++ linux-2.6/drivers/base/base.h
@@ -88,8 +88,6 @@ extern void driver_detach(struct device_
 extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 
 extern void sysdev_shutdown(void);
-extern int sysdev_suspend(pm_message_t state);
-extern int sysdev_resume(void);
 
 extern char *make_class_name(const char *name, struct kobject *kobj);
 
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -381,10 +381,12 @@ struct dev_pm_info {
 
 #ifdef CONFIG_PM_SLEEP
 extern void device_pm_lock(void);
+extern int sysdev_resume(void);
 extern void device_power_up(pm_message_t state);
 extern void device_resume(pm_message_t state);
 
 extern void device_pm_unlock(void);
+extern int sysdev_suspend(pm_message_t state);
 extern int device_power_down(pm_message_t state);
 extern int device_suspend(pm_message_t state);
 extern int device_prepare_suspend(pm_message_t state);


^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 17:37 ` Rafael J. Wysocki
  (?)
@ 2009-02-22 17:38 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 17:38 UTC (permalink / raw)
  To: LKML
  Cc: Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.

This is based on an earlier patch from Linus.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |    4 ++++
 drivers/base/base.h       |    2 --
 drivers/base/power/main.c |    3 ---
 drivers/xen/manage.c      |    8 ++++++++
 include/linux/pm.h        |    2 ++
 kernel/kexec.c            |    7 +++++++
 kernel/power/disk.c       |   11 +++++++++++
 kernel/power/main.c       |    8 ++++++--
 8 files changed, 38 insertions(+), 7 deletions(-)

Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1192,6 +1192,7 @@ static int suspend(int vetoable)
 	device_suspend(PMSG_SUSPEND);
 	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
 
@@ -1208,6 +1209,7 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+	sysdev_resume();
 	device_power_up(PMSG_RESUME);
 	local_irq_enable();
 	device_resume(PMSG_RESUME);
@@ -1228,6 +1230,7 @@ static void standby(void)
 
 	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
 	err = set_system_power_state(APM_STATE_STANDBY);
@@ -1235,6 +1238,7 @@ static void standby(void)
 		apm_error("standby", err);
 
 	local_irq_disable();
+	sysdev_resume();
 	device_power_up(PMSG_RESUME);
 	local_irq_enable();
 }
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -333,7 +333,6 @@ static void dpm_power_up(pm_message_t st
  */
 void device_power_up(pm_message_t state)
 {
-	sysdev_resume();
 	dpm_power_up(state);
 }
 EXPORT_SYMBOL_GPL(device_power_up);
@@ -577,8 +576,6 @@ int device_power_down(pm_message_t state
 		}
 		dev->power.status = DPM_OFF_IRQ;
 	}
-	if (!error)
-		error = sysdev_suspend(state);
 	if (error)
 		dpm_power_up(resume_event(state));
 	return error;
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -45,6 +45,13 @@ static int xen_suspend(void *data)
 		       err);
 		return err;
 	}
+	err = sysdev_suspend(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
+			err);
+		device_power_up(PMSG_RESUME);
+		return err;
+	}
 
 	xen_mm_pin_all();
 	gnttab_suspend();
@@ -61,6 +68,7 @@ static int xen_suspend(void *data)
 	gnttab_resume();
 	xen_mm_unpin_all();
 
+	sysdev_resume();
 	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1465,6 +1465,11 @@ int kernel_kexec(void)
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
 			goto Enable_irqs;
+
+		/* Suspend system devices */
+		error = sysdev_suspend(PMSG_FREEZE);
+		if (error)
+			goto Power_up_devices;
 	} else
 #endif
 	{
@@ -1477,6 +1482,8 @@ int kernel_kexec(void)
 
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
+		sysdev_resume();
+ Power_up_devices:
 		device_power_up(PMSG_RESTORE);
  Enable_irqs:
 		local_irq_enable();
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -227,6 +227,12 @@ static int create_image(int platform_mod
 			"aborting hibernation\n");
 		goto Enable_irqs;
 	}
+	sysdev_suspend(PMSG_FREEZE);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down, "
+			"aborting hibernation\n");
+		goto Power_up_devices;
+	}
 
 	if (hibernation_test(TEST_CORE))
 		goto Power_up;
@@ -242,9 +248,11 @@ static int create_image(int platform_mod
 	if (!in_suspend)
 		platform_leave(platform_mode);
  Power_up:
+	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+ Power_up_devices:
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
  Enable_irqs:
@@ -335,6 +343,7 @@ static int resume_target_kernel(void)
 			"aborting resume\n");
 		goto Enable_irqs;
 	}
+	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -357,6 +366,7 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+	sysdev_resume();
 	device_power_up(PMSG_RECOVER);
  Enable_irqs:
 	local_irq_enable();
@@ -440,6 +450,7 @@ int hibernation_platform_enter(void)
 	local_irq_disable();
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -298,8 +298,12 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
-	if (!suspend_test(TEST_CORE))
-		error = suspend_ops->enter(state);
+	error = sysdev_suspend(PMSG_SUSPEND);
+	if (!error) {
+		if (!suspend_test(TEST_CORE))
+			error = suspend_ops->enter(state);
+		sysdev_resume();
+	}
 
 	device_power_up(PMSG_RESUME);
  Done:
Index: linux-2.6/drivers/base/base.h
===================================================================
--- linux-2.6.orig/drivers/base/base.h
+++ linux-2.6/drivers/base/base.h
@@ -88,8 +88,6 @@ extern void driver_detach(struct device_
 extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 
 extern void sysdev_shutdown(void);
-extern int sysdev_suspend(pm_message_t state);
-extern int sysdev_resume(void);
 
 extern char *make_class_name(const char *name, struct kobject *kobj);
 
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -381,10 +381,12 @@ struct dev_pm_info {
 
 #ifdef CONFIG_PM_SLEEP
 extern void device_pm_lock(void);
+extern int sysdev_resume(void);
 extern void device_power_up(pm_message_t state);
 extern void device_resume(pm_message_t state);
 
 extern void device_pm_unlock(void);
+extern int sysdev_suspend(pm_message_t state);
 extern int device_power_down(pm_message_t state);
 extern int device_suspend(pm_message_t state);
 extern int device_prepare_suspend(pm_message_t state);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  (?)
@ 2009-02-22 17:39 ` Rafael J. Wysocki
  2009-02-22 18:01   ` Linus Torvalds
                     ` (3 more replies)
  -1 siblings, 4 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 17:39 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to disable device
interrupts (at the IO-APIC level) during suspend or hibernation
and enable them during the subsequent resume, respectively, so that
the timer interrupts are enabled while "late" suspend callbacks and
"early" resume callbacks provided by device drivers are being
executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
interrupts will be disabled (at the IO-APIC level), with the help of
the new helper function, before calling "late" suspend callbacks
provided by device drivers and analogously during resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   20 ++++++++--
 drivers/xen/manage.c      |   37 ++++++++++++--------
 include/linux/interrupt.h |    3 +
 kernel/irq/manage.c       |   85 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |   11 ++++-
 kernel/power/disk.c       |   56 +++++++++++++++++++++++++++---
 kernel/power/main.c       |   27 +++++++++++---
 7 files changed, 208 insertions(+), 31 deletions(-)

Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -746,3 +746,88 @@ int request_irq(unsigned int irq, irq_ha
 	return retval;
 }
 EXPORT_SYMBOL(request_irq);
+
+#ifdef CONFIG_PM_SLEEP
+struct disabled_irq {
+	struct list_head list;
+	int irq;
+};
+
+static LIST_HEAD(resume_irqs_list);
+
+/**
+ *	enable_device_irqs - enable interrupts disabled by disable_device_irqs()
+ *
+ *	Enable all interrupt lines previously disabled by disable_device_irqs()
+ *	that are on resume_irqs_list.
+ */
+void enable_device_irqs(void)
+{
+	struct disabled_irq *resume_irq, *tmp;
+
+	list_for_each_entry_safe(resume_irq, tmp, &resume_irqs_list, list) {
+		enable_irq(resume_irq->irq);
+		list_del(&resume_irq->list);
+		kfree(resume_irq);
+	}
+}
+
+/**
+ *	disable_device_irqs - disable all enabled interrupt lines
+ *
+ *	During system-wide suspend or hibernation device interrupts need to be
+ *	disabled at the chip level and this function is provided for this
+ *	purpose.  It disables all interrupt lines that are enabled at the
+ *	moment and saves their numbers for enable_device_irqs().
+ */
+int disable_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		struct disabled_irq *resume_irq;
+		struct irqaction *action;
+		bool is_timer_irq;
+
+		resume_irq = kzalloc(sizeof(*resume_irq), GFP_NOIO);
+		if (!resume_irq) {
+			enable_device_irqs();
+			return -ENOMEM;
+		}
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		is_timer_irq = false;
+		action = desc->action;
+		while (action) {
+			if (action->flags | IRQF_TIMER) {
+				is_timer_irq = true;
+				break;
+			}
+			action = action->next;
+		}
+
+		if (!is_timer_irq && !desc->depth) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED;
+			desc->chip->disable(irq);
+		} else {
+			spin_unlock_irqrestore(&desc->lock, flags);
+			kfree(resume_irq);
+			continue;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (desc->action)
+			synchronize_irq(irq);
+
+		resume_irq->irq = irq;
+		list_add(&resume_irq->list, &resume_irqs_list);
+	}
+
+	return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -470,4 +470,7 @@ extern int early_irq_init(void);
 extern int arch_early_irq_init(void);
 extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
 
+extern int disable_device_irqs(void);
+extern void enable_device_irqs(void);
+
 #endif
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -22,6 +22,7 @@
 #include <linux/freezer.h>
 #include <linux/vmstat.h>
 #include <linux/syscalls.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -287,17 +288,25 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = disable_device_irqs();
+	if (error) {
+		printk(KERN_ERR "PM: Failed to disable device interrupts\n");
+		goto Unlock;
+	}
+
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +314,17 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
+	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -22,6 +22,7 @@
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/freezer.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -214,7 +215,13 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
+	error = disable_device_irqs();
+	if (error) {
+		printk(KERN_ERR "PM: Failed to disable device interrupts\n");
+		goto Unlock;
+	}
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -227,6 +234,9 @@ static int create_image(int platform_mod
 			"aborting hibernation\n");
 		goto Enable_irqs;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,11 +262,17 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+
  Enable_irqs:
-	local_irq_enable();
+ 	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
 	return error;
 }
@@ -336,13 +352,22 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
+	error = disable_device_irqs();
+	if (error) {
+		printk(KERN_ERR "PM: Failed to disable device interrupts\n");
+		goto Unlock;
+	}
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
 		goto Enable_irqs;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +391,19 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
+
+	local_irq_enable();
+
 	device_power_up(PMSG_RECOVER);
+
  Enable_irqs:
-	local_irq_enable();
+	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +480,23 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
+	error = disable_device_irqs();
+	if (error)
+		goto Unlock;
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
+	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
 
 	/*
@@ -464,12 +505,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -228,6 +228,7 @@
 #include <linux/suspend.h>
 #include <linux/kthread.h>
 #include <linux/jiffies.h>
+#include <linux/interrupt.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -1190,8 +1191,11 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
+	disable_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1213,13 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	enable_device_irqs();
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1236,10 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
+	disable_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1249,10 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	enable_device_irqs();
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,13 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
-
-	if (!*cancelled) {
-		xen_irq_resume();
-		xen_console_resume();
-		xen_timer_resume();
-	}
 
 	return 0;
 }
@@ -108,6 +95,18 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = disable_device_irqs();
+	if (err) {
+		printk(KERN_ERR "disable_device_irqs failed: %d\n", err);
+		goto resume_devices;
+	}
+
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto enable_irqs;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,18 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+	if (!cancelled) {
+		xen_irq_resume();
+		xen_console_resume();
+		xen_timer_resume();
+	}
+
+enable_irqs:
+	enable_device_irqs();
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,11 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
+
+		error = disable_device_irqs();
+		if (error)
+			goto Unlock_pm;
+
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1466,6 +1470,7 @@ int kernel_kexec(void)
 		if (error)
 			goto Enable_irqs;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1489,11 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
+		local_irq_enable();
 		device_power_up(PMSG_RESTORE);
  Enable_irqs:
-		local_irq_enable();
+		enable_device_irqs();
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  (?)
@ 2009-02-22 17:39 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 17:39 UTC (permalink / raw)
  To: LKML
  Cc: Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to disable device
interrupts (at the IO-APIC level) during suspend or hibernation
and enable them during the subsequent resume, respectively, so that
the timer interrupts are enabled while "late" suspend callbacks and
"early" resume callbacks provided by device drivers are being
executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
interrupts will be disabled (at the IO-APIC level), with the help of
the new helper function, before calling "late" suspend callbacks
provided by device drivers and analogously during resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   20 ++++++++--
 drivers/xen/manage.c      |   37 ++++++++++++--------
 include/linux/interrupt.h |    3 +
 kernel/irq/manage.c       |   85 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |   11 ++++-
 kernel/power/disk.c       |   56 +++++++++++++++++++++++++++---
 kernel/power/main.c       |   27 +++++++++++---
 7 files changed, 208 insertions(+), 31 deletions(-)

Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -746,3 +746,88 @@ int request_irq(unsigned int irq, irq_ha
 	return retval;
 }
 EXPORT_SYMBOL(request_irq);
+
+#ifdef CONFIG_PM_SLEEP
+struct disabled_irq {
+	struct list_head list;
+	int irq;
+};
+
+static LIST_HEAD(resume_irqs_list);
+
+/**
+ *	enable_device_irqs - enable interrupts disabled by disable_device_irqs()
+ *
+ *	Enable all interrupt lines previously disabled by disable_device_irqs()
+ *	that are on resume_irqs_list.
+ */
+void enable_device_irqs(void)
+{
+	struct disabled_irq *resume_irq, *tmp;
+
+	list_for_each_entry_safe(resume_irq, tmp, &resume_irqs_list, list) {
+		enable_irq(resume_irq->irq);
+		list_del(&resume_irq->list);
+		kfree(resume_irq);
+	}
+}
+
+/**
+ *	disable_device_irqs - disable all enabled interrupt lines
+ *
+ *	During system-wide suspend or hibernation device interrupts need to be
+ *	disabled at the chip level and this function is provided for this
+ *	purpose.  It disables all interrupt lines that are enabled at the
+ *	moment and saves their numbers for enable_device_irqs().
+ */
+int disable_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		struct disabled_irq *resume_irq;
+		struct irqaction *action;
+		bool is_timer_irq;
+
+		resume_irq = kzalloc(sizeof(*resume_irq), GFP_NOIO);
+		if (!resume_irq) {
+			enable_device_irqs();
+			return -ENOMEM;
+		}
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		is_timer_irq = false;
+		action = desc->action;
+		while (action) {
+			if (action->flags | IRQF_TIMER) {
+				is_timer_irq = true;
+				break;
+			}
+			action = action->next;
+		}
+
+		if (!is_timer_irq && !desc->depth) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED;
+			desc->chip->disable(irq);
+		} else {
+			spin_unlock_irqrestore(&desc->lock, flags);
+			kfree(resume_irq);
+			continue;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (desc->action)
+			synchronize_irq(irq);
+
+		resume_irq->irq = irq;
+		list_add(&resume_irq->list, &resume_irqs_list);
+	}
+
+	return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -470,4 +470,7 @@ extern int early_irq_init(void);
 extern int arch_early_irq_init(void);
 extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
 
+extern int disable_device_irqs(void);
+extern void enable_device_irqs(void);
+
 #endif
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -22,6 +22,7 @@
 #include <linux/freezer.h>
 #include <linux/vmstat.h>
 #include <linux/syscalls.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -287,17 +288,25 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = disable_device_irqs();
+	if (error) {
+		printk(KERN_ERR "PM: Failed to disable device interrupts\n");
+		goto Unlock;
+	}
+
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +314,17 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
+	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -22,6 +22,7 @@
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/freezer.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -214,7 +215,13 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
+	error = disable_device_irqs();
+	if (error) {
+		printk(KERN_ERR "PM: Failed to disable device interrupts\n");
+		goto Unlock;
+	}
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -227,6 +234,9 @@ static int create_image(int platform_mod
 			"aborting hibernation\n");
 		goto Enable_irqs;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,11 +262,17 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+
  Enable_irqs:
-	local_irq_enable();
+ 	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
 	return error;
 }
@@ -336,13 +352,22 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
+	error = disable_device_irqs();
+	if (error) {
+		printk(KERN_ERR "PM: Failed to disable device interrupts\n");
+		goto Unlock;
+	}
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
 		goto Enable_irqs;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +391,19 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
+
+	local_irq_enable();
+
 	device_power_up(PMSG_RECOVER);
+
  Enable_irqs:
-	local_irq_enable();
+	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +480,23 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
+	error = disable_device_irqs();
+	if (error)
+		goto Unlock;
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
+	enable_device_irqs();
+
+ Unlock:
 	device_pm_unlock();
 
 	/*
@@ -464,12 +505,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -228,6 +228,7 @@
 #include <linux/suspend.h>
 #include <linux/kthread.h>
 #include <linux/jiffies.h>
+#include <linux/interrupt.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -1190,8 +1191,11 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
+	disable_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1213,13 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	enable_device_irqs();
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1236,10 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
+	disable_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1249,10 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	enable_device_irqs();
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,13 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
-
-	if (!*cancelled) {
-		xen_irq_resume();
-		xen_console_resume();
-		xen_timer_resume();
-	}
 
 	return 0;
 }
@@ -108,6 +95,18 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = disable_device_irqs();
+	if (err) {
+		printk(KERN_ERR "disable_device_irqs failed: %d\n", err);
+		goto resume_devices;
+	}
+
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto enable_irqs;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,18 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+	if (!cancelled) {
+		xen_irq_resume();
+		xen_console_resume();
+		xen_timer_resume();
+	}
+
+enable_irqs:
+	enable_device_irqs();
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,11 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
+
+		error = disable_device_irqs();
+		if (error)
+			goto Unlock_pm;
+
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1466,6 +1470,7 @@ int kernel_kexec(void)
 		if (error)
 			goto Enable_irqs;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1489,11 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
+		local_irq_enable();
 		device_power_up(PMSG_RESTORE);
  Enable_irqs:
-		local_irq_enable();
+		enable_device_irqs();
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 17:39 ` [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
@ 2009-02-22 18:01   ` Linus Torvalds
  2009-02-22 22:42     ` Rafael J. Wysocki
  2009-02-22 22:42     ` Rafael J. Wysocki
  2009-02-22 18:01   ` Linus Torvalds
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner



On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> 
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> interrupts will be disabled (at the IO-APIC level), with the help of
> the new helper function, before calling "late" suspend callbacks
> provided by device drivers and analogously during resume.

I think this patch is actually a bit too complicated.

> +struct disabled_irq {
> +	struct list_head list;
> +	int irq;
> +};
> +
> +static LIST_HEAD(resume_irqs_list);
> +
> +/**
> + *	enable_device_irqs - enable interrupts disabled by disable_device_irqs()
> + *
> + *	Enable all interrupt lines previously disabled by disable_device_irqs()
> + *	that are on resume_irqs_list.
> + */
> +void enable_device_irqs(void)
> +{
> +	struct disabled_irq *resume_irq, *tmp;
> +
> +	list_for_each_entry_safe(resume_irq, tmp, &resume_irqs_list, list) {
> +		enable_irq(resume_irq->irq);
> +		list_del(&resume_irq->list);
> +		kfree(resume_irq);
> +	}
> +}

Don't do this whole separate list. Instead, just add a per-irq-descriptor 
flag to the desc->status field that says "suspended". IOW, just do 
something like

	diff --git a/include/linux/irq.h b/include/linux/irq.h
	index f899b50..7bc2a31 100644
	--- a/include/linux/irq.h
	+++ b/include/linux/irq.h
	@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsigned int irq,
	 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
	 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
	 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
	+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
	 
	 #ifdef CONFIG_IRQ_PER_CPU
	 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

and then just make the suspend sequence do

	for_each_irq_desc(irq, desc) {
		.. check desc if we should disable it ..
		disable_irq(irq);
		desc->status |= IRQ_SUSPENDED;
	}

and the resume sequence do

	for_each_irq_desc(irq, desc) {
		if (!(desc->status & IRQ_SUSPENDED))
			continue;
		desc->status &= ~IRQ_SUSPENDED;
		enabled_irq(irq);
	}

And that simplifcation then gets rid of

> +/**
> + *	disable_device_irqs - disable all enabled interrupt lines
> + *
> + *	During system-wide suspend or hibernation device interrupts need to be
> + *	disabled at the chip level and this function is provided for this
> + *	purpose.  It disables all interrupt lines that are enabled at the
> + *	moment and saves their numbers for enable_device_irqs().
> + */
> +int disable_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +		struct disabled_irq *resume_irq;
> +		struct irqaction *action;
> +		bool is_timer_irq;
> +
> +		resume_irq = kzalloc(sizeof(*resume_irq), GFP_NOIO);
> +		if (!resume_irq) {
> +			enable_device_irqs();
> +			return -ENOMEM;
> +		}

this just goes away.

> +		is_timer_irq = false;
> +		action = desc->action;
> +		while (action) {
> +			if (action->flags | IRQF_TIMER) {
> +				is_timer_irq = true;
> +				break;
> +			}
> +			action = action->next;
> +		}

This is also pointless and wrong (and buggy). You should use '&' to 
test that flag, not '|', but more importantly, if you share interrupts 
with a timer irq, there's nothing sane the irq layer can do ANYWAY, so 
just ignore the whole problem. Just look at the first one, don't try to be 
clever, because your clever code doesn't buy anything at all. 

So get rid of the loop, and just do

	if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
		desc->depth++;
		desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
		desc->chip->disable(irq);
	}
	spin_unlock_irqrestore(&desc->lock, flags);

and you're done.

Also, I'd actually suggest that the whole "synchronize_irq()" be handled 
in a separate loop after the main one, so make that one just be

	for_each_irq_desc(irq, desc) {
		if (desc->status & IRQ_SUSPENDED)
			serialize_irq(irq);
	}

at the end. No need for desc->lock, since the IRQ_SUSPENDED bit is stable.	

Finally:

> +extern int disable_device_irqs(void);
> +extern void enable_device_irqs(void);

I think the naming is not great. It's not about disable/enable, it's very 
much about suspend/resume. In your version, it had that global 
"disabled_irq" list, and in mine it has that IRQ_SUSPENDED bit - and in 
both cases you can't nest things, and you can't consider them in any way 
"generic" enable/disable things, they are very specialized "shut up 
everything but the timer irq".

I also don't think there is any reasonable error case, so just make the 
"suspend" thing return 'void', and don't complicate the caller. We don't 
error out on the simple "disable_irq()" either. It's a imperative 
statement, not a "please can you try to do that" thing.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 17:39 ` [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
  2009-02-22 18:01   ` Linus Torvalds
@ 2009-02-22 18:01   ` Linus Torvalds
  2009-02-23 22:11   ` Arve Hjønnevåg
  2009-02-23 22:11   ` Arve Hjønnevåg
  3 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> 
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> interrupts will be disabled (at the IO-APIC level), with the help of
> the new helper function, before calling "late" suspend callbacks
> provided by device drivers and analogously during resume.

I think this patch is actually a bit too complicated.

> +struct disabled_irq {
> +	struct list_head list;
> +	int irq;
> +};
> +
> +static LIST_HEAD(resume_irqs_list);
> +
> +/**
> + *	enable_device_irqs - enable interrupts disabled by disable_device_irqs()
> + *
> + *	Enable all interrupt lines previously disabled by disable_device_irqs()
> + *	that are on resume_irqs_list.
> + */
> +void enable_device_irqs(void)
> +{
> +	struct disabled_irq *resume_irq, *tmp;
> +
> +	list_for_each_entry_safe(resume_irq, tmp, &resume_irqs_list, list) {
> +		enable_irq(resume_irq->irq);
> +		list_del(&resume_irq->list);
> +		kfree(resume_irq);
> +	}
> +}

Don't do this whole separate list. Instead, just add a per-irq-descriptor 
flag to the desc->status field that says "suspended". IOW, just do 
something like

	diff --git a/include/linux/irq.h b/include/linux/irq.h
	index f899b50..7bc2a31 100644
	--- a/include/linux/irq.h
	+++ b/include/linux/irq.h
	@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsigned int irq,
	 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
	 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
	 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
	+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
	 
	 #ifdef CONFIG_IRQ_PER_CPU
	 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

and then just make the suspend sequence do

	for_each_irq_desc(irq, desc) {
		.. check desc if we should disable it ..
		disable_irq(irq);
		desc->status |= IRQ_SUSPENDED;
	}

and the resume sequence do

	for_each_irq_desc(irq, desc) {
		if (!(desc->status & IRQ_SUSPENDED))
			continue;
		desc->status &= ~IRQ_SUSPENDED;
		enabled_irq(irq);
	}

And that simplifcation then gets rid of

> +/**
> + *	disable_device_irqs - disable all enabled interrupt lines
> + *
> + *	During system-wide suspend or hibernation device interrupts need to be
> + *	disabled at the chip level and this function is provided for this
> + *	purpose.  It disables all interrupt lines that are enabled at the
> + *	moment and saves their numbers for enable_device_irqs().
> + */
> +int disable_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +		struct disabled_irq *resume_irq;
> +		struct irqaction *action;
> +		bool is_timer_irq;
> +
> +		resume_irq = kzalloc(sizeof(*resume_irq), GFP_NOIO);
> +		if (!resume_irq) {
> +			enable_device_irqs();
> +			return -ENOMEM;
> +		}

this just goes away.

> +		is_timer_irq = false;
> +		action = desc->action;
> +		while (action) {
> +			if (action->flags | IRQF_TIMER) {
> +				is_timer_irq = true;
> +				break;
> +			}
> +			action = action->next;
> +		}

This is also pointless and wrong (and buggy). You should use '&' to 
test that flag, not '|', but more importantly, if you share interrupts 
with a timer irq, there's nothing sane the irq layer can do ANYWAY, so 
just ignore the whole problem. Just look at the first one, don't try to be 
clever, because your clever code doesn't buy anything at all. 

So get rid of the loop, and just do

	if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
		desc->depth++;
		desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
		desc->chip->disable(irq);
	}
	spin_unlock_irqrestore(&desc->lock, flags);

and you're done.

Also, I'd actually suggest that the whole "synchronize_irq()" be handled 
in a separate loop after the main one, so make that one just be

	for_each_irq_desc(irq, desc) {
		if (desc->status & IRQ_SUSPENDED)
			serialize_irq(irq);
	}

at the end. No need for desc->lock, since the IRQ_SUSPENDED bit is stable.	

Finally:

> +extern int disable_device_irqs(void);
> +extern void enable_device_irqs(void);

I think the naming is not great. It's not about disable/enable, it's very 
much about suspend/resume. In your version, it had that global 
"disabled_irq" list, and in mine it has that IRQ_SUSPENDED bit - and in 
both cases you can't nest things, and you can't consider them in any way 
"generic" enable/disable things, they are very specialized "shut up 
everything but the timer irq".

I also don't think there is any reasonable error case, so just make the 
"suspend" thing return 'void', and don't complicate the caller. We don't 
error out on the simple "disable_irq()" either. It's a imperative 
statement, not a "please can you try to do that" thing.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 17:37 ` Rafael J. Wysocki
@ 2009-02-22 18:13   ` Linus Torvalds
  -1 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner



On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> 
> However, x86 currently doesn't set the IRQF_TIMER flag and I need to 
> make it do so before going further in this direction and changing the 
> PCI PM framework to take advantage of the $subject changes, for example. 

Actually, you don't.

The modern form of timer interrupt on x86 is the local apic timer, and it 
doesn't go through the io-apic at all, and is not even visible to the irq 
subsystem. So it stays enabled through this all.

But for old-style timer interrupts, something like the appended should 
do it.

Untested, of course, but it looks obvious enough.

		Linus

---
 arch/x86/kernel/time_64.c     |    2 +-
 arch/x86/kernel/vmiclock_32.c |    3 ++-
 arch/x86/mach-default/setup.c |    2 +-
 arch/x86/mach-voyager/setup.c |    2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/time_64.c b/arch/x86/kernel/time_64.c
index e6e695a..241ec39 100644
--- a/arch/x86/kernel/time_64.c
+++ b/arch/x86/kernel/time_64.c
@@ -115,7 +115,7 @@ unsigned long __init calibrate_cpu(void)
 
 static struct irqaction irq0 = {
 	.handler	= timer_interrupt,
-	.flags		= IRQF_DISABLED | IRQF_IRQPOLL | IRQF_NOBALANCING,
+	.flags		= IRQF_DISABLED | IRQF_IRQPOLL | IRQF_NOBALANCING | IRQF_TIMER,
 	.mask		= CPU_MASK_NONE,
 	.name		= "timer"
 };
diff --git a/arch/x86/kernel/vmiclock_32.c b/arch/x86/kernel/vmiclock_32.c
index bde106c..7a29d5c 100644
--- a/arch/x86/kernel/vmiclock_32.c
+++ b/arch/x86/kernel/vmiclock_32.c
@@ -1,3 +1,4 @@
+
 /*
  * VMI paravirtual timer support routines.
  *
@@ -202,7 +203,7 @@ static irqreturn_t vmi_timer_interrupt(int irq, void *dev_id)
 static struct irqaction vmi_clock_action  = {
 	.name 		= "vmi-timer",
 	.handler 	= vmi_timer_interrupt,
-	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING,
+	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING, IRQF_TIMER,
 	.mask 		= CPU_MASK_ALL,
 };
 
diff --git a/arch/x86/mach-default/setup.c b/arch/x86/mach-default/setup.c
index a265a7c..d737542 100644
--- a/arch/x86/mach-default/setup.c
+++ b/arch/x86/mach-default/setup.c
@@ -96,7 +96,7 @@ void __init trap_init_hook(void)
 
 static struct irqaction irq0  = {
 	.handler = timer_interrupt,
-	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL,
+	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
 	.mask = CPU_MASK_NONE,
 	.name = "timer"
 };
diff --git a/arch/x86/mach-voyager/setup.c b/arch/x86/mach-voyager/setup.c
index d914a79..4de9e08 100644
--- a/arch/x86/mach-voyager/setup.c
+++ b/arch/x86/mach-voyager/setup.c
@@ -56,7 +56,7 @@ void __init trap_init_hook(void)
 
 static struct irqaction irq0 = {
 	.handler = timer_interrupt,
-	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL,
+	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
 	.mask = CPU_MASK_NONE,
 	.name = "timer"
 };


^ permalink raw reply related	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
@ 2009-02-22 18:13   ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> 
> However, x86 currently doesn't set the IRQF_TIMER flag and I need to 
> make it do so before going further in this direction and changing the 
> PCI PM framework to take advantage of the $subject changes, for example. 

Actually, you don't.

The modern form of timer interrupt on x86 is the local apic timer, and it 
doesn't go through the io-apic at all, and is not even visible to the irq 
subsystem. So it stays enabled through this all.

But for old-style timer interrupts, something like the appended should 
do it.

Untested, of course, but it looks obvious enough.

		Linus

---
 arch/x86/kernel/time_64.c     |    2 +-
 arch/x86/kernel/vmiclock_32.c |    3 ++-
 arch/x86/mach-default/setup.c |    2 +-
 arch/x86/mach-voyager/setup.c |    2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/time_64.c b/arch/x86/kernel/time_64.c
index e6e695a..241ec39 100644
--- a/arch/x86/kernel/time_64.c
+++ b/arch/x86/kernel/time_64.c
@@ -115,7 +115,7 @@ unsigned long __init calibrate_cpu(void)
 
 static struct irqaction irq0 = {
 	.handler	= timer_interrupt,
-	.flags		= IRQF_DISABLED | IRQF_IRQPOLL | IRQF_NOBALANCING,
+	.flags		= IRQF_DISABLED | IRQF_IRQPOLL | IRQF_NOBALANCING | IRQF_TIMER,
 	.mask		= CPU_MASK_NONE,
 	.name		= "timer"
 };
diff --git a/arch/x86/kernel/vmiclock_32.c b/arch/x86/kernel/vmiclock_32.c
index bde106c..7a29d5c 100644
--- a/arch/x86/kernel/vmiclock_32.c
+++ b/arch/x86/kernel/vmiclock_32.c
@@ -1,3 +1,4 @@
+
 /*
  * VMI paravirtual timer support routines.
  *
@@ -202,7 +203,7 @@ static irqreturn_t vmi_timer_interrupt(int irq, void *dev_id)
 static struct irqaction vmi_clock_action  = {
 	.name 		= "vmi-timer",
 	.handler 	= vmi_timer_interrupt,
-	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING,
+	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING, IRQF_TIMER,
 	.mask 		= CPU_MASK_ALL,
 };
 
diff --git a/arch/x86/mach-default/setup.c b/arch/x86/mach-default/setup.c
index a265a7c..d737542 100644
--- a/arch/x86/mach-default/setup.c
+++ b/arch/x86/mach-default/setup.c
@@ -96,7 +96,7 @@ void __init trap_init_hook(void)
 
 static struct irqaction irq0  = {
 	.handler = timer_interrupt,
-	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL,
+	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
 	.mask = CPU_MASK_NONE,
 	.name = "timer"
 };
diff --git a/arch/x86/mach-voyager/setup.c b/arch/x86/mach-voyager/setup.c
index d914a79..4de9e08 100644
--- a/arch/x86/mach-voyager/setup.c
+++ b/arch/x86/mach-voyager/setup.c
@@ -56,7 +56,7 @@ void __init trap_init_hook(void)
 
 static struct irqaction irq0 = {
 	.handler = timer_interrupt,
-	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL,
+	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
 	.mask = CPU_MASK_NONE,
 	.name = "timer"
 };

^ permalink raw reply related	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 18:13   ` Linus Torvalds
  (?)
@ 2009-02-22 18:18   ` Ingo Molnar
  2009-02-22 18:25       ` Linus Torvalds
  -1 siblings, 1 reply; 373+ messages in thread
From: Ingo Molnar @ 2009-02-22 18:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner


> +	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING, IRQF_TIMER,
> +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
> +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,

s/, IRQF_TIMER/ | IRQF_TIMER

i guess.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 18:13   ` Linus Torvalds
  (?)
  (?)
@ 2009-02-22 18:18   ` Ingo Molnar
  -1 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-22 18:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner


> +	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING, IRQF_TIMER,
> +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
> +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,

s/, IRQF_TIMER/ | IRQF_TIMER

i guess.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 18:18   ` Ingo Molnar
@ 2009-02-22 18:25       ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Sun, 22 Feb 2009, Ingo Molnar wrote:

> 
> > +	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING, IRQF_TIMER,
> > +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
> > +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
> 
> s/, IRQF_TIMER/ | IRQF_TIMER
> 
> i guess.

Oops yes. I got one of them right. I guess that's the same one I happened 
to compile in my config. Duh.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
@ 2009-02-22 18:25       ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner



On Sun, 22 Feb 2009, Ingo Molnar wrote:

> 
> > +	.flags 		= IRQF_DISABLED | IRQF_NOBALANCING, IRQF_TIMER,
> > +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
> > +	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL, IRQF_TIMER,
> 
> s/, IRQF_TIMER/ | IRQF_TIMER
> 
> i guess.

Oops yes. I got one of them right. I guess that's the same one I happened 
to compile in my config. Duh.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 18:25       ` Linus Torvalds
@ 2009-02-22 18:35         ` Linus Torvalds
  -1 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Sun, 22 Feb 2009, Linus Torvalds wrote:
> 
> Oops yes. I got one of them right. I guess that's the same one I happened 
> to compile in my config. Duh.

I committed the trivially fixed version. 

I also committed Rafael's patch 1/2 (the one that doesn't actually change 
anything). Even if we don't do this in 2.6.29, I want to make it easy to 
test, and get the infrastructure unified.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
@ 2009-02-22 18:35         ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 18:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner



On Sun, 22 Feb 2009, Linus Torvalds wrote:
> 
> Oops yes. I got one of them right. I guess that's the same one I happened 
> to compile in my config. Duh.

I committed the trivially fixed version. 

I also committed Rafael's patch 1/2 (the one that doesn't actually change 
anything). Even if we don't do this in 2.6.29, I want to make it easy to 
test, and get the infrastructure unified.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 17:38 ` Rafael J. Wysocki
@ 2009-02-22 20:56   ` Adrian Bunk
  2009-02-22 21:07       ` Linus Torvalds
  2009-02-22 20:56   ` Adrian Bunk
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 373+ messages in thread
From: Adrian Bunk @ 2009-02-22 20:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Sun, Feb 22, 2009 at 06:38:50PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Move the sysdev_suspend/resume from the callee to the callers, with
> no real change in semantics, so that we can rework the disabling of
> interrupts during suspend/hibernation.
> 
> This is based on an earlier patch from Linus.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
>...
> --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> +++ linux-2.6/arch/x86/kernel/apm_32.c
> @@ -1192,6 +1192,7 @@ static int suspend(int vetoable)
>  	device_suspend(PMSG_SUSPEND);
>  	local_irq_disable();
>  	device_power_down(PMSG_SUSPEND);
> +	sysdev_suspend(PMSG_SUSPEND);
>  
>  	local_irq_enable();
>  
> @@ -1208,6 +1209,7 @@ static int suspend(int vetoable)
>  	if (err != APM_SUCCESS)
>  		apm_error("suspend", err);
>  	err = (err == APM_SUCCESS) ? 0 : -EIO;
> +	sysdev_resume();
>  	device_power_up(PMSG_RESUME);
>  	local_irq_enable();
>  	device_resume(PMSG_RESUME);
> @@ -1228,6 +1230,7 @@ static void standby(void)
>  
>  	local_irq_disable();
>  	device_power_down(PMSG_SUSPEND);
> +	sysdev_suspend(PMSG_SUSPEND);
>  	local_irq_enable();
>  
>  	err = set_system_power_state(APM_STATE_STANDBY);
> @@ -1235,6 +1238,7 @@ static void standby(void)
>  		apm_error("standby", err);
>  
>  	local_irq_disable();
> +	sysdev_resume();
>  	device_power_up(PMSG_RESUME);
>  	local_irq_enable();
>  }
>...

This causes the following build error with CONFIG_APM=m:

<--  snip  -->

...
  MODPOST 2586 modules
ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
make[2]: *** [__modpost] Error 1

<--  snip  -->

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 17:38 ` Rafael J. Wysocki
  2009-02-22 20:56   ` Adrian Bunk
@ 2009-02-22 20:56   ` Adrian Bunk
  2009-03-05 16:54   ` Pavel Machek
  2009-03-05 16:54   ` Pavel Machek
  3 siblings, 0 replies; 373+ messages in thread
From: Adrian Bunk @ 2009-02-22 20:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Sun, Feb 22, 2009 at 06:38:50PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Move the sysdev_suspend/resume from the callee to the callers, with
> no real change in semantics, so that we can rework the disabling of
> interrupts during suspend/hibernation.
> 
> This is based on an earlier patch from Linus.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
>...
> --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> +++ linux-2.6/arch/x86/kernel/apm_32.c
> @@ -1192,6 +1192,7 @@ static int suspend(int vetoable)
>  	device_suspend(PMSG_SUSPEND);
>  	local_irq_disable();
>  	device_power_down(PMSG_SUSPEND);
> +	sysdev_suspend(PMSG_SUSPEND);
>  
>  	local_irq_enable();
>  
> @@ -1208,6 +1209,7 @@ static int suspend(int vetoable)
>  	if (err != APM_SUCCESS)
>  		apm_error("suspend", err);
>  	err = (err == APM_SUCCESS) ? 0 : -EIO;
> +	sysdev_resume();
>  	device_power_up(PMSG_RESUME);
>  	local_irq_enable();
>  	device_resume(PMSG_RESUME);
> @@ -1228,6 +1230,7 @@ static void standby(void)
>  
>  	local_irq_disable();
>  	device_power_down(PMSG_SUSPEND);
> +	sysdev_suspend(PMSG_SUSPEND);
>  	local_irq_enable();
>  
>  	err = set_system_power_state(APM_STATE_STANDBY);
> @@ -1235,6 +1238,7 @@ static void standby(void)
>  		apm_error("standby", err);
>  
>  	local_irq_disable();
> +	sysdev_resume();
>  	device_power_up(PMSG_RESUME);
>  	local_irq_enable();
>  }
>...

This causes the following build error with CONFIG_APM=m:

<--  snip  -->

...
  MODPOST 2586 modules
ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
make[2]: *** [__modpost] Error 1

<--  snip  -->

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 20:56   ` Adrian Bunk
@ 2009-02-22 21:07       ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 21:07 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Sun, 22 Feb 2009, Adrian Bunk wrote:
> ...
>   MODPOST 2586 modules
> ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
> ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
> make[2]: *** [__modpost] Error 1

Ahh. device_power_[down|up] were EXPORT_SYMBOL_GPL, so now that we've 
split them, so must sysdev_[suspend|resume] be.

Does this fix it?

		Linus
---
 drivers/base/sys.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/base/sys.c b/drivers/base/sys.c
index c98c31e..ef2055e 100644
--- a/drivers/base/sys.c
+++ b/drivers/base/sys.c
@@ -432,6 +432,7 @@ aux_driver:
 	}
 	return ret;
 }
+EXPORT_SYMBOL_GPL(sysdev_suspend);
 
 
 /**
@@ -463,6 +464,7 @@ int sysdev_resume(void)
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(sysdev_resume);
 
 
 int __init system_bus_init(void)

^ permalink raw reply related	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
@ 2009-02-22 21:07       ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 21:07 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Sun, 22 Feb 2009, Adrian Bunk wrote:
> ...
>   MODPOST 2586 modules
> ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
> ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
> make[2]: *** [__modpost] Error 1

Ahh. device_power_[down|up] were EXPORT_SYMBOL_GPL, so now that we've 
split them, so must sysdev_[suspend|resume] be.

Does this fix it?

		Linus
---
 drivers/base/sys.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/base/sys.c b/drivers/base/sys.c
index c98c31e..ef2055e 100644
--- a/drivers/base/sys.c
+++ b/drivers/base/sys.c
@@ -432,6 +432,7 @@ aux_driver:
 	}
 	return ret;
 }
+EXPORT_SYMBOL_GPL(sysdev_suspend);
 
 
 /**
@@ -463,6 +464,7 @@ int sysdev_resume(void)
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(sysdev_resume);
 
 
 int __init system_bus_init(void)

^ permalink raw reply related	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 21:07       ` Linus Torvalds
@ 2009-02-22 21:12         ` Ingo Molnar
  -1 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-22 21:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Rafael J. Wysocki, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Sun, 22 Feb 2009, Adrian Bunk wrote:
> > ...
> >   MODPOST 2586 modules
> > ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
> > ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
> > make[2]: *** [__modpost] Error 1
> 
> Ahh. device_power_[down|up] were EXPORT_SYMBOL_GPL, so now that we've 
> split them, so must sysdev_[suspend|resume] be.
> 
> Does this fix it?

I just hit the same issue in -tip testing and did the same fix:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/urgent

	Ingo

------------------>
Ingo Molnar (1):
      PM: Split up sysdev_[suspend|resume] from device_power_[down|up], fix


 drivers/base/sys.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/base/sys.c b/drivers/base/sys.c
index c98c31e..b428c8c 100644
--- a/drivers/base/sys.c
+++ b/drivers/base/sys.c
@@ -303,7 +303,6 @@ void sysdev_unregister(struct sys_device * sysdev)
  *	is guaranteed by virtue of the fact that child devices are registered
  *	after their parents.
  */
-
 void sysdev_shutdown(void)
 {
 	struct sysdev_class * cls;
@@ -363,7 +362,6 @@ static void __sysdev_resume(struct sys_device *dev)
  *	This is only called by the device PM core, so we let them handle
  *	all synchronization.
  */
-
 int sysdev_suspend(pm_message_t state)
 {
 	struct sysdev_class * cls;
@@ -432,7 +430,7 @@ aux_driver:
 	}
 	return ret;
 }
-
+EXPORT_SYMBOL_GPL(sysdev_suspend);
 
 /**
  *	sysdev_resume - Bring system devices back to life.
@@ -442,7 +440,6 @@ aux_driver:
  *
  *	Note: Interrupts are disabled when called.
  */
-
 int sysdev_resume(void)
 {
 	struct sysdev_class * cls;
@@ -463,7 +460,7 @@ int sysdev_resume(void)
 	}
 	return 0;
 }
-
+EXPORT_SYMBOL_GPL(sysdev_resume);
 
 int __init system_bus_init(void)
 {

^ permalink raw reply related	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
@ 2009-02-22 21:12         ` Ingo Molnar
  0 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-22 21:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, LKML, Jesse Barnes, Eric W. Biederman,
	Jeremy Fitzhardinge, pm list, Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Sun, 22 Feb 2009, Adrian Bunk wrote:
> > ...
> >   MODPOST 2586 modules
> > ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
> > ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
> > make[2]: *** [__modpost] Error 1
> 
> Ahh. device_power_[down|up] were EXPORT_SYMBOL_GPL, so now that we've 
> split them, so must sysdev_[suspend|resume] be.
> 
> Does this fix it?

I just hit the same issue in -tip testing and did the same fix:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/urgent

	Ingo

------------------>
Ingo Molnar (1):
      PM: Split up sysdev_[suspend|resume] from device_power_[down|up], fix


 drivers/base/sys.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/base/sys.c b/drivers/base/sys.c
index c98c31e..b428c8c 100644
--- a/drivers/base/sys.c
+++ b/drivers/base/sys.c
@@ -303,7 +303,6 @@ void sysdev_unregister(struct sys_device * sysdev)
  *	is guaranteed by virtue of the fact that child devices are registered
  *	after their parents.
  */
-
 void sysdev_shutdown(void)
 {
 	struct sysdev_class * cls;
@@ -363,7 +362,6 @@ static void __sysdev_resume(struct sys_device *dev)
  *	This is only called by the device PM core, so we let them handle
  *	all synchronization.
  */
-
 int sysdev_suspend(pm_message_t state)
 {
 	struct sysdev_class * cls;
@@ -432,7 +430,7 @@ aux_driver:
 	}
 	return ret;
 }
-
+EXPORT_SYMBOL_GPL(sysdev_suspend);
 
 /**
  *	sysdev_resume - Bring system devices back to life.
@@ -442,7 +440,6 @@ aux_driver:
  *
  *	Note: Interrupts are disabled when called.
  */
-
 int sysdev_resume(void)
 {
 	struct sysdev_class * cls;
@@ -463,7 +460,7 @@ int sysdev_resume(void)
 	}
 	return 0;
 }
-
+EXPORT_SYMBOL_GPL(sysdev_resume);
 
 int __init system_bus_init(void)
 {

^ permalink raw reply related	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  (?)
@ 2009-02-22 22:37 ` Eric W. Biederman
  2009-02-22 22:56   ` Benjamin Herrenschmidt
                     ` (2 more replies)
  -1 siblings, 3 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-22 22:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Moreover, the real purpose of these changes is to be able to execute the
> "late" suspend and "early" resume device callbacks with timer interrupts
> enabled, so that they can use mutexes etc.  However, x86 currently doesn't set
> the IRQF_TIMER flag and I need to make it do so before going further in this
> direction and changing the PCI PM framework to take advantage of the $subject
> changes, for example.  So, I need to know how to modify x86 timer code so that
> the IRQF_TIMER flag is set by it.

How does this sync with the ACPI requirement that the it's late suspend MUST
happen with irqs disabled?

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  (?)
@ 2009-02-22 22:37 ` Eric W. Biederman
  -1 siblings, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-22 22:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Moreover, the real purpose of these changes is to be able to execute the
> "late" suspend and "early" resume device callbacks with timer interrupts
> enabled, so that they can use mutexes etc.  However, x86 currently doesn't set
> the IRQF_TIMER flag and I need to make it do so before going further in this
> direction and changing the PCI PM framework to take advantage of the $subject
> changes, for example.  So, I need to know how to modify x86 timer code so that
> the IRQF_TIMER flag is set by it.

How does this sync with the ACPI requirement that the it's late suspend MUST
happen with irqs disabled?

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 18:01   ` Linus Torvalds
@ 2009-02-22 22:42     ` Rafael J. Wysocki
  2009-02-22 23:48       ` Rafael J. Wysocki
  2009-02-22 23:48       ` Rafael J. Wysocki
  2009-02-22 22:42     ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Sunday 22 February 2009, Linus Torvalds wrote:
> 
> On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> > 
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > interrupts will be disabled (at the IO-APIC level), with the help of
> > the new helper function, before calling "late" suspend callbacks
> > provided by device drivers and analogously during resume.
> 
> I think this patch is actually a bit too complicated.
> 
> > +struct disabled_irq {
> > +	struct list_head list;
> > +	int irq;
> > +};
> > +
> > +static LIST_HEAD(resume_irqs_list);
> > +
> > +/**
> > + *	enable_device_irqs - enable interrupts disabled by disable_device_irqs()
> > + *
> > + *	Enable all interrupt lines previously disabled by disable_device_irqs()
> > + *	that are on resume_irqs_list.
> > + */
> > +void enable_device_irqs(void)
> > +{
> > +	struct disabled_irq *resume_irq, *tmp;
> > +
> > +	list_for_each_entry_safe(resume_irq, tmp, &resume_irqs_list, list) {
> > +		enable_irq(resume_irq->irq);
> > +		list_del(&resume_irq->list);
> > +		kfree(resume_irq);
> > +	}
> > +}
> 
> Don't do this whole separate list. Instead, just add a per-irq-descriptor 
> flag to the desc->status field that says "suspended". IOW, just do 
> something like

OK

> 	diff --git a/include/linux/irq.h b/include/linux/irq.h
> 	index f899b50..7bc2a31 100644
> 	--- a/include/linux/irq.h
> 	+++ b/include/linux/irq.h
> 	@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsigned int irq,
> 	 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
> 	 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
> 	 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
> 	+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
> 	 
> 	 #ifdef CONFIG_IRQ_PER_CPU
> 	 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
> 
> and then just make the suspend sequence do
> 
> 	for_each_irq_desc(irq, desc) {
> 		.. check desc if we should disable it ..
> 		disable_irq(irq);
> 		desc->status |= IRQ_SUSPENDED;
> 	}
> 
> and the resume sequence do
> 
> 	for_each_irq_desc(irq, desc) {
> 		if (!(desc->status & IRQ_SUSPENDED))
> 			continue;
> 		desc->status &= ~IRQ_SUSPENDED;
> 		enabled_irq(irq);
> 	}
> 
> And that simplifcation then gets rid of
> 
> > +/**
> > + *	disable_device_irqs - disable all enabled interrupt lines
> > + *
> > + *	During system-wide suspend or hibernation device interrupts need to be
> > + *	disabled at the chip level and this function is provided for this
> > + *	purpose.  It disables all interrupt lines that are enabled at the
> > + *	moment and saves their numbers for enable_device_irqs().
> > + */
> > +int disable_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +		struct disabled_irq *resume_irq;
> > +		struct irqaction *action;
> > +		bool is_timer_irq;
> > +
> > +		resume_irq = kzalloc(sizeof(*resume_irq), GFP_NOIO);
> > +		if (!resume_irq) {
> > +			enable_device_irqs();
> > +			return -ENOMEM;
> > +		}
> 
> this just goes away.
> 
> > +		is_timer_irq = false;
> > +		action = desc->action;
> > +		while (action) {
> > +			if (action->flags | IRQF_TIMER) {
> > +				is_timer_irq = true;
> > +				break;
> > +			}
> > +			action = action->next;
> > +		}
> 
> This is also pointless and wrong (and buggy). You should use '&' to 
> test that flag, not '|',

Ouch, sorry.

> but more importantly, if you share interrupts with a timer irq, there's
> nothing sane the irq layer can do ANYWAY, so just ignore the whole problem.
> Just look at the first one, don't try to be clever, because your clever code
> doesn't buy anything at all. 
> 
> So get rid of the loop, and just do
> 
> 	if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> 		desc->depth++;
> 		desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> 		desc->chip->disable(irq);
> 	}
> 	spin_unlock_irqrestore(&desc->lock, flags);
> 
> and you're done.

OK

> Also, I'd actually suggest that the whole "synchronize_irq()" be handled 
> in a separate loop after the main one, so make that one just be
> 
> 	for_each_irq_desc(irq, desc) {
> 		if (desc->status & IRQ_SUSPENDED)
> 			serialize_irq(irq);
> 	}
> 
> at the end. No need for desc->lock, since the IRQ_SUSPENDED bit is stable.	

OK

> Finally:
> 
> > +extern int disable_device_irqs(void);
> > +extern void enable_device_irqs(void);
> 
> I think the naming is not great. It's not about disable/enable, it's very 
> much about suspend/resume. In your version, it had that global 
> "disabled_irq" list, and in mine it has that IRQ_SUSPENDED bit - and in 
> both cases you can't nest things, and you can't consider them in any way 
> "generic" enable/disable things, they are very specialized "shut up 
> everything but the timer irq".

OK, would 

extern void suspend_device_irqs(void);
extern void resume_device_irqs(void);

be better?

> I also don't think there is any reasonable error case, so just make the 
> "suspend" thing return 'void', and don't complicate the caller. We don't 
> error out on the simple "disable_irq()" either. It's a imperative 
> statement, not a "please can you try to do that" thing.

The error is there just because the memory allocation can fail.  With the
IRQ_SUSPENDED flag as per your suggestion it won't be necessary any more.

Thanks a lot for your comments, I'll send an updated patch shortly.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 18:01   ` Linus Torvalds
  2009-02-22 22:42     ` Rafael J. Wysocki
@ 2009-02-22 22:42     ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Sunday 22 February 2009, Linus Torvalds wrote:
> 
> On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> > 
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > interrupts will be disabled (at the IO-APIC level), with the help of
> > the new helper function, before calling "late" suspend callbacks
> > provided by device drivers and analogously during resume.
> 
> I think this patch is actually a bit too complicated.
> 
> > +struct disabled_irq {
> > +	struct list_head list;
> > +	int irq;
> > +};
> > +
> > +static LIST_HEAD(resume_irqs_list);
> > +
> > +/**
> > + *	enable_device_irqs - enable interrupts disabled by disable_device_irqs()
> > + *
> > + *	Enable all interrupt lines previously disabled by disable_device_irqs()
> > + *	that are on resume_irqs_list.
> > + */
> > +void enable_device_irqs(void)
> > +{
> > +	struct disabled_irq *resume_irq, *tmp;
> > +
> > +	list_for_each_entry_safe(resume_irq, tmp, &resume_irqs_list, list) {
> > +		enable_irq(resume_irq->irq);
> > +		list_del(&resume_irq->list);
> > +		kfree(resume_irq);
> > +	}
> > +}
> 
> Don't do this whole separate list. Instead, just add a per-irq-descriptor 
> flag to the desc->status field that says "suspended". IOW, just do 
> something like

OK

> 	diff --git a/include/linux/irq.h b/include/linux/irq.h
> 	index f899b50..7bc2a31 100644
> 	--- a/include/linux/irq.h
> 	+++ b/include/linux/irq.h
> 	@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsigned int irq,
> 	 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
> 	 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
> 	 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
> 	+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
> 	 
> 	 #ifdef CONFIG_IRQ_PER_CPU
> 	 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
> 
> and then just make the suspend sequence do
> 
> 	for_each_irq_desc(irq, desc) {
> 		.. check desc if we should disable it ..
> 		disable_irq(irq);
> 		desc->status |= IRQ_SUSPENDED;
> 	}
> 
> and the resume sequence do
> 
> 	for_each_irq_desc(irq, desc) {
> 		if (!(desc->status & IRQ_SUSPENDED))
> 			continue;
> 		desc->status &= ~IRQ_SUSPENDED;
> 		enabled_irq(irq);
> 	}
> 
> And that simplifcation then gets rid of
> 
> > +/**
> > + *	disable_device_irqs - disable all enabled interrupt lines
> > + *
> > + *	During system-wide suspend or hibernation device interrupts need to be
> > + *	disabled at the chip level and this function is provided for this
> > + *	purpose.  It disables all interrupt lines that are enabled at the
> > + *	moment and saves their numbers for enable_device_irqs().
> > + */
> > +int disable_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +		struct disabled_irq *resume_irq;
> > +		struct irqaction *action;
> > +		bool is_timer_irq;
> > +
> > +		resume_irq = kzalloc(sizeof(*resume_irq), GFP_NOIO);
> > +		if (!resume_irq) {
> > +			enable_device_irqs();
> > +			return -ENOMEM;
> > +		}
> 
> this just goes away.
> 
> > +		is_timer_irq = false;
> > +		action = desc->action;
> > +		while (action) {
> > +			if (action->flags | IRQF_TIMER) {
> > +				is_timer_irq = true;
> > +				break;
> > +			}
> > +			action = action->next;
> > +		}
> 
> This is also pointless and wrong (and buggy). You should use '&' to 
> test that flag, not '|',

Ouch, sorry.

> but more importantly, if you share interrupts with a timer irq, there's
> nothing sane the irq layer can do ANYWAY, so just ignore the whole problem.
> Just look at the first one, don't try to be clever, because your clever code
> doesn't buy anything at all. 
> 
> So get rid of the loop, and just do
> 
> 	if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> 		desc->depth++;
> 		desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> 		desc->chip->disable(irq);
> 	}
> 	spin_unlock_irqrestore(&desc->lock, flags);
> 
> and you're done.

OK

> Also, I'd actually suggest that the whole "synchronize_irq()" be handled 
> in a separate loop after the main one, so make that one just be
> 
> 	for_each_irq_desc(irq, desc) {
> 		if (desc->status & IRQ_SUSPENDED)
> 			serialize_irq(irq);
> 	}
> 
> at the end. No need for desc->lock, since the IRQ_SUSPENDED bit is stable.	

OK

> Finally:
> 
> > +extern int disable_device_irqs(void);
> > +extern void enable_device_irqs(void);
> 
> I think the naming is not great. It's not about disable/enable, it's very 
> much about suspend/resume. In your version, it had that global 
> "disabled_irq" list, and in mine it has that IRQ_SUSPENDED bit - and in 
> both cases you can't nest things, and you can't consider them in any way 
> "generic" enable/disable things, they are very specialized "shut up 
> everything but the timer irq".

OK, would 

extern void suspend_device_irqs(void);
extern void resume_device_irqs(void);

be better?

> I also don't think there is any reasonable error case, so just make the 
> "suspend" thing return 'void', and don't complicate the caller. We don't 
> error out on the simple "disable_irq()" either. It's a imperative 
> statement, not a "please can you try to do that" thing.

The error is there just because the memory allocation can fail.  With the
IRQ_SUSPENDED flag as per your suggestion it won't be necessary any more.

Thanks a lot for your comments, I'll send an updated patch shortly.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 21:07       ` Linus Torvalds
  (?)
  (?)
@ 2009-02-22 22:42       ` Adrian Bunk
  -1 siblings, 0 replies; 373+ messages in thread
From: Adrian Bunk @ 2009-02-22 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Sun, Feb 22, 2009 at 01:07:28PM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 22 Feb 2009, Adrian Bunk wrote:
> > ...
> >   MODPOST 2586 modules
> > ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
> > ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
> > make[2]: *** [__modpost] Error 1
> 
> Ahh. device_power_[down|up] were EXPORT_SYMBOL_GPL, so now that we've 
> split them, so must sysdev_[suspend|resume] be.
> 
> Does this fix it?


Thanks, works fine.


> 		Linus
> ---
>  drivers/base/sys.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/sys.c b/drivers/base/sys.c
> index c98c31e..ef2055e 100644
> --- a/drivers/base/sys.c
> +++ b/drivers/base/sys.c
> @@ -432,6 +432,7 @@ aux_driver:
>  	}
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(sysdev_suspend);
>  
>  
>  /**
> @@ -463,6 +464,7 @@ int sysdev_resume(void)
>  	}
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(sysdev_resume);
>  
>  
>  int __init system_bus_init(void)

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 21:07       ` Linus Torvalds
                         ` (2 preceding siblings ...)
  (?)
@ 2009-02-22 22:42       ` Adrian Bunk
  -1 siblings, 0 replies; 373+ messages in thread
From: Adrian Bunk @ 2009-02-22 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Sun, Feb 22, 2009 at 01:07:28PM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 22 Feb 2009, Adrian Bunk wrote:
> > ...
> >   MODPOST 2586 modules
> > ERROR: "sysdev_resume" [arch/x86/kernel/apm.ko] undefined!
> > ERROR: "sysdev_suspend" [arch/x86/kernel/apm.ko] undefined!
> > make[2]: *** [__modpost] Error 1
> 
> Ahh. device_power_[down|up] were EXPORT_SYMBOL_GPL, so now that we've 
> split them, so must sysdev_[suspend|resume] be.
> 
> Does this fix it?


Thanks, works fine.


> 		Linus
> ---
>  drivers/base/sys.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/sys.c b/drivers/base/sys.c
> index c98c31e..ef2055e 100644
> --- a/drivers/base/sys.c
> +++ b/drivers/base/sys.c
> @@ -432,6 +432,7 @@ aux_driver:
>  	}
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(sysdev_suspend);
>  
>  
>  /**
> @@ -463,6 +464,7 @@ int sysdev_resume(void)
>  	}
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(sysdev_resume);
>  
>  
>  int __init system_bus_init(void)

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 22:37 ` Eric W. Biederman
  2009-02-22 22:56   ` Benjamin Herrenschmidt
@ 2009-02-22 22:56   ` Benjamin Herrenschmidt
  2009-02-22 23:02     ` Linus Torvalds
  2 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-22 22:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Sun, 2009-02-22 at 14:37 -0800, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Moreover, the real purpose of these changes is to be able to execute the
> > "late" suspend and "early" resume device callbacks with timer interrupts
> > enabled, so that they can use mutexes etc.  However, x86 currently doesn't set
> > the IRQF_TIMER flag and I need to make it do so before going further in this
> > direction and changing the PCI PM framework to take advantage of the $subject
> > changes, for example.  So, I need to know how to modify x86 timer code so that
> > the IRQF_TIMER flag is set by it.
> 
> How does this sync with the ACPI requirement that the it's late suspend MUST
> happen with irqs disabled?

If I understand properly what the intention here is, the sysdev suspend
and later still happens with hard irqs off.

This is purely the layer between suspend and suspend_late at the driver
level that uses the above instead of hard IRQs off in order to be able
to properly order the ACPI calls vs. the driver calls.

Ben.



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 22:37 ` Eric W. Biederman
@ 2009-02-22 22:56   ` Benjamin Herrenschmidt
  2009-02-22 22:56   ` Benjamin Herrenschmidt
  2009-02-22 23:02     ` Linus Torvalds
  2 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-22 22:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

On Sun, 2009-02-22 at 14:37 -0800, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Moreover, the real purpose of these changes is to be able to execute the
> > "late" suspend and "early" resume device callbacks with timer interrupts
> > enabled, so that they can use mutexes etc.  However, x86 currently doesn't set
> > the IRQF_TIMER flag and I need to make it do so before going further in this
> > direction and changing the PCI PM framework to take advantage of the $subject
> > changes, for example.  So, I need to know how to modify x86 timer code so that
> > the IRQF_TIMER flag is set by it.
> 
> How does this sync with the ACPI requirement that the it's late suspend MUST
> happen with irqs disabled?

If I understand properly what the intention here is, the sysdev suspend
and later still happens with hard irqs off.

This is purely the layer between suspend and suspend_late at the driver
level that uses the above instead of hard IRQs off in order to be able
to properly order the ACPI calls vs. the driver calls.

Ben.

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
  2009-02-22 22:37 ` Eric W. Biederman
@ 2009-02-22 23:02     ` Linus Torvalds
  2009-02-22 22:56   ` Benjamin Herrenschmidt
  2009-02-22 23:02     ` Linus Torvalds
  2 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 23:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner



On Sun, 22 Feb 2009, Eric W. Biederman wrote:
> 
> How does this sync with the ACPI requirement that the it's late suspend MUST
> happen with irqs disabled?

All the system device suspend and the actual CPU power-off still happens 
with CPU interrupts disabled.

It's just that the regular two-phase device suspend code now runs first 
with interrupts enabled (the regular "->suspend()" callback), and then the 
second phase runs with the CPU still having interrupts on (and taking 
timer interrupts), but with the actual device interrupts disabled.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume
@ 2009-02-22 23:02     ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-22 23:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, pm list



On Sun, 22 Feb 2009, Eric W. Biederman wrote:
> 
> How does this sync with the ACPI requirement that the it's late suspend MUST
> happen with irqs disabled?

All the system device suspend and the actual CPU power-off still happens 
with CPU interrupts disabled.

It's just that the regular two-phase device suspend code now runs first 
with interrupts enabled (the regular "->suspend()" callback), and then the 
second phase runs with the CPU still having interrupts on (and taking 
timer interrupts), but with the actual device interrupts disabled.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 22:42     ` Rafael J. Wysocki
@ 2009-02-22 23:48       ` Rafael J. Wysocki
  2009-02-23  0:05         ` Linus Torvalds
                           ` (5 more replies)
  2009-02-22 23:48       ` Rafael J. Wysocki
  1 sibling, 6 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 23:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> On Sunday 22 February 2009, Linus Torvalds wrote:
> > 
> > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
[--snip--]
> 
> Thanks a lot for your comments, I'll send an updated patch shortly.

The updated patch is appended.

It has been initially tested, but requires more testing, especially with APM,
XEN, kexec jump etc.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 2)

Introduce two helper functions allowing us to disable device
interrupts (at the IO-APIC level) during suspend or hibernation
and enable them during the subsequent resume, respectively, so that
the timer interrupts are enabled while "late" suspend callbacks and
"early" resume callbacks provided by device drivers are being
executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
interrupts will be disabled (at the IO-APIC level), with the help of
the new helper function, before calling "late" suspend callbacks
provided by device drivers and analogously during resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   20 ++++++++++++----
 drivers/xen/manage.c      |   32 +++++++++++++++----------
 include/linux/interrupt.h |    3 ++
 include/linux/irq.h       |    1 
 kernel/irq/manage.c       |   57 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |   10 ++++----
 kernel/power/disk.c       |   46 +++++++++++++++++++++++++++++--------
 kernel/power/main.c       |   20 +++++++++++-----
 8 files changed, 152 insertions(+), 37 deletions(-)

Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
 	return retval;
 }
 EXPORT_SYMBOL(request_irq);
+
+#ifdef CONFIG_PM_SLEEP
+/**
+ *	suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ *	During system-wide suspend or hibernation device interrupts need to be
+ *	disabled at the chip level and this function is provided for this
+ *	purpose.  It disables all interrupt lines that are enabled at the
+ *	moment and sets the IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (!desc->depth && desc->action
+		    && !(desc->action->flags & IRQF_TIMER)) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
+			desc->chip->disable(irq);
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc) {
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
+ *
+ *	Enable all interrupt lines previously disabled by suspend_device_irqs()
+ *	that have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+		desc->status &= ~IRQ_SUSPENDED;
+		enable_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+#endif /* CONFIG_PM_SLEEP */
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -470,4 +470,7 @@ extern int early_irq_init(void);
 extern int arch_early_irq_init(void);
 extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
 
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+
 #endif
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -22,6 +22,7 @@
 #include <linux/freezer.h>
 #include <linux/vmstat.h>
 #include <linux/syscalls.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -287,17 +288,20 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
+	suspend_device_irqs();
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +309,15 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
+	resume_device_irqs();
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -22,6 +22,7 @@
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/freezer.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -214,7 +215,8 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+	suspend_device_irqs();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +227,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +257,17 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
+	resume_device_irqs();
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +346,17 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+	suspend_device_irqs();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +380,17 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
+	resume_device_irqs();
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +467,18 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+	suspend_device_irqs();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
+	resume_device_irqs();
 	device_pm_unlock();
 
 	/*
@@ -464,12 +487,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -228,6 +228,7 @@
 #include <linux/suspend.h>
 #include <linux/kthread.h>
 #include <linux/jiffies.h>
+#include <linux/interrupt.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -1190,8 +1191,11 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
+	suspend_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1213,13 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	resume_device_irqs();
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1236,10 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
+	suspend_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1249,10 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	resume_device_irqs();
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,13 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
-
-	if (!*cancelled) {
-		xen_irq_resume();
-		xen_console_resume();
-		xen_timer_resume();
-	}
 
 	return 0;
 }
@@ -108,6 +95,14 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	suspend_device_irqs();
+
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +115,17 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+	if (!cancelled) {
+		xen_irq_resume();
+		xen_console_resume();
+		xen_timer_resume();
+	}
+
+resume_devices:
+	resume_device_irqs();
+
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,7 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
+		suspend_device_irqs();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1464,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Resume_irqs;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1485,10 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Resume_irqs:
+		resume_device_irqs();
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 22:42     ` Rafael J. Wysocki
  2009-02-22 23:48       ` Rafael J. Wysocki
@ 2009-02-22 23:48       ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-22 23:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> On Sunday 22 February 2009, Linus Torvalds wrote:
> > 
> > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
[--snip--]
> 
> Thanks a lot for your comments, I'll send an updated patch shortly.

The updated patch is appended.

It has been initially tested, but requires more testing, especially with APM,
XEN, kexec jump etc.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 2)

Introduce two helper functions allowing us to disable device
interrupts (at the IO-APIC level) during suspend or hibernation
and enable them during the subsequent resume, respectively, so that
the timer interrupts are enabled while "late" suspend callbacks and
"early" resume callbacks provided by device drivers are being
executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
interrupts will be disabled (at the IO-APIC level), with the help of
the new helper function, before calling "late" suspend callbacks
provided by device drivers and analogously during resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   20 ++++++++++++----
 drivers/xen/manage.c      |   32 +++++++++++++++----------
 include/linux/interrupt.h |    3 ++
 include/linux/irq.h       |    1 
 kernel/irq/manage.c       |   57 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |   10 ++++----
 kernel/power/disk.c       |   46 +++++++++++++++++++++++++++++--------
 kernel/power/main.c       |   20 +++++++++++-----
 8 files changed, 152 insertions(+), 37 deletions(-)

Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
 	return retval;
 }
 EXPORT_SYMBOL(request_irq);
+
+#ifdef CONFIG_PM_SLEEP
+/**
+ *	suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ *	During system-wide suspend or hibernation device interrupts need to be
+ *	disabled at the chip level and this function is provided for this
+ *	purpose.  It disables all interrupt lines that are enabled at the
+ *	moment and sets the IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (!desc->depth && desc->action
+		    && !(desc->action->flags & IRQF_TIMER)) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
+			desc->chip->disable(irq);
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc) {
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
+ *
+ *	Enable all interrupt lines previously disabled by suspend_device_irqs()
+ *	that have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+		desc->status &= ~IRQ_SUSPENDED;
+		enable_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+#endif /* CONFIG_PM_SLEEP */
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -470,4 +470,7 @@ extern int early_irq_init(void);
 extern int arch_early_irq_init(void);
 extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
 
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+
 #endif
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -22,6 +22,7 @@
 #include <linux/freezer.h>
 #include <linux/vmstat.h>
 #include <linux/syscalls.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -287,17 +288,20 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
+	suspend_device_irqs();
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +309,15 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
+	resume_device_irqs();
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -22,6 +22,7 @@
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/freezer.h>
+#include <linux/interrupt.h>
 
 #include "power.h"
 
@@ -214,7 +215,8 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+	suspend_device_irqs();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +227,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +257,17 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
+	resume_device_irqs();
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +346,17 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+	suspend_device_irqs();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +380,17 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
+	resume_device_irqs();
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +467,18 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+	suspend_device_irqs();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
+	resume_device_irqs();
 	device_pm_unlock();
 
 	/*
@@ -464,12 +487,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -228,6 +228,7 @@
 #include <linux/suspend.h>
 #include <linux/kthread.h>
 #include <linux/jiffies.h>
+#include <linux/interrupt.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -1190,8 +1191,11 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
+	suspend_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1213,13 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	resume_device_irqs();
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1236,10 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
+	suspend_device_irqs();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1249,10 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+	resume_device_irqs();
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,13 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
-
-	if (!*cancelled) {
-		xen_irq_resume();
-		xen_console_resume();
-		xen_timer_resume();
-	}
 
 	return 0;
 }
@@ -108,6 +95,14 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	suspend_device_irqs();
+
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +115,17 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+	if (!cancelled) {
+		xen_irq_resume();
+		xen_console_resume();
+		xen_timer_resume();
+	}
+
+resume_devices:
+	resume_device_irqs();
+
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,7 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
+		suspend_device_irqs();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1464,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Resume_irqs;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1485,10 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Resume_irqs:
+		resume_device_irqs();
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 23:48       ` Rafael J. Wysocki
  2009-02-23  0:05         ` Linus Torvalds
@ 2009-02-23  0:05         ` Linus Torvalds
  2009-02-23  1:23             ` Linus Torvalds
  2009-02-23  3:04         ` Eric W. Biederman
                           ` (3 subsequent siblings)
  5 siblings, 1 reply; 373+ messages in thread
From: Linus Torvalds @ 2009-02-23  0:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner



On Mon, 23 Feb 2009, Rafael J. Wysocki wrote:
> 
> The updated patch is appended.

Ok, looks sane to me. I'll try it on my poor eeepc, although right now 
Fedora-11 rawhide (that poor laptop gets _all_ the crazy stuff thrown at 
it, and runs btrfs to boot) has broken something in X by enabling DRI2, so 
I may not get around to it today.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 23:48       ` Rafael J. Wysocki
@ 2009-02-23  0:05         ` Linus Torvalds
  2009-02-23  0:05         ` Linus Torvalds
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-23  0:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Mon, 23 Feb 2009, Rafael J. Wysocki wrote:
> 
> The updated patch is appended.

Ok, looks sane to me. I'll try it on my poor eeepc, although right now 
Fedora-11 rawhide (that poor laptop gets _all_ the crazy stuff thrown at 
it, and runs btrfs to boot) has broken something in X by enabling DRI2, so 
I may not get around to it today.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  0:05         ` Linus Torvalds
@ 2009-02-23  1:23             ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-23  1:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner



On Sun, 22 Feb 2009, Linus Torvalds wrote:
> 
> Ok, looks sane to me. I'll try it on my poor eeepc, although right now 
> Fedora-11 rawhide (that poor laptop gets _all_ the crazy stuff thrown at 
> it, and runs btrfs to boot) has broken something in X by enabling DRI2, so 
> I may not get around to it today.

Well, the suspend/resume part seems to work for me. My X issues keep my 
from testing it with compiz, but here's an ack for v2 of the 2/2 patch at 
least on my EeePC.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-23  1:23             ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-23  1:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Sun, 22 Feb 2009, Linus Torvalds wrote:
> 
> Ok, looks sane to me. I'll try it on my poor eeepc, although right now 
> Fedora-11 rawhide (that poor laptop gets _all_ the crazy stuff thrown at 
> it, and runs btrfs to boot) has broken something in X by enabling DRI2, so 
> I may not get around to it today.

Well, the suspend/resume part seems to work for me. My X issues keep my 
from testing it with compiz, but here's an ack for v2 of the 2/2 patch at 
least on my EeePC.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 23:48       ` Rafael J. Wysocki
  2009-02-23  0:05         ` Linus Torvalds
  2009-02-23  0:05         ` Linus Torvalds
@ 2009-02-23  3:04         ` Eric W. Biederman
  2009-02-23  8:44           ` Ingo Molnar
  2009-02-23  8:44           ` Ingo Molnar
  2009-02-23  3:04         ` Eric W. Biederman
                           ` (2 subsequent siblings)
  5 siblings, 2 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23  3:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, LKML, Ingo Molnar, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Sunday 22 February 2009, Rafael J. Wysocki wrote:
>> On Sunday 22 February 2009, Linus Torvalds wrote:
>> > 
>> > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> [--snip--]
>> 
>> Thanks a lot for your comments, I'll send an updated patch shortly.
>
> The updated patch is appended.
>
> It has been initially tested, but requires more testing, especially with APM,
> XEN, kexec jump etc.
>
> Thanks,
> Rafael
>
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Rework handling of interrupts during suspend-resume (rev. 2)
>
> Introduce two helper functions allowing us to disable device
> interrupts (at the IO-APIC level) during suspend or hibernation
> and enable them during the subsequent resume, respectively, so that
> the timer interrupts are enabled while "late" suspend callbacks and
> "early" resume callbacks provided by device drivers are being
> executed.
>
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> interrupts will be disabled (at the IO-APIC level), with the help of
> the new helper function, before calling "late" suspend callbacks
> provided by device drivers and analogously during resume.

I don't have an issue with the code, but I do have an issue with
this description of it.

Calling disable especially for ioapics does nothing directly.
It simply arranges for the irq to be marked pending and for the
irq to be masked if the irq happens.

So what you are doing is arranging so that no interrupts will be
delivered to drivers.  Not really disabling interrupts at the IO-APIC
level.

In addition not all interrupts (even on x86) go through an IO-APIC anymore
so describing the patch in terms of an IO-APIC makes it a bit hard to
understand what your intent actually is.


Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 23:48       ` Rafael J. Wysocki
                           ` (2 preceding siblings ...)
  2009-02-23  3:04         ` Eric W. Biederman
@ 2009-02-23  3:04         ` Eric W. Biederman
  2009-02-23  8:36         ` Ingo Molnar
  2009-02-23  8:36         ` Ingo Molnar
  5 siblings, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23  3:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Sunday 22 February 2009, Rafael J. Wysocki wrote:
>> On Sunday 22 February 2009, Linus Torvalds wrote:
>> > 
>> > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> [--snip--]
>> 
>> Thanks a lot for your comments, I'll send an updated patch shortly.
>
> The updated patch is appended.
>
> It has been initially tested, but requires more testing, especially with APM,
> XEN, kexec jump etc.
>
> Thanks,
> Rafael
>
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Rework handling of interrupts during suspend-resume (rev. 2)
>
> Introduce two helper functions allowing us to disable device
> interrupts (at the IO-APIC level) during suspend or hibernation
> and enable them during the subsequent resume, respectively, so that
> the timer interrupts are enabled while "late" suspend callbacks and
> "early" resume callbacks provided by device drivers are being
> executed.
>
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> interrupts will be disabled (at the IO-APIC level), with the help of
> the new helper function, before calling "late" suspend callbacks
> provided by device drivers and analogously during resume.

I don't have an issue with the code, but I do have an issue with
this description of it.

Calling disable especially for ioapics does nothing directly.
It simply arranges for the irq to be marked pending and for the
irq to be masked if the irq happens.

So what you are doing is arranging so that no interrupts will be
delivered to drivers.  Not really disabling interrupts at the IO-APIC
level.

In addition not all interrupts (even on x86) go through an IO-APIC anymore
so describing the patch in terms of an IO-APIC makes it a bit hard to
understand what your intent actually is.


Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 23:48       ` Rafael J. Wysocki
                           ` (4 preceding siblings ...)
  2009-02-23  8:36         ` Ingo Molnar
@ 2009-02-23  8:36         ` Ingo Molnar
  2009-02-23 11:29           ` Rafael J. Wysocki
  2009-02-23 11:29           ` Rafael J. Wysocki
  5 siblings, 2 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23  8:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, LKML, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> > On Sunday 22 February 2009, Linus Torvalds wrote:
> > > 
> > > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> [--snip--]
> > 
> > Thanks a lot for your comments, I'll send an updated patch shortly.
> 
> The updated patch is appended.
> 
> It has been initially tested, but requires more testing, 
> especially with APM, XEN, kexec jump etc.

>  arch/x86/kernel/apm_32.c  |   20 ++++++++++++----
>  drivers/xen/manage.c      |   32 +++++++++++++++----------
>  include/linux/interrupt.h |    3 ++
>  include/linux/irq.h       |    1 
>  kernel/irq/manage.c       |   57 ++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/kexec.c            |   10 ++++----
>  kernel/power/disk.c       |   46 +++++++++++++++++++++++++++++--------
>  kernel/power/main.c       |   20 +++++++++++-----
>  8 files changed, 152 insertions(+), 37 deletions(-)
> 
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
>  	return retval;
>  }
>  EXPORT_SYMBOL(request_irq);
> +
> +#ifdef CONFIG_PM_SLEEP
> +/**
> + *	suspend_device_irqs - disable all currently enabled interrupt lines

Code placement nit: please dont put new #ifdef blocks into the 
core IRQ code, add a kernel/irq/power.c file instead and make 
the kbuild rule depend on PM_SLEEP.

The new suspend_device_irqs() and resume_device_irqs() doesnt 
use any manage.c internals so this should work straight away.

> + *
> + *	During system-wide suspend or hibernation device interrupts need to be
> + *	disabled at the chip level and this function is provided for this
> + *	purpose.  It disables all interrupt lines that are enabled at the
> + *	moment and sets the IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +
> +		if (!desc->depth && desc->action
> +		    && !(desc->action->flags & IRQF_TIMER)) {
> +			desc->depth++;
> +			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> +			desc->chip->disable(irq);
> +		}
> +
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +	}
> +
> +	for_each_irq_desc(irq, desc) {
> +		if (desc->status & IRQ_SUSPENDED)
> +			synchronize_irq(irq);
> +	}

Optimization/code-flow nit: a possibility might be to do a 
single loop, i.e. i think it's safe to couple the disable+sync 
bits [as in 99.99% of the cases there will be no in-execution 
irq handlers when we execute this.]

Something like:

		int do_sync = 0;

		spin_lock_irqsave(&desc->lock, flags);

		if (!desc->depth && desc->action
		    && !(desc->action->flags & IRQF_TIMER)) {

			desc->depth++;
			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
			desc->chip->disable(irq);
			do_sync = 1;
		}

		spin_unlock_irqrestore(&desc->lock, flags);

		if (do_sync)
			synchronize_irq(irq);

In fact i'd suggest to factor out this logic into a separate 
__suspend_irq(irq) / __resume_irq(irq) inline helper functions. 
(They should be inline for the time being as they are not 
shared-irq-safe so they shouldnt really be exposed to drivers in 
such a singular capacity.)

Doing so will also fix the line-break ugliness of the first 
branch - as in a standalone function the condition fits into a 
single line.

There's a performance reason as well: especially when we have a 
lot of IRQ descriptors that will be about twice as fast. (with a 
large iteration scope this function is cachemiss-limited and 
doing this passes doubles the cachemiss rate.)

> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> +
> +/**
> + *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
> + *
> + *	Enable all interrupt lines previously disabled by suspend_device_irqs()
> + *	that have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		if (!(desc->status & IRQ_SUSPENDED))
> +			continue;
> +		desc->status &= ~IRQ_SUSPENDED;
> +		enable_irq(irq);
> +	}

Robustness+optimization nit: this will work but could be done in 
a nicer way: enable_irq() should auto-clear IRQ_SUSPENDED. (We 
already clear flags there so it's even a tiny bit faster this 
way.)

We definitely dont want IRQ_SUSPENDED to 'leak' out into an 
enabled line, should something call enable_irq() on a suspended 
line. So either make it auto-unsuspend in enable_irq(), or add 
an extra WARN_ON() to enable_irq(), to make sure IRQ_SUSPENDED 
is always off by that time.

> +     arch_suspend_disable_irqs();
> +     BUG_ON(!irqs_disabled());

Please. We just disabled all devices - a BUG_ON() is a very 
counter-productive thing to do here - chances are the user will 
never see anything but a hang. So please turn this into a nice 
WARN_ONCE().

> --- linux-2.6.orig/include/linux/interrupt.h
> +++ linux-2.6/include/linux/interrupt.h
> @@ -470,4 +470,7 @@ extern int early_irq_init(void);
>  extern int arch_early_irq_init(void);
>  extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
>  
> +extern void suspend_device_irqs(void);
> +extern void resume_device_irqs(void);

Header cleanliness nit: please dont just throw new prototypes to 
the tail of headers, but think about where they fit in best, 
logically.

These two new prototypes should go straight after the normal irq 
line state management functions:

  extern void disable_irq_nosync(unsigned int irq);
  extern void disable_irq(unsigned int irq);
  extern void enable_irq(unsigned int irq);

Perhaps also with a comment like this:

/*
 * Note: dont use these functions in driver code - they are for 
 * core kernel use only.
 */

> +++ linux-2.6/kernel/power/main.c
[...]
> +
> + Unlock:
> +	resume_device_irqs();

Small drive-by style nit: while at it could you please fix the 
capitalization and the naming of the label (and all labels in 
this file)? The standard label is "out_unlock". [and 
"err_unlock" for failure cases - but this isnt a failure case.]

There's 43 such bad label names in kernel/power/*.c, see the 
output of:

  git grep '^ [A-Z][a-z].*:$' kernel/power/

> Index: linux-2.6/arch/x86/kernel/apm_32.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> +++ linux-2.6/arch/x86/kernel/apm_32.c

> +
> +	suspend_device_irqs();
>  	device_power_down(PMSG_SUSPEND);
> +
> +	local_irq_disable();

hm, this is a very repetitive pattern, all around the various 
suspend/resume variants. Might make sense to make:

  	device_power_down(PMSG_SUSPEND);

do the irq line disabling plus the local irq disabling 
automatically. That also means it cannot be forgotten. The 
symmetric action should happen for PMSG_RESUME.

Is there ever a case where we want a different pattern?

> Index: linux-2.6/drivers/xen/manage.c
> ===================================================================
> --- linux-2.6.orig/drivers/xen/manage.c
> +++ linux-2.6/drivers/xen/manage.c
> @@ -39,12 +39,6 @@ static int xen_suspend(void *data)

> -	if (!*cancelled) {
> -		xen_irq_resume();
> -		xen_console_resume();
> -		xen_timer_resume();

This change needs a second look. xen_suspend() is a 
stop_machine() handler and as such executes on specific CPUs, 
and your change modifies this. OTOH, i had a look at these 
handlers and it all looks safe. Jeremy?

> +resume_devices:
> +	resume_device_irqs();

Small style nit: labels should start with a space character. 
I.e. it should be:

> + resume_devices:
> +	resume_device_irqs();

> +++ linux-2.6/kernel/kexec.c
> @@ -1454,7 +1454,7 @@ int kernel_kexec(void)
>  		if (error)
>  			goto Resume_devices;
>  		device_pm_lock();
> -		local_irq_disable();
> +		suspend_device_irqs();
>  		/* At this point, device_suspend() has been called,
>  		 * but *not* device_power_down(). We *must*
>  		 * device_power_down() now.  Otherwise, drivers for
> @@ -1464,8 +1464,9 @@ int kernel_kexec(void)
>  		 */
>  		error = device_power_down(PMSG_FREEZE);
>  		if (error)
> -			goto Enable_irqs;
> +			goto Resume_irqs;
>  
> +		local_irq_disable();
>  		/* Suspend system devices */
>  		error = sysdev_suspend(PMSG_FREEZE);
>  		if (error)
> @@ -1484,9 +1485,10 @@ int kernel_kexec(void)
>  	if (kexec_image->preserve_context) {
>  		sysdev_resume();
>   Power_up_devices:
> -		device_power_up(PMSG_RESTORE);
> - Enable_irqs:
>  		local_irq_enable();
> +		device_power_up(PMSG_RESTORE);
> + Resume_irqs:
> +		resume_device_irqs();
>  		device_pm_unlock();
>  		enable_nonboot_cpus();
>   Resume_devices:

(same comment about label style applies here too.)

> Index: linux-2.6/include/linux/irq.h
> ===================================================================
> --- linux-2.6.orig/include/linux/irq.h
> +++ linux-2.6/include/linux/irq.h
> @@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
>  #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
>  #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
>  #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
> +#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
>  
>  #ifdef CONFIG_IRQ_PER_CPU
>  # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

Note, you should probably make PM_SLEEP depend on 
GENERIC_HARDIRQS - as this change will break the build on all 
non-genirq architectures. (sparc, alpha, etc.)

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 23:48       ` Rafael J. Wysocki
                           ` (3 preceding siblings ...)
  2009-02-23  3:04         ` Eric W. Biederman
@ 2009-02-23  8:36         ` Ingo Molnar
  2009-02-23  8:36         ` Ingo Molnar
  5 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23  8:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> > On Sunday 22 February 2009, Linus Torvalds wrote:
> > > 
> > > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> [--snip--]
> > 
> > Thanks a lot for your comments, I'll send an updated patch shortly.
> 
> The updated patch is appended.
> 
> It has been initially tested, but requires more testing, 
> especially with APM, XEN, kexec jump etc.

>  arch/x86/kernel/apm_32.c  |   20 ++++++++++++----
>  drivers/xen/manage.c      |   32 +++++++++++++++----------
>  include/linux/interrupt.h |    3 ++
>  include/linux/irq.h       |    1 
>  kernel/irq/manage.c       |   57 ++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/kexec.c            |   10 ++++----
>  kernel/power/disk.c       |   46 +++++++++++++++++++++++++++++--------
>  kernel/power/main.c       |   20 +++++++++++-----
>  8 files changed, 152 insertions(+), 37 deletions(-)
> 
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
>  	return retval;
>  }
>  EXPORT_SYMBOL(request_irq);
> +
> +#ifdef CONFIG_PM_SLEEP
> +/**
> + *	suspend_device_irqs - disable all currently enabled interrupt lines

Code placement nit: please dont put new #ifdef blocks into the 
core IRQ code, add a kernel/irq/power.c file instead and make 
the kbuild rule depend on PM_SLEEP.

The new suspend_device_irqs() and resume_device_irqs() doesnt 
use any manage.c internals so this should work straight away.

> + *
> + *	During system-wide suspend or hibernation device interrupts need to be
> + *	disabled at the chip level and this function is provided for this
> + *	purpose.  It disables all interrupt lines that are enabled at the
> + *	moment and sets the IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +
> +		if (!desc->depth && desc->action
> +		    && !(desc->action->flags & IRQF_TIMER)) {
> +			desc->depth++;
> +			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> +			desc->chip->disable(irq);
> +		}
> +
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +	}
> +
> +	for_each_irq_desc(irq, desc) {
> +		if (desc->status & IRQ_SUSPENDED)
> +			synchronize_irq(irq);
> +	}

Optimization/code-flow nit: a possibility might be to do a 
single loop, i.e. i think it's safe to couple the disable+sync 
bits [as in 99.99% of the cases there will be no in-execution 
irq handlers when we execute this.]

Something like:

		int do_sync = 0;

		spin_lock_irqsave(&desc->lock, flags);

		if (!desc->depth && desc->action
		    && !(desc->action->flags & IRQF_TIMER)) {

			desc->depth++;
			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
			desc->chip->disable(irq);
			do_sync = 1;
		}

		spin_unlock_irqrestore(&desc->lock, flags);

		if (do_sync)
			synchronize_irq(irq);

In fact i'd suggest to factor out this logic into a separate 
__suspend_irq(irq) / __resume_irq(irq) inline helper functions. 
(They should be inline for the time being as they are not 
shared-irq-safe so they shouldnt really be exposed to drivers in 
such a singular capacity.)

Doing so will also fix the line-break ugliness of the first 
branch - as in a standalone function the condition fits into a 
single line.

There's a performance reason as well: especially when we have a 
lot of IRQ descriptors that will be about twice as fast. (with a 
large iteration scope this function is cachemiss-limited and 
doing this passes doubles the cachemiss rate.)

> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> +
> +/**
> + *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
> + *
> + *	Enable all interrupt lines previously disabled by suspend_device_irqs()
> + *	that have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		if (!(desc->status & IRQ_SUSPENDED))
> +			continue;
> +		desc->status &= ~IRQ_SUSPENDED;
> +		enable_irq(irq);
> +	}

Robustness+optimization nit: this will work but could be done in 
a nicer way: enable_irq() should auto-clear IRQ_SUSPENDED. (We 
already clear flags there so it's even a tiny bit faster this 
way.)

We definitely dont want IRQ_SUSPENDED to 'leak' out into an 
enabled line, should something call enable_irq() on a suspended 
line. So either make it auto-unsuspend in enable_irq(), or add 
an extra WARN_ON() to enable_irq(), to make sure IRQ_SUSPENDED 
is always off by that time.

> +     arch_suspend_disable_irqs();
> +     BUG_ON(!irqs_disabled());

Please. We just disabled all devices - a BUG_ON() is a very 
counter-productive thing to do here - chances are the user will 
never see anything but a hang. So please turn this into a nice 
WARN_ONCE().

> --- linux-2.6.orig/include/linux/interrupt.h
> +++ linux-2.6/include/linux/interrupt.h
> @@ -470,4 +470,7 @@ extern int early_irq_init(void);
>  extern int arch_early_irq_init(void);
>  extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
>  
> +extern void suspend_device_irqs(void);
> +extern void resume_device_irqs(void);

Header cleanliness nit: please dont just throw new prototypes to 
the tail of headers, but think about where they fit in best, 
logically.

These two new prototypes should go straight after the normal irq 
line state management functions:

  extern void disable_irq_nosync(unsigned int irq);
  extern void disable_irq(unsigned int irq);
  extern void enable_irq(unsigned int irq);

Perhaps also with a comment like this:

/*
 * Note: dont use these functions in driver code - they are for 
 * core kernel use only.
 */

> +++ linux-2.6/kernel/power/main.c
[...]
> +
> + Unlock:
> +	resume_device_irqs();

Small drive-by style nit: while at it could you please fix the 
capitalization and the naming of the label (and all labels in 
this file)? The standard label is "out_unlock". [and 
"err_unlock" for failure cases - but this isnt a failure case.]

There's 43 such bad label names in kernel/power/*.c, see the 
output of:

  git grep '^ [A-Z][a-z].*:$' kernel/power/

> Index: linux-2.6/arch/x86/kernel/apm_32.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> +++ linux-2.6/arch/x86/kernel/apm_32.c

> +
> +	suspend_device_irqs();
>  	device_power_down(PMSG_SUSPEND);
> +
> +	local_irq_disable();

hm, this is a very repetitive pattern, all around the various 
suspend/resume variants. Might make sense to make:

  	device_power_down(PMSG_SUSPEND);

do the irq line disabling plus the local irq disabling 
automatically. That also means it cannot be forgotten. The 
symmetric action should happen for PMSG_RESUME.

Is there ever a case where we want a different pattern?

> Index: linux-2.6/drivers/xen/manage.c
> ===================================================================
> --- linux-2.6.orig/drivers/xen/manage.c
> +++ linux-2.6/drivers/xen/manage.c
> @@ -39,12 +39,6 @@ static int xen_suspend(void *data)

> -	if (!*cancelled) {
> -		xen_irq_resume();
> -		xen_console_resume();
> -		xen_timer_resume();

This change needs a second look. xen_suspend() is a 
stop_machine() handler and as such executes on specific CPUs, 
and your change modifies this. OTOH, i had a look at these 
handlers and it all looks safe. Jeremy?

> +resume_devices:
> +	resume_device_irqs();

Small style nit: labels should start with a space character. 
I.e. it should be:

> + resume_devices:
> +	resume_device_irqs();

> +++ linux-2.6/kernel/kexec.c
> @@ -1454,7 +1454,7 @@ int kernel_kexec(void)
>  		if (error)
>  			goto Resume_devices;
>  		device_pm_lock();
> -		local_irq_disable();
> +		suspend_device_irqs();
>  		/* At this point, device_suspend() has been called,
>  		 * but *not* device_power_down(). We *must*
>  		 * device_power_down() now.  Otherwise, drivers for
> @@ -1464,8 +1464,9 @@ int kernel_kexec(void)
>  		 */
>  		error = device_power_down(PMSG_FREEZE);
>  		if (error)
> -			goto Enable_irqs;
> +			goto Resume_irqs;
>  
> +		local_irq_disable();
>  		/* Suspend system devices */
>  		error = sysdev_suspend(PMSG_FREEZE);
>  		if (error)
> @@ -1484,9 +1485,10 @@ int kernel_kexec(void)
>  	if (kexec_image->preserve_context) {
>  		sysdev_resume();
>   Power_up_devices:
> -		device_power_up(PMSG_RESTORE);
> - Enable_irqs:
>  		local_irq_enable();
> +		device_power_up(PMSG_RESTORE);
> + Resume_irqs:
> +		resume_device_irqs();
>  		device_pm_unlock();
>  		enable_nonboot_cpus();
>   Resume_devices:

(same comment about label style applies here too.)

> Index: linux-2.6/include/linux/irq.h
> ===================================================================
> --- linux-2.6.orig/include/linux/irq.h
> +++ linux-2.6/include/linux/irq.h
> @@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
>  #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
>  #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
>  #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
> +#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
>  
>  #ifdef CONFIG_IRQ_PER_CPU
>  # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

Note, you should probably make PM_SLEEP depend on 
GENERIC_HARDIRQS - as this change will break the build on all 
non-genirq architectures. (sparc, alpha, etc.)

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  3:04         ` Eric W. Biederman
  2009-02-23  8:44           ` Ingo Molnar
@ 2009-02-23  8:44           ` Ingo Molnar
  2009-02-23  9:22             ` Eric W. Biederman
  2009-02-23  9:22             ` Eric W. Biederman
  1 sibling, 2 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23  8:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> >> On Sunday 22 February 2009, Linus Torvalds wrote:
> >> > 
> >> > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> > [--snip--]
> >> 
> >> Thanks a lot for your comments, I'll send an updated patch shortly.
> >
> > The updated patch is appended.
> >
> > It has been initially tested, but requires more testing, especially with APM,
> > XEN, kexec jump etc.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Rework handling of interrupts during suspend-resume (rev. 2)
> >
> > Introduce two helper functions allowing us to disable device 
> > interrupts (at the IO-APIC level) during suspend or 
> > hibernation and enable them during the subsequent resume, 
> > respectively, so that the timer interrupts are enabled while 
> > "late" suspend callbacks and "early" resume callbacks 
> > provided by device drivers are being executed.
> >
> > Use these functions to rework the handling of interrupts 
> > during suspend (hibernation) and resume.  Namely, interrupts 
> > will only be disabled on the CPU right before suspending 
> > sysdevs, while device interrupts will be disabled (at the 
> > IO-APIC level), with the help of the new helper function, 
> > before calling "late" suspend callbacks provided by device 
> > drivers and analogously during resume.
> 
> I don't have an issue with the code, but I do have an issue 
> with this description of it.
> 
> Calling disable especially for ioapics does nothing directly. 
> It simply arranges for the irq to be marked pending and for 
> the irq to be masked if the irq happens.
> 
> So what you are doing is arranging so that no interrupts will 
> be delivered to drivers.  Not really disabling interrupts at 
> the IO-APIC level.
> 
> In addition not all interrupts (even on x86) go through an 
> IO-APIC anymore so describing the patch in terms of an IO-APIC 
> makes it a bit hard to understand what your intent actually 
> is.

I think this aspect has been well-understood during the 
discussion of this topic and it's just a slightly misleading 
changelog.

The new suspend code does not rely on truly disabling IRQs on 
the low level. The purpose is to not get IRQs to drivers - which 
might crash/hang/race/misbehave.

Still, it might make sense to not just use the ->disable 
sequence but primarily the ->shutdown irqchip method (when it's 
available in the irqchip).

While we obviously cannot turn off the PIC that delivers timer 
IRQs at this stage - there's no theoretical reason why the 
suspend sequence couldnt power down some secondary PICs as well 
- in some arch code, or maybe even in the generic driver suspend 
sequence if the device tree is structured carefully enough so 
that the PIC gets turned off last.

So turning off all device IRQs in the most lowlevel way possible 
would be prudent. I.e. the suspend stage should do:

                if (desc->chip->shutdown)
                        desc->chip->shutdown(irq);
                else
                        desc->chip->disable(irq);

(there's no change needed for the resume stage)

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  3:04         ` Eric W. Biederman
@ 2009-02-23  8:44           ` Ingo Molnar
  2009-02-23  8:44           ` Ingo Molnar
  1 sibling, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23  8:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, pm list, Linus Torvalds,
	Thomas Gleixner


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> >> On Sunday 22 February 2009, Linus Torvalds wrote:
> >> > 
> >> > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> > [--snip--]
> >> 
> >> Thanks a lot for your comments, I'll send an updated patch shortly.
> >
> > The updated patch is appended.
> >
> > It has been initially tested, but requires more testing, especially with APM,
> > XEN, kexec jump etc.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Rework handling of interrupts during suspend-resume (rev. 2)
> >
> > Introduce two helper functions allowing us to disable device 
> > interrupts (at the IO-APIC level) during suspend or 
> > hibernation and enable them during the subsequent resume, 
> > respectively, so that the timer interrupts are enabled while 
> > "late" suspend callbacks and "early" resume callbacks 
> > provided by device drivers are being executed.
> >
> > Use these functions to rework the handling of interrupts 
> > during suspend (hibernation) and resume.  Namely, interrupts 
> > will only be disabled on the CPU right before suspending 
> > sysdevs, while device interrupts will be disabled (at the 
> > IO-APIC level), with the help of the new helper function, 
> > before calling "late" suspend callbacks provided by device 
> > drivers and analogously during resume.
> 
> I don't have an issue with the code, but I do have an issue 
> with this description of it.
> 
> Calling disable especially for ioapics does nothing directly. 
> It simply arranges for the irq to be marked pending and for 
> the irq to be masked if the irq happens.
> 
> So what you are doing is arranging so that no interrupts will 
> be delivered to drivers.  Not really disabling interrupts at 
> the IO-APIC level.
> 
> In addition not all interrupts (even on x86) go through an 
> IO-APIC anymore so describing the patch in terms of an IO-APIC 
> makes it a bit hard to understand what your intent actually 
> is.

I think this aspect has been well-understood during the 
discussion of this topic and it's just a slightly misleading 
changelog.

The new suspend code does not rely on truly disabling IRQs on 
the low level. The purpose is to not get IRQs to drivers - which 
might crash/hang/race/misbehave.

Still, it might make sense to not just use the ->disable 
sequence but primarily the ->shutdown irqchip method (when it's 
available in the irqchip).

While we obviously cannot turn off the PIC that delivers timer 
IRQs at this stage - there's no theoretical reason why the 
suspend sequence couldnt power down some secondary PICs as well 
- in some arch code, or maybe even in the generic driver suspend 
sequence if the device tree is structured carefully enough so 
that the PIC gets turned off last.

So turning off all device IRQs in the most lowlevel way possible 
would be prudent. I.e. the suspend stage should do:

                if (desc->chip->shutdown)
                        desc->chip->shutdown(irq);
                else
                        desc->chip->disable(irq);

(there's no change needed for the resume stage)

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  8:44           ` Ingo Molnar
  2009-02-23  9:22             ` Eric W. Biederman
@ 2009-02-23  9:22             ` Eric W. Biederman
  2009-02-23  9:44               ` Ingo Molnar
                                 ` (3 more replies)
  1 sibling, 4 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23  9:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

Ingo Molnar <mingo@elte.hu> writes:


> I think this aspect has been well-understood during the 
> discussion of this topic and it's just a slightly misleading 
> changelog.

As I was a member of that discussion I did not see that.

It took me several passes through the patches to realize
the goal is to allow drivers to be able to sleep while they
are in their late pm shutdown routines.

Why we want this I don't know.  But it seems simple enough
to implement, and it makes it harder to get the late pm
suspend routines wrong, which is always good.

> The new suspend code does not rely on truly disabling IRQs on 
> the low level. The purpose is to not get IRQs to drivers - which 
> might crash/hang/race/misbehave.

Reasonable.  I expect one of the problems with drivers getting it
wrong is that the interface is too complex for mortal humans to
understand.

> Still, it might make sense to not just use the ->disable 
> sequence but primarily the ->shutdown irqchip method (when it's 
> available in the irqchip).

Disable seems fine to me.  This is interesting in the context
of all of the irqs that will when masked show up somewhere
else (think boot interrupts).

> While we obviously cannot turn off the PIC that delivers timer 
> IRQs at this stage - there's no theoretical reason why the 
> suspend sequence couldnt power down some secondary PICs as well 
> - in some arch code, or maybe even in the generic driver suspend 
> sequence if the device tree is structured carefully enough so 
> that the PIC gets turned off last.

If the point is simply to prevent deliver of irqs to the drivers
I don't see the point of anything more than what the patch does
now.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  8:44           ` Ingo Molnar
@ 2009-02-23  9:22             ` Eric W. Biederman
  2009-02-23  9:22             ` Eric W. Biederman
  1 sibling, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23  9:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, pm list, Linus Torvalds,
	Thomas Gleixner

Ingo Molnar <mingo@elte.hu> writes:


> I think this aspect has been well-understood during the 
> discussion of this topic and it's just a slightly misleading 
> changelog.

As I was a member of that discussion I did not see that.

It took me several passes through the patches to realize
the goal is to allow drivers to be able to sleep while they
are in their late pm shutdown routines.

Why we want this I don't know.  But it seems simple enough
to implement, and it makes it harder to get the late pm
suspend routines wrong, which is always good.

> The new suspend code does not rely on truly disabling IRQs on 
> the low level. The purpose is to not get IRQs to drivers - which 
> might crash/hang/race/misbehave.

Reasonable.  I expect one of the problems with drivers getting it
wrong is that the interface is too complex for mortal humans to
understand.

> Still, it might make sense to not just use the ->disable 
> sequence but primarily the ->shutdown irqchip method (when it's 
> available in the irqchip).

Disable seems fine to me.  This is interesting in the context
of all of the irqs that will when masked show up somewhere
else (think boot interrupts).

> While we obviously cannot turn off the PIC that delivers timer 
> IRQs at this stage - there's no theoretical reason why the 
> suspend sequence couldnt power down some secondary PICs as well 
> - in some arch code, or maybe even in the generic driver suspend 
> sequence if the device tree is structured carefully enough so 
> that the PIC gets turned off last.

If the point is simply to prevent deliver of irqs to the drivers
I don't see the point of anything more than what the patch does
now.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  9:22             ` Eric W. Biederman
@ 2009-02-23  9:44               ` Ingo Molnar
  2009-02-23 10:42                 ` Eric W. Biederman
  2009-02-23 10:42                 ` Eric W. Biederman
  2009-02-23  9:44               ` Ingo Molnar
                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23  9:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > I think this aspect has been well-understood during the 
> > discussion of this topic and it's just a slightly misleading 
> > changelog.
> 
> As I was a member of that discussion I did not see that.
> 
> It took me several passes through the patches to realize the 
> goal is to allow drivers to be able to sleep while they are in 
> their late pm shutdown routines.
> 
> Why we want this I don't know.  But it seems simple enough to 
> implement, and it makes it harder to get the late pm suspend 
> routines wrong, which is always good.

That's not the only goal. The other goal is to further shrink a 
particular window of suspend fragility: the irqs-disabled stage 
of the suspend/resume sequence.

Since suspend/resume is a mini-reboot sequence, there's a large 
amount of code executed - and the variety of code is large as 
well. We had repeat cases of random drivers re-enabling 
interrupts and thus breaking other drivers - and these are nasty 
to debug.

So this patchset disables device IRQs centrally and serializes 
with pending work - so there's no races with pending IRQs 
anymore.

The fact that we keep the timer irq running is two-fold: firstly 
the timer code is special and not really part of the regular 
suspend/resume sequence.

Drivers want to take timestamps, sometimes they even want to do 
a small usleep(), etc. Ideally the suspend/resume code is pretty 
much _the same_ as a regular bootup (and shutdown) code - so we 
want to provide a similar environment to how drivers initialize 
and deinitialize, and we want to enable them to share code 
between bootup/shutdown and suspend/resume agressively.

So the more generic kernel environment we give these fragile 
handlers, the better we are off in the end. Since we already had 
IRQS_TIMER, that was just the natural thing to do.

> > The new suspend code does not rely on truly disabling IRQs 
> > on the low level. The purpose is to not get IRQs to drivers 
> > - which might crash/hang/race/misbehave.
> 
> Reasonable.  I expect one of the problems with drivers getting 
> it wrong is that the interface is too complex for mortal 
> humans to understand.

The suspend/resume state machine certainly used to be a piece of 
code that makes a seasoned kernel developer weep in fear.

That has changed drastically in the past few months. The 
suspend+hibernation logic got unified (at least as far as driver 
methods go), and all the flow and ordering has been cleaned up 
and has been made more robust.

What makes s2ram fragile is not human failure but the 
combination of a handful of physical property:

1) Psychology: shutting the lid or pushing the suspend button is 
   a deceivingly 'simple' action to the user. But under the 
   hood, a ton of stuff happens: we deinitialize a lot of 
   things, we go through _all hardware state_, and we do so in a 
   serial fashion. If just one piece fails to do the right 
   thing, the box might not resume. Still, the user expects this 
   'simple' thing to just work, all the time. No excuses 
   accepted.

2) Length of code: To get a successful s2ram sequence the kernel
   runs through tens of thousands of lines of code. Code which
   never gets executed on a normal box - only if we s2ram. If 
   just one step fails, we get a hung box.

3) Debuggability: a lot of s2ram code runs with the console off, 
   making any bugs hard to debug. Furthermore we have no 
   meaningful persistent storage either for kernel bug messages. 
   The RTC trick of PM_DEBUG works but is a very narrow channel 
   of information and it takes a lot of time to debug a bug via 
   that method.

The combination of these factors really makes up for a perfect 
storm in terms of kernel technology: we have this 
very-deceivingly-simple-looking but complex-and-rarely-executed 
piece of code, which is very hard to debug.

Even just one of these factors would be enough to make an 
otherwise healthy subsystem fragile - no wonder s2ram has been a 
problem ever since it existed in the upstream kernel.

So now we need just one thing: patience and more of the same 
good stuff that happened lately.

> > Still, it might make sense to not just use the ->disable 
> > sequence but primarily the ->shutdown irqchip method (when 
> > it's available in the irqchip).
> 
> Disable seems fine to me.  This is interesting in the context 
> of all of the irqs that will when masked show up somewhere 
> else (think boot interrupts).
> 
> > While we obviously cannot turn off the PIC that delivers 
> > timer IRQs at this stage - there's no theoretical reason why 
> > the suspend sequence couldnt power down some secondary PICs 
> > as well - in some arch code, or maybe even in the generic 
> > driver suspend sequence if the device tree is structured 
> > carefully enough so that the PIC gets turned off last.
> 
> If the point is simply to prevent deliver of irqs to the 
> drivers I don't see the point of anything more than what the 
> patch does now.

... except for the usecase i described above. Say some PIC sits 
on a piece of silicon which gets turned off. I'm not talking 
about x86 but some custom device. We really dont want that IRQ 
line to send half of an IRQ message (un-ACK-ed) when it gets 
turned off. So physically 'suspending' all IRQ lines does make a 
certain level of long-term sense.

Especially if it's just 3 extra lines of code to the existing 
patch.

There _might_ be one downside: overhead of ->shutdown() methods. 
With a typical IRQ count on the typical netbook i doubt it's 
more than ~50 usecs combined.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  9:22             ` Eric W. Biederman
  2009-02-23  9:44               ` Ingo Molnar
@ 2009-02-23  9:44               ` Ingo Molnar
  2009-02-23 10:13               ` Benjamin Herrenschmidt
  2009-02-23 10:13               ` Benjamin Herrenschmidt
  3 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23  9:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, pm list, Linus Torvalds,
	Thomas Gleixner


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > I think this aspect has been well-understood during the 
> > discussion of this topic and it's just a slightly misleading 
> > changelog.
> 
> As I was a member of that discussion I did not see that.
> 
> It took me several passes through the patches to realize the 
> goal is to allow drivers to be able to sleep while they are in 
> their late pm shutdown routines.
> 
> Why we want this I don't know.  But it seems simple enough to 
> implement, and it makes it harder to get the late pm suspend 
> routines wrong, which is always good.

That's not the only goal. The other goal is to further shrink a 
particular window of suspend fragility: the irqs-disabled stage 
of the suspend/resume sequence.

Since suspend/resume is a mini-reboot sequence, there's a large 
amount of code executed - and the variety of code is large as 
well. We had repeat cases of random drivers re-enabling 
interrupts and thus breaking other drivers - and these are nasty 
to debug.

So this patchset disables device IRQs centrally and serializes 
with pending work - so there's no races with pending IRQs 
anymore.

The fact that we keep the timer irq running is two-fold: firstly 
the timer code is special and not really part of the regular 
suspend/resume sequence.

Drivers want to take timestamps, sometimes they even want to do 
a small usleep(), etc. Ideally the suspend/resume code is pretty 
much _the same_ as a regular bootup (and shutdown) code - so we 
want to provide a similar environment to how drivers initialize 
and deinitialize, and we want to enable them to share code 
between bootup/shutdown and suspend/resume agressively.

So the more generic kernel environment we give these fragile 
handlers, the better we are off in the end. Since we already had 
IRQS_TIMER, that was just the natural thing to do.

> > The new suspend code does not rely on truly disabling IRQs 
> > on the low level. The purpose is to not get IRQs to drivers 
> > - which might crash/hang/race/misbehave.
> 
> Reasonable.  I expect one of the problems with drivers getting 
> it wrong is that the interface is too complex for mortal 
> humans to understand.

The suspend/resume state machine certainly used to be a piece of 
code that makes a seasoned kernel developer weep in fear.

That has changed drastically in the past few months. The 
suspend+hibernation logic got unified (at least as far as driver 
methods go), and all the flow and ordering has been cleaned up 
and has been made more robust.

What makes s2ram fragile is not human failure but the 
combination of a handful of physical property:

1) Psychology: shutting the lid or pushing the suspend button is 
   a deceivingly 'simple' action to the user. But under the 
   hood, a ton of stuff happens: we deinitialize a lot of 
   things, we go through _all hardware state_, and we do so in a 
   serial fashion. If just one piece fails to do the right 
   thing, the box might not resume. Still, the user expects this 
   'simple' thing to just work, all the time. No excuses 
   accepted.

2) Length of code: To get a successful s2ram sequence the kernel
   runs through tens of thousands of lines of code. Code which
   never gets executed on a normal box - only if we s2ram. If 
   just one step fails, we get a hung box.

3) Debuggability: a lot of s2ram code runs with the console off, 
   making any bugs hard to debug. Furthermore we have no 
   meaningful persistent storage either for kernel bug messages. 
   The RTC trick of PM_DEBUG works but is a very narrow channel 
   of information and it takes a lot of time to debug a bug via 
   that method.

The combination of these factors really makes up for a perfect 
storm in terms of kernel technology: we have this 
very-deceivingly-simple-looking but complex-and-rarely-executed 
piece of code, which is very hard to debug.

Even just one of these factors would be enough to make an 
otherwise healthy subsystem fragile - no wonder s2ram has been a 
problem ever since it existed in the upstream kernel.

So now we need just one thing: patience and more of the same 
good stuff that happened lately.

> > Still, it might make sense to not just use the ->disable 
> > sequence but primarily the ->shutdown irqchip method (when 
> > it's available in the irqchip).
> 
> Disable seems fine to me.  This is interesting in the context 
> of all of the irqs that will when masked show up somewhere 
> else (think boot interrupts).
> 
> > While we obviously cannot turn off the PIC that delivers 
> > timer IRQs at this stage - there's no theoretical reason why 
> > the suspend sequence couldnt power down some secondary PICs 
> > as well - in some arch code, or maybe even in the generic 
> > driver suspend sequence if the device tree is structured 
> > carefully enough so that the PIC gets turned off last.
> 
> If the point is simply to prevent deliver of irqs to the 
> drivers I don't see the point of anything more than what the 
> patch does now.

... except for the usecase i described above. Say some PIC sits 
on a piece of silicon which gets turned off. I'm not talking 
about x86 but some custom device. We really dont want that IRQ 
line to send half of an IRQ message (un-ACK-ed) when it gets 
turned off. So physically 'suspending' all IRQ lines does make a 
certain level of long-term sense.

Especially if it's just 3 extra lines of code to the existing 
patch.

There _might_ be one downside: overhead of ->shutdown() methods. 
With a typical IRQ count on the typical netbook i doubt it's 
more than ~50 usecs combined.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  9:22             ` Eric W. Biederman
                                 ` (2 preceding siblings ...)
  2009-02-23 10:13               ` Benjamin Herrenschmidt
@ 2009-02-23 10:13               ` Benjamin Herrenschmidt
  3 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-23 10:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Rafael J. Wysocki, Linus Torvalds, LKML,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Mon, 2009-02-23 at 01:22 -0800, Eric W. Biederman wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> 
> > I think this aspect has been well-understood during the 
> > discussion of this topic and it's just a slightly misleading 
> > changelog.
> 
> As I was a member of that discussion I did not see that.
> 
> It took me several passes through the patches to realize
> the goal is to allow drivers to be able to sleep while they
> are in their late pm shutdown routines.
> 
> Why we want this I don't know.  But it seems simple enough
> to implement, and it makes it harder to get the late pm
> suspend routines wrong, which is always good.

To simplify (it's really all in the discussion we had the last few
weeks) It boils down to being able to do the proper ACPI calls (which
require core interrupts to be on, ie, ACPI uses mutexes, sleeps, etc...)
after we have saved and before we restore the PCI config space, in the
late suspend or early resume stages of devices.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  9:22             ` Eric W. Biederman
  2009-02-23  9:44               ` Ingo Molnar
  2009-02-23  9:44               ` Ingo Molnar
@ 2009-02-23 10:13               ` Benjamin Herrenschmidt
  2009-02-23 10:13               ` Benjamin Herrenschmidt
  3 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-23 10:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

On Mon, 2009-02-23 at 01:22 -0800, Eric W. Biederman wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> 
> > I think this aspect has been well-understood during the 
> > discussion of this topic and it's just a slightly misleading 
> > changelog.
> 
> As I was a member of that discussion I did not see that.
> 
> It took me several passes through the patches to realize
> the goal is to allow drivers to be able to sleep while they
> are in their late pm shutdown routines.
> 
> Why we want this I don't know.  But it seems simple enough
> to implement, and it makes it harder to get the late pm
> suspend routines wrong, which is always good.

To simplify (it's really all in the discussion we had the last few
weeks) It boils down to being able to do the proper ACPI calls (which
require core interrupts to be on, ie, ACPI uses mutexes, sleeps, etc...)
after we have saved and before we restore the PCI config space, in the
late suspend or early resume stages of devices.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  9:44               ` Ingo Molnar
@ 2009-02-23 10:42                 ` Eric W. Biederman
  2009-02-23 11:03                   ` Rafael J. Wysocki
                                     ` (3 more replies)
  2009-02-23 10:42                 ` Eric W. Biederman
  1 sibling, 4 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23 10:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Ingo Molnar <mingo@elte.hu> writes:
>> 
>> > I think this aspect has been well-understood during the 
>> > discussion of this topic and it's just a slightly misleading 
>> > changelog.
>> 
>> As I was a member of that discussion I did not see that.
>> 
>> It took me several passes through the patches to realize the 
>> goal is to allow drivers to be able to sleep while they are in 
>> their late pm shutdown routines.
>> 
>> Why we want this I don't know.  But it seems simple enough to 
>> implement, and it makes it harder to get the late pm suspend 
>> routines wrong, which is always good.
>
> That's not the only goal. The other goal is to further shrink a 
> particular window of suspend fragility: the irqs-disabled stage 
> of the suspend/resume sequence.
>
> Since suspend/resume is a mini-reboot sequence, there's a large 
> amount of code executed - and the variety of code is large as 
> well. We had repeat cases of random drivers re-enabling 
> interrupts and thus breaking other drivers - and these are nasty 
> to debug.
>
> So this patchset disables device IRQs centrally and serializes 
> with pending work - so there's no races with pending IRQs 
> anymore.
>
> The fact that we keep the timer irq running is two-fold: firstly 
> the timer code is special and not really part of the regular 
> suspend/resume sequence.
>
> Drivers want to take timestamps, sometimes they even want to do 
> a small usleep(), etc. Ideally the suspend/resume code is pretty 
> much _the same_ as a regular bootup (and shutdown) code - so we 
> want to provide a similar environment to how drivers initialize 
> and deinitialize, and we want to enable them to share code 
> between bootup/shutdown and suspend/resume agressively.
>
> So the more generic kernel environment we give these fragile 
> handlers, the better we are off in the end. Since we already had 
> IRQS_TIMER, that was just the natural thing to do.

I am all for sharing code, especially if we can factor if
we can find common factors that do the same thing.

I don't know how many times I have found drivers doing something
weird in their shutdown routines that they don't know how
to get the device out of.  The e1000 driver has shown up several
times because it likes to suspend the device on shutdown.

The fact that the methods exposed to drivers were only defined
to be usable on the s2ram/hibernate path is something I have
brought up on more than one occasion as a bad choice.

I'm really not convinced that the rational for separating
out the shutdown methods from the remove methods has
been very good.  That of we don't need to clean up the in-kernel
data structures on reboot so why do something extra that can
introduce instability.

So having been watching a smaller form of this drama on the
reboot path for several years.  Having had a device method
with fixed semantics, and not the dwm sematics of the historical
suspend routing.  I expect there is still a ways to go before
it is simple and easy for drivers to figure out what they need
to implement out of the confusing variety of possible device
methods.

>> > The new suspend code does not rely on truly disabling IRQs 
>> > on the low level. The purpose is to not get IRQs to drivers 
>> > - which might crash/hang/race/misbehave.
>> 
>> Reasonable.  I expect one of the problems with drivers getting 
>> it wrong is that the interface is too complex for mortal 
>> humans to understand.
>
> The suspend/resume state machine certainly used to be a piece of 
> code that makes a seasoned kernel developer weep in fear.
>
> That has changed drastically in the past few months. The 
> suspend+hibernation logic got unified (at least as far as driver 
> methods go), and all the flow and ordering has been cleaned up 
> and has been made more robust.

I will have to look again.  My impression is that overloading
a single method is part of what got us into this mess in the
first place.

No that I don't see things getting better. 

> What makes s2ram fragile is not human failure but the 
> combination of a handful of physical property:
>
> 1) Psychology: shutting the lid or pushing the suspend button is 
>    a deceivingly 'simple' action to the user. But under the 
>    hood, a ton of stuff happens: we deinitialize a lot of 
>    things, we go through _all hardware state_, and we do so in a 
>    serial fashion. If just one piece fails to do the right 
>    thing, the box might not resume. Still, the user expects this 
>    'simple' thing to just work, all the time. No excuses 
>    accepted.
>
> 2) Length of code: To get a successful s2ram sequence the kernel
>    runs through tens of thousands of lines of code. Code which
>    never gets executed on a normal box - only if we s2ram. If 
>    just one step fails, we get a hung box.
>
> 3) Debuggability: a lot of s2ram code runs with the console off, 
>    making any bugs hard to debug. Furthermore we have no 
>    meaningful persistent storage either for kernel bug messages. 
>    The RTC trick of PM_DEBUG works but is a very narrow channel 
>    of information and it takes a lot of time to debug a bug via 
>    that method.

Yep that is an issue.

> The combination of these factors really makes up for a perfect 
> storm in terms of kernel technology: we have this 
> very-deceivingly-simple-looking but complex-and-rarely-executed 
> piece of code, which is very hard to debug.

And much of this as you are finding with this piece of code
is how the software was designed rather then how the software
needed to be.

> Even just one of these factors would be enough to make an 
> otherwise healthy subsystem fragile - no wonder s2ram has been a 
> problem ever since it existed in the upstream kernel.
>
> So now we need just one thing: patience and more of the same 
> good stuff that happened lately.

I think there has been some good progress, and so I am happy
to be patient.  I will still mention on occasion what it
seems we are doing wrong.  Unfortunately I don't have time
to do a lot more than that.

>> > Still, it might make sense to not just use the ->disable 
>> > sequence but primarily the ->shutdown irqchip method (when 
>> > it's available in the irqchip).
>> 
>> Disable seems fine to me.  This is interesting in the context 
>> of all of the irqs that will when masked show up somewhere 
>> else (think boot interrupts).
>> 
>> > While we obviously cannot turn off the PIC that delivers 
>> > timer IRQs at this stage - there's no theoretical reason why 
>> > the suspend sequence couldnt power down some secondary PICs 
>> > as well - in some arch code, or maybe even in the generic 
>> > driver suspend sequence if the device tree is structured 
>> > carefully enough so that the PIC gets turned off last.
>> 
>> If the point is simply to prevent deliver of irqs to the 
>> drivers I don't see the point of anything more than what the 
>> patch does now.
>
> ... except for the usecase i described above. Say some PIC sits 
> on a piece of silicon which gets turned off. I'm not talking 
> about x86 but some custom device. We really dont want that IRQ 
> line to send half of an IRQ message (un-ACK-ed) when it gets 
> turned off. So physically 'suspending' all IRQ lines does make a 
> certain level of long-term sense.

Good point.  We will loose both level and edge triggered events
that occur between suspending the irqs and restoring them but
that is inevitable.  So we might as well call shutdown and totally
turn off the irqs if we can.

I don't know where in the state machine this is getting called but
I would suggest doing this before we shutdown cpus.  We are quickly
reaching the point where laptops will exceed the 8 core limit, of
lowest priority delivery mode.  And only in lowest priority delivery
mode is it possible to migrate irqs outside of the interrupt handlers.

That plus if we suspend the irqs before shutting down the cpus it means
we can safely support more vectors than a single cpu can catch.

I was a little a worried about the shutdown code path because it
requires in the worst case acking a level triggered irq when we have
it disabled, but looking at ack_apic_level that appears to be a well
tested code path.  We just can't reprogram the vector.

> There _might_ be one downside: overhead of ->shutdown() methods. 
> With a typical IRQ count on the typical netbook i doubt it's 
> more than ~50 usecs combined.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  9:44               ` Ingo Molnar
  2009-02-23 10:42                 ` Eric W. Biederman
@ 2009-02-23 10:42                 ` Eric W. Biederman
  1 sibling, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23 10:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, pm list, Linus Torvalds,
	Thomas Gleixner

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Ingo Molnar <mingo@elte.hu> writes:
>> 
>> > I think this aspect has been well-understood during the 
>> > discussion of this topic and it's just a slightly misleading 
>> > changelog.
>> 
>> As I was a member of that discussion I did not see that.
>> 
>> It took me several passes through the patches to realize the 
>> goal is to allow drivers to be able to sleep while they are in 
>> their late pm shutdown routines.
>> 
>> Why we want this I don't know.  But it seems simple enough to 
>> implement, and it makes it harder to get the late pm suspend 
>> routines wrong, which is always good.
>
> That's not the only goal. The other goal is to further shrink a 
> particular window of suspend fragility: the irqs-disabled stage 
> of the suspend/resume sequence.
>
> Since suspend/resume is a mini-reboot sequence, there's a large 
> amount of code executed - and the variety of code is large as 
> well. We had repeat cases of random drivers re-enabling 
> interrupts and thus breaking other drivers - and these are nasty 
> to debug.
>
> So this patchset disables device IRQs centrally and serializes 
> with pending work - so there's no races with pending IRQs 
> anymore.
>
> The fact that we keep the timer irq running is two-fold: firstly 
> the timer code is special and not really part of the regular 
> suspend/resume sequence.
>
> Drivers want to take timestamps, sometimes they even want to do 
> a small usleep(), etc. Ideally the suspend/resume code is pretty 
> much _the same_ as a regular bootup (and shutdown) code - so we 
> want to provide a similar environment to how drivers initialize 
> and deinitialize, and we want to enable them to share code 
> between bootup/shutdown and suspend/resume agressively.
>
> So the more generic kernel environment we give these fragile 
> handlers, the better we are off in the end. Since we already had 
> IRQS_TIMER, that was just the natural thing to do.

I am all for sharing code, especially if we can factor if
we can find common factors that do the same thing.

I don't know how many times I have found drivers doing something
weird in their shutdown routines that they don't know how
to get the device out of.  The e1000 driver has shown up several
times because it likes to suspend the device on shutdown.

The fact that the methods exposed to drivers were only defined
to be usable on the s2ram/hibernate path is something I have
brought up on more than one occasion as a bad choice.

I'm really not convinced that the rational for separating
out the shutdown methods from the remove methods has
been very good.  That of we don't need to clean up the in-kernel
data structures on reboot so why do something extra that can
introduce instability.

So having been watching a smaller form of this drama on the
reboot path for several years.  Having had a device method
with fixed semantics, and not the dwm sematics of the historical
suspend routing.  I expect there is still a ways to go before
it is simple and easy for drivers to figure out what they need
to implement out of the confusing variety of possible device
methods.

>> > The new suspend code does not rely on truly disabling IRQs 
>> > on the low level. The purpose is to not get IRQs to drivers 
>> > - which might crash/hang/race/misbehave.
>> 
>> Reasonable.  I expect one of the problems with drivers getting 
>> it wrong is that the interface is too complex for mortal 
>> humans to understand.
>
> The suspend/resume state machine certainly used to be a piece of 
> code that makes a seasoned kernel developer weep in fear.
>
> That has changed drastically in the past few months. The 
> suspend+hibernation logic got unified (at least as far as driver 
> methods go), and all the flow and ordering has been cleaned up 
> and has been made more robust.

I will have to look again.  My impression is that overloading
a single method is part of what got us into this mess in the
first place.

No that I don't see things getting better. 

> What makes s2ram fragile is not human failure but the 
> combination of a handful of physical property:
>
> 1) Psychology: shutting the lid or pushing the suspend button is 
>    a deceivingly 'simple' action to the user. But under the 
>    hood, a ton of stuff happens: we deinitialize a lot of 
>    things, we go through _all hardware state_, and we do so in a 
>    serial fashion. If just one piece fails to do the right 
>    thing, the box might not resume. Still, the user expects this 
>    'simple' thing to just work, all the time. No excuses 
>    accepted.
>
> 2) Length of code: To get a successful s2ram sequence the kernel
>    runs through tens of thousands of lines of code. Code which
>    never gets executed on a normal box - only if we s2ram. If 
>    just one step fails, we get a hung box.
>
> 3) Debuggability: a lot of s2ram code runs with the console off, 
>    making any bugs hard to debug. Furthermore we have no 
>    meaningful persistent storage either for kernel bug messages. 
>    The RTC trick of PM_DEBUG works but is a very narrow channel 
>    of information and it takes a lot of time to debug a bug via 
>    that method.

Yep that is an issue.

> The combination of these factors really makes up for a perfect 
> storm in terms of kernel technology: we have this 
> very-deceivingly-simple-looking but complex-and-rarely-executed 
> piece of code, which is very hard to debug.

And much of this as you are finding with this piece of code
is how the software was designed rather then how the software
needed to be.

> Even just one of these factors would be enough to make an 
> otherwise healthy subsystem fragile - no wonder s2ram has been a 
> problem ever since it existed in the upstream kernel.
>
> So now we need just one thing: patience and more of the same 
> good stuff that happened lately.

I think there has been some good progress, and so I am happy
to be patient.  I will still mention on occasion what it
seems we are doing wrong.  Unfortunately I don't have time
to do a lot more than that.

>> > Still, it might make sense to not just use the ->disable 
>> > sequence but primarily the ->shutdown irqchip method (when 
>> > it's available in the irqchip).
>> 
>> Disable seems fine to me.  This is interesting in the context 
>> of all of the irqs that will when masked show up somewhere 
>> else (think boot interrupts).
>> 
>> > While we obviously cannot turn off the PIC that delivers 
>> > timer IRQs at this stage - there's no theoretical reason why 
>> > the suspend sequence couldnt power down some secondary PICs 
>> > as well - in some arch code, or maybe even in the generic 
>> > driver suspend sequence if the device tree is structured 
>> > carefully enough so that the PIC gets turned off last.
>> 
>> If the point is simply to prevent deliver of irqs to the 
>> drivers I don't see the point of anything more than what the 
>> patch does now.
>
> ... except for the usecase i described above. Say some PIC sits 
> on a piece of silicon which gets turned off. I'm not talking 
> about x86 but some custom device. We really dont want that IRQ 
> line to send half of an IRQ message (un-ACK-ed) when it gets 
> turned off. So physically 'suspending' all IRQ lines does make a 
> certain level of long-term sense.

Good point.  We will loose both level and edge triggered events
that occur between suspending the irqs and restoring them but
that is inevitable.  So we might as well call shutdown and totally
turn off the irqs if we can.

I don't know where in the state machine this is getting called but
I would suggest doing this before we shutdown cpus.  We are quickly
reaching the point where laptops will exceed the 8 core limit, of
lowest priority delivery mode.  And only in lowest priority delivery
mode is it possible to migrate irqs outside of the interrupt handlers.

That plus if we suspend the irqs before shutting down the cpus it means
we can safely support more vectors than a single cpu can catch.

I was a little a worried about the shutdown code path because it
requires in the worst case acking a level triggered irq when we have
it disabled, but looking at ack_apic_level that appears to be a well
tested code path.  We just can't reprogram the vector.

> There _might_ be one downside: overhead of ->shutdown() methods. 
> With a typical IRQ count on the typical netbook i doubt it's 
> more than ~50 usecs combined.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  1:23             ` Linus Torvalds
  (?)
@ 2009-02-23 10:52             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 10:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Monday 23 February 2009, Linus Torvalds wrote:
> 
> On Sun, 22 Feb 2009, Linus Torvalds wrote:
> > 
> > Ok, looks sane to me. I'll try it on my poor eeepc, although right now 
> > Fedora-11 rawhide (that poor laptop gets _all_ the crazy stuff thrown at 
> > it, and runs btrfs to boot) has broken something in X by enabling DRI2, so 
> > I may not get around to it today.
> 
> Well, the suspend/resume part seems to work for me.

Great, thanks for testing.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 10:42                 ` Eric W. Biederman
@ 2009-02-23 11:03                   ` Rafael J. Wysocki
  2009-02-23 15:28                     ` Eric W. Biederman
  2009-02-23 15:28                     ` Eric W. Biederman
  2009-02-23 11:03                   ` Rafael J. Wysocki
                                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 11:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Monday 23 February 2009, Eric W. Biederman wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >> Ingo Molnar <mingo@elte.hu> writes:
> >> 
> >> > I think this aspect has been well-understood during the 
> >> > discussion of this topic and it's just a slightly misleading 
> >> > changelog.
> >> 
> >> As I was a member of that discussion I did not see that.
> >> 
> >> It took me several passes through the patches to realize the 
> >> goal is to allow drivers to be able to sleep while they are in 
> >> their late pm shutdown routines.
> >> 
> >> Why we want this I don't know.  But it seems simple enough to 
> >> implement, and it makes it harder to get the late pm suspend 
> >> routines wrong, which is always good.
> >
> > That's not the only goal. The other goal is to further shrink a 
> > particular window of suspend fragility: the irqs-disabled stage 
> > of the suspend/resume sequence.
> >
> > Since suspend/resume is a mini-reboot sequence, there's a large 
> > amount of code executed - and the variety of code is large as 
> > well. We had repeat cases of random drivers re-enabling 
> > interrupts and thus breaking other drivers - and these are nasty 
> > to debug.
> >
> > So this patchset disables device IRQs centrally and serializes 
> > with pending work - so there's no races with pending IRQs 
> > anymore.
> >
> > The fact that we keep the timer irq running is two-fold: firstly 
> > the timer code is special and not really part of the regular 
> > suspend/resume sequence.
> >
> > Drivers want to take timestamps, sometimes they even want to do 
> > a small usleep(), etc. Ideally the suspend/resume code is pretty 
> > much _the same_ as a regular bootup (and shutdown) code - so we 
> > want to provide a similar environment to how drivers initialize 
> > and deinitialize, and we want to enable them to share code 
> > between bootup/shutdown and suspend/resume agressively.
> >
> > So the more generic kernel environment we give these fragile 
> > handlers, the better we are off in the end. Since we already had 
> > IRQS_TIMER, that was just the natural thing to do.
> 
> I am all for sharing code, especially if we can factor if
> we can find common factors that do the same thing.
> 
> I don't know how many times I have found drivers doing something
> weird in their shutdown routines that they don't know how
> to get the device out of.  The e1000 driver has shown up several
> times because it likes to suspend the device on shutdown.
> 
> The fact that the methods exposed to drivers were only defined
> to be usable on the s2ram/hibernate path is something I have
> brought up on more than one occasion as a bad choice.
> 
> I'm really not convinced that the rational for separating
> out the shutdown methods from the remove methods has
> been very good.  That of we don't need to clean up the in-kernel
> data structures on reboot so why do something extra that can
> introduce instability.
> 
> So having been watching a smaller form of this drama on the
> reboot path for several years.  Having had a device method
> with fixed semantics, and not the dwm sematics of the historical
> suspend routing.  I expect there is still a ways to go before
> it is simple and easy for drivers to figure out what they need
> to implement out of the confusing variety of possible device
> methods.
> 
> >> > The new suspend code does not rely on truly disabling IRQs 
> >> > on the low level. The purpose is to not get IRQs to drivers 
> >> > - which might crash/hang/race/misbehave.
> >> 
> >> Reasonable.  I expect one of the problems with drivers getting 
> >> it wrong is that the interface is too complex for mortal 
> >> humans to understand.
> >
> > The suspend/resume state machine certainly used to be a piece of 
> > code that makes a seasoned kernel developer weep in fear.
> >
> > That has changed drastically in the past few months. The 
> > suspend+hibernation logic got unified (at least as far as driver 
> > methods go), and all the flow and ordering has been cleaned up 
> > and has been made more robust.
> 
> I will have to look again.  My impression is that overloading
> a single method is part of what got us into this mess in the
> first place.
> 
> No that I don't see things getting better. 
> 
> > What makes s2ram fragile is not human failure but the 
> > combination of a handful of physical property:
> >
> > 1) Psychology: shutting the lid or pushing the suspend button is 
> >    a deceivingly 'simple' action to the user. But under the 
> >    hood, a ton of stuff happens: we deinitialize a lot of 
> >    things, we go through _all hardware state_, and we do so in a 
> >    serial fashion. If just one piece fails to do the right 
> >    thing, the box might not resume. Still, the user expects this 
> >    'simple' thing to just work, all the time. No excuses 
> >    accepted.
> >
> > 2) Length of code: To get a successful s2ram sequence the kernel
> >    runs through tens of thousands of lines of code. Code which
> >    never gets executed on a normal box - only if we s2ram. If 
> >    just one step fails, we get a hung box.
> >
> > 3) Debuggability: a lot of s2ram code runs with the console off, 
> >    making any bugs hard to debug. Furthermore we have no 
> >    meaningful persistent storage either for kernel bug messages. 
> >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> >    of information and it takes a lot of time to debug a bug via 
> >    that method.
> 
> Yep that is an issue.
> 
> > The combination of these factors really makes up for a perfect 
> > storm in terms of kernel technology: we have this 
> > very-deceivingly-simple-looking but complex-and-rarely-executed 
> > piece of code, which is very hard to debug.
> 
> And much of this as you are finding with this piece of code
> is how the software was designed rather then how the software
> needed to be.
> 
> > Even just one of these factors would be enough to make an 
> > otherwise healthy subsystem fragile - no wonder s2ram has been a 
> > problem ever since it existed in the upstream kernel.
> >
> > So now we need just one thing: patience and more of the same 
> > good stuff that happened lately.
> 
> I think there has been some good progress, and so I am happy
> to be patient.  I will still mention on occasion what it
> seems we are doing wrong.  Unfortunately I don't have time
> to do a lot more than that.
> 
> >> > Still, it might make sense to not just use the ->disable 
> >> > sequence but primarily the ->shutdown irqchip method (when 
> >> > it's available in the irqchip).
> >> 
> >> Disable seems fine to me.  This is interesting in the context 
> >> of all of the irqs that will when masked show up somewhere 
> >> else (think boot interrupts).
> >> 
> >> > While we obviously cannot turn off the PIC that delivers 
> >> > timer IRQs at this stage - there's no theoretical reason why 
> >> > the suspend sequence couldnt power down some secondary PICs 
> >> > as well - in some arch code, or maybe even in the generic 
> >> > driver suspend sequence if the device tree is structured 
> >> > carefully enough so that the PIC gets turned off last.
> >> 
> >> If the point is simply to prevent deliver of irqs to the 
> >> drivers I don't see the point of anything more than what the 
> >> patch does now.
> >
> > ... except for the usecase i described above. Say some PIC sits 
> > on a piece of silicon which gets turned off. I'm not talking 
> > about x86 but some custom device. We really dont want that IRQ 
> > line to send half of an IRQ message (un-ACK-ed) when it gets 
> > turned off. So physically 'suspending' all IRQ lines does make a 
> > certain level of long-term sense.
> 
> Good point.  We will loose both level and edge triggered events
> that occur between suspending the irqs and restoring them but
> that is inevitable.  So we might as well call shutdown and totally
> turn off the irqs if we can.
> 
> I don't know where in the state machine this is getting called but
> I would suggest doing this before we shutdown cpus.

This is the plan.  In fact, I'm going to do this in the next patch after the
$subject one has been tested and found acceptable.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 10:42                 ` Eric W. Biederman
  2009-02-23 11:03                   ` Rafael J. Wysocki
@ 2009-02-23 11:03                   ` Rafael J. Wysocki
  2009-02-23 11:04                   ` Ingo Molnar
  2009-02-23 11:04                   ` Ingo Molnar
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 11:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

On Monday 23 February 2009, Eric W. Biederman wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >> Ingo Molnar <mingo@elte.hu> writes:
> >> 
> >> > I think this aspect has been well-understood during the 
> >> > discussion of this topic and it's just a slightly misleading 
> >> > changelog.
> >> 
> >> As I was a member of that discussion I did not see that.
> >> 
> >> It took me several passes through the patches to realize the 
> >> goal is to allow drivers to be able to sleep while they are in 
> >> their late pm shutdown routines.
> >> 
> >> Why we want this I don't know.  But it seems simple enough to 
> >> implement, and it makes it harder to get the late pm suspend 
> >> routines wrong, which is always good.
> >
> > That's not the only goal. The other goal is to further shrink a 
> > particular window of suspend fragility: the irqs-disabled stage 
> > of the suspend/resume sequence.
> >
> > Since suspend/resume is a mini-reboot sequence, there's a large 
> > amount of code executed - and the variety of code is large as 
> > well. We had repeat cases of random drivers re-enabling 
> > interrupts and thus breaking other drivers - and these are nasty 
> > to debug.
> >
> > So this patchset disables device IRQs centrally and serializes 
> > with pending work - so there's no races with pending IRQs 
> > anymore.
> >
> > The fact that we keep the timer irq running is two-fold: firstly 
> > the timer code is special and not really part of the regular 
> > suspend/resume sequence.
> >
> > Drivers want to take timestamps, sometimes they even want to do 
> > a small usleep(), etc. Ideally the suspend/resume code is pretty 
> > much _the same_ as a regular bootup (and shutdown) code - so we 
> > want to provide a similar environment to how drivers initialize 
> > and deinitialize, and we want to enable them to share code 
> > between bootup/shutdown and suspend/resume agressively.
> >
> > So the more generic kernel environment we give these fragile 
> > handlers, the better we are off in the end. Since we already had 
> > IRQS_TIMER, that was just the natural thing to do.
> 
> I am all for sharing code, especially if we can factor if
> we can find common factors that do the same thing.
> 
> I don't know how many times I have found drivers doing something
> weird in their shutdown routines that they don't know how
> to get the device out of.  The e1000 driver has shown up several
> times because it likes to suspend the device on shutdown.
> 
> The fact that the methods exposed to drivers were only defined
> to be usable on the s2ram/hibernate path is something I have
> brought up on more than one occasion as a bad choice.
> 
> I'm really not convinced that the rational for separating
> out the shutdown methods from the remove methods has
> been very good.  That of we don't need to clean up the in-kernel
> data structures on reboot so why do something extra that can
> introduce instability.
> 
> So having been watching a smaller form of this drama on the
> reboot path for several years.  Having had a device method
> with fixed semantics, and not the dwm sematics of the historical
> suspend routing.  I expect there is still a ways to go before
> it is simple and easy for drivers to figure out what they need
> to implement out of the confusing variety of possible device
> methods.
> 
> >> > The new suspend code does not rely on truly disabling IRQs 
> >> > on the low level. The purpose is to not get IRQs to drivers 
> >> > - which might crash/hang/race/misbehave.
> >> 
> >> Reasonable.  I expect one of the problems with drivers getting 
> >> it wrong is that the interface is too complex for mortal 
> >> humans to understand.
> >
> > The suspend/resume state machine certainly used to be a piece of 
> > code that makes a seasoned kernel developer weep in fear.
> >
> > That has changed drastically in the past few months. The 
> > suspend+hibernation logic got unified (at least as far as driver 
> > methods go), and all the flow and ordering has been cleaned up 
> > and has been made more robust.
> 
> I will have to look again.  My impression is that overloading
> a single method is part of what got us into this mess in the
> first place.
> 
> No that I don't see things getting better. 
> 
> > What makes s2ram fragile is not human failure but the 
> > combination of a handful of physical property:
> >
> > 1) Psychology: shutting the lid or pushing the suspend button is 
> >    a deceivingly 'simple' action to the user. But under the 
> >    hood, a ton of stuff happens: we deinitialize a lot of 
> >    things, we go through _all hardware state_, and we do so in a 
> >    serial fashion. If just one piece fails to do the right 
> >    thing, the box might not resume. Still, the user expects this 
> >    'simple' thing to just work, all the time. No excuses 
> >    accepted.
> >
> > 2) Length of code: To get a successful s2ram sequence the kernel
> >    runs through tens of thousands of lines of code. Code which
> >    never gets executed on a normal box - only if we s2ram. If 
> >    just one step fails, we get a hung box.
> >
> > 3) Debuggability: a lot of s2ram code runs with the console off, 
> >    making any bugs hard to debug. Furthermore we have no 
> >    meaningful persistent storage either for kernel bug messages. 
> >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> >    of information and it takes a lot of time to debug a bug via 
> >    that method.
> 
> Yep that is an issue.
> 
> > The combination of these factors really makes up for a perfect 
> > storm in terms of kernel technology: we have this 
> > very-deceivingly-simple-looking but complex-and-rarely-executed 
> > piece of code, which is very hard to debug.
> 
> And much of this as you are finding with this piece of code
> is how the software was designed rather then how the software
> needed to be.
> 
> > Even just one of these factors would be enough to make an 
> > otherwise healthy subsystem fragile - no wonder s2ram has been a 
> > problem ever since it existed in the upstream kernel.
> >
> > So now we need just one thing: patience and more of the same 
> > good stuff that happened lately.
> 
> I think there has been some good progress, and so I am happy
> to be patient.  I will still mention on occasion what it
> seems we are doing wrong.  Unfortunately I don't have time
> to do a lot more than that.
> 
> >> > Still, it might make sense to not just use the ->disable 
> >> > sequence but primarily the ->shutdown irqchip method (when 
> >> > it's available in the irqchip).
> >> 
> >> Disable seems fine to me.  This is interesting in the context 
> >> of all of the irqs that will when masked show up somewhere 
> >> else (think boot interrupts).
> >> 
> >> > While we obviously cannot turn off the PIC that delivers 
> >> > timer IRQs at this stage - there's no theoretical reason why 
> >> > the suspend sequence couldnt power down some secondary PICs 
> >> > as well - in some arch code, or maybe even in the generic 
> >> > driver suspend sequence if the device tree is structured 
> >> > carefully enough so that the PIC gets turned off last.
> >> 
> >> If the point is simply to prevent deliver of irqs to the 
> >> drivers I don't see the point of anything more than what the 
> >> patch does now.
> >
> > ... except for the usecase i described above. Say some PIC sits 
> > on a piece of silicon which gets turned off. I'm not talking 
> > about x86 but some custom device. We really dont want that IRQ 
> > line to send half of an IRQ message (un-ACK-ed) when it gets 
> > turned off. So physically 'suspending' all IRQ lines does make a 
> > certain level of long-term sense.
> 
> Good point.  We will loose both level and edge triggered events
> that occur between suspending the irqs and restoring them but
> that is inevitable.  So we might as well call shutdown and totally
> turn off the irqs if we can.
> 
> I don't know where in the state machine this is getting called but
> I would suggest doing this before we shutdown cpus.

This is the plan.  In fact, I'm going to do this in the next patch after the
$subject one has been tested and found acceptable.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 10:42                 ` Eric W. Biederman
  2009-02-23 11:03                   ` Rafael J. Wysocki
  2009-02-23 11:03                   ` Rafael J. Wysocki
@ 2009-02-23 11:04                   ` Ingo Molnar
  2009-02-23 14:45                     ` Rafael J. Wysocki
  2009-02-23 14:45                     ` Rafael J. Wysocki
  2009-02-23 11:04                   ` Ingo Molnar
  3 siblings, 2 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 11:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> > What makes s2ram fragile is not human failure but the 
> > combination of a handful of physical property:
> >
> > 1) Psychology: shutting the lid or pushing the suspend button is 
> >    a deceivingly 'simple' action to the user. But under the 
> >    hood, a ton of stuff happens: we deinitialize a lot of 
> >    things, we go through _all hardware state_, and we do so in a 
> >    serial fashion. If just one piece fails to do the right 
> >    thing, the box might not resume. Still, the user expects this 
> >    'simple' thing to just work, all the time. No excuses 
> >    accepted.
> >
> > 2) Length of code: To get a successful s2ram sequence the kernel
> >    runs through tens of thousands of lines of code. Code which
> >    never gets executed on a normal box - only if we s2ram. If 
> >    just one step fails, we get a hung box.
> >
> > 3) Debuggability: a lot of s2ram code runs with the console off, 
> >    making any bugs hard to debug. Furthermore we have no 
> >    meaningful persistent storage either for kernel bug messages. 
> >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> >    of information and it takes a lot of time to debug a bug via 
> >    that method.
> 
> Yep that is an issue.

I'd also like to add #4:

     4) One more thing that makes s2ram special is that when the 
        resume path finds hardware often in an even more 
        deinitialized form than during normal bootup. During
        normal bootup the BIOS/firmware has at least done some
        minimal bootstrap (to get the kernel loaded), which
        makes life easier for the kernel.

        At s2ram stage we've got a completely pure hardware
        init state, with very minimal firmware activation. So 
        many of the init and deinit problems and bugs we only 
        hit in the s2ram path - which dynamics is again not 
        helpful.

> > The combination of these factors really makes up for a 
> > perfect storm in terms of kernel technology: we have this 
> > very-deceivingly-simple-looking but 
> > complex-and-rarely-executed piece of code, which is very 
> > hard to debug.
> 
> And much of this as you are finding with this piece of code is 
> how the software was designed rather then how the software 
> needed to be.

Well most of the 4 problems above are externalities and cannot 
go away just by fixing the kernel.

 #1 will always be with us.
 #3 needs the hardware to change. It's happening, but slowly.
 #4 will be with us as long as there's non-Linux BIOSes

#2 is the only thing where we can make a realistic difference,
but there's just so much we can do there.

And that still leaves the other three items: each of which is 
powerful enough of a force to give a bad name to any normal 
subsystem.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 10:42                 ` Eric W. Biederman
                                     ` (2 preceding siblings ...)
  2009-02-23 11:04                   ` Ingo Molnar
@ 2009-02-23 11:04                   ` Ingo Molnar
  3 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 11:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, pm list, Linus Torvalds,
	Thomas Gleixner


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> > What makes s2ram fragile is not human failure but the 
> > combination of a handful of physical property:
> >
> > 1) Psychology: shutting the lid or pushing the suspend button is 
> >    a deceivingly 'simple' action to the user. But under the 
> >    hood, a ton of stuff happens: we deinitialize a lot of 
> >    things, we go through _all hardware state_, and we do so in a 
> >    serial fashion. If just one piece fails to do the right 
> >    thing, the box might not resume. Still, the user expects this 
> >    'simple' thing to just work, all the time. No excuses 
> >    accepted.
> >
> > 2) Length of code: To get a successful s2ram sequence the kernel
> >    runs through tens of thousands of lines of code. Code which
> >    never gets executed on a normal box - only if we s2ram. If 
> >    just one step fails, we get a hung box.
> >
> > 3) Debuggability: a lot of s2ram code runs with the console off, 
> >    making any bugs hard to debug. Furthermore we have no 
> >    meaningful persistent storage either for kernel bug messages. 
> >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> >    of information and it takes a lot of time to debug a bug via 
> >    that method.
> 
> Yep that is an issue.

I'd also like to add #4:

     4) One more thing that makes s2ram special is that when the 
        resume path finds hardware often in an even more 
        deinitialized form than during normal bootup. During
        normal bootup the BIOS/firmware has at least done some
        minimal bootstrap (to get the kernel loaded), which
        makes life easier for the kernel.

        At s2ram stage we've got a completely pure hardware
        init state, with very minimal firmware activation. So 
        many of the init and deinit problems and bugs we only 
        hit in the s2ram path - which dynamics is again not 
        helpful.

> > The combination of these factors really makes up for a 
> > perfect storm in terms of kernel technology: we have this 
> > very-deceivingly-simple-looking but 
> > complex-and-rarely-executed piece of code, which is very 
> > hard to debug.
> 
> And much of this as you are finding with this piece of code is 
> how the software was designed rather then how the software 
> needed to be.

Well most of the 4 problems above are externalities and cannot 
go away just by fixing the kernel.

 #1 will always be with us.
 #3 needs the hardware to change. It's happening, but slowly.
 #4 will be with us as long as there's non-Linux BIOSes

#2 is the only thing where we can make a realistic difference,
but there's just so much we can do there.

And that still leaves the other three items: each of which is 
powerful enough of a force to give a bad name to any normal 
subsystem.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  8:36         ` Ingo Molnar
@ 2009-02-23 11:29           ` Rafael J. Wysocki
  2009-02-23 12:28             ` Ingo Molnar
                               ` (7 more replies)
  2009-02-23 11:29           ` Rafael J. Wysocki
  1 sibling, 8 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 11:29 UTC (permalink / raw)
  To: Ingo Molnar, Johannes Berg
  Cc: Linus Torvalds, LKML, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> > > On Sunday 22 February 2009, Linus Torvalds wrote:
> > > > 
> > > > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> > [--snip--]
> > > 
> > > Thanks a lot for your comments, I'll send an updated patch shortly.
> > 
> > The updated patch is appended.
> > 
> > It has been initially tested, but requires more testing, 
> > especially with APM, XEN, kexec jump etc.
> 
> >  arch/x86/kernel/apm_32.c  |   20 ++++++++++++----
> >  drivers/xen/manage.c      |   32 +++++++++++++++----------
> >  include/linux/interrupt.h |    3 ++
> >  include/linux/irq.h       |    1 
> >  kernel/irq/manage.c       |   57 ++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/kexec.c            |   10 ++++----
> >  kernel/power/disk.c       |   46 +++++++++++++++++++++++++++++--------
> >  kernel/power/main.c       |   20 +++++++++++-----
> >  8 files changed, 152 insertions(+), 37 deletions(-)
> > 
> > Index: linux-2.6/kernel/irq/manage.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/irq/manage.c
> > +++ linux-2.6/kernel/irq/manage.c
> > @@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
> >  	return retval;
> >  }
> >  EXPORT_SYMBOL(request_irq);
> > +
> > +#ifdef CONFIG_PM_SLEEP
> > +/**
> > + *	suspend_device_irqs - disable all currently enabled interrupt lines
> 
> Code placement nit: please dont put new #ifdef blocks into the 
> core IRQ code, add a kernel/irq/power.c file instead and make 
> the kbuild rule depend on PM_SLEEP.
> 
> The new suspend_device_irqs() and resume_device_irqs() doesnt 
> use any manage.c internals so this should work straight away.

OK, I'll do that.

> > + *
> > + *	During system-wide suspend or hibernation device interrupts need to be
> > + *	disabled at the chip level and this function is provided for this
> > + *	purpose.  It disables all interrupt lines that are enabled at the
> > + *	moment and sets the IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +
> > +		if (!desc->depth && desc->action
> > +		    && !(desc->action->flags & IRQF_TIMER)) {
> > +			desc->depth++;
> > +			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> > +			desc->chip->disable(irq);
> > +		}
> > +
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> > +	}
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		if (desc->status & IRQ_SUSPENDED)
> > +			synchronize_irq(irq);
> > +	}
> 
> Optimization/code-flow nit: a possibility might be to do a 
> single loop, i.e. i think it's safe to couple the disable+sync 
> bits [as in 99.99% of the cases there will be no in-execution 
> irq handlers when we execute this.]

Well, Linus suggested to do it in a separate loop.  I'm fine with both ways.

> Something like:
> 
> 		int do_sync = 0;
> 
> 		spin_lock_irqsave(&desc->lock, flags);
> 
> 		if (!desc->depth && desc->action
> 		    && !(desc->action->flags & IRQF_TIMER)) {
> 
> 			desc->depth++;
> 			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> 			desc->chip->disable(irq);
> 			do_sync = 1;
> 		}
> 
> 		spin_unlock_irqrestore(&desc->lock, flags);
> 
> 		if (do_sync)
> 			synchronize_irq(irq);
>
> In fact i'd suggest to factor out this logic into a separate 
> __suspend_irq(irq) / __resume_irq(irq) inline helper functions. 
> (They should be inline for the time being as they are not 
> shared-irq-safe so they shouldnt really be exposed to drivers in 
> such a singular capacity.)

Good idea, I'll do it.

> Doing so will also fix the line-break ugliness of the first 
> branch - as in a standalone function the condition fits into a 
> single line.
> 
> There's a performance reason as well: especially when we have a 
> lot of IRQ descriptors that will be about twice as fast. (with a 
> large iteration scope this function is cachemiss-limited and 
> doing this passes doubles the cachemiss rate.)
> 
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
> > + *
> > + *	Enable all interrupt lines previously disabled by suspend_device_irqs()
> > + *	that have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		if (!(desc->status & IRQ_SUSPENDED))
> > +			continue;
> > +		desc->status &= ~IRQ_SUSPENDED;
> > +		enable_irq(irq);
> > +	}
> 
> Robustness+optimization nit: this will work but could be done in 
> a nicer way: enable_irq() should auto-clear IRQ_SUSPENDED. (We 
> already clear flags there so it's even a tiny bit faster this 
> way.)

OK

> We definitely dont want IRQ_SUSPENDED to 'leak' out into an 
> enabled line, should something call enable_irq() on a suspended 
> line. So either make it auto-unsuspend in enable_irq(), or add 
> an extra WARN_ON() to enable_irq(), to make sure IRQ_SUSPENDED 
> is always off by that time.
> 
> > +     arch_suspend_disable_irqs();
> > +     BUG_ON(!irqs_disabled());
> 
> Please. We just disabled all devices - a BUG_ON() is a very 
> counter-productive thing to do here - chances are the user will 
> never see anything but a hang. So please turn this into a nice 
> WARN_ONCE().

This is just moving code.  Also, the BUG_ON() can only affect powerpc and it's
there on purpose AFAICS (Johannes?).  Anyway, changing that would be a separate
patch.

> > --- linux-2.6.orig/include/linux/interrupt.h
> > +++ linux-2.6/include/linux/interrupt.h
> > @@ -470,4 +470,7 @@ extern int early_irq_init(void);
> >  extern int arch_early_irq_init(void);
> >  extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
> >  
> > +extern void suspend_device_irqs(void);
> > +extern void resume_device_irqs(void);
> 
> Header cleanliness nit: please dont just throw new prototypes to 
> the tail of headers, but think about where they fit in best, 
> logically.
> 
> These two new prototypes should go straight after the normal irq 
> line state management functions:
> 
>   extern void disable_irq_nosync(unsigned int irq);
>   extern void disable_irq(unsigned int irq);
>   extern void enable_irq(unsigned int irq);
> 
> Perhaps also with a comment like this:
> 
> /*
>  * Note: dont use these functions in driver code - they are for 
>  * core kernel use only.
>  */

OK, I'll put them in there.

> > +++ linux-2.6/kernel/power/main.c
> [...]
> > +
> > + Unlock:
> > +	resume_device_irqs();
> 
> Small drive-by style nit: while at it could you please fix the 
> capitalization and the naming of the label (and all labels in 
> this file)?

I don't think they are wrong.  They are uniform accross the file and it's
clear what they mean.

> The standard label is "out_unlock". [and "err_unlock" for failure cases
> - but this isnt a failure case.]

Where exactly is this standard defined?
 
> There's 43 such bad label names in kernel/power/*.c, see the 
> output of:
> 
>   git grep '^ [A-Z][a-z].*:$' kernel/power/

If you think they are bad, please send a patch to change them.

> > Index: linux-2.6/arch/x86/kernel/apm_32.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> > +++ linux-2.6/arch/x86/kernel/apm_32.c
> 
> > +
> > +	suspend_device_irqs();
> >  	device_power_down(PMSG_SUSPEND);
> > +
> > +	local_irq_disable();
> 
> hm, this is a very repetitive pattern, all around the various 
> suspend/resume variants. Might make sense to make:
> 
>   	device_power_down(PMSG_SUSPEND);
> 
> do the irq line disabling plus the local irq disabling 
> automatically. That also means it cannot be forgotten. The 
> symmetric action should happen for PMSG_RESUME.
> 
> Is there ever a case where we want a different pattern?

Even if there's no such case, I prefer to call local_irq_disable() explicitly
in here, so that it's clearly known where it happens to anyone reading this
code.

Doing the "late" suspend of devices and disabling interrupts on the CPU
are separate logical steps.

> > Index: linux-2.6/drivers/xen/manage.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/xen/manage.c
> > +++ linux-2.6/drivers/xen/manage.c
> > @@ -39,12 +39,6 @@ static int xen_suspend(void *data)
> 
> > -	if (!*cancelled) {
> > -		xen_irq_resume();
> > -		xen_console_resume();
> > -		xen_timer_resume();
> 
> This change needs a second look. xen_suspend() is a 
> stop_machine() handler and as such executes on specific CPUs, 
> and your change modifies this. OTOH, i had a look at these 
> handlers and it all looks safe. Jeremy?
> 
> > +resume_devices:
> > +	resume_device_irqs();
> 
> Small style nit: labels should start with a space character. 
> I.e. it should be:

I know, but the second label in there starts without a space character and
IMO keeping a uniform coding style i a single file is more important than
trying to adjust it to a broader set of rules FWIW.  I also think that coding
style changes shouldn't be mixed with functional changes as far as reasonably
possible.

> > + resume_devices:
> > +	resume_device_irqs();
> 
> > +++ linux-2.6/kernel/kexec.c
> > @@ -1454,7 +1454,7 @@ int kernel_kexec(void)
> >  		if (error)
> >  			goto Resume_devices;
> >  		device_pm_lock();
> > -		local_irq_disable();
> > +		suspend_device_irqs();
> >  		/* At this point, device_suspend() has been called,
> >  		 * but *not* device_power_down(). We *must*
> >  		 * device_power_down() now.  Otherwise, drivers for
> > @@ -1464,8 +1464,9 @@ int kernel_kexec(void)
> >  		 */
> >  		error = device_power_down(PMSG_FREEZE);
> >  		if (error)
> > -			goto Enable_irqs;
> > +			goto Resume_irqs;
> >  
> > +		local_irq_disable();
> >  		/* Suspend system devices */
> >  		error = sysdev_suspend(PMSG_FREEZE);
> >  		if (error)
> > @@ -1484,9 +1485,10 @@ int kernel_kexec(void)
> >  	if (kexec_image->preserve_context) {
> >  		sysdev_resume();
> >   Power_up_devices:
> > -		device_power_up(PMSG_RESTORE);
> > - Enable_irqs:
> >  		local_irq_enable();
> > +		device_power_up(PMSG_RESTORE);
> > + Resume_irqs:
> > +		resume_device_irqs();
> >  		device_pm_unlock();
> >  		enable_nonboot_cpus();
> >   Resume_devices:
> 
> (same comment about label style applies here too.)
> 
> > Index: linux-2.6/include/linux/irq.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/irq.h
> > +++ linux-2.6/include/linux/irq.h
> > @@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
> >  #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
> >  #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
> >  #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
> > +#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
> >  
> >  #ifdef CONFIG_IRQ_PER_CPU
> >  # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
> 
> Note, you should probably make PM_SLEEP depend on 
> GENERIC_HARDIRQS - as this change will break the build on all 
> non-genirq architectures. (sparc, alpha, etc.)

PM_SLEEP depends on ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE, which
I don't think is set on these architectures.

Thanlks a lot for your comments.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23  8:36         ` Ingo Molnar
  2009-02-23 11:29           ` Rafael J. Wysocki
@ 2009-02-23 11:29           ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 11:29 UTC (permalink / raw)
  To: Ingo Molnar, Johannes Berg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> > > On Sunday 22 February 2009, Linus Torvalds wrote:
> > > > 
> > > > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> > [--snip--]
> > > 
> > > Thanks a lot for your comments, I'll send an updated patch shortly.
> > 
> > The updated patch is appended.
> > 
> > It has been initially tested, but requires more testing, 
> > especially with APM, XEN, kexec jump etc.
> 
> >  arch/x86/kernel/apm_32.c  |   20 ++++++++++++----
> >  drivers/xen/manage.c      |   32 +++++++++++++++----------
> >  include/linux/interrupt.h |    3 ++
> >  include/linux/irq.h       |    1 
> >  kernel/irq/manage.c       |   57 ++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/kexec.c            |   10 ++++----
> >  kernel/power/disk.c       |   46 +++++++++++++++++++++++++++++--------
> >  kernel/power/main.c       |   20 +++++++++++-----
> >  8 files changed, 152 insertions(+), 37 deletions(-)
> > 
> > Index: linux-2.6/kernel/irq/manage.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/irq/manage.c
> > +++ linux-2.6/kernel/irq/manage.c
> > @@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
> >  	return retval;
> >  }
> >  EXPORT_SYMBOL(request_irq);
> > +
> > +#ifdef CONFIG_PM_SLEEP
> > +/**
> > + *	suspend_device_irqs - disable all currently enabled interrupt lines
> 
> Code placement nit: please dont put new #ifdef blocks into the 
> core IRQ code, add a kernel/irq/power.c file instead and make 
> the kbuild rule depend on PM_SLEEP.
> 
> The new suspend_device_irqs() and resume_device_irqs() doesnt 
> use any manage.c internals so this should work straight away.

OK, I'll do that.

> > + *
> > + *	During system-wide suspend or hibernation device interrupts need to be
> > + *	disabled at the chip level and this function is provided for this
> > + *	purpose.  It disables all interrupt lines that are enabled at the
> > + *	moment and sets the IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +
> > +		if (!desc->depth && desc->action
> > +		    && !(desc->action->flags & IRQF_TIMER)) {
> > +			desc->depth++;
> > +			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> > +			desc->chip->disable(irq);
> > +		}
> > +
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> > +	}
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		if (desc->status & IRQ_SUSPENDED)
> > +			synchronize_irq(irq);
> > +	}
> 
> Optimization/code-flow nit: a possibility might be to do a 
> single loop, i.e. i think it's safe to couple the disable+sync 
> bits [as in 99.99% of the cases there will be no in-execution 
> irq handlers when we execute this.]

Well, Linus suggested to do it in a separate loop.  I'm fine with both ways.

> Something like:
> 
> 		int do_sync = 0;
> 
> 		spin_lock_irqsave(&desc->lock, flags);
> 
> 		if (!desc->depth && desc->action
> 		    && !(desc->action->flags & IRQF_TIMER)) {
> 
> 			desc->depth++;
> 			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> 			desc->chip->disable(irq);
> 			do_sync = 1;
> 		}
> 
> 		spin_unlock_irqrestore(&desc->lock, flags);
> 
> 		if (do_sync)
> 			synchronize_irq(irq);
>
> In fact i'd suggest to factor out this logic into a separate 
> __suspend_irq(irq) / __resume_irq(irq) inline helper functions. 
> (They should be inline for the time being as they are not 
> shared-irq-safe so they shouldnt really be exposed to drivers in 
> such a singular capacity.)

Good idea, I'll do it.

> Doing so will also fix the line-break ugliness of the first 
> branch - as in a standalone function the condition fits into a 
> single line.
> 
> There's a performance reason as well: especially when we have a 
> lot of IRQ descriptors that will be about twice as fast. (with a 
> large iteration scope this function is cachemiss-limited and 
> doing this passes doubles the cachemiss rate.)
> 
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
> > + *
> > + *	Enable all interrupt lines previously disabled by suspend_device_irqs()
> > + *	that have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		if (!(desc->status & IRQ_SUSPENDED))
> > +			continue;
> > +		desc->status &= ~IRQ_SUSPENDED;
> > +		enable_irq(irq);
> > +	}
> 
> Robustness+optimization nit: this will work but could be done in 
> a nicer way: enable_irq() should auto-clear IRQ_SUSPENDED. (We 
> already clear flags there so it's even a tiny bit faster this 
> way.)

OK

> We definitely dont want IRQ_SUSPENDED to 'leak' out into an 
> enabled line, should something call enable_irq() on a suspended 
> line. So either make it auto-unsuspend in enable_irq(), or add 
> an extra WARN_ON() to enable_irq(), to make sure IRQ_SUSPENDED 
> is always off by that time.
> 
> > +     arch_suspend_disable_irqs();
> > +     BUG_ON(!irqs_disabled());
> 
> Please. We just disabled all devices - a BUG_ON() is a very 
> counter-productive thing to do here - chances are the user will 
> never see anything but a hang. So please turn this into a nice 
> WARN_ONCE().

This is just moving code.  Also, the BUG_ON() can only affect powerpc and it's
there on purpose AFAICS (Johannes?).  Anyway, changing that would be a separate
patch.

> > --- linux-2.6.orig/include/linux/interrupt.h
> > +++ linux-2.6/include/linux/interrupt.h
> > @@ -470,4 +470,7 @@ extern int early_irq_init(void);
> >  extern int arch_early_irq_init(void);
> >  extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
> >  
> > +extern void suspend_device_irqs(void);
> > +extern void resume_device_irqs(void);
> 
> Header cleanliness nit: please dont just throw new prototypes to 
> the tail of headers, but think about where they fit in best, 
> logically.
> 
> These two new prototypes should go straight after the normal irq 
> line state management functions:
> 
>   extern void disable_irq_nosync(unsigned int irq);
>   extern void disable_irq(unsigned int irq);
>   extern void enable_irq(unsigned int irq);
> 
> Perhaps also with a comment like this:
> 
> /*
>  * Note: dont use these functions in driver code - they are for 
>  * core kernel use only.
>  */

OK, I'll put them in there.

> > +++ linux-2.6/kernel/power/main.c
> [...]
> > +
> > + Unlock:
> > +	resume_device_irqs();
> 
> Small drive-by style nit: while at it could you please fix the 
> capitalization and the naming of the label (and all labels in 
> this file)?

I don't think they are wrong.  They are uniform accross the file and it's
clear what they mean.

> The standard label is "out_unlock". [and "err_unlock" for failure cases
> - but this isnt a failure case.]

Where exactly is this standard defined?
 
> There's 43 such bad label names in kernel/power/*.c, see the 
> output of:
> 
>   git grep '^ [A-Z][a-z].*:$' kernel/power/

If you think they are bad, please send a patch to change them.

> > Index: linux-2.6/arch/x86/kernel/apm_32.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> > +++ linux-2.6/arch/x86/kernel/apm_32.c
> 
> > +
> > +	suspend_device_irqs();
> >  	device_power_down(PMSG_SUSPEND);
> > +
> > +	local_irq_disable();
> 
> hm, this is a very repetitive pattern, all around the various 
> suspend/resume variants. Might make sense to make:
> 
>   	device_power_down(PMSG_SUSPEND);
> 
> do the irq line disabling plus the local irq disabling 
> automatically. That also means it cannot be forgotten. The 
> symmetric action should happen for PMSG_RESUME.
> 
> Is there ever a case where we want a different pattern?

Even if there's no such case, I prefer to call local_irq_disable() explicitly
in here, so that it's clearly known where it happens to anyone reading this
code.

Doing the "late" suspend of devices and disabling interrupts on the CPU
are separate logical steps.

> > Index: linux-2.6/drivers/xen/manage.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/xen/manage.c
> > +++ linux-2.6/drivers/xen/manage.c
> > @@ -39,12 +39,6 @@ static int xen_suspend(void *data)
> 
> > -	if (!*cancelled) {
> > -		xen_irq_resume();
> > -		xen_console_resume();
> > -		xen_timer_resume();
> 
> This change needs a second look. xen_suspend() is a 
> stop_machine() handler and as such executes on specific CPUs, 
> and your change modifies this. OTOH, i had a look at these 
> handlers and it all looks safe. Jeremy?
> 
> > +resume_devices:
> > +	resume_device_irqs();
> 
> Small style nit: labels should start with a space character. 
> I.e. it should be:

I know, but the second label in there starts without a space character and
IMO keeping a uniform coding style i a single file is more important than
trying to adjust it to a broader set of rules FWIW.  I also think that coding
style changes shouldn't be mixed with functional changes as far as reasonably
possible.

> > + resume_devices:
> > +	resume_device_irqs();
> 
> > +++ linux-2.6/kernel/kexec.c
> > @@ -1454,7 +1454,7 @@ int kernel_kexec(void)
> >  		if (error)
> >  			goto Resume_devices;
> >  		device_pm_lock();
> > -		local_irq_disable();
> > +		suspend_device_irqs();
> >  		/* At this point, device_suspend() has been called,
> >  		 * but *not* device_power_down(). We *must*
> >  		 * device_power_down() now.  Otherwise, drivers for
> > @@ -1464,8 +1464,9 @@ int kernel_kexec(void)
> >  		 */
> >  		error = device_power_down(PMSG_FREEZE);
> >  		if (error)
> > -			goto Enable_irqs;
> > +			goto Resume_irqs;
> >  
> > +		local_irq_disable();
> >  		/* Suspend system devices */
> >  		error = sysdev_suspend(PMSG_FREEZE);
> >  		if (error)
> > @@ -1484,9 +1485,10 @@ int kernel_kexec(void)
> >  	if (kexec_image->preserve_context) {
> >  		sysdev_resume();
> >   Power_up_devices:
> > -		device_power_up(PMSG_RESTORE);
> > - Enable_irqs:
> >  		local_irq_enable();
> > +		device_power_up(PMSG_RESTORE);
> > + Resume_irqs:
> > +		resume_device_irqs();
> >  		device_pm_unlock();
> >  		enable_nonboot_cpus();
> >   Resume_devices:
> 
> (same comment about label style applies here too.)
> 
> > Index: linux-2.6/include/linux/irq.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/irq.h
> > +++ linux-2.6/include/linux/irq.h
> > @@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
> >  #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
> >  #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
> >  #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
> > +#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
> >  
> >  #ifdef CONFIG_IRQ_PER_CPU
> >  # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
> 
> Note, you should probably make PM_SLEEP depend on 
> GENERIC_HARDIRQS - as this change will break the build on all 
> non-genirq architectures. (sparc, alpha, etc.)

PM_SLEEP depends on ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE, which
I don't think is set on these architectures.

Thanlks a lot for your comments.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
@ 2009-02-23 12:28             ` Ingo Molnar
  2009-02-23 14:48                 ` Rafael J. Wysocki
                                 ` (2 more replies)
  2009-02-23 12:28             ` Ingo Molnar
                               ` (6 subsequent siblings)
  7 siblings, 3 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 12:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Johannes Berg, Linus Torvalds, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > > Index: linux-2.6/arch/x86/kernel/apm_32.c
> > > ===================================================================
> > > --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> > > +++ linux-2.6/arch/x86/kernel/apm_32.c
> > 
> > > +
> > > +	suspend_device_irqs();
> > >  	device_power_down(PMSG_SUSPEND);
> > > +
> > > +	local_irq_disable();
> > 
> > hm, this is a very repetitive pattern, all around the various 
> > suspend/resume variants. Might make sense to make:
> > 
> >   	device_power_down(PMSG_SUSPEND);
> > 
> > do the irq line disabling plus the local irq disabling 
> > automatically. That also means it cannot be forgotten. The 
> > symmetric action should happen for PMSG_RESUME.
> > 
> > Is there ever a case where we want a different pattern?
> 
> Even if there's no such case, I prefer to call 
> local_irq_disable() explicitly in here, so that it's clearly 
> known where it happens to anyone reading this code.

That property can be implied in the function name:

   	device_power_down_irq_disable(PMSG_SUSPEND);

Open-coding it, if it looks the same in all the cases just 
increases the chances that someone somewhere copies them 
incorrectly.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
  2009-02-23 12:28             ` Ingo Molnar
@ 2009-02-23 12:28             ` Ingo Molnar
  2009-02-23 12:45             ` Ingo Molnar
                               ` (5 subsequent siblings)
  7 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 12:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Johannes Berg, Linus Torvalds, pm list


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > > Index: linux-2.6/arch/x86/kernel/apm_32.c
> > > ===================================================================
> > > --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> > > +++ linux-2.6/arch/x86/kernel/apm_32.c
> > 
> > > +
> > > +	suspend_device_irqs();
> > >  	device_power_down(PMSG_SUSPEND);
> > > +
> > > +	local_irq_disable();
> > 
> > hm, this is a very repetitive pattern, all around the various 
> > suspend/resume variants. Might make sense to make:
> > 
> >   	device_power_down(PMSG_SUSPEND);
> > 
> > do the irq line disabling plus the local irq disabling 
> > automatically. That also means it cannot be forgotten. The 
> > symmetric action should happen for PMSG_RESUME.
> > 
> > Is there ever a case where we want a different pattern?
> 
> Even if there's no such case, I prefer to call 
> local_irq_disable() explicitly in here, so that it's clearly 
> known where it happens to anyone reading this code.

That property can be implied in the function name:

   	device_power_down_irq_disable(PMSG_SUSPEND);

Open-coding it, if it looks the same in all the cases just 
increases the chances that someone somewhere copies them 
incorrectly.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
  2009-02-23 12:28             ` Ingo Molnar
  2009-02-23 12:28             ` Ingo Molnar
@ 2009-02-23 12:45             ` Ingo Molnar
  2009-02-23 15:07               ` Rafael J. Wysocki
  2009-02-23 15:07               ` Rafael J. Wysocki
  2009-02-23 12:45             ` Ingo Molnar
                               ` (4 subsequent siblings)
  7 siblings, 2 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 12:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Johannes Berg, Linus Torvalds, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > > +resume_devices:
> > > +	resume_device_irqs();
> > 
> > Small style nit: labels should start with a space character. 
> > I.e. it should be:
> 
> I know, but the second label in there starts without a space 
> character and IMO keeping a uniform coding style i a single 
> file is more important than trying to adjust it to a broader 
> set of rules FWIW. [...]

Even though it's just a very small and insignificant detail 
(nowhere described in the CodingStyle), barely worth the mention 
(and i already regret having brought it up at all), what you say 
is wrong on a conceptual level and that alarms me a bit ;-)

It is exactly these kinds of "my code, my style!" world view 
that results in a crappy overall kernel style.

For a single file to look consistent is just the first (and 
required) step, what matters even more is for files to have 
similar coding patterns, to make the style as helpful to the 
general kernel developer/reviewer/bug-fixer/maintainer as 
possible.

"code with a helpful style" here means two things:

1) it should understand and adhere to basic style principles. 
   This is just an (often arbitrary) subset of the infinite set 
   of reasonable style guides. The most common-sense ones are
   written down in Documentation/CodingStyle. There's a lot of 
   leeway, as long as the basic principle of "be helpful" is
   understood and followed.

2) it should carry meta information outside of the language 
   syntax and it should build expectations about a code's 
   purpose and general structure.

   That is essential so that we can find bugs during review.

   If each file has a slightly different style to express labels 
   then that means we insert extra entropy and degrades and 
   obfuscates the true meat of the code and hurts the overall 
   reviewability of the code.

In practical terms: i noticed that weird label - otherwise i 
would not have commented on it. I noticed it because it had the 
pattern of a comment block (most comment blocks start with 
capital letters, and for that good reason).

It was completely unnecessary for me to notice that label - it 
carries no information about the patch itself. Ergo, it would be 
better in the long run if code does not raise unnecessary mental 
exceptions. We have a limited set of exceptions we are able to 
handle during review, lets make sure we use them sparingly.

Sure, there will always be borderline cases where we'll have to 
agree to disagree, even if we agree about the general principle.

But this is not one of those cases - having a "Suspend:" 
capitalized label is not something you added to enhance the 
basic coding style - it is something very uncommon and 
self-serving which you added in _spite_ of the general 
principles i believe. It has no other message beyond "I do this 
because i can!".

I.e. it is not helpful at all. When it comes to coding style the 
kernel is not a democracy at all.

> [...]  I also think that coding style changes shouldn't be 
> mixed with functional changes as far as reasonably possible.

Sure, you got that drive-by review for free, by virtue of 
context diffs ;-)

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
                               ` (2 preceding siblings ...)
  2009-02-23 12:45             ` Ingo Molnar
@ 2009-02-23 12:45             ` Ingo Molnar
  2009-02-23 15:52             ` Johannes Berg
                               ` (3 subsequent siblings)
  7 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 12:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Johannes Berg, Linus Torvalds, pm list


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > > +resume_devices:
> > > +	resume_device_irqs();
> > 
> > Small style nit: labels should start with a space character. 
> > I.e. it should be:
> 
> I know, but the second label in there starts without a space 
> character and IMO keeping a uniform coding style i a single 
> file is more important than trying to adjust it to a broader 
> set of rules FWIW. [...]

Even though it's just a very small and insignificant detail 
(nowhere described in the CodingStyle), barely worth the mention 
(and i already regret having brought it up at all), what you say 
is wrong on a conceptual level and that alarms me a bit ;-)

It is exactly these kinds of "my code, my style!" world view 
that results in a crappy overall kernel style.

For a single file to look consistent is just the first (and 
required) step, what matters even more is for files to have 
similar coding patterns, to make the style as helpful to the 
general kernel developer/reviewer/bug-fixer/maintainer as 
possible.

"code with a helpful style" here means two things:

1) it should understand and adhere to basic style principles. 
   This is just an (often arbitrary) subset of the infinite set 
   of reasonable style guides. The most common-sense ones are
   written down in Documentation/CodingStyle. There's a lot of 
   leeway, as long as the basic principle of "be helpful" is
   understood and followed.

2) it should carry meta information outside of the language 
   syntax and it should build expectations about a code's 
   purpose and general structure.

   That is essential so that we can find bugs during review.

   If each file has a slightly different style to express labels 
   then that means we insert extra entropy and degrades and 
   obfuscates the true meat of the code and hurts the overall 
   reviewability of the code.

In practical terms: i noticed that weird label - otherwise i 
would not have commented on it. I noticed it because it had the 
pattern of a comment block (most comment blocks start with 
capital letters, and for that good reason).

It was completely unnecessary for me to notice that label - it 
carries no information about the patch itself. Ergo, it would be 
better in the long run if code does not raise unnecessary mental 
exceptions. We have a limited set of exceptions we are able to 
handle during review, lets make sure we use them sparingly.

Sure, there will always be borderline cases where we'll have to 
agree to disagree, even if we agree about the general principle.

But this is not one of those cases - having a "Suspend:" 
capitalized label is not something you added to enhance the 
basic coding style - it is something very uncommon and 
self-serving which you added in _spite_ of the general 
principles i believe. It has no other message beyond "I do this 
because i can!".

I.e. it is not helpful at all. When it comes to coding style the 
kernel is not a democracy at all.

> [...]  I also think that coding style changes shouldn't be 
> mixed with functional changes as far as reasonably possible.

Sure, you got that drive-by review for free, by virtue of 
context diffs ;-)

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:04                   ` Ingo Molnar
@ 2009-02-23 14:45                     ` Rafael J. Wysocki
  2009-02-23 15:06                         ` Ingo Molnar
  2009-02-23 14:45                     ` Rafael J. Wysocki
  1 sibling, 1 reply; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 14:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric W. Biederman, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> > > What makes s2ram fragile is not human failure but the 
> > > combination of a handful of physical property:
> > >
> > > 1) Psychology: shutting the lid or pushing the suspend button is 
> > >    a deceivingly 'simple' action to the user. But under the 
> > >    hood, a ton of stuff happens: we deinitialize a lot of 
> > >    things, we go through _all hardware state_, and we do so in a 
> > >    serial fashion. If just one piece fails to do the right 
> > >    thing, the box might not resume. Still, the user expects this 
> > >    'simple' thing to just work, all the time. No excuses 
> > >    accepted.
> > >
> > > 2) Length of code: To get a successful s2ram sequence the kernel
> > >    runs through tens of thousands of lines of code. Code which
> > >    never gets executed on a normal box - only if we s2ram. If 
> > >    just one step fails, we get a hung box.
> > >
> > > 3) Debuggability: a lot of s2ram code runs with the console off, 
> > >    making any bugs hard to debug. Furthermore we have no 
> > >    meaningful persistent storage either for kernel bug messages. 
> > >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> > >    of information and it takes a lot of time to debug a bug via 
> > >    that method.
> > 
> > Yep that is an issue.
> 
> I'd also like to add #4:
> 
>      4) One more thing that makes s2ram special is that when the 
>         resume path finds hardware often in an even more 
>         deinitialized form than during normal bootup. During
>         normal bootup the BIOS/firmware has at least done some
>         minimal bootstrap (to get the kernel loaded), which
>         makes life easier for the kernel.
> 
>         At s2ram stage we've got a completely pure hardware
>         init state, with very minimal firmware activation.

This is very true and at least in some cases done on purpose, AFAICS, due to
some timing constraints forced on HW vendors by M$, for example.

>         So many of the init and deinit problems and bugs we only 
>         hit in the s2ram path - which dynamics is again not 
>         helpful.

Plus ACPI requires us to do additional things during suspend-resume that
are not done on boot-shutdown and which have their own ordering requirements
(not necessarily stated directly, but such that we have do discover
experimentally).  That also change from one BIOS to another.

> > > The combination of these factors really makes up for a 
> > > perfect storm in terms of kernel technology: we have this 
> > > very-deceivingly-simple-looking but 
> > > complex-and-rarely-executed piece of code, which is very 
> > > hard to debug.
> > 
> > And much of this as you are finding with this piece of code is 
> > how the software was designed rather then how the software 
> > needed to be.
> 
> Well most of the 4 problems above are externalities and cannot 
> go away just by fixing the kernel.
> 
>  #1 will always be with us.
>  #3 needs the hardware to change. It's happening, but slowly.
>  #4 will be with us as long as there's non-Linux BIOSes
> 
> #2 is the only thing where we can make a realistic difference,
> but there's just so much we can do there.
> 
> And that still leaves the other three items: each of which is 
> powerful enough of a force to give a bad name to any normal 
> subsystem.

Agreed.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:04                   ` Ingo Molnar
  2009-02-23 14:45                     ` Rafael J. Wysocki
@ 2009-02-23 14:45                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 14:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> > > What makes s2ram fragile is not human failure but the 
> > > combination of a handful of physical property:
> > >
> > > 1) Psychology: shutting the lid or pushing the suspend button is 
> > >    a deceivingly 'simple' action to the user. But under the 
> > >    hood, a ton of stuff happens: we deinitialize a lot of 
> > >    things, we go through _all hardware state_, and we do so in a 
> > >    serial fashion. If just one piece fails to do the right 
> > >    thing, the box might not resume. Still, the user expects this 
> > >    'simple' thing to just work, all the time. No excuses 
> > >    accepted.
> > >
> > > 2) Length of code: To get a successful s2ram sequence the kernel
> > >    runs through tens of thousands of lines of code. Code which
> > >    never gets executed on a normal box - only if we s2ram. If 
> > >    just one step fails, we get a hung box.
> > >
> > > 3) Debuggability: a lot of s2ram code runs with the console off, 
> > >    making any bugs hard to debug. Furthermore we have no 
> > >    meaningful persistent storage either for kernel bug messages. 
> > >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> > >    of information and it takes a lot of time to debug a bug via 
> > >    that method.
> > 
> > Yep that is an issue.
> 
> I'd also like to add #4:
> 
>      4) One more thing that makes s2ram special is that when the 
>         resume path finds hardware often in an even more 
>         deinitialized form than during normal bootup. During
>         normal bootup the BIOS/firmware has at least done some
>         minimal bootstrap (to get the kernel loaded), which
>         makes life easier for the kernel.
> 
>         At s2ram stage we've got a completely pure hardware
>         init state, with very minimal firmware activation.

This is very true and at least in some cases done on purpose, AFAICS, due to
some timing constraints forced on HW vendors by M$, for example.

>         So many of the init and deinit problems and bugs we only 
>         hit in the s2ram path - which dynamics is again not 
>         helpful.

Plus ACPI requires us to do additional things during suspend-resume that
are not done on boot-shutdown and which have their own ordering requirements
(not necessarily stated directly, but such that we have do discover
experimentally).  That also change from one BIOS to another.

> > > The combination of these factors really makes up for a 
> > > perfect storm in terms of kernel technology: we have this 
> > > very-deceivingly-simple-looking but 
> > > complex-and-rarely-executed piece of code, which is very 
> > > hard to debug.
> > 
> > And much of this as you are finding with this piece of code is 
> > how the software was designed rather then how the software 
> > needed to be.
> 
> Well most of the 4 problems above are externalities and cannot 
> go away just by fixing the kernel.
> 
>  #1 will always be with us.
>  #3 needs the hardware to change. It's happening, but slowly.
>  #4 will be with us as long as there's non-Linux BIOSes
> 
> #2 is the only thing where we can make a realistic difference,
> but there's just so much we can do there.
> 
> And that still leaves the other three items: each of which is 
> powerful enough of a force to give a bad name to any normal 
> subsystem.

Agreed.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 12:28             ` Ingo Molnar
@ 2009-02-23 14:48                 ` Rafael J. Wysocki
  2009-02-23 20:49               ` Benjamin Herrenschmidt
  2009-02-23 20:49               ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 14:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Johannes Berg, Linus Torvalds, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > > > Index: linux-2.6/arch/x86/kernel/apm_32.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> > > > +++ linux-2.6/arch/x86/kernel/apm_32.c
> > > 
> > > > +
> > > > +	suspend_device_irqs();
> > > >  	device_power_down(PMSG_SUSPEND);
> > > > +
> > > > +	local_irq_disable();
> > > 
> > > hm, this is a very repetitive pattern, all around the various 
> > > suspend/resume variants. Might make sense to make:
> > > 
> > >   	device_power_down(PMSG_SUSPEND);
> > > 
> > > do the irq line disabling plus the local irq disabling 
> > > automatically. That also means it cannot be forgotten. The 
> > > symmetric action should happen for PMSG_RESUME.
> > > 
> > > Is there ever a case where we want a different pattern?
> > 
> > Even if there's no such case, I prefer to call 
> > local_irq_disable() explicitly in here, so that it's clearly 
> > known where it happens to anyone reading this code.
> 
> That property can be implied in the function name:
> 
>    	device_power_down_irq_disable(PMSG_SUSPEND);
> 
> Open-coding it, if it looks the same in all the cases just 
> increases the chances that someone somewhere copies them 
> incorrectly.

Well, I see your point, but in that case I'd rather couple the disabling of
local interrupts on the CPU with sysdev_suspend and the disabling (or whatever
Eric would like to call that) of device interrupts with device_power_down().

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-23 14:48                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 14:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Johannes Berg, Linus Torvalds, pm list

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > > > Index: linux-2.6/arch/x86/kernel/apm_32.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> > > > +++ linux-2.6/arch/x86/kernel/apm_32.c
> > > 
> > > > +
> > > > +	suspend_device_irqs();
> > > >  	device_power_down(PMSG_SUSPEND);
> > > > +
> > > > +	local_irq_disable();
> > > 
> > > hm, this is a very repetitive pattern, all around the various 
> > > suspend/resume variants. Might make sense to make:
> > > 
> > >   	device_power_down(PMSG_SUSPEND);
> > > 
> > > do the irq line disabling plus the local irq disabling 
> > > automatically. That also means it cannot be forgotten. The 
> > > symmetric action should happen for PMSG_RESUME.
> > > 
> > > Is there ever a case where we want a different pattern?
> > 
> > Even if there's no such case, I prefer to call 
> > local_irq_disable() explicitly in here, so that it's clearly 
> > known where it happens to anyone reading this code.
> 
> That property can be implied in the function name:
> 
>    	device_power_down_irq_disable(PMSG_SUSPEND);
> 
> Open-coding it, if it looks the same in all the cases just 
> increases the chances that someone somewhere copies them 
> incorrectly.

Well, I see your point, but in that case I'd rather couple the disabling of
local interrupts on the CPU with sysdev_suspend and the disabling (or whatever
Eric would like to call that) of device interrupts with device_power_down().

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 14:45                     ` Rafael J. Wysocki
@ 2009-02-23 15:06                         ` Ingo Molnar
  0 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 15:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Eric W. Biederman, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Monday 23 February 2009, Ingo Molnar wrote:
> > 
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > 
> > > > What makes s2ram fragile is not human failure but the 
> > > > combination of a handful of physical property:
> > > >
> > > > 1) Psychology: shutting the lid or pushing the suspend button is 
> > > >    a deceivingly 'simple' action to the user. But under the 
> > > >    hood, a ton of stuff happens: we deinitialize a lot of 
> > > >    things, we go through _all hardware state_, and we do so in a 
> > > >    serial fashion. If just one piece fails to do the right 
> > > >    thing, the box might not resume. Still, the user expects this 
> > > >    'simple' thing to just work, all the time. No excuses 
> > > >    accepted.
> > > >
> > > > 2) Length of code: To get a successful s2ram sequence the kernel
> > > >    runs through tens of thousands of lines of code. Code which
> > > >    never gets executed on a normal box - only if we s2ram. If 
> > > >    just one step fails, we get a hung box.
> > > >
> > > > 3) Debuggability: a lot of s2ram code runs with the console off, 
> > > >    making any bugs hard to debug. Furthermore we have no 
> > > >    meaningful persistent storage either for kernel bug messages. 
> > > >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> > > >    of information and it takes a lot of time to debug a bug via 
> > > >    that method.
> > > 
> > > Yep that is an issue.
> > 
> > I'd also like to add #4:
> > 
> >      4) One more thing that makes s2ram special is that when the 
> >         resume path finds hardware often in an even more 
> >         deinitialized form than during normal bootup. During
> >         normal bootup the BIOS/firmware has at least done some
> >         minimal bootstrap (to get the kernel loaded), which
> >         makes life easier for the kernel.
> > 
> >         At s2ram stage we've got a completely pure hardware
> >         init state, with very minimal firmware activation.
> 
> This is very true and at least in some cases done on purpose, 
> AFAICS, due to some timing constraints forced on HW vendors by 
> M$, for example.

IMHO i think it's the technically sane thing to do. Personally i 
trust the quirks of bare metal much more than the combined 
quirks of firmware _and_ bare metal.

> >         So many of the init and deinit problems and bugs we 
> >         only hit in the s2ram path - which dynamics is again 
> >         not helpful.
> 
> Plus ACPI requires us to do additional things during 
> suspend-resume that are not done on boot-shutdown and which 
> have their own ordering requirements (not necessarily stated 
> directly, but such that we have do discover experimentally).  
> That also change from one BIOS to another.

We could perhaps do a few things here to trigger bugs sooner.

For example at driver init, instead of executing just 
->driver_open(), we could execute:

   ->driver_open()
   ->driver_suspend()
   ->driver_resume()

I.e. we'd simulate a suspend+resume mini-step. This makes it 
sure that the basic driver callbacks are sane. It is also 
supposed to work because the driver is just being initialized. 

This way certain types of bugs would not show up as difficult to 
debug s2ram regressions - but would show up as 'boot hang' or 
'boot crash' bugs.

This does not simulate the "big picture" resume machinery (the 
dependencies, etc.), nor does it trigger any of the "hardware 
got really turned off" effects that true resume will trigger - 
but at least it offloads a portion of the testing space from 
's2ram' to 'bootup' testing.

What's your feeling - what percentage of all s2ram regressions 
in the last year or so could have been triggered this way? Lets 
assume we had 100 regressions in that timeframe - would it be in 
the 10 bugs range? Or much lower or much higher?

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-23 15:06                         ` Ingo Molnar
  0 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 15:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Monday 23 February 2009, Ingo Molnar wrote:
> > 
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > 
> > > > What makes s2ram fragile is not human failure but the 
> > > > combination of a handful of physical property:
> > > >
> > > > 1) Psychology: shutting the lid or pushing the suspend button is 
> > > >    a deceivingly 'simple' action to the user. But under the 
> > > >    hood, a ton of stuff happens: we deinitialize a lot of 
> > > >    things, we go through _all hardware state_, and we do so in a 
> > > >    serial fashion. If just one piece fails to do the right 
> > > >    thing, the box might not resume. Still, the user expects this 
> > > >    'simple' thing to just work, all the time. No excuses 
> > > >    accepted.
> > > >
> > > > 2) Length of code: To get a successful s2ram sequence the kernel
> > > >    runs through tens of thousands of lines of code. Code which
> > > >    never gets executed on a normal box - only if we s2ram. If 
> > > >    just one step fails, we get a hung box.
> > > >
> > > > 3) Debuggability: a lot of s2ram code runs with the console off, 
> > > >    making any bugs hard to debug. Furthermore we have no 
> > > >    meaningful persistent storage either for kernel bug messages. 
> > > >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> > > >    of information and it takes a lot of time to debug a bug via 
> > > >    that method.
> > > 
> > > Yep that is an issue.
> > 
> > I'd also like to add #4:
> > 
> >      4) One more thing that makes s2ram special is that when the 
> >         resume path finds hardware often in an even more 
> >         deinitialized form than during normal bootup. During
> >         normal bootup the BIOS/firmware has at least done some
> >         minimal bootstrap (to get the kernel loaded), which
> >         makes life easier for the kernel.
> > 
> >         At s2ram stage we've got a completely pure hardware
> >         init state, with very minimal firmware activation.
> 
> This is very true and at least in some cases done on purpose, 
> AFAICS, due to some timing constraints forced on HW vendors by 
> M$, for example.

IMHO i think it's the technically sane thing to do. Personally i 
trust the quirks of bare metal much more than the combined 
quirks of firmware _and_ bare metal.

> >         So many of the init and deinit problems and bugs we 
> >         only hit in the s2ram path - which dynamics is again 
> >         not helpful.
> 
> Plus ACPI requires us to do additional things during 
> suspend-resume that are not done on boot-shutdown and which 
> have their own ordering requirements (not necessarily stated 
> directly, but such that we have do discover experimentally).  
> That also change from one BIOS to another.

We could perhaps do a few things here to trigger bugs sooner.

For example at driver init, instead of executing just 
->driver_open(), we could execute:

   ->driver_open()
   ->driver_suspend()
   ->driver_resume()

I.e. we'd simulate a suspend+resume mini-step. This makes it 
sure that the basic driver callbacks are sane. It is also 
supposed to work because the driver is just being initialized. 

This way certain types of bugs would not show up as difficult to 
debug s2ram regressions - but would show up as 'boot hang' or 
'boot crash' bugs.

This does not simulate the "big picture" resume machinery (the 
dependencies, etc.), nor does it trigger any of the "hardware 
got really turned off" effects that true resume will trigger - 
but at least it offloads a portion of the testing space from 
's2ram' to 'bootup' testing.

What's your feeling - what percentage of all s2ram regressions 
in the last year or so could have been triggered this way? Lets 
assume we had 100 regressions in that timeframe - would it be in 
the 10 bugs range? Or much lower or much higher?

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 12:45             ` Ingo Molnar
  2009-02-23 15:07               ` Rafael J. Wysocki
@ 2009-02-23 15:07               ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 15:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Johannes Berg, Linus Torvalds, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > > > +resume_devices:
> > > > +	resume_device_irqs();
> > > 
> > > Small style nit: labels should start with a space character. 
> > > I.e. it should be:
> > 
> > I know, but the second label in there starts without a space 
> > character and IMO keeping a uniform coding style i a single 
> > file is more important than trying to adjust it to a broader 
> > set of rules FWIW. [...]
> 
> Even though it's just a very small and insignificant detail 
> (nowhere described in the CodingStyle), barely worth the mention 
> (and i already regret having brought it up at all), what you say 
> is wrong on a conceptual level and that alarms me a bit ;-)
> 
> It is exactly these kinds of "my code, my style!" world view 
> that results in a crappy overall kernel style.
> 
> For a single file to look consistent is just the first (and 
> required) step, what matters even more is for files to have 
> similar coding patterns, to make the style as helpful to the 
> general kernel developer/reviewer/bug-fixer/maintainer as 
> possible.
> 
> "code with a helpful style" here means two things:
> 
> 1) it should understand and adhere to basic style principles. 
>    This is just an (often arbitrary) subset of the infinite set 
>    of reasonable style guides. The most common-sense ones are
>    written down in Documentation/CodingStyle. There's a lot of 
>    leeway, as long as the basic principle of "be helpful" is
>    understood and followed.
> 
> 2) it should carry meta information outside of the language 
>    syntax and it should build expectations about a code's 
>    purpose and general structure.
> 
>    That is essential so that we can find bugs during review.
> 
>    If each file has a slightly different style to express labels 
>    then that means we insert extra entropy and degrades and 
>    obfuscates the true meat of the code and hurts the overall 
>    reviewability of the code.
> 
> In practical terms: i noticed that weird label - otherwise i 
> would not have commented on it. I noticed it because it had the 
> pattern of a comment block (most comment blocks start with 
> capital letters, and for that good reason).
> 
> It was completely unnecessary for me to notice that label - it 
> carries no information about the patch itself. Ergo, it would be 
> better in the long run if code does not raise unnecessary mental 
> exceptions. We have a limited set of exceptions we are able to 
> handle during review, lets make sure we use them sparingly.

Just to clarify, I have nothing against labels that are not capitalized etc.,
actually I can live with whatever style of labels is considered as appropriate
and/or helpful.

However, if specific style of labels was chosen for given file in the past and
it is consistent over the entire file, I don't think it should be changed in a
patch that does a different thing, regardless of who's maintaining the file
or who's written the code in question.  It should be changed in a separate
patch with a changelog describing why this change is being made.  I don't
really have the time to write such a patch at the moment and I don't really
think it's _that_ important.  YMMV.

> Sure, there will always be borderline cases where we'll have to 
> agree to disagree, even if we agree about the general principle.
> 
> But this is not one of those cases - having a "Suspend:" 
> capitalized label is not something you added to enhance the 
> basic coding style - it is something very uncommon and 
> self-serving which you added in _spite_ of the general 
> principles i believe. It has no other message beyond "I do this 
> because i can!".
> 
> I.e. it is not helpful at all. When it comes to coding style the 
> kernel is not a democracy at all.
> 
> > [...]  I also think that coding style changes shouldn't be 
> > mixed with functional changes as far as reasonably possible.
> 
> Sure, you got that drive-by review for free, by virtue of 
> context diffs ;-)

Well, OK. :-)

Still, IMHO it's more helpful if the comments related to the changes that
belong to the patch in question are not mixed with comments related to the
coding style of the files being modified.  Perhaps I'm picky ...

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 12:45             ` Ingo Molnar
@ 2009-02-23 15:07               ` Rafael J. Wysocki
  2009-02-23 15:07               ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 15:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Johannes Berg, Linus Torvalds, pm list

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > > > +resume_devices:
> > > > +	resume_device_irqs();
> > > 
> > > Small style nit: labels should start with a space character. 
> > > I.e. it should be:
> > 
> > I know, but the second label in there starts without a space 
> > character and IMO keeping a uniform coding style i a single 
> > file is more important than trying to adjust it to a broader 
> > set of rules FWIW. [...]
> 
> Even though it's just a very small and insignificant detail 
> (nowhere described in the CodingStyle), barely worth the mention 
> (and i already regret having brought it up at all), what you say 
> is wrong on a conceptual level and that alarms me a bit ;-)
> 
> It is exactly these kinds of "my code, my style!" world view 
> that results in a crappy overall kernel style.
> 
> For a single file to look consistent is just the first (and 
> required) step, what matters even more is for files to have 
> similar coding patterns, to make the style as helpful to the 
> general kernel developer/reviewer/bug-fixer/maintainer as 
> possible.
> 
> "code with a helpful style" here means two things:
> 
> 1) it should understand and adhere to basic style principles. 
>    This is just an (often arbitrary) subset of the infinite set 
>    of reasonable style guides. The most common-sense ones are
>    written down in Documentation/CodingStyle. There's a lot of 
>    leeway, as long as the basic principle of "be helpful" is
>    understood and followed.
> 
> 2) it should carry meta information outside of the language 
>    syntax and it should build expectations about a code's 
>    purpose and general structure.
> 
>    That is essential so that we can find bugs during review.
> 
>    If each file has a slightly different style to express labels 
>    then that means we insert extra entropy and degrades and 
>    obfuscates the true meat of the code and hurts the overall 
>    reviewability of the code.
> 
> In practical terms: i noticed that weird label - otherwise i 
> would not have commented on it. I noticed it because it had the 
> pattern of a comment block (most comment blocks start with 
> capital letters, and for that good reason).
> 
> It was completely unnecessary for me to notice that label - it 
> carries no information about the patch itself. Ergo, it would be 
> better in the long run if code does not raise unnecessary mental 
> exceptions. We have a limited set of exceptions we are able to 
> handle during review, lets make sure we use them sparingly.

Just to clarify, I have nothing against labels that are not capitalized etc.,
actually I can live with whatever style of labels is considered as appropriate
and/or helpful.

However, if specific style of labels was chosen for given file in the past and
it is consistent over the entire file, I don't think it should be changed in a
patch that does a different thing, regardless of who's maintaining the file
or who's written the code in question.  It should be changed in a separate
patch with a changelog describing why this change is being made.  I don't
really have the time to write such a patch at the moment and I don't really
think it's _that_ important.  YMMV.

> Sure, there will always be borderline cases where we'll have to 
> agree to disagree, even if we agree about the general principle.
> 
> But this is not one of those cases - having a "Suspend:" 
> capitalized label is not something you added to enhance the 
> basic coding style - it is something very uncommon and 
> self-serving which you added in _spite_ of the general 
> principles i believe. It has no other message beyond "I do this 
> because i can!".
> 
> I.e. it is not helpful at all. When it comes to coding style the 
> kernel is not a democracy at all.
> 
> > [...]  I also think that coding style changes shouldn't be 
> > mixed with functional changes as far as reasonably possible.
> 
> Sure, you got that drive-by review for free, by virtue of 
> context diffs ;-)

Well, OK. :-)

Still, IMHO it's more helpful if the comments related to the changes that
belong to the patch in question are not mixed with comments related to the
coding style of the files being modified.  Perhaps I'm picky ...

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:03                   ` Rafael J. Wysocki
  2009-02-23 15:28                     ` Eric W. Biederman
@ 2009-02-23 15:28                     ` Eric W. Biederman
  2009-02-23 21:39                       ` Rafael J. Wysocki
  2009-02-23 21:39                       ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23 15:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

>> I don't know where in the state machine this is getting called but
>> I would suggest doing this before we shutdown cpus.
>
> This is the plan.  In fact, I'm going to do this in the next patch after the
> $subject one has been tested and found acceptable.

Good to hear.  Then let's please get a version of the irq disable that calls
shutdown, so we can be certain we don't have hardware irqs in flight.

For the drivers it should not matter for clean cpu shutdown it will.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:03                   ` Rafael J. Wysocki
@ 2009-02-23 15:28                     ` Eric W. Biederman
  2009-02-23 15:28                     ` Eric W. Biederman
  1 sibling, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-23 15:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

>> I don't know where in the state machine this is getting called but
>> I would suggest doing this before we shutdown cpus.
>
> This is the plan.  In fact, I'm going to do this in the next patch after the
> $subject one has been tested and found acceptable.

Good to hear.  Then let's please get a version of the irq disable that calls
shutdown, so we can be certain we don't have hardware irqs in flight.

For the drivers it should not matter for clean cpu shutdown it will.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
                               ` (4 preceding siblings ...)
  2009-02-23 15:52             ` Johannes Berg
@ 2009-02-23 15:52             ` Johannes Berg
  2009-02-23 17:16             ` Ingo Molnar
  2009-02-23 17:16             ` Ingo Molnar
  7 siblings, 0 replies; 373+ messages in thread
From: Johannes Berg @ 2009-02-23 15:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Linus Torvalds, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

On Mon, 2009-02-23 at 12:29 +0100, Rafael J. Wysocki wrote:

> > > +     arch_suspend_disable_irqs();
> > > +     BUG_ON(!irqs_disabled());
> > 
> > Please. We just disabled all devices - a BUG_ON() is a very 
> > counter-productive thing to do here - chances are the user will 
> > never see anything but a hang. So please turn this into a nice 
> > WARN_ONCE().
> 
> This is just moving code.  Also, the BUG_ON() can only affect powerpc and it's
> there on purpose AFAICS (Johannes?).  Anyway, changing that would be a separate
> patch.

It can affect any platform that overrides the weak symbol
arch_suspend_disable_irqs(), and I think that if you're writing this
low-level code you better have a way to debug. As such, I don't think it
needs changing, because you can only ever see that while implementing
arch_suspend_disable_irqs(). OTOH, since it can only trigger then, a
WARN_ON is probably fine as well since you'll be getting your machine
into inconsistent states all the time while implementing this ;)

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
                               ` (3 preceding siblings ...)
  2009-02-23 12:45             ` Ingo Molnar
@ 2009-02-23 15:52             ` Johannes Berg
  2009-02-23 15:52             ` Johannes Berg
                               ` (2 subsequent siblings)
  7 siblings, 0 replies; 373+ messages in thread
From: Johannes Berg @ 2009-02-23 15:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list


[-- Attachment #1.1: Type: text/plain, Size: 1047 bytes --]

On Mon, 2009-02-23 at 12:29 +0100, Rafael J. Wysocki wrote:

> > > +     arch_suspend_disable_irqs();
> > > +     BUG_ON(!irqs_disabled());
> > 
> > Please. We just disabled all devices - a BUG_ON() is a very 
> > counter-productive thing to do here - chances are the user will 
> > never see anything but a hang. So please turn this into a nice 
> > WARN_ONCE().
> 
> This is just moving code.  Also, the BUG_ON() can only affect powerpc and it's
> there on purpose AFAICS (Johannes?).  Anyway, changing that would be a separate
> patch.

It can affect any platform that overrides the weak symbol
arch_suspend_disable_irqs(), and I think that if you're writing this
low-level code you better have a way to debug. As such, I don't think it
needs changing, because you can only ever see that while implementing
arch_suspend_disable_irqs(). OTOH, since it can only trigger then, a
WARN_ON is probably fine as well since you'll be getting your machine
into inconsistent states all the time while implementing this ;)

johannes

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
                               ` (6 preceding siblings ...)
  2009-02-23 17:16             ` Ingo Molnar
@ 2009-02-23 17:16             ` Ingo Molnar
  2009-02-23 17:28                 ` Linus Torvalds
  7 siblings, 1 reply; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 17:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Johannes Berg, Linus Torvalds, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > > +void suspend_device_irqs(void)
> > > +{
> > > +	struct irq_desc *desc;
> > > +	int irq;
> > > +
> > > +	for_each_irq_desc(irq, desc) {
> > > +		unsigned long flags;
> > > +
> > > +		spin_lock_irqsave(&desc->lock, flags);
> > > +
> > > +		if (!desc->depth && desc->action
> > > +		    && !(desc->action->flags & IRQF_TIMER)) {
> > > +			desc->depth++;
> > > +			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> > > +			desc->chip->disable(irq);
> > > +		}
> > > +
> > > +		spin_unlock_irqrestore(&desc->lock, flags);
> > > +	}
> > > +
> > > +	for_each_irq_desc(irq, desc) {
> > > +		if (desc->status & IRQ_SUSPENDED)
> > > +			synchronize_irq(irq);
> > > +	}
> > 
> > Optimization/code-flow nit: a possibility might be to do a 
> > single loop, i.e. i think it's safe to couple the 
> > disable+sync bits [as in 99.99% of the cases there will be 
> > no in-execution irq handlers when we execute this.]
> 
> Well, Linus suggested to do it in a separate loop.  I'm fine 
> with both ways.

Linus, do you have a strong opinion about which variant we 
should use?

The two approaches are not completely equivalent, the variant 
suggested by Linus is a bit more 'atomic' - in that it first 
turns off everything, then looks for everything that needs to be 
synchronized.

OTOH, it _shouldnt_ make much of a difference on a correctly 
working system - we ought to be able to disable the irqs one by 
one and wait on any pending ones on the spot. Maybe if there was 
some implicit dependency between irq sources it would be more 
robust to do Linus's version.

Dunno ... no strong feelings either way.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 11:29           ` Rafael J. Wysocki
                               ` (5 preceding siblings ...)
  2009-02-23 15:52             ` Johannes Berg
@ 2009-02-23 17:16             ` Ingo Molnar
  2009-02-23 17:16             ` Ingo Molnar
  7 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-23 17:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Johannes Berg, Linus Torvalds, pm list


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > > +void suspend_device_irqs(void)
> > > +{
> > > +	struct irq_desc *desc;
> > > +	int irq;
> > > +
> > > +	for_each_irq_desc(irq, desc) {
> > > +		unsigned long flags;
> > > +
> > > +		spin_lock_irqsave(&desc->lock, flags);
> > > +
> > > +		if (!desc->depth && desc->action
> > > +		    && !(desc->action->flags & IRQF_TIMER)) {
> > > +			desc->depth++;
> > > +			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> > > +			desc->chip->disable(irq);
> > > +		}
> > > +
> > > +		spin_unlock_irqrestore(&desc->lock, flags);
> > > +	}
> > > +
> > > +	for_each_irq_desc(irq, desc) {
> > > +		if (desc->status & IRQ_SUSPENDED)
> > > +			synchronize_irq(irq);
> > > +	}
> > 
> > Optimization/code-flow nit: a possibility might be to do a 
> > single loop, i.e. i think it's safe to couple the 
> > disable+sync bits [as in 99.99% of the cases there will be 
> > no in-execution irq handlers when we execute this.]
> 
> Well, Linus suggested to do it in a separate loop.  I'm fine 
> with both ways.

Linus, do you have a strong opinion about which variant we 
should use?

The two approaches are not completely equivalent, the variant 
suggested by Linus is a bit more 'atomic' - in that it first 
turns off everything, then looks for everything that needs to be 
synchronized.

OTOH, it _shouldnt_ make much of a difference on a correctly 
working system - we ought to be able to disable the irqs one by 
one and wait on any pending ones on the spot. Maybe if there was 
some implicit dependency between irq sources it would be more 
robust to do Linus's version.

Dunno ... no strong feelings either way.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 17:16             ` Ingo Molnar
@ 2009-02-23 17:28                 ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-23 17:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Johannes Berg, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Mon, 23 Feb 2009, Ingo Molnar wrote:
> 
> Linus, do you have a strong opinion about which variant we 
> should use?

Strong? No. I think mine is better just because _if_ another CPU is busy 
handling an interrupt that we're just now disabling, we'll just go on to 
the next interrupt. Waiting for them all at the end is always more 
efficient.

But does it really matter? No. In this case I think we've shut down all 
other CPU's anyway, so the whole "serialize_irq()" should probably not 
even be needed. 

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-23 17:28                 ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-23 17:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Johannes Berg, pm list



On Mon, 23 Feb 2009, Ingo Molnar wrote:
> 
> Linus, do you have a strong opinion about which variant we 
> should use?

Strong? No. I think mine is better just because _if_ another CPU is busy 
handling an interrupt that we're just now disabling, we'll just go on to 
the next interrupt. Waiting for them all at the end is always more 
efficient.

But does it really matter? No. In this case I think we've shut down all 
other CPU's anyway, so the whole "serialize_irq()" should probably not 
even be needed. 

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 12:28             ` Ingo Molnar
  2009-02-23 14:48                 ` Rafael J. Wysocki
  2009-02-23 20:49               ` Benjamin Herrenschmidt
@ 2009-02-23 20:49               ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-23 20:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Johannes Berg, Linus Torvalds, LKML,
	Eric W. Biederman, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner


> That property can be implied in the function name:
> 
>    	device_power_down_irq_disable(PMSG_SUSPEND);
> 
> Open-coding it, if it looks the same in all the cases just 
> increases the chances that someone somewhere copies them 
> incorrectly.

No. Some archs need to do "special" things at the irq disable point,
leave it open coded in the caller please.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 12:28             ` Ingo Molnar
  2009-02-23 14:48                 ` Rafael J. Wysocki
@ 2009-02-23 20:49               ` Benjamin Herrenschmidt
  2009-02-23 20:49               ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-23 20:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Johannes Berg, Linus Torvalds, pm list


> That property can be implied in the function name:
> 
>    	device_power_down_irq_disable(PMSG_SUSPEND);
> 
> Open-coding it, if it looks the same in all the cases just 
> increases the chances that someone somewhere copies them 
> incorrectly.

No. Some archs need to do "special" things at the irq disable point,
leave it open coded in the caller please.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 15:28                     ` Eric W. Biederman
  2009-02-23 21:39                       ` Rafael J. Wysocki
@ 2009-02-23 21:39                       ` Rafael J. Wysocki
  2009-02-24  3:30                         ` Eric W. Biederman
  2009-02-24  3:30                         ` Eric W. Biederman
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 21:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Monday 23 February 2009, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> >> I don't know where in the state machine this is getting called but
> >> I would suggest doing this before we shutdown cpus.
> >
> > This is the plan.  In fact, I'm going to do this in the next patch after the
> > $subject one has been tested and found acceptable.
> 
> Good to hear.  Then let's please get a version of the irq disable that calls
> shutdown, so we can be certain we don't have hardware irqs in flight.
> 
> For the drivers it should not matter for clean cpu shutdown it will.

OK, I will.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 15:28                     ` Eric W. Biederman
@ 2009-02-23 21:39                       ` Rafael J. Wysocki
  2009-02-23 21:39                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 21:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

On Monday 23 February 2009, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> >> I don't know where in the state machine this is getting called but
> >> I would suggest doing this before we shutdown cpus.
> >
> > This is the plan.  In fact, I'm going to do this in the next patch after the
> > $subject one has been tested and found acceptable.
> 
> Good to hear.  Then let's please get a version of the irq disable that calls
> shutdown, so we can be certain we don't have hardware irqs in flight.
> 
> For the drivers it should not matter for clean cpu shutdown it will.

OK, I will.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 15:06                         ` Ingo Molnar
@ 2009-02-23 21:59                           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 21:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric W. Biederman, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Monday 23 February 2009, Ingo Molnar wrote:
> > > 
> > > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > 
> > > > > What makes s2ram fragile is not human failure but the 
> > > > > combination of a handful of physical property:
> > > > >
> > > > > 1) Psychology: shutting the lid or pushing the suspend button is 
> > > > >    a deceivingly 'simple' action to the user. But under the 
> > > > >    hood, a ton of stuff happens: we deinitialize a lot of 
> > > > >    things, we go through _all hardware state_, and we do so in a 
> > > > >    serial fashion. If just one piece fails to do the right 
> > > > >    thing, the box might not resume. Still, the user expects this 
> > > > >    'simple' thing to just work, all the time. No excuses 
> > > > >    accepted.
> > > > >
> > > > > 2) Length of code: To get a successful s2ram sequence the kernel
> > > > >    runs through tens of thousands of lines of code. Code which
> > > > >    never gets executed on a normal box - only if we s2ram. If 
> > > > >    just one step fails, we get a hung box.
> > > > >
> > > > > 3) Debuggability: a lot of s2ram code runs with the console off, 
> > > > >    making any bugs hard to debug. Furthermore we have no 
> > > > >    meaningful persistent storage either for kernel bug messages. 
> > > > >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> > > > >    of information and it takes a lot of time to debug a bug via 
> > > > >    that method.
> > > > 
> > > > Yep that is an issue.
> > > 
> > > I'd also like to add #4:
> > > 
> > >      4) One more thing that makes s2ram special is that when the 
> > >         resume path finds hardware often in an even more 
> > >         deinitialized form than during normal bootup. During
> > >         normal bootup the BIOS/firmware has at least done some
> > >         minimal bootstrap (to get the kernel loaded), which
> > >         makes life easier for the kernel.
> > > 
> > >         At s2ram stage we've got a completely pure hardware
> > >         init state, with very minimal firmware activation.
> > 
> > This is very true and at least in some cases done on purpose, 
> > AFAICS, due to some timing constraints forced on HW vendors by 
> > M$, for example.
> 
> IMHO i think it's the technically sane thing to do. Personally i 
> trust the quirks of bare metal much more than the combined 
> quirks of firmware _and_ bare metal.
> 
> > >         So many of the init and deinit problems and bugs we 
> > >         only hit in the s2ram path - which dynamics is again 
> > >         not helpful.
> > 
> > Plus ACPI requires us to do additional things during 
> > suspend-resume that are not done on boot-shutdown and which 
> > have their own ordering requirements (not necessarily stated 
> > directly, but such that we have do discover experimentally).  
> > That also change from one BIOS to another.
> 
> We could perhaps do a few things here to trigger bugs sooner.
> 
> For example at driver init, instead of executing just 
> ->driver_open(), we could execute:
> 
>    ->driver_open()
>    ->driver_suspend()
>    ->driver_resume()

I'm not sure.  On PCI we run some code apart from the driver's suspend and
resume callbacks, especially in the new framework, and the bus type executes
the driver callbacks.

> I.e. we'd simulate a suspend+resume mini-step. This makes it 
> sure that the basic driver callbacks are sane. It is also 
> supposed to work because the driver is just being initialized. 
>
> This way certain types of bugs would not show up as difficult to 
> debug s2ram regressions - but would show up as 'boot hang' or 
> 'boot crash' bugs.

There is a testing facility exactly for this (/sys/power/pm_test) that allows
you to simulate the entire suspend sequence without suspending as well as some
separate pieces of it.  Still, it doesn't work very well, because the
conditions in which the resume callbacks are being run differ substantially
from the conditions right after we get control from the BIOS.

For one example, if ->suspend() puts the device into D3, then your simulated
->resume() will get the device in D3, while the BIOS would probably put it into
D0 (at least as far as PCI devices are concerned).

> This does not simulate the "big picture" resume machinery (the 
> dependencies, etc.), nor does it trigger any of the "hardware 
> got really turned off" effects that true resume will trigger - 
> but at least it offloads a portion of the testing space from 
> 's2ram' to 'bootup' testing.
> 
> What's your feeling - what percentage of all s2ram regressions 
> in the last year or so could have been triggered this way? Lets 
> assume we had 100 regressions in that timeframe - would it be in 
> the 10 bugs range? Or much lower or much higher?

Very small number of actual bugs with rather a lot of false positives.

IMO there are three basic sources of recent suspend regressions:
1) Arch-dependent changes (x86 mostly) and low-level changes affecting suspend
   (like PCI bus enumeration, IOMMU etc.), where people didn't realize their
   modifications would have a broader effect.
2) PM core changes where we weren't sure what was the best way to go (probably
   I'm to blame for the majority of these).
3) Changes related to graphics (this has always been difficult, but is getting
   much better now).

Driver regressions, other than the graphics-related, are really a very small
fraction.

Well, there still are some known problems unsolved, but that's a different
matter.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-23 21:59                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 21:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Monday 23 February 2009, Ingo Molnar wrote:
> > > 
> > > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > 
> > > > > What makes s2ram fragile is not human failure but the 
> > > > > combination of a handful of physical property:
> > > > >
> > > > > 1) Psychology: shutting the lid or pushing the suspend button is 
> > > > >    a deceivingly 'simple' action to the user. But under the 
> > > > >    hood, a ton of stuff happens: we deinitialize a lot of 
> > > > >    things, we go through _all hardware state_, and we do so in a 
> > > > >    serial fashion. If just one piece fails to do the right 
> > > > >    thing, the box might not resume. Still, the user expects this 
> > > > >    'simple' thing to just work, all the time. No excuses 
> > > > >    accepted.
> > > > >
> > > > > 2) Length of code: To get a successful s2ram sequence the kernel
> > > > >    runs through tens of thousands of lines of code. Code which
> > > > >    never gets executed on a normal box - only if we s2ram. If 
> > > > >    just one step fails, we get a hung box.
> > > > >
> > > > > 3) Debuggability: a lot of s2ram code runs with the console off, 
> > > > >    making any bugs hard to debug. Furthermore we have no 
> > > > >    meaningful persistent storage either for kernel bug messages. 
> > > > >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> > > > >    of information and it takes a lot of time to debug a bug via 
> > > > >    that method.
> > > > 
> > > > Yep that is an issue.
> > > 
> > > I'd also like to add #4:
> > > 
> > >      4) One more thing that makes s2ram special is that when the 
> > >         resume path finds hardware often in an even more 
> > >         deinitialized form than during normal bootup. During
> > >         normal bootup the BIOS/firmware has at least done some
> > >         minimal bootstrap (to get the kernel loaded), which
> > >         makes life easier for the kernel.
> > > 
> > >         At s2ram stage we've got a completely pure hardware
> > >         init state, with very minimal firmware activation.
> > 
> > This is very true and at least in some cases done on purpose, 
> > AFAICS, due to some timing constraints forced on HW vendors by 
> > M$, for example.
> 
> IMHO i think it's the technically sane thing to do. Personally i 
> trust the quirks of bare metal much more than the combined 
> quirks of firmware _and_ bare metal.
> 
> > >         So many of the init and deinit problems and bugs we 
> > >         only hit in the s2ram path - which dynamics is again 
> > >         not helpful.
> > 
> > Plus ACPI requires us to do additional things during 
> > suspend-resume that are not done on boot-shutdown and which 
> > have their own ordering requirements (not necessarily stated 
> > directly, but such that we have do discover experimentally).  
> > That also change from one BIOS to another.
> 
> We could perhaps do a few things here to trigger bugs sooner.
> 
> For example at driver init, instead of executing just 
> ->driver_open(), we could execute:
> 
>    ->driver_open()
>    ->driver_suspend()
>    ->driver_resume()

I'm not sure.  On PCI we run some code apart from the driver's suspend and
resume callbacks, especially in the new framework, and the bus type executes
the driver callbacks.

> I.e. we'd simulate a suspend+resume mini-step. This makes it 
> sure that the basic driver callbacks are sane. It is also 
> supposed to work because the driver is just being initialized. 
>
> This way certain types of bugs would not show up as difficult to 
> debug s2ram regressions - but would show up as 'boot hang' or 
> 'boot crash' bugs.

There is a testing facility exactly for this (/sys/power/pm_test) that allows
you to simulate the entire suspend sequence without suspending as well as some
separate pieces of it.  Still, it doesn't work very well, because the
conditions in which the resume callbacks are being run differ substantially
from the conditions right after we get control from the BIOS.

For one example, if ->suspend() puts the device into D3, then your simulated
->resume() will get the device in D3, while the BIOS would probably put it into
D0 (at least as far as PCI devices are concerned).

> This does not simulate the "big picture" resume machinery (the 
> dependencies, etc.), nor does it trigger any of the "hardware 
> got really turned off" effects that true resume will trigger - 
> but at least it offloads a portion of the testing space from 
> 's2ram' to 'bootup' testing.
> 
> What's your feeling - what percentage of all s2ram regressions 
> in the last year or so could have been triggered this way? Lets 
> assume we had 100 regressions in that timeframe - would it be in 
> the 10 bugs range? Or much lower or much higher?

Very small number of actual bugs with rather a lot of false positives.

IMO there are three basic sources of recent suspend regressions:
1) Arch-dependent changes (x86 mostly) and low-level changes affecting suspend
   (like PCI bus enumeration, IOMMU etc.), where people didn't realize their
   modifications would have a broader effect.
2) PM core changes where we weren't sure what was the best way to go (probably
   I'm to blame for the majority of these).
3) Changes related to graphics (this has always been difficult, but is getting
   much better now).

Driver regressions, other than the graphics-related, are really a very small
fraction.

Well, there still are some known problems unsolved, but that's a different
matter.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-22 17:39 ` [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-02-23 22:11   ` Arve Hjønnevåg
@ 2009-02-23 22:11   ` Arve Hjønnevåg
  2009-02-23 22:23       ` Rafael J. Wysocki
  3 siblings, 1 reply; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-23 22:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Sun, Feb 22, 2009 at 9:39 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> Introduce two helper functions allowing us to disable device
> interrupts (at the IO-APIC level) during suspend or hibernation
> and enable them during the subsequent resume, respectively, so that
> the timer interrupts are enabled while "late" suspend callbacks and
> "early" resume callbacks provided by device drivers are being
> executed.
>
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> interrupts will be disabled (at the IO-APIC level), with the help of
> the new helper function, before calling "late" suspend callbacks
> provided by device drivers and analogously during resume.
>

What impact does this have on wakeup interrupts? Unless you add a
check, after masking all interrupt at the CPU, to abort suspend if any
wakeup interrupt has IRQ_PENDING set I think you will loose wakeup
interrupts (at least for irqs that use default_disable).

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-22 17:39 ` [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
  2009-02-22 18:01   ` Linus Torvalds
  2009-02-22 18:01   ` Linus Torvalds
@ 2009-02-23 22:11   ` Arve Hjønnevåg
  2009-02-23 22:11   ` Arve Hjønnevåg
  3 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-23 22:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Sun, Feb 22, 2009 at 9:39 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> Introduce two helper functions allowing us to disable device
> interrupts (at the IO-APIC level) during suspend or hibernation
> and enable them during the subsequent resume, respectively, so that
> the timer interrupts are enabled while "late" suspend callbacks and
> "early" resume callbacks provided by device drivers are being
> executed.
>
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> interrupts will be disabled (at the IO-APIC level), with the help of
> the new helper function, before calling "late" suspend callbacks
> provided by device drivers and analogously during resume.
>

What impact does this have on wakeup interrupts? Unless you add a
check, after masking all interrupt at the CPU, to abort suspend if any
wakeup interrupt has IRQ_PENDING set I think you will loose wakeup
interrupts (at least for irqs that use default_disable).

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 17:28                 ` Linus Torvalds
  (?)
@ 2009-02-23 22:11                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Johannes Berg, LKML, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Monday 23 February 2009, Linus Torvalds wrote:
> 
> On Mon, 23 Feb 2009, Ingo Molnar wrote:
> > 
> > Linus, do you have a strong opinion about which variant we 
> > should use?
> 
> Strong? No. I think mine is better just because _if_ another CPU is busy 
> handling an interrupt that we're just now disabling, we'll just go on to 
> the next interrupt. Waiting for them all at the end is always more 
> efficient.
> 
> But does it really matter? No. In this case I think we've shut down all 
> other CPU's anyway, so the whole "serialize_irq()" should probably not 
> even be needed. 

But we're going to move the shutting down of the other CPUs after this point.

Finally, the sequence is going to be:
- "normal" suspend of devices
- disable device interrupts
- "late" suspend of devices
- _PTS
- disable nonboot CPUs
- local_irq_disable
- sysdev_suspend
[This is because ACPI wants us to put devices into low power states before
doing the _PTS, which in turn is supposed to be done before the disabling of
nonboot CPUs, and we want to put devices into low power states during "late"
suspend.  Of course, analogously for the resume part.]

So, I think your version is _really_ better. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 17:28                 ` Linus Torvalds
  (?)
  (?)
@ 2009-02-23 22:11                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, pm list, Ingo Molnar, Johannes Berg

On Monday 23 February 2009, Linus Torvalds wrote:
> 
> On Mon, 23 Feb 2009, Ingo Molnar wrote:
> > 
> > Linus, do you have a strong opinion about which variant we 
> > should use?
> 
> Strong? No. I think mine is better just because _if_ another CPU is busy 
> handling an interrupt that we're just now disabling, we'll just go on to 
> the next interrupt. Waiting for them all at the end is always more 
> efficient.
> 
> But does it really matter? No. In this case I think we've shut down all 
> other CPU's anyway, so the whole "serialize_irq()" should probably not 
> even be needed. 

But we're going to move the shutting down of the other CPUs after this point.

Finally, the sequence is going to be:
- "normal" suspend of devices
- disable device interrupts
- "late" suspend of devices
- _PTS
- disable nonboot CPUs
- local_irq_disable
- sysdev_suspend
[This is because ACPI wants us to put devices into low power states before
doing the _PTS, which in turn is supposed to be done before the disabling of
nonboot CPUs, and we want to put devices into low power states during "late"
suspend.  Of course, analogously for the resume part.]

So, I think your version is _really_ better. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 22:11   ` Arve Hjønnevåg
@ 2009-02-23 22:23       ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:23 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Monday 23 February 2009, Arve Hjønnevåg wrote:
> On Sun, Feb 22, 2009 at 9:39 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Introduce two helper functions allowing us to disable device
> > interrupts (at the IO-APIC level) during suspend or hibernation
> > and enable them during the subsequent resume, respectively, so that
> > the timer interrupts are enabled while "late" suspend callbacks and
> > "early" resume callbacks provided by device drivers are being
> > executed.
> >
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > interrupts will be disabled (at the IO-APIC level), with the help of
> > the new helper function, before calling "late" suspend callbacks
> > provided by device drivers and analogously during resume.
> >
> 
> What impact does this have on wakeup interrupts? Unless you add a
> check, after masking all interrupt at the CPU, to abort suspend if any
> wakeup interrupt has IRQ_PENDING set I think you will loose wakeup
> interrupts (at least for irqs that use default_disable).

I _think_ they would have to be reenabled after we've called
local_irq_disable().

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-23 22:23       ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:23 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Monday 23 February 2009, Arve Hjønnevåg wrote:
> On Sun, Feb 22, 2009 at 9:39 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Introduce two helper functions allowing us to disable device
> > interrupts (at the IO-APIC level) during suspend or hibernation
> > and enable them during the subsequent resume, respectively, so that
> > the timer interrupts are enabled while "late" suspend callbacks and
> > "early" resume callbacks provided by device drivers are being
> > executed.
> >
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > interrupts will be disabled (at the IO-APIC level), with the help of
> > the new helper function, before calling "late" suspend callbacks
> > provided by device drivers and analogously during resume.
> >
> 
> What impact does this have on wakeup interrupts? Unless you add a
> check, after masking all interrupt at the CPU, to abort suspend if any
> wakeup interrupt has IRQ_PENDING set I think you will loose wakeup
> interrupts (at least for irqs that use default_disable).

I _think_ they would have to be reenabled after we've called
local_irq_disable().

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-23 22:23       ` Rafael J. Wysocki
  (?)
@ 2009-02-23 22:44       ` Arve Hjønnevåg
  -1 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-23 22:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Mon, Feb 23, 2009 at 2:23 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Monday 23 February 2009, Arve Hjønnevåg wrote:
>> On Sun, Feb 22, 2009 at 9:39 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > From: Rafael J. Wysocki <rjw@sisk.pl>
>> >
>> > Introduce two helper functions allowing us to disable device
>> > interrupts (at the IO-APIC level) during suspend or hibernation
>> > and enable them during the subsequent resume, respectively, so that
>> > the timer interrupts are enabled while "late" suspend callbacks and
>> > "early" resume callbacks provided by device drivers are being
>> > executed.
>> >
>> > Use these functions to rework the handling of interrupts during
>> > suspend (hibernation) and resume.  Namely, interrupts will only be
>> > disabled on the CPU right before suspending sysdevs, while device
>> > interrupts will be disabled (at the IO-APIC level), with the help of
>> > the new helper function, before calling "late" suspend callbacks
>> > provided by device drivers and analogously during resume.
>> >
>>
>> What impact does this have on wakeup interrupts? Unless you add a
>> check, after masking all interrupt at the CPU, to abort suspend if any
>> wakeup interrupt has IRQ_PENDING set I think you will loose wakeup
>> interrupts (at least for irqs that use default_disable).
>
> I _think_ they would have to be reenabled after we've called
> local_irq_disable().

Are you talking about the irq_chip switching from enabled interrupts
to wake interrupts? It is not enough for the irq_chip to reenable the
hardware interrupt. If the interrupt is edge triggered and occurred
after you disabled it, but before local_irq_disable, the only record
of it is the IRQ_PENDING flag.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 22:23       ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-02-23 22:44       ` Arve Hjønnevåg
  -1 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-23 22:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Mon, Feb 23, 2009 at 2:23 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Monday 23 February 2009, Arve Hjønnevåg wrote:
>> On Sun, Feb 22, 2009 at 9:39 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > From: Rafael J. Wysocki <rjw@sisk.pl>
>> >
>> > Introduce two helper functions allowing us to disable device
>> > interrupts (at the IO-APIC level) during suspend or hibernation
>> > and enable them during the subsequent resume, respectively, so that
>> > the timer interrupts are enabled while "late" suspend callbacks and
>> > "early" resume callbacks provided by device drivers are being
>> > executed.
>> >
>> > Use these functions to rework the handling of interrupts during
>> > suspend (hibernation) and resume.  Namely, interrupts will only be
>> > disabled on the CPU right before suspending sysdevs, while device
>> > interrupts will be disabled (at the IO-APIC level), with the help of
>> > the new helper function, before calling "late" suspend callbacks
>> > provided by device drivers and analogously during resume.
>> >
>>
>> What impact does this have on wakeup interrupts? Unless you add a
>> check, after masking all interrupt at the CPU, to abort suspend if any
>> wakeup interrupt has IRQ_PENDING set I think you will loose wakeup
>> interrupts (at least for irqs that use default_disable).
>
> I _think_ they would have to be reenabled after we've called
> local_irq_disable().

Are you talking about the irq_chip switching from enabled interrupts
to wake interrupts? It is not enough for the irq_chip to reenable the
hardware interrupt. If the interrupt is edge triggered and occurred
after you disabled it, but before local_irq_disable, the only record
of it is the IRQ_PENDING flag.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 21:39                       ` Rafael J. Wysocki
@ 2009-02-24  3:30                         ` Eric W. Biederman
  2009-02-24 22:42                           ` Rafael J. Wysocki
  2009-02-24 22:42                           ` Rafael J. Wysocki
  2009-02-24  3:30                         ` Eric W. Biederman
  1 sibling, 2 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-24  3:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Monday 23 February 2009, Eric W. Biederman wrote:
>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>> 
>> >> I don't know where in the state machine this is getting called but
>> >> I would suggest doing this before we shutdown cpus.
>> >
>> > This is the plan.  In fact, I'm going to do this in the next patch after the
>> > $subject one has been tested and found acceptable.
>> 
>> Good to hear.  Then let's please get a version of the irq disable that calls
>> shutdown, so we can be certain we don't have hardware irqs in flight.
>> 
>> For the drivers it should not matter for clean cpu shutdown it will.
>
> OK, I will.

My apologies I was wrong.  Calling shutdown is not safe.

I just remembered that masking an ioapic from anywhere besides the
irq handler can lock the ioapic state machine, and lead to non-recoverable
interrupts.  It is rare but I have seen it happen.  I wanted to figure out
how to migrate interrupts outside of interrupt context and this was what
prevented me.  A suspend/resume cycle might be enough of a reset to
get the ioapic out of that state but I don't know.

The only safe way on x86 to shutdown a level triggered ioapic irq
outside of irq context is for the driver to program the hardware to
not generate an irq.

Therefore doing anything with the irqs at the point where we are
suspending them is a formality, and perhaps simply code that ensures
in-flight irqs don't make it past a certain point.

I believe we just need to call disable() and print a big nasty warning
if any irq comes in after the suspend stage.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-23 21:39                       ` Rafael J. Wysocki
  2009-02-24  3:30                         ` Eric W. Biederman
@ 2009-02-24  3:30                         ` Eric W. Biederman
  1 sibling, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-24  3:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Monday 23 February 2009, Eric W. Biederman wrote:
>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>> 
>> >> I don't know where in the state machine this is getting called but
>> >> I would suggest doing this before we shutdown cpus.
>> >
>> > This is the plan.  In fact, I'm going to do this in the next patch after the
>> > $subject one has been tested and found acceptable.
>> 
>> Good to hear.  Then let's please get a version of the irq disable that calls
>> shutdown, so we can be certain we don't have hardware irqs in flight.
>> 
>> For the drivers it should not matter for clean cpu shutdown it will.
>
> OK, I will.

My apologies I was wrong.  Calling shutdown is not safe.

I just remembered that masking an ioapic from anywhere besides the
irq handler can lock the ioapic state machine, and lead to non-recoverable
interrupts.  It is rare but I have seen it happen.  I wanted to figure out
how to migrate interrupts outside of interrupt context and this was what
prevented me.  A suspend/resume cycle might be enough of a reset to
get the ioapic out of that state but I don't know.

The only safe way on x86 to shutdown a level triggered ioapic irq
outside of irq context is for the driver to program the hardware to
not generate an irq.

Therefore doing anything with the irqs at the point where we are
suspending them is a formality, and perhaps simply code that ensures
in-flight irqs don't make it past a certain point.

I believe we just need to call disable() and print a big nasty warning
if any irq comes in after the suspend stage.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24  3:30                         ` Eric W. Biederman
@ 2009-02-24 22:42                           ` Rafael J. Wysocki
  2009-02-24 22:51                             ` Linus Torvalds
                                               ` (3 more replies)
  2009-02-24 22:42                           ` Rafael J. Wysocki
  1 sibling, 4 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24 22:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Linus Torvalds, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Tuesday 24 February 2009, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Monday 23 February 2009, Eric W. Biederman wrote:
> >> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >> 
> >> >> I don't know where in the state machine this is getting called but
> >> >> I would suggest doing this before we shutdown cpus.
> >> >
> >> > This is the plan.  In fact, I'm going to do this in the next patch after the
> >> > $subject one has been tested and found acceptable.
> >> 
> >> Good to hear.  Then let's please get a version of the irq disable that calls
> >> shutdown, so we can be certain we don't have hardware irqs in flight.
> >> 
> >> For the drivers it should not matter for clean cpu shutdown it will.
> >
> > OK, I will.
> 
> My apologies I was wrong.  Calling shutdown is not safe.
> 
> I just remembered that masking an ioapic from anywhere besides the
> irq handler can lock the ioapic state machine, and lead to non-recoverable
> interrupts.  It is rare but I have seen it happen.  I wanted to figure out
> how to migrate interrupts outside of interrupt context and this was what
> prevented me.  A suspend/resume cycle might be enough of a reset to
> get the ioapic out of that state but I don't know.
> 
> The only safe way on x86 to shutdown a level triggered ioapic irq
> outside of irq context is for the driver to program the hardware to
> not generate an irq.

Well, that changes things quite a bit, because it means we can't change the
suspend-resume sequence in a way we thought we could without fixing all
drivers first, but this is exactly what we'd like to avoid by changing the
core.

I think the most important source of level triggered interrupts are PCI
devices, so perhaps we can make the PCI PM core use bit 10 of the PCI Device
Control register to prevent devices from generating INTx after the drivers'
suspend routines have been executed?

> Therefore doing anything with the irqs at the point where we are
> suspending them is a formality, and perhaps simply code that ensures
> in-flight irqs don't make it past a certain point.
> 
> I believe we just need to call disable() and print a big nasty warning
> if any irq comes in after the suspend stage.

At the moment we're safe, since PCI devices are put into low power states
in the suspend stage.  However, we'd like to make that happen in the "late
suspend" stage to avoid a problem with a shared interrupt occuring after one
of the devices using it has been suspended and its driver's irq handler can't
cope with that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24  3:30                         ` Eric W. Biederman
  2009-02-24 22:42                           ` Rafael J. Wysocki
@ 2009-02-24 22:42                           ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24 22:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

On Tuesday 24 February 2009, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Monday 23 February 2009, Eric W. Biederman wrote:
> >> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >> 
> >> >> I don't know where in the state machine this is getting called but
> >> >> I would suggest doing this before we shutdown cpus.
> >> >
> >> > This is the plan.  In fact, I'm going to do this in the next patch after the
> >> > $subject one has been tested and found acceptable.
> >> 
> >> Good to hear.  Then let's please get a version of the irq disable that calls
> >> shutdown, so we can be certain we don't have hardware irqs in flight.
> >> 
> >> For the drivers it should not matter for clean cpu shutdown it will.
> >
> > OK, I will.
> 
> My apologies I was wrong.  Calling shutdown is not safe.
> 
> I just remembered that masking an ioapic from anywhere besides the
> irq handler can lock the ioapic state machine, and lead to non-recoverable
> interrupts.  It is rare but I have seen it happen.  I wanted to figure out
> how to migrate interrupts outside of interrupt context and this was what
> prevented me.  A suspend/resume cycle might be enough of a reset to
> get the ioapic out of that state but I don't know.
> 
> The only safe way on x86 to shutdown a level triggered ioapic irq
> outside of irq context is for the driver to program the hardware to
> not generate an irq.

Well, that changes things quite a bit, because it means we can't change the
suspend-resume sequence in a way we thought we could without fixing all
drivers first, but this is exactly what we'd like to avoid by changing the
core.

I think the most important source of level triggered interrupts are PCI
devices, so perhaps we can make the PCI PM core use bit 10 of the PCI Device
Control register to prevent devices from generating INTx after the drivers'
suspend routines have been executed?

> Therefore doing anything with the irqs at the point where we are
> suspending them is a formality, and perhaps simply code that ensures
> in-flight irqs don't make it past a certain point.
> 
> I believe we just need to call disable() and print a big nasty warning
> if any irq comes in after the suspend stage.

At the moment we're safe, since PCI devices are put into low power states
in the suspend stage.  However, we'd like to make that happen in the "late
suspend" stage to avoid a problem with a shared interrupt occuring after one
of the devices using it has been suspended and its driver's irq handler can't
cope with that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:42                           ` Rafael J. Wysocki
  2009-02-24 22:51                             ` Linus Torvalds
@ 2009-02-24 22:51                             ` Linus Torvalds
  2009-02-24 23:07                               ` Rafael J. Wysocki
                                                 ` (3 more replies)
  2009-02-25 15:32                             ` Alan Stern
  2009-02-25 15:32                             ` [linux-pm] " Alan Stern
  3 siblings, 4 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-24 22:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Eric W. Biederman, Ingo Molnar, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner



On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
>
> > The only safe way on x86 to shutdown a level triggered ioapic irq
> > outside of irq context is for the driver to program the hardware to
> > not generate an irq.
> 
> Well, that changes things quite a bit, because it means we can't change the
> suspend-resume sequence in a way we thought we could without fixing all
> drivers first, but this is exactly what we'd like to avoid by changing the
> core.

Calling "disable_irq()" is perfectly fine.

What is not possible on that broken IO-APIC (among other things) is to 
actually turn the interrupts off at the apic (ie the whole ->shutdown() 
thing). But that's not what we even want to do. What we care about is 
just disabling the interrupt from a drievr perspective.

IOW, the patches I have seen are fine, and all the comments from Eric are 
just confusion about what we want done.

WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may happen later, but 
that's totally unrelated to this whole "suspend_device_irq()" thing.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:42                           ` Rafael J. Wysocki
@ 2009-02-24 22:51                             ` Linus Torvalds
  2009-02-24 22:51                             ` Linus Torvalds
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-24 22:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
>
> > The only safe way on x86 to shutdown a level triggered ioapic irq
> > outside of irq context is for the driver to program the hardware to
> > not generate an irq.
> 
> Well, that changes things quite a bit, because it means we can't change the
> suspend-resume sequence in a way we thought we could without fixing all
> drivers first, but this is exactly what we'd like to avoid by changing the
> core.

Calling "disable_irq()" is perfectly fine.

What is not possible on that broken IO-APIC (among other things) is to 
actually turn the interrupts off at the apic (ie the whole ->shutdown() 
thing). But that's not what we even want to do. What we care about is 
just disabling the interrupt from a drievr perspective.

IOW, the patches I have seen are fine, and all the comments from Eric are 
just confusion about what we want done.

WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may happen later, but 
that's totally unrelated to this whole "suspend_device_irq()" thing.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:51                             ` Linus Torvalds
@ 2009-02-24 23:07                               ` Rafael J. Wysocki
  2009-02-24 23:09                                 ` Ingo Molnar
  2009-02-24 23:09                                 ` Ingo Molnar
  2009-02-24 23:07                               ` Rafael J. Wysocki
                                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24 23:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Ingo Molnar, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Tuesday 24 February 2009, Linus Torvalds wrote:
> 
> On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> >
> > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > outside of irq context is for the driver to program the hardware to
> > > not generate an irq.
> > 
> > Well, that changes things quite a bit, because it means we can't change the
> > suspend-resume sequence in a way we thought we could without fixing all
> > drivers first, but this is exactly what we'd like to avoid by changing the
> > core.
> 
> Calling "disable_irq()" is perfectly fine.
> 
> What is not possible on that broken IO-APIC (among other things) is to 
> actually turn the interrupts off at the apic (ie the whole ->shutdown() 
> thing). But that's not what we even want to do. What we care about is 
> just disabling the interrupt from a drievr perspective.
> 
> IOW, the patches I have seen are fine, and all the comments from Eric are 
> just confusion about what we want done.

Ah, OK.  Thanks for the explanation, I got confused too.

> WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may happen later, but 
> that's totally unrelated to this whole "suspend_device_irq()" thing.

Yeah.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:51                             ` Linus Torvalds
  2009-02-24 23:07                               ` Rafael J. Wysocki
@ 2009-02-24 23:07                               ` Rafael J. Wysocki
  2009-02-25  4:16                               ` Eric W. Biederman
  2009-02-25  4:16                               ` Eric W. Biederman
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24 23:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Tuesday 24 February 2009, Linus Torvalds wrote:
> 
> On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> >
> > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > outside of irq context is for the driver to program the hardware to
> > > not generate an irq.
> > 
> > Well, that changes things quite a bit, because it means we can't change the
> > suspend-resume sequence in a way we thought we could without fixing all
> > drivers first, but this is exactly what we'd like to avoid by changing the
> > core.
> 
> Calling "disable_irq()" is perfectly fine.
> 
> What is not possible on that broken IO-APIC (among other things) is to 
> actually turn the interrupts off at the apic (ie the whole ->shutdown() 
> thing). But that's not what we even want to do. What we care about is 
> just disabling the interrupt from a drievr perspective.
> 
> IOW, the patches I have seen are fine, and all the comments from Eric are 
> just confusion about what we want done.

Ah, OK.  Thanks for the explanation, I got confused too.

> WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may happen later, but 
> that's totally unrelated to this whole "suspend_device_irq()" thing.

Yeah.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 23:07                               ` Rafael J. Wysocki
@ 2009-02-24 23:09                                 ` Ingo Molnar
  2009-02-24 23:29                                   ` Rafael J. Wysocki
  2009-02-24 23:29                                   ` Rafael J. Wysocki
  2009-02-24 23:09                                 ` Ingo Molnar
  1 sibling, 2 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-24 23:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Eric W. Biederman, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Tuesday 24 February 2009, Linus Torvalds wrote:
> > 
> > On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> > >
> > > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > > outside of irq context is for the driver to program the hardware to
> > > > not generate an irq.
> > > 
> > > Well, that changes things quite a bit, because it means we can't change the
> > > suspend-resume sequence in a way we thought we could without fixing all
> > > drivers first, but this is exactly what we'd like to avoid by changing the
> > > core.
> > 
> > Calling "disable_irq()" is perfectly fine.
> > 
> > What is not possible on that broken IO-APIC (among other 
> > things) is to actually turn the interrupts off at the apic 
> > (ie the whole ->shutdown() thing). But that's not what we 
> > even want to do. What we care about is just disabling the 
> > interrupt from a drievr perspective.
> > 
> > IOW, the patches I have seen are fine, and all the comments 
> > from Eric are just confusion about what we want done.
> 
> Ah, OK.  Thanks for the explanation, I got confused too.
> 
> > WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may 
> > happen later, but that's totally unrelated to this whole 
> > "suspend_device_irq()" thing.
> 
> Yeah.

We definitely dont want to turn off x86 IO-APICs - the timer IRQ 
goes via one of them.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 23:07                               ` Rafael J. Wysocki
  2009-02-24 23:09                                 ` Ingo Molnar
@ 2009-02-24 23:09                                 ` Ingo Molnar
  1 sibling, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-24 23:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Tuesday 24 February 2009, Linus Torvalds wrote:
> > 
> > On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> > >
> > > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > > outside of irq context is for the driver to program the hardware to
> > > > not generate an irq.
> > > 
> > > Well, that changes things quite a bit, because it means we can't change the
> > > suspend-resume sequence in a way we thought we could without fixing all
> > > drivers first, but this is exactly what we'd like to avoid by changing the
> > > core.
> > 
> > Calling "disable_irq()" is perfectly fine.
> > 
> > What is not possible on that broken IO-APIC (among other 
> > things) is to actually turn the interrupts off at the apic 
> > (ie the whole ->shutdown() thing). But that's not what we 
> > even want to do. What we care about is just disabling the 
> > interrupt from a drievr perspective.
> > 
> > IOW, the patches I have seen are fine, and all the comments 
> > from Eric are just confusion about what we want done.
> 
> Ah, OK.  Thanks for the explanation, I got confused too.
> 
> > WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may 
> > happen later, but that's totally unrelated to this whole 
> > "suspend_device_irq()" thing.
> 
> Yeah.

We definitely dont want to turn off x86 IO-APICs - the timer IRQ 
goes via one of them.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 23:09                                 ` Ingo Molnar
  2009-02-24 23:29                                   ` Rafael J. Wysocki
@ 2009-02-24 23:29                                   ` Rafael J. Wysocki
  2009-02-25 13:23                                     ` Ingo Molnar
                                                       ` (3 more replies)
  1 sibling, 4 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24 23:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Eric W. Biederman, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

On Wednesday 25 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Tuesday 24 February 2009, Linus Torvalds wrote:
> > > 
> > > On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> > > >
> > > > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > > > outside of irq context is for the driver to program the hardware to
> > > > > not generate an irq.
> > > > 
> > > > Well, that changes things quite a bit, because it means we can't change the
> > > > suspend-resume sequence in a way we thought we could without fixing all
> > > > drivers first, but this is exactly what we'd like to avoid by changing the
> > > > core.
> > > 
> > > Calling "disable_irq()" is perfectly fine.
> > > 
> > > What is not possible on that broken IO-APIC (among other 
> > > things) is to actually turn the interrupts off at the apic 
> > > (ie the whole ->shutdown() thing). But that's not what we 
> > > even want to do. What we care about is just disabling the 
> > > interrupt from a drievr perspective.
> > > 
> > > IOW, the patches I have seen are fine, and all the comments 
> > > from Eric are just confusion about what we want done.
> > 
> > Ah, OK.  Thanks for the explanation, I got confused too.
> > 
> > > WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may 
> > > happen later, but that's totally unrelated to this whole 
> > > "suspend_device_irq()" thing.
> > 
> > Yeah.
> 
> We definitely dont want to turn off x86 IO-APICs - the timer IRQ 
> goes via one of them.

No, we don't.  At least not at this point.

BTW, appended is the current (3rd) version of the $subject patch with some
of your comments taken into account.  In particular, I did the following:
- moved [suspend|resume]_device_irqs() to a separate file (pm.c)
- fixed interrupt.h so that their headers are at a better place
- made enable_irq() clear IRQ_SUSPENDED
- made device_power_down() and device_power_up() call
  suspend_device_irqs() and resume_device_irqs(), respectively, which
  simplified the callers quite a bit (it changed the Xen code ordering, though,
  but I _think_ it still should work).

Please have a look.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 3)

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 ++++++++--
 drivers/base/power/main.c |   20 ++++++++------
 drivers/xen/manage.c      |   16 ++++++-----
 include/linux/interrupt.h |    4 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/manage.c       |    3 +-
 kernel/irq/pm.c           |   63 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++---
 kernel/power/disk.c       |   39 +++++++++++++++++++++-------
 kernel/power/main.c       |   17 ++++++++----
 11 files changed, 146 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,10 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following two functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,63 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+/**
+ *	suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ *	During system-wide suspend or hibernation device interrupts need to be
+ *	disabled at the chip level and this function is provided for this
+ *	purpose.  It disables all interrupt lines that are enabled at the
+ *	moment and sets the IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (!desc->depth && !(desc->status & IRQ_WAKEUP)
+		    && desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
+			desc->chip->disable(irq);
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc) {
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
+ *
+ *	Enable all interrupt lines previously disabled by suspend_device_irqs()
+ *	that have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			enable_irq(irq);
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
-		unsigned int status = desc->status & ~IRQ_DISABLED;
+		unsigned int status;
 
+		status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 23:09                                 ` Ingo Molnar
@ 2009-02-24 23:29                                   ` Rafael J. Wysocki
  2009-02-24 23:29                                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24 23:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner

On Wednesday 25 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Tuesday 24 February 2009, Linus Torvalds wrote:
> > > 
> > > On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> > > >
> > > > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > > > outside of irq context is for the driver to program the hardware to
> > > > > not generate an irq.
> > > > 
> > > > Well, that changes things quite a bit, because it means we can't change the
> > > > suspend-resume sequence in a way we thought we could without fixing all
> > > > drivers first, but this is exactly what we'd like to avoid by changing the
> > > > core.
> > > 
> > > Calling "disable_irq()" is perfectly fine.
> > > 
> > > What is not possible on that broken IO-APIC (among other 
> > > things) is to actually turn the interrupts off at the apic 
> > > (ie the whole ->shutdown() thing). But that's not what we 
> > > even want to do. What we care about is just disabling the 
> > > interrupt from a drievr perspective.
> > > 
> > > IOW, the patches I have seen are fine, and all the comments 
> > > from Eric are just confusion about what we want done.
> > 
> > Ah, OK.  Thanks for the explanation, I got confused too.
> > 
> > > WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may 
> > > happen later, but that's totally unrelated to this whole 
> > > "suspend_device_irq()" thing.
> > 
> > Yeah.
> 
> We definitely dont want to turn off x86 IO-APICs - the timer IRQ 
> goes via one of them.

No, we don't.  At least not at this point.

BTW, appended is the current (3rd) version of the $subject patch with some
of your comments taken into account.  In particular, I did the following:
- moved [suspend|resume]_device_irqs() to a separate file (pm.c)
- fixed interrupt.h so that their headers are at a better place
- made enable_irq() clear IRQ_SUSPENDED
- made device_power_down() and device_power_up() call
  suspend_device_irqs() and resume_device_irqs(), respectively, which
  simplified the callers quite a bit (it changed the Xen code ordering, though,
  but I _think_ it still should work).

Please have a look.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 3)

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 ++++++++--
 drivers/base/power/main.c |   20 ++++++++------
 drivers/xen/manage.c      |   16 ++++++-----
 include/linux/interrupt.h |    4 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/manage.c       |    3 +-
 kernel/irq/pm.c           |   63 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++---
 kernel/power/disk.c       |   39 +++++++++++++++++++++-------
 kernel/power/main.c       |   17 ++++++++----
 11 files changed, 146 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,10 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following two functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,63 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+/**
+ *	suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ *	During system-wide suspend or hibernation device interrupts need to be
+ *	disabled at the chip level and this function is provided for this
+ *	purpose.  It disables all interrupt lines that are enabled at the
+ *	moment and sets the IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (!desc->depth && !(desc->status & IRQ_WAKEUP)
+		    && desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
+			desc->chip->disable(irq);
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc) {
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
+ *
+ *	Enable all interrupt lines previously disabled by suspend_device_irqs()
+ *	that have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			enable_irq(irq);
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
-		unsigned int status = desc->status & ~IRQ_DISABLED;
+		unsigned int status;
 
+		status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:51                             ` Linus Torvalds
  2009-02-24 23:07                               ` Rafael J. Wysocki
  2009-02-24 23:07                               ` Rafael J. Wysocki
@ 2009-02-25  4:16                               ` Eric W. Biederman
  2009-02-25  4:26                                   ` Linus Torvalds
  2009-02-25  4:16                               ` Eric W. Biederman
  3 siblings, 1 reply; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-25  4:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
>>
>> > The only safe way on x86 to shutdown a level triggered ioapic irq
>> > outside of irq context is for the driver to program the hardware to
>> > not generate an irq.
>> 
>> Well, that changes things quite a bit, because it means we can't change the
>> suspend-resume sequence in a way we thought we could without fixing all
>> drivers first, but this is exactly what we'd like to avoid by changing the
>> core.
>
> Calling "disable_irq()" is perfectly fine.

Agreed, I did not mean to indicate otherwise.

> What is not possible on that broken IO-APIC (among other things) is to 
> actually turn the interrupts off at the apic (ie the whole ->shutdown() 
> thing). But that's not what we even want to do. What we care about is 
> just disabling the interrupt from a drievr perspective.
>
> IOW, the patches I have seen are fine, and all the comments from Eric are 
> just confusion about what we want done.

Largely yes.

> WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may happen later, but 
> that's totally unrelated to this whole "suspend_device_irq()" thing.

Right.

The question I was asking is:
Can we get the broken cpu hotunplug code out of the suspend path?

If we can get the devices into a low power state and not generating
interrupts by the time we disable cpus then we do not need to migrate
irqs from process context and risk hitting the ioapic bugs.

While related safely suspending cpus is a different problem and a
different patch.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:51                             ` Linus Torvalds
                                                 ` (2 preceding siblings ...)
  2009-02-25  4:16                               ` Eric W. Biederman
@ 2009-02-25  4:16                               ` Eric W. Biederman
  3 siblings, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-25  4:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, pm list

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
>>
>> > The only safe way on x86 to shutdown a level triggered ioapic irq
>> > outside of irq context is for the driver to program the hardware to
>> > not generate an irq.
>> 
>> Well, that changes things quite a bit, because it means we can't change the
>> suspend-resume sequence in a way we thought we could without fixing all
>> drivers first, but this is exactly what we'd like to avoid by changing the
>> core.
>
> Calling "disable_irq()" is perfectly fine.

Agreed, I did not mean to indicate otherwise.

> What is not possible on that broken IO-APIC (among other things) is to 
> actually turn the interrupts off at the apic (ie the whole ->shutdown() 
> thing). But that's not what we even want to do. What we care about is 
> just disabling the interrupt from a drievr perspective.
>
> IOW, the patches I have seen are fine, and all the comments from Eric are 
> just confusion about what we want done.

Largely yes.

> WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may happen later, but 
> that's totally unrelated to this whole "suspend_device_irq()" thing.

Right.

The question I was asking is:
Can we get the broken cpu hotunplug code out of the suspend path?

If we can get the devices into a low power state and not generating
interrupts by the time we disable cpus then we do not need to migrate
irqs from process context and risk hitting the ioapic bugs.

While related safely suspending cpus is a different problem and a
different patch.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-25  4:16                               ` Eric W. Biederman
@ 2009-02-25  4:26                                   ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-25  4:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, Ingo Molnar, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner



On Tue, 24 Feb 2009, Eric W. Biederman wrote:
> The question I was asking is:
> Can we get the broken cpu hotunplug code out of the suspend path?

I think we can move it around. I don't think we can get rid of it.

> If we can get the devices into a low power state and not generating
> interrupts by the time we disable cpus then we do not need to migrate
> irqs from process context and risk hitting the ioapic bugs.

At least one issue is that the actual final "go to sleep" is something 
that has to happen on just one CPU. And I'm pretty sure the others have to 
have gone through the shutdown sequence before that.

And knowing ACPI, the ordering requirements will boil down to something 
insane, like "you have to turn off the other CPU's _before_ you turn off 
some od the core devices, because turning off the other CPU's may involve 
them". 

So if what you would _want_ to do is to move the "turn off CPU's" into the 
very innermost layer, so that different architectures can then decide 
whether they even need to go through that whole thing or not (because 
turning off one core will automatically turn off all the others, simply 
because the power was turned off), I suspect the answer is "no".

So you were probably hoping to never have to have that whole horrible 
issue with moving interrupts around. I'm afraid I'm not seeing it happen. 
But maybe we can have it happen after we've disabled all the non-system 
devices, so that in practice there simply won't be any new interrupts 
coming in any more.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-25  4:26                                   ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-25  4:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, pm list



On Tue, 24 Feb 2009, Eric W. Biederman wrote:
> The question I was asking is:
> Can we get the broken cpu hotunplug code out of the suspend path?

I think we can move it around. I don't think we can get rid of it.

> If we can get the devices into a low power state and not generating
> interrupts by the time we disable cpus then we do not need to migrate
> irqs from process context and risk hitting the ioapic bugs.

At least one issue is that the actual final "go to sleep" is something 
that has to happen on just one CPU. And I'm pretty sure the others have to 
have gone through the shutdown sequence before that.

And knowing ACPI, the ordering requirements will boil down to something 
insane, like "you have to turn off the other CPU's _before_ you turn off 
some od the core devices, because turning off the other CPU's may involve 
them". 

So if what you would _want_ to do is to move the "turn off CPU's" into the 
very innermost layer, so that different architectures can then decide 
whether they even need to go through that whole thing or not (because 
turning off one core will automatically turn off all the others, simply 
because the power was turned off), I suspect the answer is "no".

So you were probably hoping to never have to have that whole horrible 
issue with moving interrupts around. I'm afraid I'm not seeing it happen. 
But maybe we can have it happen after we've disabled all the non-system 
devices, so that in practice there simply won't be any new interrupts 
coming in any more.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-25  4:26                                   ` Linus Torvalds
  (?)
@ 2009-02-25  4:59                                   ` Eric W. Biederman
  -1 siblings, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-25  4:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 24 Feb 2009, Eric W. Biederman wrote:
>> The question I was asking is:
>> Can we get the broken cpu hotunplug code out of the suspend path?
>
> I think we can move it around. I don't think we can get rid of it.
>
>> If we can get the devices into a low power state and not generating
>> interrupts by the time we disable cpus then we do not need to migrate
>> irqs from process context and risk hitting the ioapic bugs.
>
> At least one issue is that the actual final "go to sleep" is something 
> that has to happen on just one CPU. And I'm pretty sure the others have to 
> have gone through the shutdown sequence before that.
>
> And knowing ACPI, the ordering requirements will boil down to something 
> insane, like "you have to turn off the other CPU's _before_ you turn off 
> some od the core devices, because turning off the other CPU's may involve 
> them". 
>
> So if what you would _want_ to do is to move the "turn off CPU's" into the 
> very innermost layer, so that different architectures can then decide 
> whether they even need to go through that whole thing or not (because 
> turning off one core will automatically turn off all the others, simply 
> because the power was turned off), I suspect the answer is "no".
>
> So you were probably hoping to never have to have that whole horrible 
> issue with moving interrupts around. I'm afraid I'm not seeing it happen. 
> But maybe we can have it happen after we've disabled all the non-system 
> devices, so that in practice there simply won't be any new interrupts 
> coming in any more.

Right.  That is what I am hoping for.  No device interrupts coming into the
cpus at the time we turn them off.

We can disable the devices and thus disable the interrupts the devices
are sending before we disable the cpus.  That should make cpu disable
on suspend much easier to get solid then general x86 cpu hot-unplug.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-25  4:26                                   ` Linus Torvalds
  (?)
  (?)
@ 2009-02-25  4:59                                   ` Eric W. Biederman
  -1 siblings, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-25  4:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, pm list

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 24 Feb 2009, Eric W. Biederman wrote:
>> The question I was asking is:
>> Can we get the broken cpu hotunplug code out of the suspend path?
>
> I think we can move it around. I don't think we can get rid of it.
>
>> If we can get the devices into a low power state and not generating
>> interrupts by the time we disable cpus then we do not need to migrate
>> irqs from process context and risk hitting the ioapic bugs.
>
> At least one issue is that the actual final "go to sleep" is something 
> that has to happen on just one CPU. And I'm pretty sure the others have to 
> have gone through the shutdown sequence before that.
>
> And knowing ACPI, the ordering requirements will boil down to something 
> insane, like "you have to turn off the other CPU's _before_ you turn off 
> some od the core devices, because turning off the other CPU's may involve 
> them". 
>
> So if what you would _want_ to do is to move the "turn off CPU's" into the 
> very innermost layer, so that different architectures can then decide 
> whether they even need to go through that whole thing or not (because 
> turning off one core will automatically turn off all the others, simply 
> because the power was turned off), I suspect the answer is "no".
>
> So you were probably hoping to never have to have that whole horrible 
> issue with moving interrupts around. I'm afraid I'm not seeing it happen. 
> But maybe we can have it happen after we've disabled all the non-system 
> devices, so that in practice there simply won't be any new interrupts 
> coming in any more.

Right.  That is what I am hoping for.  No device interrupts coming into the
cpus at the time we turn them off.

We can disable the devices and thus disable the interrupts the devices
are sending before we disable the cpus.  That should make cpu disable
on suspend much easier to get solid then general x86 cpu hot-unplug.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 23:29                                   ` Rafael J. Wysocki
  2009-02-25 13:23                                     ` Ingo Molnar
@ 2009-02-25 13:23                                     ` Ingo Molnar
  2009-02-26  1:17                                     ` Arve Hjønnevåg
  2009-02-26  1:17                                     ` Arve Hjønnevåg
  3 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-25 13:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Eric W. Biederman, LKML, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Wednesday 25 February 2009, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > On Tuesday 24 February 2009, Linus Torvalds wrote:
> > > > 
> > > > On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> > > > >
> > > > > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > > > > outside of irq context is for the driver to program the hardware to
> > > > > > not generate an irq.
> > > > > 
> > > > > Well, that changes things quite a bit, because it means we can't change the
> > > > > suspend-resume sequence in a way we thought we could without fixing all
> > > > > drivers first, but this is exactly what we'd like to avoid by changing the
> > > > > core.
> > > > 
> > > > Calling "disable_irq()" is perfectly fine.
> > > > 
> > > > What is not possible on that broken IO-APIC (among other 
> > > > things) is to actually turn the interrupts off at the apic 
> > > > (ie the whole ->shutdown() thing). But that's not what we 
> > > > even want to do. What we care about is just disabling the 
> > > > interrupt from a drievr perspective.
> > > > 
> > > > IOW, the patches I have seen are fine, and all the comments 
> > > > from Eric are just confusion about what we want done.
> > > 
> > > Ah, OK.  Thanks for the explanation, I got confused too.
> > > 
> > > > WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may 
> > > > happen later, but that's totally unrelated to this whole 
> > > > "suspend_device_irq()" thing.
> > > 
> > > Yeah.
> > 
> > We definitely dont want to turn off x86 IO-APICs - the timer IRQ 
> > goes via one of them.
> 
> No, we don't.  At least not at this point.
> 
> BTW, appended is the current (3rd) version of the $subject patch with some
> of your comments taken into account.  In particular, I did the following:
> - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> - fixed interrupt.h so that their headers are at a better place
> - made enable_irq() clear IRQ_SUSPENDED
> - made device_power_down() and device_power_up() call
>   suspend_device_irqs() and resume_device_irqs(), respectively, which
>   simplified the callers quite a bit (it changed the Xen code ordering, though,
>   but I _think_ it still should work).
> 
> Please have a look.

Looks good, thanks Rafael!

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 23:29                                   ` Rafael J. Wysocki
@ 2009-02-25 13:23                                     ` Ingo Molnar
  2009-02-25 13:23                                     ` Ingo Molnar
                                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-25 13:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Wednesday 25 February 2009, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > On Tuesday 24 February 2009, Linus Torvalds wrote:
> > > > 
> > > > On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> > > > >
> > > > > > The only safe way on x86 to shutdown a level triggered ioapic irq
> > > > > > outside of irq context is for the driver to program the hardware to
> > > > > > not generate an irq.
> > > > > 
> > > > > Well, that changes things quite a bit, because it means we can't change the
> > > > > suspend-resume sequence in a way we thought we could without fixing all
> > > > > drivers first, but this is exactly what we'd like to avoid by changing the
> > > > > core.
> > > > 
> > > > Calling "disable_irq()" is perfectly fine.
> > > > 
> > > > What is not possible on that broken IO-APIC (among other 
> > > > things) is to actually turn the interrupts off at the apic 
> > > > (ie the whole ->shutdown() thing). But that's not what we 
> > > > even want to do. What we care about is just disabling the 
> > > > interrupt from a drievr perspective.
> > > > 
> > > > IOW, the patches I have seen are fine, and all the comments 
> > > > from Eric are just confusion about what we want done.
> > > 
> > > Ah, OK.  Thanks for the explanation, I got confused too.
> > > 
> > > > WE DO NOT WANT TO TURN OFF THE IO-APIC. That may or may 
> > > > happen later, but that's totally unrelated to this whole 
> > > > "suspend_device_irq()" thing.
> > > 
> > > Yeah.
> > 
> > We definitely dont want to turn off x86 IO-APICs - the timer IRQ 
> > goes via one of them.
> 
> No, we don't.  At least not at this point.
> 
> BTW, appended is the current (3rd) version of the $subject patch with some
> of your comments taken into account.  In particular, I did the following:
> - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> - fixed interrupt.h so that their headers are at a better place
> - made enable_irq() clear IRQ_SUSPENDED
> - made device_power_down() and device_power_up() call
>   suspend_device_irqs() and resume_device_irqs(), respectively, which
>   simplified the callers quite a bit (it changed the Xen code ordering, though,
>   but I _think_ it still should work).
> 
> Please have a look.

Looks good, thanks Rafael!

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:42                           ` Rafael J. Wysocki
                                               ` (2 preceding siblings ...)
  2009-02-25 15:32                             ` Alan Stern
@ 2009-02-25 15:32                             ` Alan Stern
  2009-02-25 16:19                                 ` Linus Torvalds
  3 siblings, 1 reply; 373+ messages in thread
From: Alan Stern @ 2009-02-25 15:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Ingo Molnar, Linus Torvalds, pm list

On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:

> I think the most important source of level triggered interrupts are PCI
> devices, so perhaps we can make the PCI PM core use bit 10 of the PCI Device
> Control register to prevent devices from generating INTx after the drivers'
> suspend routines have been executed?

I wish that were true.  As I recall, the original PCI specification did
not define this bit, and older PCI devices don't support it.  So you
can't count on being able to supress interrupt generation this way.  

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 22:42                           ` Rafael J. Wysocki
  2009-02-24 22:51                             ` Linus Torvalds
  2009-02-24 22:51                             ` Linus Torvalds
@ 2009-02-25 15:32                             ` Alan Stern
  2009-02-25 15:32                             ` [linux-pm] " Alan Stern
  3 siblings, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-02-25 15:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Ingo Molnar, Linus Torvalds, Thomas Gleixner

On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:

> I think the most important source of level triggered interrupts are PCI
> devices, so perhaps we can make the PCI PM core use bit 10 of the PCI Device
> Control register to prevent devices from generating INTx after the drivers'
> suspend routines have been executed?

I wish that were true.  As I recall, the original PCI specification did
not define this bit, and older PCI devices don't support it.  So you
can't count on being able to supress interrupt generation this way.  

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-25 15:32                             ` [linux-pm] " Alan Stern
@ 2009-02-25 16:19                                 ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-25 16:19 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Eric W. Biederman, Jeremy Fitzhardinge, LKML,
	Jesse Barnes, Thomas Gleixner, Ingo Molnar, pm list



On Wed, 25 Feb 2009, Alan Stern wrote:

> On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> 
> > I think the most important source of level triggered interrupts are PCI
> > devices, so perhaps we can make the PCI PM core use bit 10 of the PCI Device
> > Control register to prevent devices from generating INTx after the drivers'
> > suspend routines have been executed?
> 
> I wish that were true.  As I recall, the original PCI specification did
> not define this bit, and older PCI devices don't support it.  So you
> can't count on being able to supress interrupt generation this way.  

It's definitely a new feature. In fact, I think even the current one makes 
it optional, so even for "new" devices it's very unclear how many of them 
actually support that bit.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-25 16:19                                 ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-25 16:19 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar



On Wed, 25 Feb 2009, Alan Stern wrote:

> On Tue, 24 Feb 2009, Rafael J. Wysocki wrote:
> 
> > I think the most important source of level triggered interrupts are PCI
> > devices, so perhaps we can make the PCI PM core use bit 10 of the PCI Device
> > Control register to prevent devices from generating INTx after the drivers'
> > suspend routines have been executed?
> 
> I wish that were true.  As I recall, the original PCI specification did
> not define this bit, and older PCI devices don't support it.  So you
> can't count on being able to supress interrupt generation this way.  

It's definitely a new feature. In fact, I think even the current one makes 
it optional, so even for "new" devices it's very unclear how many of them 
actually support that bit.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-24 23:29                                   ` Rafael J. Wysocki
  2009-02-25 13:23                                     ` Ingo Molnar
  2009-02-25 13:23                                     ` Ingo Molnar
@ 2009-02-26  1:17                                     ` Arve Hjønnevåg
  2009-02-26  1:27                                       ` Linus Torvalds
                                                         ` (3 more replies)
  2009-02-26  1:17                                     ` Arve Hjønnevåg
  3 siblings, 4 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  1:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Linus Torvalds, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> BTW, appended is the current (3rd) version of the $subject patch with some
> of your comments taken into account.  In particular, I did the following:
> - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> - fixed interrupt.h so that their headers are at a better place
> - made enable_irq() clear IRQ_SUSPENDED
> - made device_power_down() and device_power_up() call
>  suspend_device_irqs() and resume_device_irqs(), respectively, which
>  simplified the callers quite a bit (it changed the Xen code ordering, though,
>  but I _think_ it still should work).

Do you plan to fix edge triggered wakeup interrupts? It still looks
like edge triggered wakeup interrupts that occur between
suspend_device_irqs and local_irq_disable will not cause a wakeup.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-24 23:29                                   ` Rafael J. Wysocki
                                                       ` (2 preceding siblings ...)
  2009-02-26  1:17                                     ` Arve Hjønnevåg
@ 2009-02-26  1:17                                     ` Arve Hjønnevåg
  3 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  1:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> BTW, appended is the current (3rd) version of the $subject patch with some
> of your comments taken into account.  In particular, I did the following:
> - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> - fixed interrupt.h so that their headers are at a better place
> - made enable_irq() clear IRQ_SUSPENDED
> - made device_power_down() and device_power_up() call
>  suspend_device_irqs() and resume_device_irqs(), respectively, which
>  simplified the callers quite a bit (it changed the Xen code ordering, though,
>  but I _think_ it still should work).

Do you plan to fix edge triggered wakeup interrupts? It still looks
like edge triggered wakeup interrupts that occur between
suspend_device_irqs and local_irq_disable will not cause a wakeup.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  1:17                                     ` Arve Hjønnevåg
  2009-02-26  1:27                                       ` Linus Torvalds
@ 2009-02-26  1:27                                       ` Linus Torvalds
  2009-02-26  2:13                                         ` Arve Hjønnevåg
  2009-02-26  2:13                                         ` Arve Hjønnevåg
  2009-02-26  9:50                                       ` Rafael J. Wysocki
  2009-02-26  9:50                                       ` Rafael J. Wysocki
  3 siblings, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  1:27 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Rafael J. Wysocki, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>
> Do you plan to fix edge triggered wakeup interrupts? It still looks
> like edge triggered wakeup interrupts that occur between
> suspend_device_irqs and local_irq_disable will not cause a wakeup.

IF we ever see this as a real issue, we can either see it in the 
IRQ_PENDING flag, or we can mark such interrupts specially. So it would be 
solvable. That said, I haven't actually heard any real usage cases. Normal 
wakeup events are _not_ interrupts in the regular "device interrupt 
controller" sense.

So can you actually point to an explicit example of something where this 
is a real issue?

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  1:17                                     ` Arve Hjønnevåg
@ 2009-02-26  1:27                                       ` Linus Torvalds
  2009-02-26  1:27                                       ` Linus Torvalds
                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  1:27 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>
> Do you plan to fix edge triggered wakeup interrupts? It still looks
> like edge triggered wakeup interrupts that occur between
> suspend_device_irqs and local_irq_disable will not cause a wakeup.

IF we ever see this as a real issue, we can either see it in the 
IRQ_PENDING flag, or we can mark such interrupts specially. So it would be 
solvable. That said, I haven't actually heard any real usage cases. Normal 
wakeup events are _not_ interrupts in the regular "device interrupt 
controller" sense.

So can you actually point to an explicit example of something where this 
is a real issue?

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  1:27                                       ` Linus Torvalds
@ 2009-02-26  2:13                                         ` Arve Hjønnevåg
  2009-02-26  2:51                                           ` Linus Torvalds
  2009-02-26  2:51                                           ` Linus Torvalds
  2009-02-26  2:13                                         ` Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  2:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Wed, Feb 25, 2009 at 5:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>>
>> Do you plan to fix edge triggered wakeup interrupts? It still looks
>> like edge triggered wakeup interrupts that occur between
>> suspend_device_irqs and local_irq_disable will not cause a wakeup.
>
> IF we ever see this as a real issue, we can either see it in the
> IRQ_PENDING flag, or we can mark such interrupts specially. So it would be
> solvable. That said, I haven't actually heard any real usage cases. Normal
> wakeup events are _not_ interrupts in the regular "device interrupt
> controller" sense.
>
> So can you actually point to an explicit example of something where this
> is a real issue?

On the msm platform the keyboard driver currently leave the interrupts
enabled when suspended. If the interrupt handler is called, we use a
wakelock to abort suspend (without wakelocks you would need to set a
flag and abort in suspend_late instead). If the interrupt occurs after
local_irq_disable, it will still be pending when we get to the suspend
enter hook and suspend will be aborted there.

As far as I can tell, this change breaks this. If you press a key at
the right time, it will be ignored.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  1:27                                       ` Linus Torvalds
  2009-02-26  2:13                                         ` Arve Hjønnevåg
@ 2009-02-26  2:13                                         ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  2:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Wed, Feb 25, 2009 at 5:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>>
>> Do you plan to fix edge triggered wakeup interrupts? It still looks
>> like edge triggered wakeup interrupts that occur between
>> suspend_device_irqs and local_irq_disable will not cause a wakeup.
>
> IF we ever see this as a real issue, we can either see it in the
> IRQ_PENDING flag, or we can mark such interrupts specially. So it would be
> solvable. That said, I haven't actually heard any real usage cases. Normal
> wakeup events are _not_ interrupts in the regular "device interrupt
> controller" sense.
>
> So can you actually point to an explicit example of something where this
> is a real issue?

On the msm platform the keyboard driver currently leave the interrupts
enabled when suspended. If the interrupt handler is called, we use a
wakelock to abort suspend (without wakelocks you would need to set a
flag and abort in suspend_late instead). If the interrupt occurs after
local_irq_disable, it will still be pending when we get to the suspend
enter hook and suspend will be aborted there.

As far as I can tell, this change breaks this. If you press a key at
the right time, it will be ignored.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  2:13                                         ` Arve Hjønnevåg
@ 2009-02-26  2:51                                           ` Linus Torvalds
  2009-02-26  3:00                                             ` Ingo Molnar
  2009-02-26  3:00                                             ` Ingo Molnar
  2009-02-26  2:51                                           ` Linus Torvalds
  1 sibling, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  2:51 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Rafael J. Wysocki, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>
> On the msm platform the keyboard driver currently leave the interrupts
> enabled when suspended. If the interrupt handler is called, we use a
> wakelock to abort suspend (without wakelocks you would need to set a
> flag and abort in suspend_late instead). If the interrupt occurs after
> local_irq_disable, it will still be pending when we get to the suspend
> enter hook and suspend will be aborted there.
> 
> As far as I can tell, this change breaks this. If you press a key at
> the right time, it will be ignored.

Is the irq on a private non-shared interrupt line? If so, you could just 
mark it as IRQF_TIMER, and the irq disable logic won't touch it.

What keyboard driver does this mfm thing, btw?

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  2:13                                         ` Arve Hjønnevåg
  2009-02-26  2:51                                           ` Linus Torvalds
@ 2009-02-26  2:51                                           ` Linus Torvalds
  1 sibling, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  2:51 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>
> On the msm platform the keyboard driver currently leave the interrupts
> enabled when suspended. If the interrupt handler is called, we use a
> wakelock to abort suspend (without wakelocks you would need to set a
> flag and abort in suspend_late instead). If the interrupt occurs after
> local_irq_disable, it will still be pending when we get to the suspend
> enter hook and suspend will be aborted there.
> 
> As far as I can tell, this change breaks this. If you press a key at
> the right time, it will be ignored.

Is the irq on a private non-shared interrupt line? If so, you could just 
mark it as IRQF_TIMER, and the irq disable logic won't touch it.

What keyboard driver does this mfm thing, btw?

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  2:51                                           ` Linus Torvalds
  2009-02-26  3:00                                             ` Ingo Molnar
@ 2009-02-26  3:00                                             ` Ingo Molnar
  2009-02-26  3:31                                               ` Arve Hjønnevåg
  2009-02-26  3:31                                               ` Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-26  3:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arve Hjønnevåg, Rafael J. Wysocki, Eric W. Biederman,
	LKML, Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list,
	Len Brown, Jesse Barnes, Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
> >
> > On the msm platform the keyboard driver currently leave the interrupts
> > enabled when suspended. If the interrupt handler is called, we use a
> > wakelock to abort suspend (without wakelocks you would need to set a
> > flag and abort in suspend_late instead). If the interrupt occurs after
> > local_irq_disable, it will still be pending when we get to the suspend
> > enter hook and suspend will be aborted there.
> > 
> > As far as I can tell, this change breaks this. If you press a key at
> > the right time, it will be ignored.
> 
> Is the irq on a private non-shared interrupt line? If so, you 
> could just mark it as IRQF_TIMER, and the irq disable logic 
> won't touch it.

Hm, if that solves the problem then it would be nice to have a 
new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:

./interrupt.h: * IRQF_TIMER - Flag to mark this interrupt as timer interrupt
./interrupt.h:#define IRQF_TIMER		0x00000200

to express such quirks cleanly.

and the suspend code can check the (IRQF_TIMER | 
IRQF_NO_SUSPEND) mask - so no extra cost.

Right now we have a clean enumeration of timer interrupts, would 
be nice to keep that.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  2:51                                           ` Linus Torvalds
@ 2009-02-26  3:00                                             ` Ingo Molnar
  2009-02-26  3:00                                             ` Ingo Molnar
  1 sibling, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-02-26  3:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
> >
> > On the msm platform the keyboard driver currently leave the interrupts
> > enabled when suspended. If the interrupt handler is called, we use a
> > wakelock to abort suspend (without wakelocks you would need to set a
> > flag and abort in suspend_late instead). If the interrupt occurs after
> > local_irq_disable, it will still be pending when we get to the suspend
> > enter hook and suspend will be aborted there.
> > 
> > As far as I can tell, this change breaks this. If you press a key at
> > the right time, it will be ignored.
> 
> Is the irq on a private non-shared interrupt line? If so, you 
> could just mark it as IRQF_TIMER, and the irq disable logic 
> won't touch it.

Hm, if that solves the problem then it would be nice to have a 
new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:

./interrupt.h: * IRQF_TIMER - Flag to mark this interrupt as timer interrupt
./interrupt.h:#define IRQF_TIMER		0x00000200

to express such quirks cleanly.

and the suspend code can check the (IRQF_TIMER | 
IRQF_NO_SUSPEND) mask - so no extra cost.

Right now we have a clean enumeration of timer interrupts, would 
be nice to keep that.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  3:00                                             ` Ingo Molnar
  2009-02-26  3:31                                               ` Arve Hjønnevåg
@ 2009-02-26  3:31                                               ` Arve Hjønnevåg
  2009-02-26  3:37                                                   ` Linus Torvalds
  1 sibling, 1 reply; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  3:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Rafael J. Wysocki, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Wed, Feb 25, 2009 at 7:00 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>>
>>
>> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>> >
>> > On the msm platform the keyboard driver currently leave the interrupts
>> > enabled when suspended. If the interrupt handler is called, we use a
>> > wakelock to abort suspend (without wakelocks you would need to set a
>> > flag and abort in suspend_late instead). If the interrupt occurs after
>> > local_irq_disable, it will still be pending when we get to the suspend
>> > enter hook and suspend will be aborted there.
>> >
>> > As far as I can tell, this change breaks this. If you press a key at
>> > the right time, it will be ignored.
>>
>> Is the irq on a private non-shared interrupt line? If so, you
>> could just mark it as IRQF_TIMER, and the irq disable logic
>> won't touch it.

That would not work without wakelocks support, since the interrupt
could occur after suspend_late which is the last chance for the driver
to abort sleep. (The patch also breaks my current wakelock
implementation since I use a suspend_late hook to abort sleep, but
this should be easy to fix)

> Hm, if that solves the problem then it would be nice to have a
> new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:

I think the right fix is for any interrupt that has IRQ_WAKEUP set to
abort suspend if it is pending. I don't know if anyone relies on these
interrupts being dropped now though.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  3:00                                             ` Ingo Molnar
@ 2009-02-26  3:31                                               ` Arve Hjønnevåg
  2009-02-26  3:31                                               ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  3:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Linus Torvalds, Thomas Gleixner

On Wed, Feb 25, 2009 at 7:00 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>>
>>
>> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>> >
>> > On the msm platform the keyboard driver currently leave the interrupts
>> > enabled when suspended. If the interrupt handler is called, we use a
>> > wakelock to abort suspend (without wakelocks you would need to set a
>> > flag and abort in suspend_late instead). If the interrupt occurs after
>> > local_irq_disable, it will still be pending when we get to the suspend
>> > enter hook and suspend will be aborted there.
>> >
>> > As far as I can tell, this change breaks this. If you press a key at
>> > the right time, it will be ignored.
>>
>> Is the irq on a private non-shared interrupt line? If so, you
>> could just mark it as IRQF_TIMER, and the irq disable logic
>> won't touch it.

That would not work without wakelocks support, since the interrupt
could occur after suspend_late which is the last chance for the driver
to abort sleep. (The patch also breaks my current wakelock
implementation since I use a suspend_late hook to abort sleep, but
this should be easy to fix)

> Hm, if that solves the problem then it would be nice to have a
> new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:

I think the right fix is for any interrupt that has IRQ_WAKEUP set to
abort suspend if it is pending. I don't know if anyone relies on these
interrupts being dropped now though.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  3:31                                               ` Arve Hjønnevåg
@ 2009-02-26  3:37                                                   ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  3:37 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Ingo Molnar, Rafael J. Wysocki, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
> 
> That would not work without wakelocks support, since the interrupt
> could occur after suspend_late which is the last chance for the driver
> to abort sleep. (The patch also breaks my current wakelock
> implementation since I use a suspend_late hook to abort sleep, but
> this should be easy to fix)

Since this must be some very deep arch-specific thing anyway, just make 
the dang thing be a "sysdev". At that point, its "suspend" function gets 
called way later (at which point CPU interrupts are off).

> > Hm, if that solves the problem then it would be nice to have a
> > new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:
> 
> I think the right fix is for any interrupt that has IRQ_WAKEUP set to
> abort suspend if it is pending. I don't know if anyone relies on these
> interrupts being dropped now though.

We could add something like that, but quite frankly, I'd hate to unless 
there is some seriously common case. If it's just an oddball hacky special 
case, it's easier to just say "hey, you have that crazy system device, you 
handle it yourself".

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-26  3:37                                                   ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  3:37 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
> 
> That would not work without wakelocks support, since the interrupt
> could occur after suspend_late which is the last chance for the driver
> to abort sleep. (The patch also breaks my current wakelock
> implementation since I use a suspend_late hook to abort sleep, but
> this should be easy to fix)

Since this must be some very deep arch-specific thing anyway, just make 
the dang thing be a "sysdev". At that point, its "suspend" function gets 
called way later (at which point CPU interrupts are off).

> > Hm, if that solves the problem then it would be nice to have a
> > new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:
> 
> I think the right fix is for any interrupt that has IRQ_WAKEUP set to
> abort suspend if it is pending. I don't know if anyone relies on these
> interrupts being dropped now though.

We could add something like that, but quite frankly, I'd hate to unless 
there is some seriously common case. If it's just an oddball hacky special 
case, it's easier to just say "hey, you have that crazy system device, you 
handle it yourself".

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  3:37                                                   ` Linus Torvalds
  (?)
  (?)
@ 2009-02-26  3:50                                                   ` Arve Hjønnevåg
  2009-02-26  3:57                                                     ` Linus Torvalds
  2009-02-26  3:57                                                     ` Linus Torvalds
  -1 siblings, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  3:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Rafael J. Wysocki, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Wed, Feb 25, 2009 at 7:37 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>>
>> That would not work without wakelocks support, since the interrupt
>> could occur after suspend_late which is the last chance for the driver
>> to abort sleep. (The patch also breaks my current wakelock
>> implementation since I use a suspend_late hook to abort sleep, but
>> this should be easy to fix)
>
> Since this must be some very deep arch-specific thing anyway, just make
> the dang thing be a "sysdev". At that point, its "suspend" function gets
> called way later (at which point CPU interrupts are off).

Wakelocks can use a sysdev, but I don't think a keyboard driver should
be a sysdev.

>
>> > Hm, if that solves the problem then it would be nice to have a
>> > new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:
>>
>> I think the right fix is for any interrupt that has IRQ_WAKEUP set to
>> abort suspend if it is pending. I don't know if anyone relies on these
>> interrupts being dropped now though.
>
> We could add something like that, but quite frankly, I'd hate to unless
> there is some seriously common case. If it's just an oddball hacky special
> case, it's easier to just say "hey, you have that crazy system device, you
> handle it yourself".

I don't think this is a oddball case. It is very common to connect
keys or keypads to gpios. If these keys are wakeup keys, it is not OK
to loose interrupts during the suspend phase.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  3:37                                                   ` Linus Torvalds
  (?)
@ 2009-02-26  3:50                                                   ` Arve Hjønnevåg
  -1 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  3:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Wed, Feb 25, 2009 at 7:37 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>>
>> That would not work without wakelocks support, since the interrupt
>> could occur after suspend_late which is the last chance for the driver
>> to abort sleep. (The patch also breaks my current wakelock
>> implementation since I use a suspend_late hook to abort sleep, but
>> this should be easy to fix)
>
> Since this must be some very deep arch-specific thing anyway, just make
> the dang thing be a "sysdev". At that point, its "suspend" function gets
> called way later (at which point CPU interrupts are off).

Wakelocks can use a sysdev, but I don't think a keyboard driver should
be a sysdev.

>
>> > Hm, if that solves the problem then it would be nice to have a
>> > new IRQF_NO_SUSPEND flag for it, in addition to IRQF_TIMER:
>>
>> I think the right fix is for any interrupt that has IRQ_WAKEUP set to
>> abort suspend if it is pending. I don't know if anyone relies on these
>> interrupts being dropped now though.
>
> We could add something like that, but quite frankly, I'd hate to unless
> there is some seriously common case. If it's just an oddball hacky special
> case, it's easier to just say "hey, you have that crazy system device, you
> handle it yourself".

I don't think this is a oddball case. It is very common to connect
keys or keypads to gpios. If these keys are wakeup keys, it is not OK
to loose interrupts during the suspend phase.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  3:50                                                   ` Arve Hjønnevåg
  2009-02-26  3:57                                                     ` Linus Torvalds
@ 2009-02-26  3:57                                                     ` Linus Torvalds
  2009-02-26  4:13                                                       ` Arve Hjønnevåg
  2009-02-26  4:13                                                       ` Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  3:57 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Ingo Molnar, Rafael J. Wysocki, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
> 
> I don't think this is a oddball case. It is very common to connect
> keys or keypads to gpios. If these keys are wakeup keys, it is not OK
> to loose interrupts during the suspend phase.

.. and how many drivers is that? Is it one or two "gpio input drivers" or 
is it a hundred?

The "common" is not so much about "how many machines", but "in how many 
drivers would you actually do this".

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  3:50                                                   ` Arve Hjønnevåg
@ 2009-02-26  3:57                                                     ` Linus Torvalds
  2009-02-26  3:57                                                     ` Linus Torvalds
  1 sibling, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26  3:57 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
> 
> I don't think this is a oddball case. It is very common to connect
> keys or keypads to gpios. If these keys are wakeup keys, it is not OK
> to loose interrupts during the suspend phase.

.. and how many drivers is that? Is it one or two "gpio input drivers" or 
is it a hundred?

The "common" is not so much about "how many machines", but "in how many 
drivers would you actually do this".

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  3:57                                                     ` Linus Torvalds
  2009-02-26  4:13                                                       ` Arve Hjønnevåg
@ 2009-02-26  4:13                                                       ` Arve Hjønnevåg
  2009-02-26  4:20                                                         ` Eric W. Biederman
  2009-02-26  4:20                                                         ` Eric W. Biederman
  1 sibling, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  4:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Rafael J. Wysocki, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Wed, Feb 25, 2009 at 7:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>>
>> I don't think this is a oddball case. It is very common to connect
>> keys or keypads to gpios. If these keys are wakeup keys, it is not OK
>> to loose interrupts during the suspend phase.
>
> .. and how many drivers is that? Is it one or two "gpio input drivers" or
> is it a hundred?
>
> The "common" is not so much about "how many machines", but "in how many
> drivers would you actually do this".

We only have one gpio input driver, but I don't think is good to loose
any wakeup interrupts. Any driver that needs an edge triggered wakeup
interrupt will have problems if the hardware does not regenerate the
interrupt when the host does not respond.

It is not hard to work around this problem in the platform specific
interrupt code, but I think it is a generic problem worth fixing for
every platform.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  3:57                                                     ` Linus Torvalds
@ 2009-02-26  4:13                                                       ` Arve Hjønnevåg
  2009-02-26  4:13                                                       ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  4:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Wed, Feb 25, 2009 at 7:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 25 Feb 2009, Arve Hjønnevåg wrote:
>>
>> I don't think this is a oddball case. It is very common to connect
>> keys or keypads to gpios. If these keys are wakeup keys, it is not OK
>> to loose interrupts during the suspend phase.
>
> .. and how many drivers is that? Is it one or two "gpio input drivers" or
> is it a hundred?
>
> The "common" is not so much about "how many machines", but "in how many
> drivers would you actually do this".

We only have one gpio input driver, but I don't think is good to loose
any wakeup interrupts. Any driver that needs an edge triggered wakeup
interrupt will have problems if the hardware does not regenerate the
interrupt when the host does not respond.

It is not hard to work around this problem in the platform specific
interrupt code, but I think it is a generic problem worth fixing for
every platform.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  4:13                                                       ` Arve Hjønnevåg
  2009-02-26  4:20                                                         ` Eric W. Biederman
@ 2009-02-26  4:20                                                         ` Eric W. Biederman
  2009-02-26  4:24                                                           ` Arve Hjønnevåg
  2009-02-26  4:24                                                           ` Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-26  4:20 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Linus Torvalds, Ingo Molnar, Rafael J. Wysocki, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

Arve Hjønnevåg <arve@android.com> writes:

> We only have one gpio input driver, but I don't think is good to loose
> any wakeup interrupts. Any driver that needs an edge triggered wakeup
> interrupt will have problems if the hardware does not regenerate the
> interrupt when the host does not respond.

We are not loosing interrupts.  The normal implementation of disable
is a software disable and sets IRQ_PENDING to ensure we don't loose
interrupts when the interrupt is disabled.

Eric

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  4:13                                                       ` Arve Hjønnevåg
@ 2009-02-26  4:20                                                         ` Eric W. Biederman
  2009-02-26  4:20                                                         ` Eric W. Biederman
  1 sibling, 0 replies; 373+ messages in thread
From: Eric W. Biederman @ 2009-02-26  4:20 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

Arve Hjønnevåg <arve@android.com> writes:

> We only have one gpio input driver, but I don't think is good to loose
> any wakeup interrupts. Any driver that needs an edge triggered wakeup
> interrupt will have problems if the hardware does not regenerate the
> interrupt when the host does not respond.

We are not loosing interrupts.  The normal implementation of disable
is a software disable and sets IRQ_PENDING to ensure we don't loose
interrupts when the interrupt is disabled.

Eric
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  4:20                                                         ` Eric W. Biederman
  2009-02-26  4:24                                                           ` Arve Hjønnevåg
@ 2009-02-26  4:24                                                           ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  4:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Ingo Molnar, Rafael J. Wysocki, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Wed, Feb 25, 2009 at 8:20 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Arve Hjønnevåg <arve@android.com> writes:
>
>> We only have one gpio input driver, but I don't think is good to loose
>> any wakeup interrupts. Any driver that needs an edge triggered wakeup
>> interrupt will have problems if the hardware does not regenerate the
>> interrupt when the host does not respond.
>
> We are not loosing interrupts.  The normal implementation of disable
> is a software disable and sets IRQ_PENDING to ensure we don't loose
> interrupts when the interrupt is disabled.

We loose the wakeup, but yes, the interrupt will be delivered if the
system wakes up for any other reason.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  4:20                                                         ` Eric W. Biederman
@ 2009-02-26  4:24                                                           ` Arve Hjønnevåg
  2009-02-26  4:24                                                           ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26  4:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, pm list

On Wed, Feb 25, 2009 at 8:20 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Arve Hjønnevåg <arve@android.com> writes:
>
>> We only have one gpio input driver, but I don't think is good to loose
>> any wakeup interrupts. Any driver that needs an edge triggered wakeup
>> interrupt will have problems if the hardware does not regenerate the
>> interrupt when the host does not respond.
>
> We are not loosing interrupts.  The normal implementation of disable
> is a software disable and sets IRQ_PENDING to ensure we don't loose
> interrupts when the interrupt is disabled.

We loose the wakeup, but yes, the interrupt will be delivered if the
system wakes up for any other reason.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  1:17                                     ` Arve Hjønnevåg
  2009-02-26  1:27                                       ` Linus Torvalds
  2009-02-26  1:27                                       ` Linus Torvalds
@ 2009-02-26  9:50                                       ` Rafael J. Wysocki
  2009-02-26 20:34                                         ` Arve Hjønnevåg
  2009-02-26 20:34                                         ` Arve Hjønnevåg
  2009-02-26  9:50                                       ` Rafael J. Wysocki
  3 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26  9:50 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Ingo Molnar, Linus Torvalds, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > BTW, appended is the current (3rd) version of the $subject patch with some
> > of your comments taken into account.  In particular, I did the following:
> > - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> > - fixed interrupt.h so that their headers are at a better place
> > - made enable_irq() clear IRQ_SUSPENDED
> > - made device_power_down() and device_power_up() call
> >  suspend_device_irqs() and resume_device_irqs(), respectively, which
> >  simplified the callers quite a bit (it changed the Xen code ordering, though,
> >  but I _think_ it still should work).
> 
> Do you plan to fix edge triggered wakeup interrupts? It still looks
> like edge triggered wakeup interrupts that occur between
> suspend_device_irqs and local_irq_disable will not cause a wakeup.

In the current version of the patch the interrupts that have IRQ_WAKEUP set
in status are not disabled.  Is this not enough?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  1:17                                     ` Arve Hjønnevåg
                                                         ` (2 preceding siblings ...)
  2009-02-26  9:50                                       ` Rafael J. Wysocki
@ 2009-02-26  9:50                                       ` Rafael J. Wysocki
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26  9:50 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > BTW, appended is the current (3rd) version of the $subject patch with some
> > of your comments taken into account.  In particular, I did the following:
> > - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> > - fixed interrupt.h so that their headers are at a better place
> > - made enable_irq() clear IRQ_SUSPENDED
> > - made device_power_down() and device_power_up() call
> >  suspend_device_irqs() and resume_device_irqs(), respectively, which
> >  simplified the callers quite a bit (it changed the Xen code ordering, though,
> >  but I _think_ it still should work).
> 
> Do you plan to fix edge triggered wakeup interrupts? It still looks
> like edge triggered wakeup interrupts that occur between
> suspend_device_irqs and local_irq_disable will not cause a wakeup.

In the current version of the patch the interrupts that have IRQ_WAKEUP set
in status are not disabled.  Is this not enough?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26  9:50                                       ` Rafael J. Wysocki
@ 2009-02-26 20:34                                         ` Arve Hjønnevåg
  2009-02-26 20:57                                             ` Benjamin Herrenschmidt
                                                             ` (2 more replies)
  2009-02-26 20:34                                         ` Arve Hjønnevåg
  1 sibling, 3 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26 20:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Linus Torvalds, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thu, Feb 26, 2009 at 1:50 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday 26 February 2009, Arve Hjønnevåg wrote:
>> On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > BTW, appended is the current (3rd) version of the $subject patch with some
>> > of your comments taken into account.  In particular, I did the following:
>> > - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
>> > - fixed interrupt.h so that their headers are at a better place
>> > - made enable_irq() clear IRQ_SUSPENDED
>> > - made device_power_down() and device_power_up() call
>> >  suspend_device_irqs() and resume_device_irqs(), respectively, which
>> >  simplified the callers quite a bit (it changed the Xen code ordering, though,
>> >  but I _think_ it still should work).
>>
>> Do you plan to fix edge triggered wakeup interrupts? It still looks
>> like edge triggered wakeup interrupts that occur between
>> suspend_device_irqs and local_irq_disable will not cause a wakeup.
>
> In the current version of the patch the interrupts that have IRQ_WAKEUP set
> in status are not disabled.  Is this not enough?

That is enough for drivers that use wakelocks to abort suspend (if I
fix the wakelock code to not use a platform device as its last abort
point). It is not enough if you don't have wakelocks, since the
interrupt can occur after suspend_late has been called and the driver
has no way to abort suspend.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26  9:50                                       ` Rafael J. Wysocki
  2009-02-26 20:34                                         ` Arve Hjønnevåg
@ 2009-02-26 20:34                                         ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26 20:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thu, Feb 26, 2009 at 1:50 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday 26 February 2009, Arve Hjønnevåg wrote:
>> On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > BTW, appended is the current (3rd) version of the $subject patch with some
>> > of your comments taken into account.  In particular, I did the following:
>> > - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
>> > - fixed interrupt.h so that their headers are at a better place
>> > - made enable_irq() clear IRQ_SUSPENDED
>> > - made device_power_down() and device_power_up() call
>> >  suspend_device_irqs() and resume_device_irqs(), respectively, which
>> >  simplified the callers quite a bit (it changed the Xen code ordering, though,
>> >  but I _think_ it still should work).
>>
>> Do you plan to fix edge triggered wakeup interrupts? It still looks
>> like edge triggered wakeup interrupts that occur between
>> suspend_device_irqs and local_irq_disable will not cause a wakeup.
>
> In the current version of the patch the interrupts that have IRQ_WAKEUP set
> in status are not disabled.  Is this not enough?

That is enough for drivers that use wakelocks to abort suspend (if I
fix the wakelock code to not use a platform device as its last abort
point). It is not enough if you don't have wakelocks, since the
interrupt can occur after suspend_late has been called and the driver
has no way to abort suspend.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26 20:34                                         ` Arve Hjønnevåg
@ 2009-02-26 20:57                                             ` Benjamin Herrenschmidt
  2009-02-26 21:58                                           ` Rafael J. Wysocki
  2009-02-26 21:58                                           ` Rafael J. Wysocki
  2 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-26 20:57 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Rafael J. Wysocki, Ingo Molnar, Linus Torvalds,
	Eric W. Biederman, LKML, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thu, 2009-02-26 at 12:34 -0800, Arve Hjønnevåg wrote:
> That is enough for drivers that use wakelocks to abort suspend (if I
> fix the wakelock code to not use a platform device as its last abort
> point). It is not enough if you don't have wakelocks, since the
> interrupt can occur after suspend_late has been called and the driver
> has no way to abort suspend.
> 
I still don't quite see how you deal with the race anyway. Ie. Even
without Rafael patch, what if the interrupt occurs after your sysdev
suspend ?

In general, unless they are level sensitive, wakeup interrupts tend to
always be somewhat racy.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-26 20:57                                             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-26 20:57 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thu, 2009-02-26 at 12:34 -0800, Arve Hjønnevåg wrote:
> That is enough for drivers that use wakelocks to abort suspend (if I
> fix the wakelock code to not use a platform device as its last abort
> point). It is not enough if you don't have wakelocks, since the
> interrupt can occur after suspend_late has been called and the driver
> has no way to abort suspend.
> 
I still don't quite see how you deal with the race anyway. Ie. Even
without Rafael patch, what if the interrupt occurs after your sysdev
suspend ?

In general, unless they are level sensitive, wakeup interrupts tend to
always be somewhat racy.

Cheers,
Ben.


_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26 20:57                                             ` Benjamin Herrenschmidt
  (?)
@ 2009-02-26 21:20                                             ` Arve Hjønnevåg
  2009-02-26 21:49                                               ` Benjamin Herrenschmidt
  2009-02-26 21:49                                               ` Benjamin Herrenschmidt
  -1 siblings, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26 21:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Rafael J. Wysocki, Ingo Molnar, Linus Torvalds,
	Eric W. Biederman, LKML, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thu, Feb 26, 2009 at 12:57 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2009-02-26 at 12:34 -0800, Arve Hjønnevåg wrote:
>> That is enough for drivers that use wakelocks to abort suspend (if I
>> fix the wakelock code to not use a platform device as its last abort
>> point). It is not enough if you don't have wakelocks, since the
>> interrupt can occur after suspend_late has been called and the driver
>> has no way to abort suspend.
>>
> I still don't quite see how you deal with the race anyway. Ie. Even
> without Rafael patch, what if the interrupt occurs after your sysdev
> suspend ?

After local_irq_disable has been called, the interrupt will no longer
be cleared by Linux when it occurs. This means that is still pending
when you get to the low level suspend code which will prevent suspend.

> In general, unless they are level sensitive, wakeup interrupts tend to
> always be somewhat racy.

They don't have to be. If you have a separate hardware component that
tracks wakeup interrupts, you need to start this before you stop the
main interrupt controller. If any interrupts are pending at this time
you abort suspend. After a wakeup you do the reverse.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 20:57                                             ` Benjamin Herrenschmidt
  (?)
  (?)
@ 2009-02-26 21:20                                             ` Arve Hjønnevåg
  -1 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26 21:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thu, Feb 26, 2009 at 12:57 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2009-02-26 at 12:34 -0800, Arve Hjønnevåg wrote:
>> That is enough for drivers that use wakelocks to abort suspend (if I
>> fix the wakelock code to not use a platform device as its last abort
>> point). It is not enough if you don't have wakelocks, since the
>> interrupt can occur after suspend_late has been called and the driver
>> has no way to abort suspend.
>>
> I still don't quite see how you deal with the race anyway. Ie. Even
> without Rafael patch, what if the interrupt occurs after your sysdev
> suspend ?

After local_irq_disable has been called, the interrupt will no longer
be cleared by Linux when it occurs. This means that is still pending
when you get to the low level suspend code which will prevent suspend.

> In general, unless they are level sensitive, wakeup interrupts tend to
> always be somewhat racy.

They don't have to be. If you have a separate hardware component that
tracks wakeup interrupts, you need to start this before you stop the
main interrupt controller. If any interrupts are pending at this time
you abort suspend. After a wakeup you do the reverse.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26 21:20                                             ` Arve Hjønnevåg
  2009-02-26 21:49                                               ` Benjamin Herrenschmidt
@ 2009-02-26 21:49                                               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-26 21:49 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Rafael J. Wysocki, Ingo Molnar, Linus Torvalds,
	Eric W. Biederman, LKML, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thu, 2009-02-26 at 13:20 -0800, Arve Hjønnevåg wrote:
> On Thu, Feb 26, 2009 at 12:57 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > On Thu, 2009-02-26 at 12:34 -0800, Arve Hjønnevåg wrote:
> >> That is enough for drivers that use wakelocks to abort suspend (if I
> >> fix the wakelock code to not use a platform device as its last abort
> >> point). It is not enough if you don't have wakelocks, since the
> >> interrupt can occur after suspend_late has been called and the driver
> >> has no way to abort suspend.
> >>
> > I still don't quite see how you deal with the race anyway. Ie. Even
> > without Rafael patch, what if the interrupt occurs after your sysdev
> > suspend ?
> 
> After local_irq_disable has been called, the interrupt will no longer
> be cleared by Linux when it occurs. This means that is still pending
> when you get to the low level suspend code which will prevent suspend.

Ok so you want this interrupt to stay pending at the PIC level ? So just
marking it so the kernel doesn't disable it should do the trick.

> > In general, unless they are level sensitive, wakeup interrupts tend to
> > always be somewhat racy.
> 
> They don't have to be. If you have a separate hardware component that
> tracks wakeup interrupts, you need to start this before you stop the
> main interrupt controller. If any interrupts are pending at this time
> you abort suspend. After a wakeup you do the reverse.

Right but then you can start this earlier and there is no problem. But
if you do want the interrupt to remaining pending in the PIC, then you
probably need to set that magic flag so we don't disable it, that should
do the trick just fine no ?

It's hard to tell without more detailed HW specs of course...

Ben.



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 21:20                                             ` Arve Hjønnevåg
@ 2009-02-26 21:49                                               ` Benjamin Herrenschmidt
  2009-02-26 21:49                                               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 373+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-26 21:49 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thu, 2009-02-26 at 13:20 -0800, Arve Hjønnevåg wrote:
> On Thu, Feb 26, 2009 at 12:57 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > On Thu, 2009-02-26 at 12:34 -0800, Arve Hjønnevåg wrote:
> >> That is enough for drivers that use wakelocks to abort suspend (if I
> >> fix the wakelock code to not use a platform device as its last abort
> >> point). It is not enough if you don't have wakelocks, since the
> >> interrupt can occur after suspend_late has been called and the driver
> >> has no way to abort suspend.
> >>
> > I still don't quite see how you deal with the race anyway. Ie. Even
> > without Rafael patch, what if the interrupt occurs after your sysdev
> > suspend ?
> 
> After local_irq_disable has been called, the interrupt will no longer
> be cleared by Linux when it occurs. This means that is still pending
> when you get to the low level suspend code which will prevent suspend.

Ok so you want this interrupt to stay pending at the PIC level ? So just
marking it so the kernel doesn't disable it should do the trick.

> > In general, unless they are level sensitive, wakeup interrupts tend to
> > always be somewhat racy.
> 
> They don't have to be. If you have a separate hardware component that
> tracks wakeup interrupts, you need to start this before you stop the
> main interrupt controller. If any interrupts are pending at this time
> you abort suspend. After a wakeup you do the reverse.

Right but then you can start this earlier and there is no problem. But
if you do want the interrupt to remaining pending in the PIC, then you
probably need to set that magic flag so we don't disable it, that should
do the trick just fine no ?

It's hard to tell without more detailed HW specs of course...

Ben.


_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 20:34                                         ` Arve Hjønnevåg
  2009-02-26 20:57                                             ` Benjamin Herrenschmidt
@ 2009-02-26 21:58                                           ` Rafael J. Wysocki
  2009-02-26 22:10                                             ` Linus Torvalds
  2009-02-26 22:10                                             ` Linus Torvalds
  2009-02-26 21:58                                           ` Rafael J. Wysocki
  2 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26 21:58 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Ingo Molnar, Linus Torvalds, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> On Thu, Feb 26, 2009 at 1:50 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> >> On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > BTW, appended is the current (3rd) version of the $subject patch with some
> >> > of your comments taken into account.  In particular, I did the following:
> >> > - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> >> > - fixed interrupt.h so that their headers are at a better place
> >> > - made enable_irq() clear IRQ_SUSPENDED
> >> > - made device_power_down() and device_power_up() call
> >> >  suspend_device_irqs() and resume_device_irqs(), respectively, which
> >> >  simplified the callers quite a bit (it changed the Xen code ordering, though,
> >> >  but I _think_ it still should work).
> >>
> >> Do you plan to fix edge triggered wakeup interrupts? It still looks
> >> like edge triggered wakeup interrupts that occur between
> >> suspend_device_irqs and local_irq_disable will not cause a wakeup.
> >
> > In the current version of the patch the interrupts that have IRQ_WAKEUP set
> > in status are not disabled.  Is this not enough?
> 
> That is enough for drivers that use wakelocks to abort suspend (if I
> fix the wakelock code to not use a platform device as its last abort
> point). It is not enough if you don't have wakelocks, since the
> interrupt can occur after suspend_late has been called and the driver
> has no way to abort suspend.

Well, how exactly the $subject patch does cause this problem to happen?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 20:34                                         ` Arve Hjønnevåg
  2009-02-26 20:57                                             ` Benjamin Herrenschmidt
  2009-02-26 21:58                                           ` Rafael J. Wysocki
@ 2009-02-26 21:58                                           ` Rafael J. Wysocki
  2 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26 21:58 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> On Thu, Feb 26, 2009 at 1:50 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> >> On Tue, Feb 24, 2009 at 3:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > BTW, appended is the current (3rd) version of the $subject patch with some
> >> > of your comments taken into account.  In particular, I did the following:
> >> > - moved [suspend|resume]_device_irqs() to a separate file (pm.c)
> >> > - fixed interrupt.h so that their headers are at a better place
> >> > - made enable_irq() clear IRQ_SUSPENDED
> >> > - made device_power_down() and device_power_up() call
> >> >  suspend_device_irqs() and resume_device_irqs(), respectively, which
> >> >  simplified the callers quite a bit (it changed the Xen code ordering, though,
> >> >  but I _think_ it still should work).
> >>
> >> Do you plan to fix edge triggered wakeup interrupts? It still looks
> >> like edge triggered wakeup interrupts that occur between
> >> suspend_device_irqs and local_irq_disable will not cause a wakeup.
> >
> > In the current version of the patch the interrupts that have IRQ_WAKEUP set
> > in status are not disabled.  Is this not enough?
> 
> That is enough for drivers that use wakelocks to abort suspend (if I
> fix the wakelock code to not use a platform device as its last abort
> point). It is not enough if you don't have wakelocks, since the
> interrupt can occur after suspend_late has been called and the driver
> has no way to abort suspend.

Well, how exactly the $subject patch does cause this problem to happen?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 21:58                                           ` Rafael J. Wysocki
  2009-02-26 22:10                                             ` Linus Torvalds
@ 2009-02-26 22:10                                             ` Linus Torvalds
  2009-02-26 22:30                                               ` Arve Hjønnevåg
                                                                 ` (3 more replies)
  1 sibling, 4 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26 22:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Arve Hjønnevåg, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
> 
> Well, how exactly the $subject patch does cause this problem to happen?

Rafael, the problem is that if an interrupt happens while it's disabled - 
but before the CPU has actually turned all interrupts off - the CPU will 
ACK the interrupt (but just set a flag for it being PENDING), so now the 
chipset logic around it will not see it as pending any more, so now the 
chipset won't auto-wake the CPU immediately (or more likely, it won't 
even suspend it).

It's trivial to fix multiple ways, so I wouldn't worry. The most trivial 
way is to just have some sysdev drievr code simply do something like

  static int sysdev_suspend()
  {
	for_each_irq(irq,desc) {
		if (!(desc->flags & IRQF_WAKE))
			continue;
		if (desc->flags & IRQ_PENDING)
			return -EBUSY;
	}
	return 0;
  }

and that should automatically mean that if any irq is pending, the suspend 
will fail and we'll immediately wake up again.

It looks trivial, and I don't understand why Arve can't just do the sysdev 
thing.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 21:58                                           ` Rafael J. Wysocki
@ 2009-02-26 22:10                                             ` Linus Torvalds
  2009-02-26 22:10                                             ` Linus Torvalds
  1 sibling, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-26 22:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
> 
> Well, how exactly the $subject patch does cause this problem to happen?

Rafael, the problem is that if an interrupt happens while it's disabled - 
but before the CPU has actually turned all interrupts off - the CPU will 
ACK the interrupt (but just set a flag for it being PENDING), so now the 
chipset logic around it will not see it as pending any more, so now the 
chipset won't auto-wake the CPU immediately (or more likely, it won't 
even suspend it).

It's trivial to fix multiple ways, so I wouldn't worry. The most trivial 
way is to just have some sysdev drievr code simply do something like

  static int sysdev_suspend()
  {
	for_each_irq(irq,desc) {
		if (!(desc->flags & IRQF_WAKE))
			continue;
		if (desc->flags & IRQ_PENDING)
			return -EBUSY;
	}
	return 0;
  }

and that should automatically mean that if any irq is pending, the suspend 
will fail and we'll immediately wake up again.

It looks trivial, and I don't understand why Arve can't just do the sysdev 
thing.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26 22:10                                             ` Linus Torvalds
  2009-02-26 22:30                                               ` Arve Hjønnevåg
@ 2009-02-26 22:30                                               ` Arve Hjønnevåg
  2009-02-26 23:10                                                 ` Rafael J. Wysocki
  2009-02-26 23:10                                                 ` Rafael J. Wysocki
  2009-02-26 22:30                                               ` Rafael J. Wysocki
  2009-02-26 22:30                                               ` Rafael J. Wysocki
  3 siblings, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thu, Feb 26, 2009 at 2:10 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
>>
>> Well, how exactly the $subject patch does cause this problem to happen?
>
> Rafael, the problem is that if an interrupt happens while it's disabled -
> but before the CPU has actually turned all interrupts off - the CPU will
> ACK the interrupt (but just set a flag for it being PENDING), so now the
> chipset logic around it will not see it as pending any more, so now the
> chipset won't auto-wake the CPU immediately (or more likely, it won't
> even suspend it).
>
> It's trivial to fix multiple ways, so I wouldn't worry. The most trivial
> way is to just have some sysdev drievr code simply do something like
>
>  static int sysdev_suspend()
>  {
>        for_each_irq(irq,desc) {
>                if (!(desc->flags & IRQF_WAKE))
>                        continue;
>                if (desc->flags & IRQ_PENDING)
>                        return -EBUSY;
>        }
>        return 0;
>  }
>
> and that should automatically mean that if any irq is pending, the suspend
> will fail and we'll immediately wake up again.
>
> It looks trivial, and I don't understand why Arve can't just do the sysdev
> thing.

I can. My point is that the patch breaks our existing code. If anyone
else uses edge triggered wakeup interrupt it may break from them as
well. The main question if this should be fixed separately for every
platform that needs it, or if pending wakeup interrupts should always
abort sleep.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 22:10                                             ` Linus Torvalds
@ 2009-02-26 22:30                                               ` Arve Hjønnevåg
  2009-02-26 22:30                                               ` Arve Hjønnevåg
                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-26 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Thu, Feb 26, 2009 at 2:10 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
>>
>> Well, how exactly the $subject patch does cause this problem to happen?
>
> Rafael, the problem is that if an interrupt happens while it's disabled -
> but before the CPU has actually turned all interrupts off - the CPU will
> ACK the interrupt (but just set a flag for it being PENDING), so now the
> chipset logic around it will not see it as pending any more, so now the
> chipset won't auto-wake the CPU immediately (or more likely, it won't
> even suspend it).
>
> It's trivial to fix multiple ways, so I wouldn't worry. The most trivial
> way is to just have some sysdev drievr code simply do something like
>
>  static int sysdev_suspend()
>  {
>        for_each_irq(irq,desc) {
>                if (!(desc->flags & IRQF_WAKE))
>                        continue;
>                if (desc->flags & IRQ_PENDING)
>                        return -EBUSY;
>        }
>        return 0;
>  }
>
> and that should automatically mean that if any irq is pending, the suspend
> will fail and we'll immediately wake up again.
>
> It looks trivial, and I don't understand why Arve can't just do the sysdev
> thing.

I can. My point is that the patch breaks our existing code. If anyone
else uses edge triggered wakeup interrupt it may break from them as
well. The main question if this should be fixed separately for every
platform that needs it, or if pending wakeup interrupts should always
abort sleep.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 22:10                                             ` Linus Torvalds
  2009-02-26 22:30                                               ` Arve Hjønnevåg
  2009-02-26 22:30                                               ` Arve Hjønnevåg
@ 2009-02-26 22:30                                               ` Rafael J. Wysocki
  2009-02-26 22:30                                               ` Rafael J. Wysocki
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arve Hjønnevåg, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thursday 26 February 2009, Linus Torvalds wrote:
> 
> On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
> > 
> > Well, how exactly the $subject patch does cause this problem to happen?
> 
> Rafael, the problem is that if an interrupt happens while it's disabled - 
> but before the CPU has actually turned all interrupts off - the CPU will 
> ACK the interrupt (but just set a flag for it being PENDING), so now the 
> chipset logic around it will not see it as pending any more, so now the 
> chipset won't auto-wake the CPU immediately (or more likely, it won't 
> even suspend it).

Ah, I see now, thanks.

> It's trivial to fix multiple ways, so I wouldn't worry. The most trivial 
> way is to just have some sysdev drievr code simply do something like
> 
>   static int sysdev_suspend()
>   {
> 	for_each_irq(irq,desc) {
> 		if (!(desc->flags & IRQF_WAKE))
> 			continue;
> 		if (desc->flags & IRQ_PENDING)
> 			return -EBUSY;
> 	}
> 	return 0;
>   }
> 
> and that should automatically mean that if any irq is pending, the suspend 
> will fail and we'll immediately wake up again.

Yeah.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 22:10                                             ` Linus Torvalds
                                                                 ` (2 preceding siblings ...)
  2009-02-26 22:30                                               ` Rafael J. Wysocki
@ 2009-02-26 22:30                                               ` Rafael J. Wysocki
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list

On Thursday 26 February 2009, Linus Torvalds wrote:
> 
> On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
> > 
> > Well, how exactly the $subject patch does cause this problem to happen?
> 
> Rafael, the problem is that if an interrupt happens while it's disabled - 
> but before the CPU has actually turned all interrupts off - the CPU will 
> ACK the interrupt (but just set a flag for it being PENDING), so now the 
> chipset logic around it will not see it as pending any more, so now the 
> chipset won't auto-wake the CPU immediately (or more likely, it won't 
> even suspend it).

Ah, I see now, thanks.

> It's trivial to fix multiple ways, so I wouldn't worry. The most trivial 
> way is to just have some sysdev drievr code simply do something like
> 
>   static int sysdev_suspend()
>   {
> 	for_each_irq(irq,desc) {
> 		if (!(desc->flags & IRQF_WAKE))
> 			continue;
> 		if (desc->flags & IRQ_PENDING)
> 			return -EBUSY;
> 	}
> 	return 0;
>   }
> 
> and that should automatically mean that if any irq is pending, the suspend 
> will fail and we'll immediately wake up again.

Yeah.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 22:30                                               ` Arve Hjønnevåg
  2009-02-26 23:10                                                 ` Rafael J. Wysocki
@ 2009-02-26 23:10                                                 ` Rafael J. Wysocki
  2009-02-27  0:00                                                   ` Arve Hjønnevåg
  2009-02-27  0:00                                                   ` Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26 23:10 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> On Thu, Feb 26, 2009 at 2:10 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >
> > On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
> >>
> >> Well, how exactly the $subject patch does cause this problem to happen?
> >
> > Rafael, the problem is that if an interrupt happens while it's disabled -
> > but before the CPU has actually turned all interrupts off - the CPU will
> > ACK the interrupt (but just set a flag for it being PENDING), so now the
> > chipset logic around it will not see it as pending any more, so now the
> > chipset won't auto-wake the CPU immediately (or more likely, it won't
> > even suspend it).
> >
> > It's trivial to fix multiple ways, so I wouldn't worry. The most trivial
> > way is to just have some sysdev drievr code simply do something like
> >
> >  static int sysdev_suspend()
> >  {
> >        for_each_irq(irq,desc) {
> >                if (!(desc->flags & IRQF_WAKE))
> >                        continue;
> >                if (desc->flags & IRQ_PENDING)
> >                        return -EBUSY;
> >        }
> >        return 0;
> >  }
> >
> > and that should automatically mean that if any irq is pending, the suspend
> > will fail and we'll immediately wake up again.
> >
> > It looks trivial, and I don't understand why Arve can't just do the sysdev
> > thing.
> 
> I can. My point is that the patch breaks our existing code.

Is that a mainline kernel code?

> If anyone else uses edge triggered wakeup interrupt it may break from them as
> well. The main question if this should be fixed separately for every
> platform that needs it, or if pending wakeup interrupts should always
> abort sleep.

Well, I'm not really sure if this is the problem.  In fact the problem is that
you have a regular device the interrupt of which can be a wake-up one.  I think
the problem wouldn't have existed at all if it had been a sysdev.  Is that
correct?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 22:30                                               ` Arve Hjønnevåg
@ 2009-02-26 23:10                                                 ` Rafael J. Wysocki
  2009-02-26 23:10                                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-26 23:10 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thursday 26 February 2009, Arve Hjønnevåg wrote:
> On Thu, Feb 26, 2009 at 2:10 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >
> > On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
> >>
> >> Well, how exactly the $subject patch does cause this problem to happen?
> >
> > Rafael, the problem is that if an interrupt happens while it's disabled -
> > but before the CPU has actually turned all interrupts off - the CPU will
> > ACK the interrupt (but just set a flag for it being PENDING), so now the
> > chipset logic around it will not see it as pending any more, so now the
> > chipset won't auto-wake the CPU immediately (or more likely, it won't
> > even suspend it).
> >
> > It's trivial to fix multiple ways, so I wouldn't worry. The most trivial
> > way is to just have some sysdev drievr code simply do something like
> >
> >  static int sysdev_suspend()
> >  {
> >        for_each_irq(irq,desc) {
> >                if (!(desc->flags & IRQF_WAKE))
> >                        continue;
> >                if (desc->flags & IRQ_PENDING)
> >                        return -EBUSY;
> >        }
> >        return 0;
> >  }
> >
> > and that should automatically mean that if any irq is pending, the suspend
> > will fail and we'll immediately wake up again.
> >
> > It looks trivial, and I don't understand why Arve can't just do the sysdev
> > thing.
> 
> I can. My point is that the patch breaks our existing code.

Is that a mainline kernel code?

> If anyone else uses edge triggered wakeup interrupt it may break from them as
> well. The main question if this should be fixed separately for every
> platform that needs it, or if pending wakeup interrupts should always
> abort sleep.

Well, I'm not really sure if this is the problem.  In fact the problem is that
you have a regular device the interrupt of which can be a wake-up one.  I think
the problem wouldn't have existed at all if it had been a sysdev.  Is that
correct?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-26 23:10                                                 ` Rafael J. Wysocki
@ 2009-02-27  0:00                                                   ` Arve Hjønnevåg
  2009-02-27  0:27                                                     ` Linus Torvalds
  2009-02-27  0:27                                                     ` Linus Torvalds
  2009-02-27  0:00                                                   ` Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-27  0:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

On Thu, Feb 26, 2009 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday 26 February 2009, Arve Hjønnevåg wrote:
>> On Thu, Feb 26, 2009 at 2:10 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> >
>> > On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
>> >>
>> >> Well, how exactly the $subject patch does cause this problem to happen?
>> >
>> > Rafael, the problem is that if an interrupt happens while it's disabled -
>> > but before the CPU has actually turned all interrupts off - the CPU will
>> > ACK the interrupt (but just set a flag for it being PENDING), so now the
>> > chipset logic around it will not see it as pending any more, so now the
>> > chipset won't auto-wake the CPU immediately (or more likely, it won't
>> > even suspend it).
>> >
>> > It's trivial to fix multiple ways, so I wouldn't worry. The most trivial
>> > way is to just have some sysdev drievr code simply do something like
>> >
>> >  static int sysdev_suspend()
>> >  {
>> >        for_each_irq(irq,desc) {
>> >                if (!(desc->flags & IRQF_WAKE))
>> >                        continue;
>> >                if (desc->flags & IRQ_PENDING)
>> >                        return -EBUSY;
>> >        }
>> >        return 0;
>> >  }
>> >
>> > and that should automatically mean that if any irq is pending, the suspend
>> > will fail and we'll immediately wake up again.
>> >
>> > It looks trivial, and I don't understand why Arve can't just do the sysdev
>> > thing.
>>
>> I can. My point is that the patch breaks our existing code.
>
> Is that a mainline kernel code?

No, the msm suspend support has not been merged.

>
>> If anyone else uses edge triggered wakeup interrupt it may break from them as
>> well. The main question if this should be fixed separately for every
>> platform that needs it, or if pending wakeup interrupts should always
>> abort sleep.
>
> Well, I'm not really sure if this is the problem.  In fact the problem is that
> you have a regular device the interrupt of which can be a wake-up one.  I think

Is that not a common case and what enable_irq_wake is for?

> the problem wouldn't have existed at all if it had been a sysdev.  Is that
> correct?

How many sysdevs use interrupts?

I found may drivers in the mainline kernel that use enable_irq_wake,
but I did not see any that handle this race condition.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-26 23:10                                                 ` Rafael J. Wysocki
  2009-02-27  0:00                                                   ` Arve Hjønnevåg
@ 2009-02-27  0:00                                                   ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-27  0:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thu, Feb 26, 2009 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday 26 February 2009, Arve Hjønnevåg wrote:
>> On Thu, Feb 26, 2009 at 2:10 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> >
>> > On Thu, 26 Feb 2009, Rafael J. Wysocki wrote:
>> >>
>> >> Well, how exactly the $subject patch does cause this problem to happen?
>> >
>> > Rafael, the problem is that if an interrupt happens while it's disabled -
>> > but before the CPU has actually turned all interrupts off - the CPU will
>> > ACK the interrupt (but just set a flag for it being PENDING), so now the
>> > chipset logic around it will not see it as pending any more, so now the
>> > chipset won't auto-wake the CPU immediately (or more likely, it won't
>> > even suspend it).
>> >
>> > It's trivial to fix multiple ways, so I wouldn't worry. The most trivial
>> > way is to just have some sysdev drievr code simply do something like
>> >
>> >  static int sysdev_suspend()
>> >  {
>> >        for_each_irq(irq,desc) {
>> >                if (!(desc->flags & IRQF_WAKE))
>> >                        continue;
>> >                if (desc->flags & IRQ_PENDING)
>> >                        return -EBUSY;
>> >        }
>> >        return 0;
>> >  }
>> >
>> > and that should automatically mean that if any irq is pending, the suspend
>> > will fail and we'll immediately wake up again.
>> >
>> > It looks trivial, and I don't understand why Arve can't just do the sysdev
>> > thing.
>>
>> I can. My point is that the patch breaks our existing code.
>
> Is that a mainline kernel code?

No, the msm suspend support has not been merged.

>
>> If anyone else uses edge triggered wakeup interrupt it may break from them as
>> well. The main question if this should be fixed separately for every
>> platform that needs it, or if pending wakeup interrupts should always
>> abort sleep.
>
> Well, I'm not really sure if this is the problem.  In fact the problem is that
> you have a regular device the interrupt of which can be a wake-up one.  I think

Is that not a common case and what enable_irq_wake is for?

> the problem wouldn't have existed at all if it had been a sysdev.  Is that
> correct?

How many sysdevs use interrupts?

I found may drivers in the mainline kernel that use enable_irq_wake,
but I did not see any that handle this race condition.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during  suspend-resume
  2009-02-27  0:00                                                   ` Arve Hjønnevåg
@ 2009-02-27  0:27                                                     ` Linus Torvalds
  2009-02-27  3:20                                                       ` [linux-pm] " Alan Stern
  2009-02-27  3:20                                                       ` Alan Stern
  2009-02-27  0:27                                                     ` Linus Torvalds
  1 sibling, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-27  0:27 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Rafael J. Wysocki, Ingo Molnar, Eric W. Biederman, LKML,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner



On Thu, 26 Feb 2009, Arve Hjønnevåg wrote:
> 
> How many sysdevs use interrupts?
> 
> I found may drivers in the mainline kernel that use enable_irq_wake,
> but I did not see any that handle this race condition.

The _only_ driver that does enable_irq_wake() on x86 is the cmos timer 
driver, and even there it actually doesn't use irq_wake, but ACPI. Why? 
Because I don't think irq wakeup even _works_ on x86.

So the whole enable_irq_wake is largely some embedded ARM platform issue, 
and a very special case, and doesn't exist anywhere else.

Maybe I'm missing something, but it's definitely not the normal case.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27  0:00                                                   ` Arve Hjønnevåg
  2009-02-27  0:27                                                     ` Linus Torvalds
@ 2009-02-27  0:27                                                     ` Linus Torvalds
  1 sibling, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-27  0:27 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, pm list



On Thu, 26 Feb 2009, Arve Hjønnevåg wrote:
> 
> How many sysdevs use interrupts?
> 
> I found may drivers in the mainline kernel that use enable_irq_wake,
> but I did not see any that handle this race condition.

The _only_ driver that does enable_irq_wake() on x86 is the cmos timer 
driver, and even there it actually doesn't use irq_wake, but ACPI. Why? 
Because I don't think irq wakeup even _works_ on x86.

So the whole enable_irq_wake is largely some embedded ARM platform issue, 
and a very special case, and doesn't exist anywhere else.

Maybe I'm missing something, but it's definitely not the normal case.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27  0:27                                                     ` Linus Torvalds
@ 2009-02-27  3:20                                                       ` Alan Stern
  2009-02-27  4:43                                                           ` Linus Torvalds
  2009-02-27  3:20                                                       ` Alan Stern
  1 sibling, 1 reply; 373+ messages in thread
From: Alan Stern @ 2009-02-27  3:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arve Hjønnevåg, Jeremy Fitzhardinge, LKML,
	Jesse Barnes, Thomas Gleixner, Eric W. Biederman, Ingo Molnar,
	pm list

On Thu, 26 Feb 2009, Linus Torvalds wrote:

> The _only_ driver that does enable_irq_wake() on x86 is the cmos timer 
> driver, and even there it actually doesn't use irq_wake, but ACPI. Why? 
> Because I don't think irq wakeup even _works_ on x86.
> 
> So the whole enable_irq_wake is largely some embedded ARM platform issue, 
> and a very special case, and doesn't exist anywhere else.
> 
> Maybe I'm missing something, but it's definitely not the normal case.

What you're missing is that the embedded world is quite a large one.  
As any member of CELF will tell you, there are lots more embedded
systems around than there are desktop/laptop computers.  (I admit, I
don't know what the ratio is if you restrict your attention to systems
running Linux.)  We can't afford to regard them as second-class
citizens.

Plenty of embedded systems use normal interrupts from GPIO lines as 
wakeup sources.  Don't discount the need for this just because desktop 
systems don't use them that way.  It may not be "normal" in the 
circles you're accustomed to, but it _is_ normal elsewhere.

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27  0:27                                                     ` Linus Torvalds
  2009-02-27  3:20                                                       ` [linux-pm] " Alan Stern
@ 2009-02-27  3:20                                                       ` Alan Stern
  1 sibling, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-02-27  3:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar

On Thu, 26 Feb 2009, Linus Torvalds wrote:

> The _only_ driver that does enable_irq_wake() on x86 is the cmos timer 
> driver, and even there it actually doesn't use irq_wake, but ACPI. Why? 
> Because I don't think irq wakeup even _works_ on x86.
> 
> So the whole enable_irq_wake is largely some embedded ARM platform issue, 
> and a very special case, and doesn't exist anywhere else.
> 
> Maybe I'm missing something, but it's definitely not the normal case.

What you're missing is that the embedded world is quite a large one.  
As any member of CELF will tell you, there are lots more embedded
systems around than there are desktop/laptop computers.  (I admit, I
don't know what the ratio is if you restrict your attention to systems
running Linux.)  We can't afford to regard them as second-class
citizens.

Plenty of embedded systems use normal interrupts from GPIO lines as 
wakeup sources.  Don't discount the need for this just because desktop 
systems don't use them that way.  It may not be "normal" in the 
circles you're accustomed to, but it _is_ normal elsewhere.

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27  3:20                                                       ` [linux-pm] " Alan Stern
@ 2009-02-27  4:43                                                           ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-27  4:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Arve Hjønnevåg, Jeremy Fitzhardinge, LKML,
	Jesse Barnes, Thomas Gleixner, Eric W. Biederman, Ingo Molnar,
	pm list



On Thu, 26 Feb 2009, Alan Stern wrote:
> 
> What you're missing is that the embedded world is quite a large one.  

I'm gpoing to give you one more clue, and if you don't stop sending out 
these IDIOTIC emails, I'm going to put you into my killfile.

Got it?

So listen up:
 - the number of ARM chips sold doesn't matter one F*CKING WHIT.
 - You need to add ONE SINGLE "sysdev" entry for ARM to take care of this 
   FOR EVERY DAMN SINGLE ONE.
 - Your inane whining about this AFTER I HAVE TOLD YOU MULTIPLE TIMES HOW 
   TO DO IT, AND AFTER I HAVE TOLD YOU THAT IT'S A SPECIAL CASE, IS 
   F*CKING IRRITATING.

Got it?

I _grepped_ for that enable_irq_wake() use. It looks like it's only used 
on ARM and maybe BF. Add the five lines of code (just cut and paste them 
from my earlier email) to your architecture already, AND STOP WHINING.

It's not a generic case. It's not a problem. You can damn well fix it in 
the ONE SINGLE ARCHITECTURE (or maybe two) that cares. I've told you how.

Why is it so damn hard for you to just accept? 

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-27  4:43                                                           ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-27  4:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar



On Thu, 26 Feb 2009, Alan Stern wrote:
> 
> What you're missing is that the embedded world is quite a large one.  

I'm gpoing to give you one more clue, and if you don't stop sending out 
these IDIOTIC emails, I'm going to put you into my killfile.

Got it?

So listen up:
 - the number of ARM chips sold doesn't matter one F*CKING WHIT.
 - You need to add ONE SINGLE "sysdev" entry for ARM to take care of this 
   FOR EVERY DAMN SINGLE ONE.
 - Your inane whining about this AFTER I HAVE TOLD YOU MULTIPLE TIMES HOW 
   TO DO IT, AND AFTER I HAVE TOLD YOU THAT IT'S A SPECIAL CASE, IS 
   F*CKING IRRITATING.

Got it?

I _grepped_ for that enable_irq_wake() use. It looks like it's only used 
on ARM and maybe BF. Add the five lines of code (just cut and paste them 
from my earlier email) to your architecture already, AND STOP WHINING.

It's not a generic case. It's not a problem. You can damn well fix it in 
the ONE SINGLE ARCHITECTURE (or maybe two) that cares. I've told you how.

Why is it so damn hard for you to just accept? 

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27  4:43                                                           ` Linus Torvalds
  (?)
@ 2009-02-27 14:59                                                           ` Alan Stern
  2009-02-27 20:30                                                             ` Linus Torvalds
  2009-02-27 20:30                                                             ` [linux-pm] " Linus Torvalds
  -1 siblings, 2 replies; 373+ messages in thread
From: Alan Stern @ 2009-02-27 14:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arve Hjønnevåg, Jeremy Fitzhardinge, LKML,
	Jesse Barnes, Thomas Gleixner, Eric W. Biederman, Ingo Molnar,
	pm list

On Thu, 26 Feb 2009, Linus Torvalds wrote:

> On Thu, 26 Feb 2009, Alan Stern wrote:
> > 
> > What you're missing is that the embedded world is quite a large one.  
> 
> I'm gpoing to give you one more clue, and if you don't stop sending out 
> these IDIOTIC emails, I'm going to put you into my killfile.
> 
> Got it?

Whoa!!  Hold on there!  You got too angry too quickly.  I'm Alan Stern, 
not Arve Hjønnevåg; that was the first email I've sent on this topic.

And while perhaps it was idiotic, you shouldn't put the blame for it on 
Arve.

> So listen up:
>  - the number of ARM chips sold doesn't matter one F*CKING WHIT.
>  - You need to add ONE SINGLE "sysdev" entry for ARM to take care of this 
>    FOR EVERY DAMN SINGLE ONE.
>  - Your inane whining about this AFTER I HAVE TOLD YOU MULTIPLE TIMES HOW 
>    TO DO IT, AND AFTER I HAVE TOLD YOU THAT IT'S A SPECIAL CASE, IS 
>    F*CKING IRRITATING.
> 
> Got it?
> 
> I _grepped_ for that enable_irq_wake() use. It looks like it's only used 
> on ARM and maybe BF. Add the five lines of code (just cut and paste them 
> from my earlier email) to your architecture already, AND STOP WHINING.

Really?  Let's see (this is using Greg KH's development tree):

$ find . -name '*.[ch]' | xargs grep enable_irq_wake
./drivers/serial/serial_core.c:         enable_irq_wake(port->irq);
./drivers/usb/gadget/at91_udc.c:                enable_irq_wake(udc->udp_irq);
./drivers/usb/gadget/at91_udc.c:                enable_irq_wake(udc->board.vbus_pin);
./drivers/usb/musb/musb_core.c: if (enable_irq_wake(nIrq) == 0) {
./drivers/usb/host/ohci-at91.c:         enable_irq_wake(hcd->irq);
./drivers/input/serio/sa1111ps2.c:      enable_irq_wake(ps2if->dev->irq[0]);
./drivers/input/keyboard/gpio_keys.c:                           enable_irq_wake(irq);
./drivers/input/keyboard/pxa27x_keypad.c:               enable_irq_wake(keypad->irq);
./drivers/input/keyboard/bf54x-keys.c:          enable_irq_wake(bf54x_kpad->irq);
./drivers/pcmcia/at91_cf.c:             enable_irq_wake(board->det_pin);
./drivers/pcmcia/at91_cf.c:                     enable_irq_wake(board->irq_pin);
./drivers/mmc/host/at91_mci.c:          enable_irq_wake(host->board->det_pin);
./drivers/mfd/htc-egpio.c:              enable_irq_wake(ei->chained_irq);
./drivers/mfd/pcf50633-core.c:  if (enable_irq_wake(client->irq) < 0)
./drivers/rtc/rtc-sa1100.c:             enable_irq_wake(IRQ_RTCAlrm);
./drivers/rtc/rtc-omap.c:               enable_irq_wake(omap_rtc_alarm);
./drivers/rtc/rtc-s3c.c:                enable_irq_wake(s3c_rtc_alarmno);
./drivers/rtc/rtc-at91rm9200.c:                 enable_irq_wake(AT91_ID_SYS);
./drivers/rtc/rtc-cmos.c:                       enable_irq_wake(cmos->irq);
./drivers/rtc/rtc-bfin.c:               enable_irq_wake(IRQ_RTC);
./drivers/rtc/rtc-ds1374.c:             enable_irq_wake(client->irq);
./drivers/rtc/rtc-at91sam9.c:                   enable_irq_wake(AT91_ID_SYS);
./drivers/rtc/rtc-pxa.c:                enable_irq_wake(pxa_rtc->irq_Alrm);
./drivers/power/pda_power.c:                    ac_wakeup_enabled = !enable_irq_wake(ac_irq->start);
./drivers/power/pda_power.c:                    usb_wakeup_enabled = !enable_irq_wake(usb_irq->start);
./arch/arm/mach-sa1100/neponset.c:      enable_irq_wake(IRQ_GPIO25);
./arch/arm/mach-s3c2410/mach-amlm5900.c:                enable_irq_wake(IRQ_EINT9);
./arch/arm/mach-omap1/board-osk.c:                      enable_irq_wake(irq);
./arch/arm/mach-omap1/serial.c: enable_irq_wake(gpio_to_irq(gpio_nr));
./arch/arm/plat-omap/gpio.c:                    enable_irq_wake(bank->irq);
./arch/arm/plat-omap/gpio.c:                    enable_irq_wake(bank->irq);
./arch/arm/plat-omap/gpio.c:/* Use disable_irq_wake() and enable_irq_wake() functions from drivers */
./include/linux/interrupt.h:static inline int enable_irq_wake(unsigned int irq)
./include/linux/interrupt.h:static inline int enable_irq_wake(unsigned int irq)

Perhaps these aren't all the sort of usage you're talking about, but I
bet most of them are.  It certainly looks like more than just ARM.  
Maybe not all that much more, but definitely more.  And the number will
only grow in the future.

> It's not a generic case. It's not a problem. You can damn well fix it in 
> the ONE SINGLE ARCHITECTURE (or maybe two) that cares. I've told you how.

I'm not arguing with your suggestion; I'm merely disagreeing with your 
statement that wakeup interrupts are "definitely not the normal case".

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27  4:43                                                           ` Linus Torvalds
  (?)
  (?)
@ 2009-02-27 14:59                                                           ` Alan Stern
  -1 siblings, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-02-27 14:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar

On Thu, 26 Feb 2009, Linus Torvalds wrote:

> On Thu, 26 Feb 2009, Alan Stern wrote:
> > 
> > What you're missing is that the embedded world is quite a large one.  
> 
> I'm gpoing to give you one more clue, and if you don't stop sending out 
> these IDIOTIC emails, I'm going to put you into my killfile.
> 
> Got it?

Whoa!!  Hold on there!  You got too angry too quickly.  I'm Alan Stern, 
not Arve Hjønnevåg; that was the first email I've sent on this topic.

And while perhaps it was idiotic, you shouldn't put the blame for it on 
Arve.

> So listen up:
>  - the number of ARM chips sold doesn't matter one F*CKING WHIT.
>  - You need to add ONE SINGLE "sysdev" entry for ARM to take care of this 
>    FOR EVERY DAMN SINGLE ONE.
>  - Your inane whining about this AFTER I HAVE TOLD YOU MULTIPLE TIMES HOW 
>    TO DO IT, AND AFTER I HAVE TOLD YOU THAT IT'S A SPECIAL CASE, IS 
>    F*CKING IRRITATING.
> 
> Got it?
> 
> I _grepped_ for that enable_irq_wake() use. It looks like it's only used 
> on ARM and maybe BF. Add the five lines of code (just cut and paste them 
> from my earlier email) to your architecture already, AND STOP WHINING.

Really?  Let's see (this is using Greg KH's development tree):

$ find . -name '*.[ch]' | xargs grep enable_irq_wake
./drivers/serial/serial_core.c:         enable_irq_wake(port->irq);
./drivers/usb/gadget/at91_udc.c:                enable_irq_wake(udc->udp_irq);
./drivers/usb/gadget/at91_udc.c:                enable_irq_wake(udc->board.vbus_pin);
./drivers/usb/musb/musb_core.c: if (enable_irq_wake(nIrq) == 0) {
./drivers/usb/host/ohci-at91.c:         enable_irq_wake(hcd->irq);
./drivers/input/serio/sa1111ps2.c:      enable_irq_wake(ps2if->dev->irq[0]);
./drivers/input/keyboard/gpio_keys.c:                           enable_irq_wake(irq);
./drivers/input/keyboard/pxa27x_keypad.c:               enable_irq_wake(keypad->irq);
./drivers/input/keyboard/bf54x-keys.c:          enable_irq_wake(bf54x_kpad->irq);
./drivers/pcmcia/at91_cf.c:             enable_irq_wake(board->det_pin);
./drivers/pcmcia/at91_cf.c:                     enable_irq_wake(board->irq_pin);
./drivers/mmc/host/at91_mci.c:          enable_irq_wake(host->board->det_pin);
./drivers/mfd/htc-egpio.c:              enable_irq_wake(ei->chained_irq);
./drivers/mfd/pcf50633-core.c:  if (enable_irq_wake(client->irq) < 0)
./drivers/rtc/rtc-sa1100.c:             enable_irq_wake(IRQ_RTCAlrm);
./drivers/rtc/rtc-omap.c:               enable_irq_wake(omap_rtc_alarm);
./drivers/rtc/rtc-s3c.c:                enable_irq_wake(s3c_rtc_alarmno);
./drivers/rtc/rtc-at91rm9200.c:                 enable_irq_wake(AT91_ID_SYS);
./drivers/rtc/rtc-cmos.c:                       enable_irq_wake(cmos->irq);
./drivers/rtc/rtc-bfin.c:               enable_irq_wake(IRQ_RTC);
./drivers/rtc/rtc-ds1374.c:             enable_irq_wake(client->irq);
./drivers/rtc/rtc-at91sam9.c:                   enable_irq_wake(AT91_ID_SYS);
./drivers/rtc/rtc-pxa.c:                enable_irq_wake(pxa_rtc->irq_Alrm);
./drivers/power/pda_power.c:                    ac_wakeup_enabled = !enable_irq_wake(ac_irq->start);
./drivers/power/pda_power.c:                    usb_wakeup_enabled = !enable_irq_wake(usb_irq->start);
./arch/arm/mach-sa1100/neponset.c:      enable_irq_wake(IRQ_GPIO25);
./arch/arm/mach-s3c2410/mach-amlm5900.c:                enable_irq_wake(IRQ_EINT9);
./arch/arm/mach-omap1/board-osk.c:                      enable_irq_wake(irq);
./arch/arm/mach-omap1/serial.c: enable_irq_wake(gpio_to_irq(gpio_nr));
./arch/arm/plat-omap/gpio.c:                    enable_irq_wake(bank->irq);
./arch/arm/plat-omap/gpio.c:                    enable_irq_wake(bank->irq);
./arch/arm/plat-omap/gpio.c:/* Use disable_irq_wake() and enable_irq_wake() functions from drivers */
./include/linux/interrupt.h:static inline int enable_irq_wake(unsigned int irq)
./include/linux/interrupt.h:static inline int enable_irq_wake(unsigned int irq)

Perhaps these aren't all the sort of usage you're talking about, but I
bet most of them are.  It certainly looks like more than just ARM.  
Maybe not all that much more, but definitely more.  And the number will
only grow in the future.

> It's not a generic case. It's not a problem. You can damn well fix it in 
> the ONE SINGLE ARCHITECTURE (or maybe two) that cares. I've told you how.

I'm not arguing with your suggestion; I'm merely disagreeing with your 
statement that wakeup interrupts are "definitely not the normal case".

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27 14:59                                                           ` [linux-pm] " Alan Stern
  2009-02-27 20:30                                                             ` Linus Torvalds
@ 2009-02-27 20:30                                                             ` Linus Torvalds
  2009-02-28  3:54                                                               ` Arve Hjønnevåg
  2009-02-28  3:54                                                               ` [linux-pm] " Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-27 20:30 UTC (permalink / raw)
  To: Alan Stern
  Cc: Arve Hjønnevåg, Jeremy Fitzhardinge, LKML,
	Jesse Barnes, Thomas Gleixner, Eric W. Biederman, Ingo Molnar,
	pm list



On Fri, 27 Feb 2009, Alan Stern wrote:
> 
> Perhaps these aren't all the sort of usage you're talking about, but I
> bet most of them are.  It certainly looks like more than just ARM.  
> Maybe not all that much more, but definitely more.  And the number will
> only grow in the future.

Are you really sure? Because it can't be x86. I'm pretty sure that that is 
simply not how x86 wake events _work_ - they're not interrupts.

And that's the big point that people seem to be missing here: the whole 
"wake up interrupt" thing is not some generic model in the first place. I 
strongly suspect that it literally only works on certain architectures.

In other words, I'm getting damn tired of people who CLEARLY DON'T EVEN 
KNOW HOW THE HARDWARE WORKS arguing over this.

Here's another hint: that whole "enable_irq_wake()" - have you possibly 
spent even five seconds to look at what it actually does? I bet you 
haven't. Because what it does is to call the irq controller "set_wake" 
function.

Now, grep for that. Whay do you find? Like maybe ARM and BlackFin? Oh, and 
I note one MIPS platform.

The point is, that whole "irq wake" really is system dependent. We can do 
helper functions for it, but anybody who thinks it's anything "generic" is 
totally mistaken.

In other words, making it a sysdev thing is the CORRECT thing to do. It 
really is not just "here's how you work around something". It really is 
"this is how the hardware FUNDAMENTALLY WORKS".

Please, stop arguing. At least argue only after you understand what the 
physical hardware actyally does. 

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27 14:59                                                           ` [linux-pm] " Alan Stern
@ 2009-02-27 20:30                                                             ` Linus Torvalds
  2009-02-27 20:30                                                             ` [linux-pm] " Linus Torvalds
  1 sibling, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-27 20:30 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar



On Fri, 27 Feb 2009, Alan Stern wrote:
> 
> Perhaps these aren't all the sort of usage you're talking about, but I
> bet most of them are.  It certainly looks like more than just ARM.  
> Maybe not all that much more, but definitely more.  And the number will
> only grow in the future.

Are you really sure? Because it can't be x86. I'm pretty sure that that is 
simply not how x86 wake events _work_ - they're not interrupts.

And that's the big point that people seem to be missing here: the whole 
"wake up interrupt" thing is not some generic model in the first place. I 
strongly suspect that it literally only works on certain architectures.

In other words, I'm getting damn tired of people who CLEARLY DON'T EVEN 
KNOW HOW THE HARDWARE WORKS arguing over this.

Here's another hint: that whole "enable_irq_wake()" - have you possibly 
spent even five seconds to look at what it actually does? I bet you 
haven't. Because what it does is to call the irq controller "set_wake" 
function.

Now, grep for that. Whay do you find? Like maybe ARM and BlackFin? Oh, and 
I note one MIPS platform.

The point is, that whole "irq wake" really is system dependent. We can do 
helper functions for it, but anybody who thinks it's anything "generic" is 
totally mistaken.

In other words, making it a sysdev thing is the CORRECT thing to do. It 
really is not just "here's how you work around something". It really is 
"this is how the hardware FUNDAMENTALLY WORKS".

Please, stop arguing. At least argue only after you understand what the 
physical hardware actyally does. 

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts  during suspend-resume
  2009-02-27 20:30                                                             ` [linux-pm] " Linus Torvalds
  2009-02-28  3:54                                                               ` Arve Hjønnevåg
@ 2009-02-28  3:54                                                               ` Arve Hjønnevåg
  2009-02-28 10:06                                                                 ` Rafael J. Wysocki
  2009-02-28 10:06                                                                 ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-28  3:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list

On Fri, Feb 27, 2009 at 12:30 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Fri, 27 Feb 2009, Alan Stern wrote:
>>
>> Perhaps these aren't all the sort of usage you're talking about, but I
>> bet most of them are.  It certainly looks like more than just ARM.
>> Maybe not all that much more, but definitely more.  And the number will
>> only grow in the future.
>
> Are you really sure? Because it can't be x86. I'm pretty sure that that is
> simply not how x86 wake events _work_ - they're not interrupts.

They are not interrupts on every arm platform that implements set_wake
either, but it is useful to pretend that they are. If the platform
code reads the wakeup status and marks the corresponding interrupt
pending, the driver does not need to know if the event occurred before
or after the system entered the low power state. I don't know if this
can be implemented on x86, but it might be worth looking into.

>
> And that's the big point that people seem to be missing here: the whole
> "wake up interrupt" thing is not some generic model in the first place. I
> strongly suspect that it literally only works on certain architectures.

My point was that it was not specific to our platform. I don't have a
problem fixing our platform if this patch is merged, but this is case
where a change to the generic code breaks some platforms. I don't
think there is a good reason to make the fix arm specific, trivial or
not, since any platform implementing set_wake may run into the race
condition that this patch introduced. If the platform does not
implement set_wake, IRQ_WAKEUP never gets set, and the fix should not
have any effect.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-27 20:30                                                             ` [linux-pm] " Linus Torvalds
@ 2009-02-28  3:54                                                               ` Arve Hjønnevåg
  2009-02-28  3:54                                                               ` [linux-pm] " Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-28  3:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar

On Fri, Feb 27, 2009 at 12:30 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Fri, 27 Feb 2009, Alan Stern wrote:
>>
>> Perhaps these aren't all the sort of usage you're talking about, but I
>> bet most of them are.  It certainly looks like more than just ARM.
>> Maybe not all that much more, but definitely more.  And the number will
>> only grow in the future.
>
> Are you really sure? Because it can't be x86. I'm pretty sure that that is
> simply not how x86 wake events _work_ - they're not interrupts.

They are not interrupts on every arm platform that implements set_wake
either, but it is useful to pretend that they are. If the platform
code reads the wakeup status and marks the corresponding interrupt
pending, the driver does not need to know if the event occurred before
or after the system entered the low power state. I don't know if this
can be implemented on x86, but it might be worth looking into.

>
> And that's the big point that people seem to be missing here: the whole
> "wake up interrupt" thing is not some generic model in the first place. I
> strongly suspect that it literally only works on certain architectures.

My point was that it was not specific to our platform. I don't have a
problem fixing our platform if this patch is merged, but this is case
where a change to the generic code breaks some platforms. I don't
think there is a good reason to make the fix arm specific, trivial or
not, since any platform implementing set_wake may run into the race
condition that this patch introduced. If the platform does not
implement set_wake, IRQ_WAKEUP never gets set, and the fix should not
have any effect.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-28  3:54                                                               ` [linux-pm] " Arve Hjønnevåg
  2009-02-28 10:06                                                                 ` Rafael J. Wysocki
@ 2009-02-28 10:06                                                                 ` Rafael J. Wysocki
  2009-02-28 17:03                                                                     ` Linus Torvalds
                                                                                     ` (2 more replies)
  1 sibling, 3 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-28 10:06 UTC (permalink / raw)
  To: Arve Hjønnevåg, Linus Torvalds
  Cc: Alan Stern, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list

On Saturday 28 February 2009, Arve Hjønnevåg wrote:
> On Fri, Feb 27, 2009 at 12:30 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >
> > On Fri, 27 Feb 2009, Alan Stern wrote:
> >>
> >> Perhaps these aren't all the sort of usage you're talking about, but I
> >> bet most of them are.  It certainly looks like more than just ARM.
> >> Maybe not all that much more, but definitely more.  And the number will
> >> only grow in the future.
> >
> > Are you really sure? Because it can't be x86. I'm pretty sure that that is
> > simply not how x86 wake events _work_ - they're not interrupts.
> 
> They are not interrupts on every arm platform that implements set_wake
> either, but it is useful to pretend that they are. If the platform
> code reads the wakeup status and marks the corresponding interrupt
> pending, the driver does not need to know if the event occurred before
> or after the system entered the low power state. I don't know if this
> can be implemented on x86, but it might be worth looking into.

That would have been a new feature, no?  And I don't think anyone except for
you does it.  So, what you're saying boils down to "please don't break my new
feature that hasn't been merged yet".

> > And that's the big point that people seem to be missing here: the whole
> > "wake up interrupt" thing is not some generic model in the first place. I
> > strongly suspect that it literally only works on certain architectures.
> 
> My point was that it was not specific to our platform. I don't have a
> problem fixing our platform if this patch is merged, but this is case
> where a change to the generic code breaks some platforms.

Quite frankly, I don't really think it will break anything else than your
platform.

Still, if Linus agrees, I can put the loop suggested by him directly into
sysdev_suspend().  Linus?

> I don't think there is a good reason to make the fix arm specific, trivial or
> not, since any platform implementing set_wake may run into the race
> condition that this patch introduced. If the platform does not
> implement set_wake, IRQ_WAKEUP never gets set, and the fix should not
> have any effect.

The point is, if we put anything like this into the generic code, platforms
start to rely on this and it will become more and more difficult to change at
the generic level if need be.

The fact that your platform relies on the generic code to disable IRQs on
the CPU at a particular point shows the mechanism very well. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-28  3:54                                                               ` [linux-pm] " Arve Hjønnevåg
@ 2009-02-28 10:06                                                                 ` Rafael J. Wysocki
  2009-02-28 10:06                                                                 ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-02-28 10:06 UTC (permalink / raw)
  To: Arve Hjønnevåg, Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar

On Saturday 28 February 2009, Arve Hjønnevåg wrote:
> On Fri, Feb 27, 2009 at 12:30 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >
> > On Fri, 27 Feb 2009, Alan Stern wrote:
> >>
> >> Perhaps these aren't all the sort of usage you're talking about, but I
> >> bet most of them are.  It certainly looks like more than just ARM.
> >> Maybe not all that much more, but definitely more.  And the number will
> >> only grow in the future.
> >
> > Are you really sure? Because it can't be x86. I'm pretty sure that that is
> > simply not how x86 wake events _work_ - they're not interrupts.
> 
> They are not interrupts on every arm platform that implements set_wake
> either, but it is useful to pretend that they are. If the platform
> code reads the wakeup status and marks the corresponding interrupt
> pending, the driver does not need to know if the event occurred before
> or after the system entered the low power state. I don't know if this
> can be implemented on x86, but it might be worth looking into.

That would have been a new feature, no?  And I don't think anyone except for
you does it.  So, what you're saying boils down to "please don't break my new
feature that hasn't been merged yet".

> > And that's the big point that people seem to be missing here: the whole
> > "wake up interrupt" thing is not some generic model in the first place. I
> > strongly suspect that it literally only works on certain architectures.
> 
> My point was that it was not specific to our platform. I don't have a
> problem fixing our platform if this patch is merged, but this is case
> where a change to the generic code breaks some platforms.

Quite frankly, I don't really think it will break anything else than your
platform.

Still, if Linus agrees, I can put the loop suggested by him directly into
sysdev_suspend().  Linus?

> I don't think there is a good reason to make the fix arm specific, trivial or
> not, since any platform implementing set_wake may run into the race
> condition that this patch introduced. If the platform does not
> implement set_wake, IRQ_WAKEUP never gets set, and the fix should not
> have any effect.

The point is, if we put anything like this into the generic code, platforms
start to rely on this and it will become more and more difficult to change at
the generic level if need be.

The fact that your platform relies on the generic code to disable IRQs on
the CPU at a particular point shows the mechanism very well. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-28 10:06                                                                 ` [linux-pm] " Rafael J. Wysocki
@ 2009-02-28 17:03                                                                     ` Linus Torvalds
  2009-02-28 22:15                                                                   ` [linux-pm] " Arve Hjønnevåg
  2009-02-28 22:15                                                                   ` Arve Hjønnevåg
  2 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-28 17:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Arve Hjønnevåg, Alan Stern, Jeremy Fitzhardinge, LKML,
	Jesse Barnes, Thomas Gleixner, Eric W. Biederman, Ingo Molnar,
	pm list



On Sat, 28 Feb 2009, Rafael J. Wysocki wrote:
> 
> Still, if Linus agrees, I can put the loop suggested by him directly into
> sysdev_suspend().  Linus?

I don't much care - it's going to be a no-op on architectures that don't 
have that kind of "turn an interrupt into a wakeup event" capability. So 
it's not going to break for things like x86, and it's not like going over 
the irq list one more time is going to be so expensive as to be 
noticeable, even if that architecture doesn't ever get any advantage of 
it.

However - my main worry is that we will notice that different 
architectures (and possibly even different platforms _within_ the same 
architecture - depending on which kind of interrupt/pm controller they 
have) will want to do different things, and actually do something to the 
interrupt controller itself too at that point.

But we can certainly try starting out with just the generic "if a wakeup 
interrupt is pending, sysdev_suspend() returns an error immediately".

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
@ 2009-02-28 17:03                                                                     ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-02-28 17:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar



On Sat, 28 Feb 2009, Rafael J. Wysocki wrote:
> 
> Still, if Linus agrees, I can put the loop suggested by him directly into
> sysdev_suspend().  Linus?

I don't much care - it's going to be a no-op on architectures that don't 
have that kind of "turn an interrupt into a wakeup event" capability. So 
it's not going to break for things like x86, and it's not like going over 
the irq list one more time is going to be so expensive as to be 
noticeable, even if that architecture doesn't ever get any advantage of 
it.

However - my main worry is that we will notice that different 
architectures (and possibly even different platforms _within_ the same 
architecture - depending on which kind of interrupt/pm controller they 
have) will want to do different things, and actually do something to the 
interrupt controller itself too at that point.

But we can certainly try starting out with just the generic "if a wakeup 
interrupt is pending, sysdev_suspend() returns an error immediately".

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH 2/2] PM: Rework handling of interrupts  during suspend-resume
  2009-02-28 10:06                                                                 ` [linux-pm] " Rafael J. Wysocki
  2009-02-28 17:03                                                                     ` Linus Torvalds
@ 2009-02-28 22:15                                                                   ` Arve Hjønnevåg
  2009-02-28 22:15                                                                   ` Arve Hjønnevåg
  2 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-28 22:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Alan Stern, Jeremy Fitzhardinge, LKML,
	Jesse Barnes, Thomas Gleixner, Eric W. Biederman, Ingo Molnar,
	pm list

On Sat, Feb 28, 2009 at 2:06 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Saturday 28 February 2009, Arve Hjønnevåg wrote:
>> On Fri, Feb 27, 2009 at 12:30 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> >
>> > On Fri, 27 Feb 2009, Alan Stern wrote:
>> >>
>> >> Perhaps these aren't all the sort of usage you're talking about, but I
>> >> bet most of them are.  It certainly looks like more than just ARM.
>> >> Maybe not all that much more, but definitely more.  And the number will
>> >> only grow in the future.
>> >
>> > Are you really sure? Because it can't be x86. I'm pretty sure that that is
>> > simply not how x86 wake events _work_ - they're not interrupts.
>>
>> They are not interrupts on every arm platform that implements set_wake
>> either, but it is useful to pretend that they are. If the platform
>> code reads the wakeup status and marks the corresponding interrupt
>> pending, the driver does not need to know if the event occurred before
>> or after the system entered the low power state. I don't know if this
>> can be implemented on x86, but it might be worth looking into.
>
> That would have been a new feature, no?  And I don't think anyone except for
> you does it.  So, what you're saying boils down to "please don't break my new
> feature that hasn't been merged yet".

I was not referring to our platform, so not this is not a new feature.

>> > And that's the big point that people seem to be missing here: the whole
>> > "wake up interrupt" thing is not some generic model in the first place. I
>> > strongly suspect that it literally only works on certain architectures.
>>
>> My point was that it was not specific to our platform. I don't have a
>> problem fixing our platform if this patch is merged, but this is case
>> where a change to the generic code breaks some platforms.
>
> Quite frankly, I don't really think it will break anything else than your
> platform.

That is quite possible, but do other platforms not break because they
are already broken? I saw no attempt to avoid race conditions on
suspend in the drivers I looked at.

> Still, if Linus agrees, I can put the loop suggested by him directly into
> sysdev_suspend().  Linus?

I vote for this.

>
>> I don't think there is a good reason to make the fix arm specific, trivial or
>> not, since any platform implementing set_wake may run into the race
>> condition that this patch introduced. If the platform does not
>> implement set_wake, IRQ_WAKEUP never gets set, and the fix should not
>> have any effect.
>
> The point is, if we put anything like this into the generic code, platforms
> start to rely on this and it will become more and more difficult to change at
> the generic level if need be.
>
> The fact that your platform relies on the generic code to disable IRQs on
> the CPU at a particular point shows the mechanism very well. :-)

If the generic code did not clear the interrupt I think this would be
a stronger point. Since our hardware does not have a mask register,
only enable, I find this feature (leave the interrupt enabled, and
mask it only if it triggers) of the generic interrupt code quite
useful. If the generic code did not do this, the platform code would
have to.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
  2009-02-28 10:06                                                                 ` [linux-pm] " Rafael J. Wysocki
  2009-02-28 17:03                                                                     ` Linus Torvalds
  2009-02-28 22:15                                                                   ` [linux-pm] " Arve Hjønnevåg
@ 2009-02-28 22:15                                                                   ` Arve Hjønnevåg
  2 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-02-28 22:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Linus Torvalds, Ingo Molnar

On Sat, Feb 28, 2009 at 2:06 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Saturday 28 February 2009, Arve Hjønnevåg wrote:
>> On Fri, Feb 27, 2009 at 12:30 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> >
>> > On Fri, 27 Feb 2009, Alan Stern wrote:
>> >>
>> >> Perhaps these aren't all the sort of usage you're talking about, but I
>> >> bet most of them are.  It certainly looks like more than just ARM.
>> >> Maybe not all that much more, but definitely more.  And the number will
>> >> only grow in the future.
>> >
>> > Are you really sure? Because it can't be x86. I'm pretty sure that that is
>> > simply not how x86 wake events _work_ - they're not interrupts.
>>
>> They are not interrupts on every arm platform that implements set_wake
>> either, but it is useful to pretend that they are. If the platform
>> code reads the wakeup status and marks the corresponding interrupt
>> pending, the driver does not need to know if the event occurred before
>> or after the system entered the low power state. I don't know if this
>> can be implemented on x86, but it might be worth looking into.
>
> That would have been a new feature, no?  And I don't think anyone except for
> you does it.  So, what you're saying boils down to "please don't break my new
> feature that hasn't been merged yet".

I was not referring to our platform, so not this is not a new feature.

>> > And that's the big point that people seem to be missing here: the whole
>> > "wake up interrupt" thing is not some generic model in the first place. I
>> > strongly suspect that it literally only works on certain architectures.
>>
>> My point was that it was not specific to our platform. I don't have a
>> problem fixing our platform if this patch is merged, but this is case
>> where a change to the generic code breaks some platforms.
>
> Quite frankly, I don't really think it will break anything else than your
> platform.

That is quite possible, but do other platforms not break because they
are already broken? I saw no attempt to avoid race conditions on
suspend in the drivers I looked at.

> Still, if Linus agrees, I can put the loop suggested by him directly into
> sysdev_suspend().  Linus?

I vote for this.

>
>> I don't think there is a good reason to make the fix arm specific, trivial or
>> not, since any platform implementing set_wake may run into the race
>> condition that this patch introduced. If the platform does not
>> implement set_wake, IRQ_WAKEUP never gets set, and the fix should not
>> have any effect.
>
> The point is, if we put anything like this into the generic code, platforms
> start to rely on this and it will become more and more difficult to change at
> the generic level if need be.
>
> The fact that your platform relies on the generic code to disable IRQs on
> the CPU at a particular point shows the mechanism very well. :-)

If the generic code did not clear the interrupt I think this would be
a stronger point. Since our hardware does not have a mask register,
only enable, I find this feature (leave the interrupt enabled, and
mask it only if it triggers) of the generic interrupt code quite
useful. If the generic code did not do this, the platform code would
have to.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (8 preceding siblings ...)
  (?)
@ 2009-03-01 22:21 ` Rafael J. Wysocki
  2009-03-01 22:24   ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
                     ` (8 more replies)
  -1 siblings, 9 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:21 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Alan Stern, Johannes Berg

Hi,

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/4 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).  [Ingo, I didn't add your ACK to the patch, because it's
changed since you saw it last time.]

2/4 - 4/4 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before the platform
"prepare" callback and the disabling of nonboot CPUs (and analogously during
resume).

Comments welcome.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (7 preceding siblings ...)
  (?)
@ 2009-03-01 22:21 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:21 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

Hi,

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/4 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).  [Ingo, I didn't add your ACK to the patch, because it's
changed since you saw it last time.]

2/4 - 4/4 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before the platform
"prepare" callback and the disabling of nonboot CPUs (and analogously during
resume).

Comments welcome.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-01 22:21 ` Rafael J. Wysocki
@ 2009-03-01 22:24   ` Rafael J. Wysocki
  2009-03-02 23:01     ` Arve Hjønnevåg
  2009-03-02 23:01     ` Arve Hjønnevåg
  2009-03-01 22:24   ` Rafael J. Wysocki
                     ` (7 subsequent siblings)
  8 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:24 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Alan Stern, Johannes Berg

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 ++++++--
 drivers/base/power/main.c |   20 ++++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 +++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/manage.c       |    3 +
 kernel/irq/pm.c           |   78 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 +++++++++++++++++------
 kernel/power/main.c       |   17 ++++++----
 12 files changed, 170 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,78 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (!desc->depth && desc->action
+		    && !(desc->action->flags & IRQF_TIMER)) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
+			desc->chip->disable(irq);
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc) {
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			enable_irq(irq);
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
-		unsigned int status = desc->status & ~IRQ_DISABLED;
+		unsigned int status;
 
+		status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {


^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-01 22:21 ` Rafael J. Wysocki
  2009-03-01 22:24   ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
@ 2009-03-01 22:24   ` Rafael J. Wysocki
  2009-03-01 22:25   ` [RFC][PATCH 2/4] PM: Change suspend code ordering Rafael J. Wysocki
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:24 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 ++++++--
 drivers/base/power/main.c |   20 ++++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 +++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/manage.c       |    3 +
 kernel/irq/pm.c           |   78 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 +++++++++++++++++------
 kernel/power/main.c       |   17 ++++++----
 12 files changed, 170 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,78 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (!desc->depth && desc->action
+		    && !(desc->action->flags & IRQF_TIMER)) {
+			desc->depth++;
+			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
+			desc->chip->disable(irq);
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc) {
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			enable_irq(irq);
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
-		unsigned int status = desc->status & ~IRQ_DISABLED;
+		unsigned int status;
 
+		status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 2/4] PM: Change suspend code ordering
  2009-03-01 22:21 ` Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-03-01 22:25   ` [RFC][PATCH 2/4] PM: Change suspend code ordering Rafael J. Wysocki
@ 2009-03-01 22:25   ` Rafael J. Wysocki
  2009-03-02 20:48       ` Linus Torvalds
  2009-03-01 22:26   ` [RFC][PATCH 3/4] PM: Change hibernation " Rafael J. Wysocki
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:25 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Alan Stern, Johannes Berg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 2/4] PM: Change suspend code ordering
  2009-03-01 22:21 ` Rafael J. Wysocki
  2009-03-01 22:24   ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
  2009-03-01 22:24   ` Rafael J. Wysocki
@ 2009-03-01 22:25   ` Rafael J. Wysocki
  2009-03-01 22:25   ` Rafael J. Wysocki
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:25 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 3/4] PM: Change hibernation code ordering
  2009-03-01 22:21 ` Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2009-03-01 22:25   ` Rafael J. Wysocki
@ 2009-03-01 22:26   ` Rafael J. Wysocki
  2009-03-01 22:26   ` Rafael J. Wysocki
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:26 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Alan Stern, Johannes Berg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 3/4] PM: Change hibernation code ordering
  2009-03-01 22:21 ` Rafael J. Wysocki
                     ` (4 preceding siblings ...)
  2009-03-01 22:26   ` [RFC][PATCH 3/4] PM: Change hibernation " Rafael J. Wysocki
@ 2009-03-01 22:26   ` Rafael J. Wysocki
  2009-03-01 22:27   ` [RFC][PATCH 4/4] kexec: Change kexec jump " Rafael J. Wysocki
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:26 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 4/4] kexec: Change kexec jump code ordering
  2009-03-01 22:21 ` Rafael J. Wysocki
                     ` (6 preceding siblings ...)
  2009-03-01 22:27   ` [RFC][PATCH 4/4] kexec: Change kexec jump " Rafael J. Wysocki
@ 2009-03-01 22:27   ` Rafael J. Wysocki
  2009-03-05 23:44     ` Linus Torvalds
  8 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:27 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Alan Stern, Johannes Berg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH 4/4] kexec: Change kexec jump code ordering
  2009-03-01 22:21 ` Rafael J. Wysocki
                     ` (5 preceding siblings ...)
  2009-03-01 22:26   ` Rafael J. Wysocki
@ 2009-03-01 22:27   ` Rafael J. Wysocki
  2009-03-01 22:27   ` Rafael J. Wysocki
  2009-03-05 23:44     ` Linus Torvalds
  8 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-01 22:27 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/4] PM: Change suspend code ordering
  2009-03-01 22:25   ` Rafael J. Wysocki
@ 2009-03-02 20:48       ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-02 20:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner, Arve  Hjønnevåg, Alan Stern,
	Johannes Berg



On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
>
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Change the ordering of the suspend core code so that the platform
> "prepare" callback is executed and the nonboot CPUs are disabled
> after calling device drivers' "late suspend" methods.

Ok, ack on this whole series, looks fine.

I'd like to see a 5/4 though:

> This change will allow us to rework the PCI PM core so that the power
> state of devices is changed in the "late" phase of suspend (and
> analogously in the "early" phase of resume)

.. doing this. Right now we have that hacky "avoid ACPI by doing a special 
limited form of pci_set_power_state() and pci_enable() in the 
early_resume. I'd love to see the actual PCI code cleanup too.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/4] PM: Change suspend code ordering
@ 2009-03-02 20:48       ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-02 20:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list



On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
>
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Change the ordering of the suspend core code so that the platform
> "prepare" callback is executed and the nonboot CPUs are disabled
> after calling device drivers' "late suspend" methods.

Ok, ack on this whole series, looks fine.

I'd like to see a 5/4 though:

> This change will allow us to rework the PCI PM core so that the power
> state of devices is changed in the "late" phase of suspend (and
> analogously in the "early" phase of resume)

.. doing this. Right now we have that hacky "avoid ACPI by doing a special 
limited form of pci_set_power_state() and pci_enable() in the 
early_resume. I'd love to see the actual PCI code cleanup too.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/4] PM: Change suspend code ordering
  2009-03-02 20:48       ` Linus Torvalds
  (?)
  (?)
@ 2009-03-02 22:02       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-02 22:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner, Arve Hjønnevåg, Alan Stern,
	Johannes Berg

On Monday 02 March 2009, Linus Torvalds wrote:
> 
> On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
> >
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Change the ordering of the suspend core code so that the platform
> > "prepare" callback is executed and the nonboot CPUs are disabled
> > after calling device drivers' "late suspend" methods.
> 
> Ok, ack on this whole series, looks fine.

Thanks!
 
> I'd like to see a 5/4 though:
> 
> > This change will allow us to rework the PCI PM core so that the power
> > state of devices is changed in the "late" phase of suspend (and
> > analogously in the "early" phase of resume)
> 
> .. doing this. Right now we have that hacky "avoid ACPI by doing a special 
> limited form of pci_set_power_state() and pci_enable() in the 
> early_resume. I'd love to see the actual PCI code cleanup too.

Sure, that's the next step, but I wanted to get the ack on the preliminary
patches first. :-)

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 2/4] PM: Change suspend code ordering
  2009-03-02 20:48       ` Linus Torvalds
  (?)
@ 2009-03-02 22:02       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-02 22:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list

On Monday 02 March 2009, Linus Torvalds wrote:
> 
> On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
> >
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Change the ordering of the suspend core code so that the platform
> > "prepare" callback is executed and the nonboot CPUs are disabled
> > after calling device drivers' "late suspend" methods.
> 
> Ok, ack on this whole series, looks fine.

Thanks!
 
> I'd like to see a 5/4 though:
> 
> > This change will allow us to rework the PCI PM core so that the power
> > state of devices is changed in the "late" phase of suspend (and
> > analogously in the "early" phase of resume)
> 
> .. doing this. Right now we have that hacky "avoid ACPI by doing a special 
> limited form of pci_set_power_state() and pci_enable() in the 
> early_resume. I'd love to see the actual PCI code cleanup too.

Sure, that's the next step, but I wanted to get the ack on the preliminary
patches first. :-)

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during  suspend-resume (rev. 4)
  2009-03-01 22:24   ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
  2009-03-02 23:01     ` Arve Hjønnevåg
@ 2009-03-02 23:01     ` Arve Hjønnevåg
  2009-03-02 23:13       ` Rafael J. Wysocki
  2009-03-02 23:13       ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-02 23:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg

On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> Introduce two helper functions allowing us to prevent device drivers
> from getting any interrupts (without disabling interrupts on the CPU)
> during suspend (or hibernation) and to make them start to receive
> interrupts again during the subsequent resume, respectively.  These
> functions make it possible to keep timer interrupts enabled while the
> "late" suspend and "early" resume callbacks provided by device
> drivers are being executed.
>
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> drivers will be prevented from receiving interrupts, with the help of
> the new helper function, before their "late" suspend callbacks run
> (and analogously during resume).
>
> In addition, since the device interrups are now disabled before the
> CPU has turned all interrupts off and the CPU will ACK the interrupts
> setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> any wake-up interrupts are pending and abort suspend if that's the
> case.
>


> +void resume_device_irqs(void)
> +{
> +       struct irq_desc *desc;
> +       int irq;
> +
> +       for_each_irq_desc(irq, desc)
> +               if (desc->status & IRQ_SUSPENDED)
> +                       enable_irq(irq);
> +}

I think you need to clear IRQ_SUSPENDED here, not in enable_irq.

> @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
>                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
>                break;
>        case 1: {
> -               unsigned int status = desc->status & ~IRQ_DISABLED;
> +               unsigned int status;
>
> +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
>                /* Prevent probing on this irq: */
>                desc->status = status | IRQ_NOPROBE;
>                check_irq_resend(desc, irq);

This only clears IRQ_SUSPENDED if the interrupt was not disabled
elsewhere. If a driver calls interrupt_disable in suspend_late, but
calls interrupt_enable lazily, resume_device_irqs will reenable the
interrupt even though the driver has a disable reference.

The rest of the patch looks good.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-01 22:24   ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
@ 2009-03-02 23:01     ` Arve Hjønnevåg
  2009-03-02 23:01     ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-02 23:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> Introduce two helper functions allowing us to prevent device drivers
> from getting any interrupts (without disabling interrupts on the CPU)
> during suspend (or hibernation) and to make them start to receive
> interrupts again during the subsequent resume, respectively.  These
> functions make it possible to keep timer interrupts enabled while the
> "late" suspend and "early" resume callbacks provided by device
> drivers are being executed.
>
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> drivers will be prevented from receiving interrupts, with the help of
> the new helper function, before their "late" suspend callbacks run
> (and analogously during resume).
>
> In addition, since the device interrups are now disabled before the
> CPU has turned all interrupts off and the CPU will ACK the interrupts
> setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> any wake-up interrupts are pending and abort suspend if that's the
> case.
>


> +void resume_device_irqs(void)
> +{
> +       struct irq_desc *desc;
> +       int irq;
> +
> +       for_each_irq_desc(irq, desc)
> +               if (desc->status & IRQ_SUSPENDED)
> +                       enable_irq(irq);
> +}

I think you need to clear IRQ_SUSPENDED here, not in enable_irq.

> @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
>                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
>                break;
>        case 1: {
> -               unsigned int status = desc->status & ~IRQ_DISABLED;
> +               unsigned int status;
>
> +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
>                /* Prevent probing on this irq: */
>                desc->status = status | IRQ_NOPROBE;
>                check_irq_resend(desc, irq);

This only clears IRQ_SUSPENDED if the interrupt was not disabled
elsewhere. If a driver calls interrupt_disable in suspend_late, but
calls interrupt_enable lazily, resume_device_irqs will reenable the
interrupt even though the driver has a disable reference.

The rest of the patch looks good.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-02 23:01     ` Arve Hjønnevåg
@ 2009-03-02 23:13       ` Rafael J. Wysocki
  2009-03-02 23:18         ` Arve Hjønnevåg
  2009-03-02 23:18         ` Arve Hjønnevåg
  2009-03-02 23:13       ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-02 23:13 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg

On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Introduce two helper functions allowing us to prevent device drivers
> > from getting any interrupts (without disabling interrupts on the CPU)
> > during suspend (or hibernation) and to make them start to receive
> > interrupts again during the subsequent resume, respectively.  These
> > functions make it possible to keep timer interrupts enabled while the
> > "late" suspend and "early" resume callbacks provided by device
> > drivers are being executed.
> >
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > drivers will be prevented from receiving interrupts, with the help of
> > the new helper function, before their "late" suspend callbacks run
> > (and analogously during resume).
> >
> > In addition, since the device interrups are now disabled before the
> > CPU has turned all interrupts off and the CPU will ACK the interrupts
> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> > any wake-up interrupts are pending and abort suspend if that's the
> > case.
> >
> 
> 
> > +void resume_device_irqs(void)
> > +{
> > +       struct irq_desc *desc;
> > +       int irq;
> > +
> > +       for_each_irq_desc(irq, desc)
> > +               if (desc->status & IRQ_SUSPENDED)
> > +                       enable_irq(irq);
> > +}
> 
> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.

enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.

> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
> >                break;
> >        case 1: {
> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
> > +               unsigned int status;
> >
> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
> >                /* Prevent probing on this irq: */
> >                desc->status = status | IRQ_NOPROBE;
> >                check_irq_resend(desc, irq);
> 
> This only clears IRQ_SUSPENDED if the interrupt was not disabled
> elsewhere. If a driver calls interrupt_disable in suspend_late, but
> calls interrupt_enable lazily, resume_device_irqs will reenable the
> interrupt even though the driver has a disable reference.

Then I'd regard the driver as buggy.
 
> The rest of the patch looks good.

I'm glad you like it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-02 23:01     ` Arve Hjønnevåg
  2009-03-02 23:13       ` Rafael J. Wysocki
@ 2009-03-02 23:13       ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-02 23:13 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Introduce two helper functions allowing us to prevent device drivers
> > from getting any interrupts (without disabling interrupts on the CPU)
> > during suspend (or hibernation) and to make them start to receive
> > interrupts again during the subsequent resume, respectively.  These
> > functions make it possible to keep timer interrupts enabled while the
> > "late" suspend and "early" resume callbacks provided by device
> > drivers are being executed.
> >
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > drivers will be prevented from receiving interrupts, with the help of
> > the new helper function, before their "late" suspend callbacks run
> > (and analogously during resume).
> >
> > In addition, since the device interrups are now disabled before the
> > CPU has turned all interrupts off and the CPU will ACK the interrupts
> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> > any wake-up interrupts are pending and abort suspend if that's the
> > case.
> >
> 
> 
> > +void resume_device_irqs(void)
> > +{
> > +       struct irq_desc *desc;
> > +       int irq;
> > +
> > +       for_each_irq_desc(irq, desc)
> > +               if (desc->status & IRQ_SUSPENDED)
> > +                       enable_irq(irq);
> > +}
> 
> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.

enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.

> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
> >                break;
> >        case 1: {
> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
> > +               unsigned int status;
> >
> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
> >                /* Prevent probing on this irq: */
> >                desc->status = status | IRQ_NOPROBE;
> >                check_irq_resend(desc, irq);
> 
> This only clears IRQ_SUSPENDED if the interrupt was not disabled
> elsewhere. If a driver calls interrupt_disable in suspend_late, but
> calls interrupt_enable lazily, resume_device_irqs will reenable the
> interrupt even though the driver has a disable reference.

Then I'd regard the driver as buggy.
 
> The rest of the patch looks good.

I'm glad you like it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during  suspend-resume (rev. 4)
  2009-03-02 23:13       ` Rafael J. Wysocki
  2009-03-02 23:18         ` Arve Hjønnevåg
@ 2009-03-02 23:18         ` Arve Hjønnevåg
  2009-03-02 23:27           ` Rafael J. Wysocki
                             ` (2 more replies)
  1 sibling, 3 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-02 23:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg

On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
>> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > From: Rafael J. Wysocki <rjw@sisk.pl>
>> >
>> > Introduce two helper functions allowing us to prevent device drivers
>> > from getting any interrupts (without disabling interrupts on the CPU)
>> > during suspend (or hibernation) and to make them start to receive
>> > interrupts again during the subsequent resume, respectively.  These
>> > functions make it possible to keep timer interrupts enabled while the
>> > "late" suspend and "early" resume callbacks provided by device
>> > drivers are being executed.
>> >
>> > Use these functions to rework the handling of interrupts during
>> > suspend (hibernation) and resume.  Namely, interrupts will only be
>> > disabled on the CPU right before suspending sysdevs, while device
>> > drivers will be prevented from receiving interrupts, with the help of
>> > the new helper function, before their "late" suspend callbacks run
>> > (and analogously during resume).
>> >
>> > In addition, since the device interrups are now disabled before the
>> > CPU has turned all interrupts off and the CPU will ACK the interrupts
>> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
>> > any wake-up interrupts are pending and abort suspend if that's the
>> > case.
>> >
>>
>>
>> > +void resume_device_irqs(void)
>> > +{
>> > +       struct irq_desc *desc;
>> > +       int irq;
>> > +
>> > +       for_each_irq_desc(irq, desc)
>> > +               if (desc->status & IRQ_SUSPENDED)
>> > +                       enable_irq(irq);
>> > +}
>>
>> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.
>
> enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
>

I'm if I missed that discussion, but enable_irq cannot know who is
calling it and therefore cannot know if IRQ_SUSPENDED should be
cleared.

>> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
>> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
>> >                break;
>> >        case 1: {
>> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
>> > +               unsigned int status;
>> >
>> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
>> >                /* Prevent probing on this irq: */
>> >                desc->status = status | IRQ_NOPROBE;
>> >                check_irq_resend(desc, irq);
>>
>> This only clears IRQ_SUSPENDED if the interrupt was not disabled
>> elsewhere. If a driver calls interrupt_disable in suspend_late, but
>> calls interrupt_enable lazily, resume_device_irqs will reenable the
>> interrupt even though the driver has a disable reference.
>
> Then I'd regard the driver as buggy.

The bug is not in the driver. The driver called disable_irq once. You
called disable_irq once, but enable_irq twice.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-02 23:13       ` Rafael J. Wysocki
@ 2009-03-02 23:18         ` Arve Hjønnevåg
  2009-03-02 23:18         ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-02 23:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
>> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > From: Rafael J. Wysocki <rjw@sisk.pl>
>> >
>> > Introduce two helper functions allowing us to prevent device drivers
>> > from getting any interrupts (without disabling interrupts on the CPU)
>> > during suspend (or hibernation) and to make them start to receive
>> > interrupts again during the subsequent resume, respectively.  These
>> > functions make it possible to keep timer interrupts enabled while the
>> > "late" suspend and "early" resume callbacks provided by device
>> > drivers are being executed.
>> >
>> > Use these functions to rework the handling of interrupts during
>> > suspend (hibernation) and resume.  Namely, interrupts will only be
>> > disabled on the CPU right before suspending sysdevs, while device
>> > drivers will be prevented from receiving interrupts, with the help of
>> > the new helper function, before their "late" suspend callbacks run
>> > (and analogously during resume).
>> >
>> > In addition, since the device interrups are now disabled before the
>> > CPU has turned all interrupts off and the CPU will ACK the interrupts
>> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
>> > any wake-up interrupts are pending and abort suspend if that's the
>> > case.
>> >
>>
>>
>> > +void resume_device_irqs(void)
>> > +{
>> > +       struct irq_desc *desc;
>> > +       int irq;
>> > +
>> > +       for_each_irq_desc(irq, desc)
>> > +               if (desc->status & IRQ_SUSPENDED)
>> > +                       enable_irq(irq);
>> > +}
>>
>> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.
>
> enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
>

I'm if I missed that discussion, but enable_irq cannot know who is
calling it and therefore cannot know if IRQ_SUSPENDED should be
cleared.

>> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
>> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
>> >                break;
>> >        case 1: {
>> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
>> > +               unsigned int status;
>> >
>> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
>> >                /* Prevent probing on this irq: */
>> >                desc->status = status | IRQ_NOPROBE;
>> >                check_irq_resend(desc, irq);
>>
>> This only clears IRQ_SUSPENDED if the interrupt was not disabled
>> elsewhere. If a driver calls interrupt_disable in suspend_late, but
>> calls interrupt_enable lazily, resume_device_irqs will reenable the
>> interrupt even though the driver has a disable reference.
>
> Then I'd regard the driver as buggy.

The bug is not in the driver. The driver called disable_irq once. You
called disable_irq once, but enable_irq twice.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-02 23:18         ` Arve Hjønnevåg
  2009-03-02 23:27           ` Rafael J. Wysocki
@ 2009-03-02 23:27           ` Rafael J. Wysocki
  2009-03-03 22:56             ` Arve Hjønnevåg
  2009-03-03 22:56             ` Arve Hjønnevåg
  2009-03-02 23:32             ` Linus Torvalds
  2 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-02 23:27 UTC (permalink / raw)
  To: Arve Hjønnevåg, Ingo Molnar
  Cc: LKML, Linus Torvalds, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner, Alan Stern, Johannes Berg

On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >> >
> >> > Introduce two helper functions allowing us to prevent device drivers
> >> > from getting any interrupts (without disabling interrupts on the CPU)
> >> > during suspend (or hibernation) and to make them start to receive
> >> > interrupts again during the subsequent resume, respectively.  These
> >> > functions make it possible to keep timer interrupts enabled while the
> >> > "late" suspend and "early" resume callbacks provided by device
> >> > drivers are being executed.
> >> >
> >> > Use these functions to rework the handling of interrupts during
> >> > suspend (hibernation) and resume.  Namely, interrupts will only be
> >> > disabled on the CPU right before suspending sysdevs, while device
> >> > drivers will be prevented from receiving interrupts, with the help of
> >> > the new helper function, before their "late" suspend callbacks run
> >> > (and analogously during resume).
> >> >
> >> > In addition, since the device interrups are now disabled before the
> >> > CPU has turned all interrupts off and the CPU will ACK the interrupts
> >> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> >> > any wake-up interrupts are pending and abort suspend if that's the
> >> > case.
> >> >
> >>
> >>
> >> > +void resume_device_irqs(void)
> >> > +{
> >> > +       struct irq_desc *desc;
> >> > +       int irq;
> >> > +
> >> > +       for_each_irq_desc(irq, desc)
> >> > +               if (desc->status & IRQ_SUSPENDED)
> >> > +                       enable_irq(irq);
> >> > +}
> >>
> >> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.
> >
> > enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
> >
> 
> I'm if I missed that discussion, but enable_irq cannot know who is
> calling it and therefore cannot know if IRQ_SUSPENDED should be
> cleared.

This change has been requested by Ingo and for a reason.

Ingo, what's your opinion?

> >> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
> >> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
> >> >                break;
> >> >        case 1: {
> >> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
> >> > +               unsigned int status;
> >> >
> >> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
> >> >                /* Prevent probing on this irq: */
> >> >                desc->status = status | IRQ_NOPROBE;
> >> >                check_irq_resend(desc, irq);
> >>
> >> This only clears IRQ_SUSPENDED if the interrupt was not disabled
> >> elsewhere. If a driver calls interrupt_disable in suspend_late, but
> >> calls interrupt_enable lazily, resume_device_irqs will reenable the
> >> interrupt even though the driver has a disable reference.
> >
> > Then I'd regard the driver as buggy.
> 
> The bug is not in the driver. The driver called disable_irq once. You
> called disable_irq once, but enable_irq twice.

Please.

Can you show me a _single_ _driver_ currently in the tree doing something
like you describe in suspend_late and resume_early?  If you can't, then please
give up.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-02 23:18         ` Arve Hjønnevåg
@ 2009-03-02 23:27           ` Rafael J. Wysocki
  2009-03-02 23:27           ` Rafael J. Wysocki
  2009-03-02 23:32             ` Linus Torvalds
  2 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-02 23:27 UTC (permalink / raw)
  To: Arve Hjønnevåg, Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Eric W. Biederman, pm list, Linus Torvalds, Thomas Gleixner

On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >> >
> >> > Introduce two helper functions allowing us to prevent device drivers
> >> > from getting any interrupts (without disabling interrupts on the CPU)
> >> > during suspend (or hibernation) and to make them start to receive
> >> > interrupts again during the subsequent resume, respectively.  These
> >> > functions make it possible to keep timer interrupts enabled while the
> >> > "late" suspend and "early" resume callbacks provided by device
> >> > drivers are being executed.
> >> >
> >> > Use these functions to rework the handling of interrupts during
> >> > suspend (hibernation) and resume.  Namely, interrupts will only be
> >> > disabled on the CPU right before suspending sysdevs, while device
> >> > drivers will be prevented from receiving interrupts, with the help of
> >> > the new helper function, before their "late" suspend callbacks run
> >> > (and analogously during resume).
> >> >
> >> > In addition, since the device interrups are now disabled before the
> >> > CPU has turned all interrupts off and the CPU will ACK the interrupts
> >> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> >> > any wake-up interrupts are pending and abort suspend if that's the
> >> > case.
> >> >
> >>
> >>
> >> > +void resume_device_irqs(void)
> >> > +{
> >> > +       struct irq_desc *desc;
> >> > +       int irq;
> >> > +
> >> > +       for_each_irq_desc(irq, desc)
> >> > +               if (desc->status & IRQ_SUSPENDED)
> >> > +                       enable_irq(irq);
> >> > +}
> >>
> >> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.
> >
> > enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
> >
> 
> I'm if I missed that discussion, but enable_irq cannot know who is
> calling it and therefore cannot know if IRQ_SUSPENDED should be
> cleared.

This change has been requested by Ingo and for a reason.

Ingo, what's your opinion?

> >> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
> >> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
> >> >                break;
> >> >        case 1: {
> >> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
> >> > +               unsigned int status;
> >> >
> >> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
> >> >                /* Prevent probing on this irq: */
> >> >                desc->status = status | IRQ_NOPROBE;
> >> >                check_irq_resend(desc, irq);
> >>
> >> This only clears IRQ_SUSPENDED if the interrupt was not disabled
> >> elsewhere. If a driver calls interrupt_disable in suspend_late, but
> >> calls interrupt_enable lazily, resume_device_irqs will reenable the
> >> interrupt even though the driver has a disable reference.
> >
> > Then I'd regard the driver as buggy.
> 
> The bug is not in the driver. The driver called disable_irq once. You
> called disable_irq once, but enable_irq twice.

Please.

Can you show me a _single_ _driver_ currently in the tree doing something
like you describe in suspend_late and resume_early?  If you can't, then please
give up.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during  suspend-resume (rev. 4)
  2009-03-02 23:18         ` Arve Hjønnevåg
@ 2009-03-02 23:32             ` Linus Torvalds
  2009-03-02 23:27           ` Rafael J. Wysocki
  2009-03-02 23:32             ` Linus Torvalds
  2 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-02 23:32 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg



On Mon, 2 Mar 2009, Arve Hjønnevåg wrote:
>
> > enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
> 
> I'm if I missed that discussion, but enable_irq cannot know who is
> calling it and therefore cannot know if IRQ_SUSPENDED should be
> cleared.

Sure it can. 

If IRQ_SUSPENDED is not set, then clearing it is a no-op, so that's fine.

If IRQ_SUSPENDED _is_ set, then that means that we're after the 
suspend_late() sequence and before the resume_early() sequence, and no 
device driver is possibly called in between, so they'd sure better not be 
doing anything that does an enable_irq().

IOW, we know who the caller is, simply because there can be no other valid 
caller!

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
@ 2009-03-02 23:32             ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-02 23:32 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list



On Mon, 2 Mar 2009, Arve Hjønnevåg wrote:
>
> > enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
> 
> I'm if I missed that discussion, but enable_irq cannot know who is
> calling it and therefore cannot know if IRQ_SUSPENDED should be
> cleared.

Sure it can. 

If IRQ_SUSPENDED is not set, then clearing it is a no-op, so that's fine.

If IRQ_SUSPENDED _is_ set, then that means that we're after the 
suspend_late() sequence and before the resume_early() sequence, and no 
device driver is possibly called in between, so they'd sure better not be 
doing anything that does an enable_irq().

IOW, we know who the caller is, simply because there can be no other valid 
caller!

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during  suspend-resume (rev. 4)
  2009-03-02 23:32             ` Linus Torvalds
@ 2009-03-02 23:35               ` Linus Torvalds
  -1 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-02 23:35 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg



On Mon, 2 Mar 2009, Linus Torvalds wrote:
> 
> If IRQ_SUSPENDED _is_ set, then that means that we're after the 
> suspend_late() sequence and before the resume_early() sequence

Sorry, after the suspend, and before the resume.

We could be _in_ the suspend_late/resume_early sequence, but a driver that 
were to try to play with interrupts at that stage would be broken. It 
can't very well do a enable_irq(), because that would be a MAJOR BUG - it 
would make the whole irq suspend thing pointless, since now interrupts 
would start to happen exactly where they must not happen!

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
@ 2009-03-02 23:35               ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-02 23:35 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list



On Mon, 2 Mar 2009, Linus Torvalds wrote:
> 
> If IRQ_SUSPENDED _is_ set, then that means that we're after the 
> suspend_late() sequence and before the resume_early() sequence

Sorry, after the suspend, and before the resume.

We could be _in_ the suspend_late/resume_early sequence, but a driver that 
were to try to play with interrupts at that stage would be broken. It 
can't very well do a enable_irq(), because that would be a MAJOR BUG - it 
would make the whole irq suspend thing pointless, since now interrupts 
would start to happen exactly where they must not happen!

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during  suspend-resume (rev. 4)
  2009-03-02 23:35               ` Linus Torvalds
  (?)
  (?)
@ 2009-03-03  0:08               ` Arve Hjønnevåg
  2009-03-03  8:41                 ` Arve Hjønnevåg
  2009-03-03  8:41                 ` Arve Hjønnevåg
  -1 siblings, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-03  0:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg

On Mon, Mar 2, 2009 at 3:35 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 2 Mar 2009, Linus Torvalds wrote:
>>
>> If IRQ_SUSPENDED _is_ set, then that means that we're after the
>> suspend_late() sequence and before the resume_early() sequence
>
> Sorry, after the suspend, and before the resume.
>
> We could be _in_ the suspend_late/resume_early sequence, but a driver that
> were to try to play with interrupts at that stage would be broken. It
> can't very well do a enable_irq(), because that would be a MAJOR BUG - it
> would make the whole irq suspend thing pointless, since now interrupts
> would start to happen exactly where they must not happen!

It may be pointless for a driver to call disable_irq and enable_irq
from suspend_late or resume_early (instead of suspend and resume), but
I would not call it a bug. Since disable_irq and enable_irq are
reference counted all this is doing is indicating that this driver can
or cannot accept interrupts. If you want to make an additional
restriction that drivers are not allowed to call disable_irq or
enable_irq from suspend_late and resume_early, then yes you can tell
that enable_irq was called from resume_device_irqs.

I don't know of any drivers that do this, I was just pointing out the
danger of releasing a reference without knowing if you acquired that
reference.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-02 23:35               ` Linus Torvalds
  (?)
@ 2009-03-03  0:08               ` Arve Hjønnevåg
  -1 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-03  0:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list

On Mon, Mar 2, 2009 at 3:35 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 2 Mar 2009, Linus Torvalds wrote:
>>
>> If IRQ_SUSPENDED _is_ set, then that means that we're after the
>> suspend_late() sequence and before the resume_early() sequence
>
> Sorry, after the suspend, and before the resume.
>
> We could be _in_ the suspend_late/resume_early sequence, but a driver that
> were to try to play with interrupts at that stage would be broken. It
> can't very well do a enable_irq(), because that would be a MAJOR BUG - it
> would make the whole irq suspend thing pointless, since now interrupts
> would start to happen exactly where they must not happen!

It may be pointless for a driver to call disable_irq and enable_irq
from suspend_late or resume_early (instead of suspend and resume), but
I would not call it a bug. Since disable_irq and enable_irq are
reference counted all this is doing is indicating that this driver can
or cannot accept interrupts. If you want to make an additional
restriction that drivers are not allowed to call disable_irq or
enable_irq from suspend_late and resume_early, then yes you can tell
that enable_irq was called from resume_device_irqs.

I don't know of any drivers that do this, I was just pointing out the
danger of releasing a reference without knowing if you acquired that
reference.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during  suspend-resume (rev. 4)
  2009-03-03  0:08               ` Arve Hjønnevåg
  2009-03-03  8:41                 ` Arve Hjønnevåg
@ 2009-03-03  8:41                 ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-03  8:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg

On Mon, Mar 2, 2009 at 4:08 PM, Arve Hjønnevåg <arve@android.com> wrote:
> On Mon, Mar 2, 2009 at 3:35 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>>
>> On Mon, 2 Mar 2009, Linus Torvalds wrote:
>>>
>>> If IRQ_SUSPENDED _is_ set, then that means that we're after the
>>> suspend_late() sequence and before the resume_early() sequence
>>
>> Sorry, after the suspend, and before the resume.
>>
>> We could be _in_ the suspend_late/resume_early sequence, but a driver that
>> were to try to play with interrupts at that stage would be broken. It
>> can't very well do a enable_irq(), because that would be a MAJOR BUG - it
>> would make the whole irq suspend thing pointless, since now interrupts
>> would start to happen exactly where they must not happen!
>
> It may be pointless for a driver to call disable_irq and enable_irq
> from suspend_late or resume_early (instead of suspend and resume), but
> I would not call it a bug. Since disable_irq and enable_irq are
> reference counted all this is doing is indicating that this driver can
> or cannot accept interrupts. If you want to make an additional
> restriction that drivers are not allowed to call disable_irq or
> enable_irq from suspend_late and resume_early, then yes you can tell
> that enable_irq was called from resume_device_irqs.
>
> I don't know of any drivers that do this, I was just pointing out the
> danger of releasing a reference without knowing if you acquired that
> reference.

I did think of a driver that can call enable_irq during the
suspend_late phase with this patch. This will not cause an extra
enable_irq, but it will enable the interrupt since suspend_device_irqs
never incremented depth. Our keypad driver disables its interrupt(s)
as soon as you press a key and starts a timer to scan the keypad. When
the timer detects that no keys are pressed, it re-enables the
interrupt. Since timers now run during suspend_late, this enable_irq
call can happen after suspend_device_irqs. If suspend_device_irqs
increments depth even if it is not zero, this can be avoided.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-03  0:08               ` Arve Hjønnevåg
@ 2009-03-03  8:41                 ` Arve Hjønnevåg
  2009-03-03  8:41                 ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-03  8:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list

On Mon, Mar 2, 2009 at 4:08 PM, Arve Hjønnevåg <arve@android.com> wrote:
> On Mon, Mar 2, 2009 at 3:35 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>>
>> On Mon, 2 Mar 2009, Linus Torvalds wrote:
>>>
>>> If IRQ_SUSPENDED _is_ set, then that means that we're after the
>>> suspend_late() sequence and before the resume_early() sequence
>>
>> Sorry, after the suspend, and before the resume.
>>
>> We could be _in_ the suspend_late/resume_early sequence, but a driver that
>> were to try to play with interrupts at that stage would be broken. It
>> can't very well do a enable_irq(), because that would be a MAJOR BUG - it
>> would make the whole irq suspend thing pointless, since now interrupts
>> would start to happen exactly where they must not happen!
>
> It may be pointless for a driver to call disable_irq and enable_irq
> from suspend_late or resume_early (instead of suspend and resume), but
> I would not call it a bug. Since disable_irq and enable_irq are
> reference counted all this is doing is indicating that this driver can
> or cannot accept interrupts. If you want to make an additional
> restriction that drivers are not allowed to call disable_irq or
> enable_irq from suspend_late and resume_early, then yes you can tell
> that enable_irq was called from resume_device_irqs.
>
> I don't know of any drivers that do this, I was just pointing out the
> danger of releasing a reference without knowing if you acquired that
> reference.

I did think of a driver that can call enable_irq during the
suspend_late phase with this patch. This will not cause an extra
enable_irq, but it will enable the interrupt since suspend_device_irqs
never incremented depth. Our keypad driver disables its interrupt(s)
as soon as you press a key and starts a timer to scan the keypad. When
the timer detects that no keys are pressed, it re-enables the
interrupt. Since timers now run during suspend_late, this enable_irq
call can happen after suspend_device_irqs. If suspend_device_irqs
increments depth even if it is not zero, this can be avoided.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during  suspend-resume (rev. 4)
  2009-03-02 23:27           ` Rafael J. Wysocki
@ 2009-03-03 22:56             ` Arve Hjønnevåg
  2009-03-04 22:03               ` [Update, rev. 5] " Rafael J. Wysocki
  2009-03-04 22:03               ` Rafael J. Wysocki
  2009-03-03 22:56             ` Arve Hjønnevåg
  1 sibling, 2 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-03 22:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Alan Stern, Johannes Berg

On Mon, Mar 2, 2009 at 3:27 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
>> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
>> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> >> > From: Rafael J. Wysocki <rjw@sisk.pl>
>> >> >
>> >> > Introduce two helper functions allowing us to prevent device drivers
>> >> > from getting any interrupts (without disabling interrupts on the CPU)
>> >> > during suspend (or hibernation) and to make them start to receive
>> >> > interrupts again during the subsequent resume, respectively.  These
>> >> > functions make it possible to keep timer interrupts enabled while the
>> >> > "late" suspend and "early" resume callbacks provided by device
>> >> > drivers are being executed.
>> >> >
>> >> > Use these functions to rework the handling of interrupts during
>> >> > suspend (hibernation) and resume.  Namely, interrupts will only be
>> >> > disabled on the CPU right before suspending sysdevs, while device
>> >> > drivers will be prevented from receiving interrupts, with the help of
>> >> > the new helper function, before their "late" suspend callbacks run
>> >> > (and analogously during resume).
>> >> >
>> >> > In addition, since the device interrups are now disabled before the
>> >> > CPU has turned all interrupts off and the CPU will ACK the interrupts
>> >> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
>> >> > any wake-up interrupts are pending and abort suspend if that's the
>> >> > case.
>> >> >
>> >>
>> >>
>> >> > +void resume_device_irqs(void)
>> >> > +{
>> >> > +       struct irq_desc *desc;
>> >> > +       int irq;
>> >> > +
>> >> > +       for_each_irq_desc(irq, desc)
>> >> > +               if (desc->status & IRQ_SUSPENDED)
>> >> > +                       enable_irq(irq);
>> >> > +}
>> >>
>> >> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.
>> >
>> > enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
>> >
>>
>> I'm if I missed that discussion, but enable_irq cannot know who is
>> calling it and therefore cannot know if IRQ_SUSPENDED should be
>> cleared.
>
> This change has been requested by Ingo and for a reason.
>
> Ingo, what's your opinion?
>
>> >> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
>> >> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
>> >> >                break;
>> >> >        case 1: {
>> >> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
>> >> > +               unsigned int status;
>> >> >
>> >> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
>> >> >                /* Prevent probing on this irq: */
>> >> >                desc->status = status | IRQ_NOPROBE;
>> >> >                check_irq_resend(desc, irq);
>> >>
>> >> This only clears IRQ_SUSPENDED if the interrupt was not disabled
>> >> elsewhere. If a driver calls interrupt_disable in suspend_late, but
>> >> calls interrupt_enable lazily, resume_device_irqs will reenable the
>> >> interrupt even though the driver has a disable reference.
>> >
>> > Then I'd regard the driver as buggy.
>>
>> The bug is not in the driver. The driver called disable_irq once. You
>> called disable_irq once, but enable_irq twice.
>
> Please.
>
> Can you show me a _single_ _driver_ currently in the tree doing something
> like you describe in suspend_late and resume_early?  If you can't, then please
> give up.

I don't know if any drivers call disable_irq or enable_irq in their
suspend hooks, but your change also allow timers, and I assume kernel
threads, to run during this phase.

There are several drivers (keypad drivers in particular), in tree and
out of tree, that call enable_irq from timers, and disable_irq from
their interrupt handler. If you also apply your later change to
disable non boot cpus after suspend_device_irqs, then on smp systems
the interrupt handler may run at the same time as suspend_device_irqs.
If suspend_device_irqs gets the spinlock first, then IRQ_SUSPENDED
gets set. If another suspend/resume cycle happens before the timer
runs, you will incorrectly enable the interrupt.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-02 23:27           ` Rafael J. Wysocki
  2009-03-03 22:56             ` Arve Hjønnevåg
@ 2009-03-03 22:56             ` Arve Hjønnevåg
  1 sibling, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-03 22:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

On Mon, Mar 2, 2009 at 3:27 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
>> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
>> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> >> > From: Rafael J. Wysocki <rjw@sisk.pl>
>> >> >
>> >> > Introduce two helper functions allowing us to prevent device drivers
>> >> > from getting any interrupts (without disabling interrupts on the CPU)
>> >> > during suspend (or hibernation) and to make them start to receive
>> >> > interrupts again during the subsequent resume, respectively.  These
>> >> > functions make it possible to keep timer interrupts enabled while the
>> >> > "late" suspend and "early" resume callbacks provided by device
>> >> > drivers are being executed.
>> >> >
>> >> > Use these functions to rework the handling of interrupts during
>> >> > suspend (hibernation) and resume.  Namely, interrupts will only be
>> >> > disabled on the CPU right before suspending sysdevs, while device
>> >> > drivers will be prevented from receiving interrupts, with the help of
>> >> > the new helper function, before their "late" suspend callbacks run
>> >> > (and analogously during resume).
>> >> >
>> >> > In addition, since the device interrups are now disabled before the
>> >> > CPU has turned all interrupts off and the CPU will ACK the interrupts
>> >> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
>> >> > any wake-up interrupts are pending and abort suspend if that's the
>> >> > case.
>> >> >
>> >>
>> >>
>> >> > +void resume_device_irqs(void)
>> >> > +{
>> >> > +       struct irq_desc *desc;
>> >> > +       int irq;
>> >> > +
>> >> > +       for_each_irq_desc(irq, desc)
>> >> > +               if (desc->status & IRQ_SUSPENDED)
>> >> > +                       enable_irq(irq);
>> >> > +}
>> >>
>> >> I think you need to clear IRQ_SUSPENDED here, not in enable_irq.
>> >
>> > enable_irq() clears IRQ_SUSPENDED.  This has already been discussed btw.
>> >
>>
>> I'm if I missed that discussion, but enable_irq cannot know who is
>> calling it and therefore cannot know if IRQ_SUSPENDED should be
>> cleared.
>
> This change has been requested by Ingo and for a reason.
>
> Ingo, what's your opinion?
>
>> >> > @@ -222,8 +222,9 @@ static void __enable_irq(struct irq_desc
>> >> >                WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
>> >> >                break;
>> >> >        case 1: {
>> >> > -               unsigned int status = desc->status & ~IRQ_DISABLED;
>> >> > +               unsigned int status;
>> >> >
>> >> > +               status = desc->status & ~(IRQ_DISABLED | IRQ_SUSPENDED);
>> >> >                /* Prevent probing on this irq: */
>> >> >                desc->status = status | IRQ_NOPROBE;
>> >> >                check_irq_resend(desc, irq);
>> >>
>> >> This only clears IRQ_SUSPENDED if the interrupt was not disabled
>> >> elsewhere. If a driver calls interrupt_disable in suspend_late, but
>> >> calls interrupt_enable lazily, resume_device_irqs will reenable the
>> >> interrupt even though the driver has a disable reference.
>> >
>> > Then I'd regard the driver as buggy.
>>
>> The bug is not in the driver. The driver called disable_irq once. You
>> called disable_irq once, but enable_irq twice.
>
> Please.
>
> Can you show me a _single_ _driver_ currently in the tree doing something
> like you describe in suspend_late and resume_early?  If you can't, then please
> give up.

I don't know if any drivers call disable_irq or enable_irq in their
suspend hooks, but your change also allow timers, and I assume kernel
threads, to run during this phase.

There are several drivers (keypad drivers in particular), in tree and
out of tree, that call enable_irq from timers, and disable_irq from
their interrupt handler. If you also apply your later change to
disable non boot cpus after suspend_device_irqs, then on smp systems
the interrupt handler may run at the same time as suspend_device_irqs.
If suspend_device_irqs gets the spinlock first, then IRQ_SUSPENDED
gets set. If another suspend/resume cycle happens before the timer
runs, you will incorrectly enable the interrupt.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [Update, rev. 5] Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-03 22:56             ` Arve Hjønnevåg
@ 2009-03-04 22:03               ` Rafael J. Wysocki
  2009-03-05 10:35                 ` Ingo Molnar
  2009-03-05 10:35                 ` Ingo Molnar
  2009-03-04 22:03               ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-04 22:03 UTC (permalink / raw)
  To: Arve Hjønnevåg, Ingo Molnar
  Cc: LKML, Linus Torvalds, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner, Alan Stern, Johannes Berg

On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> On Mon, Mar 2, 2009 at 3:27 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> >> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> >> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
[--snip--]
> > Can you show me a _single_ _driver_ currently in the tree doing something
> > like you describe in suspend_late and resume_early?  If you can't, then please
> > give up.
> 
> I don't know if any drivers call disable_irq or enable_irq in their
> suspend hooks, but your change also allow timers, and I assume kernel
> threads, to run during this phase.
> 
> There are several drivers (keypad drivers in particular), in tree and
> out of tree, that call enable_irq from timers, and disable_irq from
> their interrupt handler. If you also apply your later change to
> disable non boot cpus after suspend_device_irqs, then on smp systems
> the interrupt handler may run at the same time as suspend_device_irqs.
> If suspend_device_irqs gets the spinlock first, then IRQ_SUSPENDED
> gets set. If another suspend/resume cycle happens before the timer
> runs, you will incorrectly enable the interrupt.

Well, unfortunately this is a valid point IMO.  I've been thinking for quite a
while how to fix it nicely, but I'm not sure if there is a nice fix.

Below is an updated patch, hopefully everyone will be fine with it.

Ingo, is making __enable_irq() an extern function acceptable?

Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 5)

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++--
 drivers/base/power/main.c |   20 +++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 ++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    1 
 kernel/irq/manage.c       |    2 -
 kernel/irq/pm.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++------
 kernel/power/main.c       |   17 +++++---
 13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		bool sync = false;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			if (!desc->depth++) {
+				desc->status |= IRQ_DISABLED;
+				desc->chip->disable(irq);
+				sync = true;
+			}
+			desc->status |= IRQ_SUSPENDED;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (sync)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		desc->status &= ~IRQ_SUSPENDED;
+		__enable_irq(desc, irq);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [Update, rev. 5] Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-03 22:56             ` Arve Hjønnevåg
  2009-03-04 22:03               ` [Update, rev. 5] " Rafael J. Wysocki
@ 2009-03-04 22:03               ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-04 22:03 UTC (permalink / raw)
  To: Arve Hjønnevåg, Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Eric W. Biederman, pm list, Linus Torvalds, Thomas Gleixner

On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> On Mon, Mar 2, 2009 at 3:27 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> >> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> >> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
[--snip--]
> > Can you show me a _single_ _driver_ currently in the tree doing something
> > like you describe in suspend_late and resume_early?  If you can't, then please
> > give up.
> 
> I don't know if any drivers call disable_irq or enable_irq in their
> suspend hooks, but your change also allow timers, and I assume kernel
> threads, to run during this phase.
> 
> There are several drivers (keypad drivers in particular), in tree and
> out of tree, that call enable_irq from timers, and disable_irq from
> their interrupt handler. If you also apply your later change to
> disable non boot cpus after suspend_device_irqs, then on smp systems
> the interrupt handler may run at the same time as suspend_device_irqs.
> If suspend_device_irqs gets the spinlock first, then IRQ_SUSPENDED
> gets set. If another suspend/resume cycle happens before the timer
> runs, you will incorrectly enable the interrupt.

Well, unfortunately this is a valid point IMO.  I've been thinking for quite a
while how to fix it nicely, but I'm not sure if there is a nice fix.

Below is an updated patch, hopefully everyone will be fine with it.

Ingo, is making __enable_irq() an extern function acceptable?

Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 5)

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++--
 drivers/base/power/main.c |   20 +++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 ++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    1 
 kernel/irq/manage.c       |    2 -
 kernel/irq/pm.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++------
 kernel/power/main.c       |   17 +++++---
 13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		bool sync = false;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			if (!desc->depth++) {
+				desc->status |= IRQ_DISABLED;
+				desc->chip->disable(irq);
+				sync = true;
+			}
+			desc->status |= IRQ_SUSPENDED;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (sync)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		desc->status &= ~IRQ_SUSPENDED;
+		__enable_irq(desc, irq);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [Update, rev. 5] Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-04 22:03               ` [Update, rev. 5] " Rafael J. Wysocki
  2009-03-05 10:35                 ` Ingo Molnar
@ 2009-03-05 10:35                 ` Ingo Molnar
  1 sibling, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-03-05 10:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Arve Hjønnevåg, LKML, Linus Torvalds,
	Eric W. Biederman, Benjamin Herrenschmidt, Jeremy Fitzhardinge,
	pm list, Len Brown, Jesse Barnes, Thomas Gleixner, Alan Stern,
	Johannes Berg


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> > On Mon, Mar 2, 2009 at 3:27 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> > >> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > >> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> > >> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> [--snip--]
> > > Can you show me a _single_ _driver_ currently in the tree doing something
> > > like you describe in suspend_late and resume_early?  If you can't, then please
> > > give up.
> > 
> > I don't know if any drivers call disable_irq or enable_irq in their
> > suspend hooks, but your change also allow timers, and I assume kernel
> > threads, to run during this phase.
> > 
> > There are several drivers (keypad drivers in particular), in tree and
> > out of tree, that call enable_irq from timers, and disable_irq from
> > their interrupt handler. If you also apply your later change to
> > disable non boot cpus after suspend_device_irqs, then on smp systems
> > the interrupt handler may run at the same time as suspend_device_irqs.
> > If suspend_device_irqs gets the spinlock first, then IRQ_SUSPENDED
> > gets set. If another suspend/resume cycle happens before the timer
> > runs, you will incorrectly enable the interrupt.
> 
> Well, unfortunately this is a valid point IMO.  I've been thinking for quite a
> while how to fix it nicely, but I'm not sure if there is a nice fix.
> 
> Below is an updated patch, hopefully everyone will be fine with it.
> 
> Ingo, is making __enable_irq() an extern function acceptable?

Sure, that's fine - it's a genirq internal function still 
between kernel/irq/manage.c and kernel/irq/pm.c.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [Update, rev. 5] Re: [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4)
  2009-03-04 22:03               ` [Update, rev. 5] " Rafael J. Wysocki
@ 2009-03-05 10:35                 ` Ingo Molnar
  2009-03-05 10:35                 ` Ingo Molnar
  1 sibling, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-03-05 10:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Eric W. Biederman, pm list, Linus Torvalds, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> > On Mon, Mar 2, 2009 at 3:27 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> > >> On Mon, Mar 2, 2009 at 3:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > >> > On Tuesday 03 March 2009, Arve Hjønnevåg wrote:
> > >> >> On Sun, Mar 1, 2009 at 2:24 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> [--snip--]
> > > Can you show me a _single_ _driver_ currently in the tree doing something
> > > like you describe in suspend_late and resume_early?  If you can't, then please
> > > give up.
> > 
> > I don't know if any drivers call disable_irq or enable_irq in their
> > suspend hooks, but your change also allow timers, and I assume kernel
> > threads, to run during this phase.
> > 
> > There are several drivers (keypad drivers in particular), in tree and
> > out of tree, that call enable_irq from timers, and disable_irq from
> > their interrupt handler. If you also apply your later change to
> > disable non boot cpus after suspend_device_irqs, then on smp systems
> > the interrupt handler may run at the same time as suspend_device_irqs.
> > If suspend_device_irqs gets the spinlock first, then IRQ_SUSPENDED
> > gets set. If another suspend/resume cycle happens before the timer
> > runs, you will incorrectly enable the interrupt.
> 
> Well, unfortunately this is a valid point IMO.  I've been thinking for quite a
> while how to fix it nicely, but I'm not sure if there is a nice fix.
> 
> Below is an updated patch, hopefully everyone will be fine with it.
> 
> Ingo, is making __enable_irq() an extern function acceptable?

Sure, that's fine - it's a genirq internal function still 
between kernel/irq/manage.c and kernel/irq/pm.c.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 17:38 ` Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-03-05 16:54   ` Pavel Machek
@ 2009-03-05 16:54   ` Pavel Machek
  3 siblings, 0 replies; 373+ messages in thread
From: Pavel Machek @ 2009-03-05 16:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner

Hi!

> Move the sysdev_suspend/resume from the callee to the callers, with
> no real change in semantics, so that we can rework the disabling of
> interrupts during suspend/hibernation.
> 
> This is based on an earlier patch from Linus.

> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -333,7 +333,6 @@ static void dpm_power_up(pm_message_t st
>   */
>  void device_power_up(pm_message_t state)
>  {
> -	sysdev_resume();
>  	dpm_power_up(state);
>  }

And at this point we can rename dpm_power_up -> device_power_up. Good.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
  2009-02-22 17:38 ` Rafael J. Wysocki
  2009-02-22 20:56   ` Adrian Bunk
  2009-02-22 20:56   ` Adrian Bunk
@ 2009-03-05 16:54   ` Pavel Machek
  2009-03-05 16:54   ` Pavel Machek
  3 siblings, 0 replies; 373+ messages in thread
From: Pavel Machek @ 2009-03-05 16:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

Hi!

> Move the sysdev_suspend/resume from the callee to the callers, with
> no real change in semantics, so that we can rework the disabling of
> interrupts during suspend/hibernation.
> 
> This is based on an earlier patch from Linus.

> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -333,7 +333,6 @@ static void dpm_power_up(pm_message_t st
>   */
>  void device_power_up(pm_message_t state)
>  {
> -	sysdev_resume();
>  	dpm_power_up(state);
>  }

And at this point we can rename dpm_power_up -> device_power_up. Good.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
  2009-03-01 22:21 ` Rafael J. Wysocki
@ 2009-03-05 23:44     ` Linus Torvalds
  2009-03-01 22:24   ` Rafael J. Wysocki
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-05 23:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner, Arve  Hjønnevåg, Alan Stern,
	Johannes Berg



On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
> 
> The following patches modifiy the way in which we handle disabling interrupts
> during suspend and enabling them during resume.  They also change the ordering
> of the core suspend and hibernation code.

Side note - I've tested them on the EeePC that had trouble resuming due to 
interrupt timings, and it suspends and resumes fine with these patches 
(modulo some new X problems, but that's what I get for living with 
Fedora-11 testing).

Of course, it also suspends and resumes without them, since the CPU "cli" 
was sufficient for that machine, and it doesn't have any ACPI issues. But 
it's still an ack that at least nothing breaks that I can tell.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
@ 2009-03-05 23:44     ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-05 23:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list



On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
> 
> The following patches modifiy the way in which we handle disabling interrupts
> during suspend and enabling them during resume.  They also change the ordering
> of the core suspend and hibernation code.

Side note - I've tested them on the EeePC that had trouble resuming due to 
interrupt timings, and it suspends and resumes fine with these patches 
(modulo some new X problems, but that's what I get for living with 
Fedora-11 testing).

Of course, it also suspends and resumes without them, since the CPU "cli" 
was sufficient for that machine, and it doesn't have any ACPI issues. But 
it's still an ack that at least nothing breaks that I can tell.

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
  2009-03-05 23:44     ` Linus Torvalds
  (?)
@ 2009-03-06  6:47     ` Sitsofe Wheeler
  -1 siblings, 0 replies; 373+ messages in thread
From: Sitsofe Wheeler @ 2009-03-06  6:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Alan Stern, Johannes Berg

On Thu, Mar 05, 2009 at 03:44:22PM -0800, Linus Torvalds wrote:
> 
 Side note - I've tested them on the EeePC that had trouble resuming due to 
> interrupt timings, and it suspends and resumes fine with these patches 
> (modulo some new X problems, but that's what I get for living with 
> Fedora-11 testing).

Since you have an EeePC I'm guessing your graphics card is an i9xx of
some variety. The i9xx kernel driver seems to have been recently
reworked so even those people who aren't using GEM yet are seeing issues
here and there (some with VT switching, some with suspend to ram and
others with suspend to disk). I actually don't know if the stuff poping
up is any more than usual but if so I'm hoping most new issues go away
once the dust settles...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
  2009-03-05 23:44     ` Linus Torvalds
  (?)
  (?)
@ 2009-03-06  6:47     ` Sitsofe Wheeler
  -1 siblings, 0 replies; 373+ messages in thread
From: Sitsofe Wheeler @ 2009-03-06  6:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list

On Thu, Mar 05, 2009 at 03:44:22PM -0800, Linus Torvalds wrote:
> 
 Side note - I've tested them on the EeePC that had trouble resuming due to 
> interrupt timings, and it suspends and resumes fine with these patches 
> (modulo some new X problems, but that's what I get for living with 
> Fedora-11 testing).

Since you have an EeePC I'm guessing your graphics card is an i9xx of
some variety. The i9xx kernel driver seems to have been recently
reworked so even those people who aren't using GEM yet are seeing issues
here and there (some with VT switching, some with suspend to ram and
others with suspend to disk). I actually don't know if the stuff poping
up is any more than usual but if so I'm hoping most new issues go away
once the dust settles...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
  2009-03-05 23:44     ` Linus Torvalds
                       ` (2 preceding siblings ...)
  (?)
@ 2009-03-06 10:19     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-06 10:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Ingo Molnar, Eric W. Biederman, Benjamin Herrenschmidt,
	Jeremy Fitzhardinge, pm list, Len Brown, Jesse Barnes,
	Thomas Gleixner, Arve Hjønnevåg, Alan Stern,
	Johannes Berg

On Friday 06 March 2009, Linus Torvalds wrote:
> 
> On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
> > 
> > The following patches modifiy the way in which we handle disabling interrupts
> > during suspend and enabling them during resume.  They also change the ordering
> > of the core suspend and hibernation code.
> 
> Side note - I've tested them on the EeePC that had trouble resuming due to 
> interrupt timings, and it suspends and resumes fine with these patches 
> (modulo some new X problems, but that's what I get for living with 
> Fedora-11 testing).
> 
> Of course, it also suspends and resumes without them, since the CPU "cli" 
> was sufficient for that machine, and it doesn't have any ACPI issues. But 
> it's still an ack that at least nothing breaks that I can tell.

OK, thanks for testing!  The next-step patches (ie. PCI suspend-resume rework)
are in the works.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume
  2009-03-05 23:44     ` Linus Torvalds
                       ` (3 preceding siblings ...)
  (?)
@ 2009-03-06 10:19     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-06 10:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Johannes Berg,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list

On Friday 06 March 2009, Linus Torvalds wrote:
> 
> On Sun, 1 Mar 2009, Rafael J. Wysocki wrote:
> > 
> > The following patches modifiy the way in which we handle disabling interrupts
> > during suspend and enabling them during resume.  They also change the ordering
> > of the core suspend and hibernation code.
> 
> Side note - I've tested them on the EeePC that had trouble resuming due to 
> interrupt timings, and it suspends and resumes fine with these patches 
> (modulo some new X problems, but that's what I get for living with 
> Fedora-11 testing).
> 
> Of course, it also suspends and resumes without them, since the CPU "cli" 
> was sufficient for that machine, and it doesn't have any ACPI issues. But 
> it's still an ack that at least nothing breaks that I can tell.

OK, thanks for testing!  The next-step patches (ie. PCI suspend-resume rework)
are in the works.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (10 preceding siblings ...)
  (?)
@ 2009-03-07 10:19 ` Rafael J. Wysocki
  2009-03-07 10:20     ` Rafael J. Wysocki
                     ` (16 more replies)
  -1 siblings, 17 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:19 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

Hi,

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/8 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

2/8 - 4/8 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

5/8 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

6/8 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

7/8 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

8/8 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.

Comments welcome.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (9 preceding siblings ...)
  (?)
@ 2009-03-07 10:19 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:19 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

Hi,

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/8 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

2/8 - 4/8 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

5/8 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

6/8 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

7/8 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

8/8 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.

Comments welcome.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-07 10:19 ` Rafael J. Wysocki
@ 2009-03-07 10:20     ` Rafael J. Wysocki
  2009-03-07 10:21   ` [RFC][PATCH][2/8] PM: Change suspend code ordering Rafael J. Wysocki
                       ` (15 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:20 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++--
 drivers/base/power/main.c |   20 +++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 ++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    1 
 kernel/irq/manage.c       |    2 -
 kernel/irq/pm.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++------
 kernel/power/main.c       |   17 +++++---
 13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		bool sync = false;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			if (!desc->depth++) {
+				desc->status |= IRQ_DISABLED;
+				desc->chip->disable(irq);
+				sync = true;
+			}
+			desc->status |= IRQ_SUSPENDED;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (sync)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		desc->status &= ~IRQ_SUSPENDED;
+		__enable_irq(desc, irq);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
@ 2009-03-07 10:20     ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:20 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++--
 drivers/base/power/main.c |   20 +++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 ++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    1 
 kernel/irq/manage.c       |    2 -
 kernel/irq/pm.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++------
 kernel/power/main.c       |   17 +++++---
 13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		bool sync = false;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			if (!desc->depth++) {
+				desc->status |= IRQ_DISABLED;
+				desc->chip->disable(irq);
+				sync = true;
+			}
+			desc->status |= IRQ_SUSPENDED;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (sync)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		desc->status &= ~IRQ_SUSPENDED;
+		__enable_irq(desc, irq);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][2/8] PM: Change suspend code ordering
  2009-03-07 10:19 ` Rafael J. Wysocki
  2009-03-07 10:20     ` Rafael J. Wysocki
@ 2009-03-07 10:21   ` Rafael J. Wysocki
  2009-03-07 10:21   ` Rafael J. Wysocki
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:21 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][2/8] PM: Change suspend code ordering
  2009-03-07 10:19 ` Rafael J. Wysocki
  2009-03-07 10:20     ` Rafael J. Wysocki
  2009-03-07 10:21   ` [RFC][PATCH][2/8] PM: Change suspend code ordering Rafael J. Wysocki
@ 2009-03-07 10:21   ` Rafael J. Wysocki
  2009-03-07 10:22   ` [RFC][PATCH][3/8] PM: Change hibernation " Rafael J. Wysocki
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:21 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][3/8] PM: Change hibernation code ordering
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2009-03-07 10:22   ` [RFC][PATCH][3/8] PM: Change hibernation " Rafael J. Wysocki
@ 2009-03-07 10:22   ` Rafael J. Wysocki
  2009-03-07 10:23   ` [RFC][PATCH][4/8] kexec: Change kexec jump " Rafael J. Wysocki
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:22 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][3/8] PM: Change hibernation code ordering
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-03-07 10:21   ` Rafael J. Wysocki
@ 2009-03-07 10:22   ` Rafael J. Wysocki
  2009-03-07 10:22   ` Rafael J. Wysocki
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:22 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][4/8] kexec: Change kexec jump code ordering
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (5 preceding siblings ...)
  2009-03-07 10:23   ` [RFC][PATCH][4/8] kexec: Change kexec jump " Rafael J. Wysocki
@ 2009-03-07 10:23   ` Rafael J. Wysocki
  2009-03-07 10:24   ` [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:23 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][4/8] kexec: Change kexec jump code ordering
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (4 preceding siblings ...)
  2009-03-07 10:22   ` Rafael J. Wysocki
@ 2009-03-07 10:23   ` Rafael J. Wysocki
  2009-03-07 10:23   ` Rafael J. Wysocki
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:23 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (6 preceding siblings ...)
  2009-03-07 10:23   ` Rafael J. Wysocki
@ 2009-03-07 10:24   ` Rafael J. Wysocki
  2009-03-07 10:24   ` Rafael J. Wysocki
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:24 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Frans Pop

From: Frans Pop <elendil@planet.nl>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 
 		pci_dev->state_saved = false;
 
-		i = drv->suspend(pci_dev, state);
-		suspend_report_result(drv->suspend, i);
-		if (i)
-			return i;
+		error = drv->suspend(pci_dev, state);
+		suspend_report_result(drv->suspend, error);
+		if (error)
+			return error;
 
 		if (pci_dev->state_saved)
 			goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return i;
+	return error;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend_late) {
-		i = drv->suspend_late(pci_dev, state);
-		suspend_report_result(drv->suspend_late, i);
+		error = drv->suspend_late(pci_dev, state);
+		suspend_report_result(drv->suspend_late, error);
 	}
-	return i;
+	return error;
 }
 
 static int pci_legacy_resume_early(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (7 preceding siblings ...)
  2009-03-07 10:24   ` [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
@ 2009-03-07 10:24   ` Rafael J. Wysocki
  2009-03-07 10:25   ` [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:24 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	pm list

From: Frans Pop <elendil@planet.nl>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 
 		pci_dev->state_saved = false;
 
-		i = drv->suspend(pci_dev, state);
-		suspend_report_result(drv->suspend, i);
-		if (i)
-			return i;
+		error = drv->suspend(pci_dev, state);
+		suspend_report_result(drv->suspend, error);
+		if (error)
+			return error;
 
 		if (pci_dev->state_saved)
 			goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return i;
+	return error;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend_late) {
-		i = drv->suspend_late(pci_dev, state);
-		suspend_report_result(drv->suspend_late, i);
+		error = drv->suspend_late(pci_dev, state);
+		suspend_report_result(drv->suspend_late, error);
 	}
-	return i;
+	return error;
 }
 
 static int pci_legacy_resume_early(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (8 preceding siblings ...)
  2009-03-07 10:24   ` Rafael J. Wysocki
@ 2009-03-07 10:25   ` Rafael J. Wysocki
  2009-03-07 10:25   ` Rafael J. Wysocki
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:25 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time.  Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up.  Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   48 +++++++++---------------------------------------
 1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
  *                           given PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
 	bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
 		break;
 	case PCI_UNKNOWN: /* Boot-up */
 		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
 			need_restore = true;
-			wait = true;
-		}
 		/* Fall-through: force to D0 */
 	default:
 		pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
 	/* enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
-	if (!wait)
-		return 0;
-
 	/* Mandatory power management transition delays */
 	/* see PCI PM 1.1 5.6.1 table 18 */
 	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
 	if (need_restore)
 		pci_restore_bars(dev);
 
-	if (wait && dev->bus->self)
+	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
 	if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	error = pci_raw_set_power_state(dev, state, true);
+	error = pci_raw_set_power_state(dev, state);
 
 	if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
 		/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
  */
 int pci_restore_standard_config(struct pci_dev *dev)
 {
-	pci_power_t prev_state;
-	int error;
-
-	pci_update_current_state(dev, PCI_D0);
-
-	prev_state = dev->current_state;
-	if (prev_state == PCI_D0)
-		goto Restore;
-
-	error = pci_raw_set_power_state(dev, PCI_D0, false);
-	if (error)
-		return error;
+	pci_update_current_state(dev, PCI_UNKNOWN);
 
-	/*
-	 * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
-	 * we've made this assumption forever and it appears to be universally
-	 * satisfied.
-	 */
-	switch(prev_state) {
-	case PCI_D3cold:
-	case PCI_D3hot:
-		mdelay(pci_pm_d3_delay);
-		break;
-	case PCI_D2:
-		udelay(PCI_PM_D2_DELAY);
-		break;
+	if (dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(dev, PCI_D0);
+		if (error)
+			return error;
 	}
 
-	pci_update_current_state(dev, PCI_D0);
-
- Restore:
 	return dev->state_saved ? pci_restore_state(dev) : 0;
 }
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (9 preceding siblings ...)
  2009-03-07 10:25   ` [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
@ 2009-03-07 10:25   ` Rafael J. Wysocki
  2009-03-07 10:26   ` [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:25 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time.  Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up.  Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   48 +++++++++---------------------------------------
 1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
  *                           given PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
 	bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
 		break;
 	case PCI_UNKNOWN: /* Boot-up */
 		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
 			need_restore = true;
-			wait = true;
-		}
 		/* Fall-through: force to D0 */
 	default:
 		pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
 	/* enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
-	if (!wait)
-		return 0;
-
 	/* Mandatory power management transition delays */
 	/* see PCI PM 1.1 5.6.1 table 18 */
 	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
 	if (need_restore)
 		pci_restore_bars(dev);
 
-	if (wait && dev->bus->self)
+	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
 	if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	error = pci_raw_set_power_state(dev, state, true);
+	error = pci_raw_set_power_state(dev, state);
 
 	if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
 		/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
  */
 int pci_restore_standard_config(struct pci_dev *dev)
 {
-	pci_power_t prev_state;
-	int error;
-
-	pci_update_current_state(dev, PCI_D0);
-
-	prev_state = dev->current_state;
-	if (prev_state == PCI_D0)
-		goto Restore;
-
-	error = pci_raw_set_power_state(dev, PCI_D0, false);
-	if (error)
-		return error;
+	pci_update_current_state(dev, PCI_UNKNOWN);
 
-	/*
-	 * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
-	 * we've made this assumption forever and it appears to be universally
-	 * satisfied.
-	 */
-	switch(prev_state) {
-	case PCI_D3cold:
-	case PCI_D3hot:
-		mdelay(pci_pm_d3_delay);
-		break;
-	case PCI_D2:
-		udelay(PCI_PM_D2_DELAY);
-		break;
+	if (dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(dev, PCI_D0);
+		if (error)
+			return error;
 	}
 
-	pci_update_current_state(dev, PCI_D0);
-
- Restore:
 	return dev->state_saved ? pci_restore_state(dev) : 0;
 }
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (10 preceding siblings ...)
  2009-03-07 10:25   ` Rafael J. Wysocki
@ 2009-03-07 10:26   ` Rafael J. Wysocki
  2009-03-07 10:26   ` Rafael J. Wysocki
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:26 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   17 +++++++++++++++++
 drivers/pci/pci.c        |   21 ---------------------
 drivers/pci/pci.h        |    1 -
 3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi
 
 /* Auxiliary functions used by the new power management framework */
 
+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+	pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+	if (pci_dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(pci_dev, PCI_D0);
+		if (error)
+			return error;
+	}
+
+	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
 	pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
 }
 
 /**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
-	pci_update_current_state(dev, PCI_UNKNOWN);
-
-	if (dev->current_state != PCI_D0) {
-		int error = pci_set_power_state(dev, PCI_D0);
-		if (error)
-			return error;
-	}
-
-	return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
  * pci_enable_ari - enable ARI forwarding if hardware support it
  * @dev: the PCI device
  */
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
 extern void pci_pm_init(struct pci_dev *dev);
 extern void platform_pci_wakeup_init(struct pci_dev *dev);
 extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);
 
 static inline bool pci_is_bridge(struct pci_dev *pci_dev)
 {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (11 preceding siblings ...)
  2009-03-07 10:26   ` [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
@ 2009-03-07 10:26   ` Rafael J. Wysocki
  2009-03-07 10:27   ` [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend Rafael J. Wysocki
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:26 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   17 +++++++++++++++++
 drivers/pci/pci.c        |   21 ---------------------
 drivers/pci/pci.h        |    1 -
 3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi
 
 /* Auxiliary functions used by the new power management framework */
 
+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+	pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+	if (pci_dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(pci_dev, PCI_D0);
+		if (error)
+			return error;
+	}
+
+	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
 	pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
 }
 
 /**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
-	pci_update_current_state(dev, PCI_UNKNOWN);
-
-	if (dev->current_state != PCI_D0) {
-		int error = pci_set_power_state(dev, PCI_D0);
-		if (error)
-			return error;
-	}
-
-	return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
  * pci_enable_ari - enable ARI forwarding if hardware support it
  * @dev: the PCI device
  */
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
 extern void pci_pm_init(struct pci_dev *dev);
 extern void platform_pci_wakeup_init(struct pci_dev *dev);
 extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);
 
 static inline bool pci_is_bridge(struct pci_dev *pci_dev)
 {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (13 preceding siblings ...)
  2009-03-07 10:27   ` [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend Rafael J. Wysocki
@ 2009-03-07 10:27   ` Rafael J. Wysocki
  2009-03-08 19:28   ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Frans Pop
  2009-03-08 19:28   ` Frans Pop
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:27 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, pm list, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time.  We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver.  At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |  129 ++++++++++++++++++++++++++++-------------------
 1 file changed, 77 insertions(+), 52 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
+
+	pci_dev->state_saved = false;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
-
-		pci_dev->state_saved = false;
+		int error;
 
 		error = drv->suspend(pci_dev, state);
 		suspend_report_result(drv->suspend, error);
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: Device state not saved by %pF\n",
 				drv->suspend);
-			goto Fixup;
 		}
 	}
 
-	pci_save_state(pci_dev);
-	/*
-	 * This is for compatibility with existing code with legacy PM support.
-	 */
-	pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
 
 	if (drv && drv->suspend_late) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
 		error = drv->suspend_late(pci_dev, state);
 		suspend_report_result(drv->suspend_late, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: Device state not saved by %pF\n",
+				drv->suspend_late);
+			return 0;
+		}
 	}
-	return error;
+
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
+
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
 	/* Disable non-bridge devices without PM support */
 	if (!pci_is_bridge(pci_dev))
 		pci_disable_enabled_device(pci_dev);
-	pci_save_state(pci_dev);
 }
 
 static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device 
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: State of device not saved by %pF\n",
 				pm->suspend);
-			goto Fixup;
 		}
 	}
 
-	if (!pci_dev->state_saved) {
-		pci_save_state(pci_dev);
-		if (!pci_is_bridge(pci_dev))
-			pci_prepare_to_sleep(pci_dev);
-	}
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device 
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (drv && drv->pm && drv->pm->suspend_noirq) {
-		error = drv->pm->suspend_noirq(dev);
-		suspend_report_result(drv->pm->suspend_noirq, error);
+	if (!pm)
+		return 0;
+
+	if (pm->suspend_noirq) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
+		error = pm->suspend_noirq(dev);
+		suspend_report_result(pm->suspend_noirq, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: State of device not saved by %pF\n",
+				pm->suspend_noirq);
+			return 0;
+		}
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved) {
+		pci_save_state(pci_dev);
+		if (!pci_is_bridge(pci_dev))
+			pci_prepare_to_sleep(pci_dev);
+	}
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (!pci_dev->state_saved)
-		pci_save_state(pci_dev);
-
 	return 0;
 }
 
@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
 	if (drv && drv->pm && drv->pm->freeze_noirq) {
+		int error;
+
 		error = drv->pm->freeze_noirq(dev);
 		suspend_report_result(drv->pm->freeze_noirq, error);
+		if (error)
+			return error;
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,41 @@ static int pci_pm_poweroff(struct device
 	pci_dev->state_saved = false;
 
 	if (pm->poweroff) {
+		int error;
+
 		error = pm->poweroff(dev);
 		suspend_report_result(pm->poweroff, error);
+		if (error)
+			return error;
 	}
 
-	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
-		pci_prepare_to_sleep(pci_dev);
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
 	if (drv && drv->pm && drv->pm->poweroff_noirq) {
+		int error;
+
 		error = drv->pm->poweroff_noirq(dev);
 		suspend_report_result(drv->pm->poweroff_noirq, error);
+		if (error)
+			return error;
 	}
 
-	return error;
+	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+		pci_prepare_to_sleep(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_restore_noirq(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (12 preceding siblings ...)
  2009-03-07 10:26   ` Rafael J. Wysocki
@ 2009-03-07 10:27   ` Rafael J. Wysocki
  2009-03-07 10:27   ` Rafael J. Wysocki
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 10:27 UTC (permalink / raw)
  To: LKML
  Cc: Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time.  We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver.  At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |  129 ++++++++++++++++++++++++++++-------------------
 1 file changed, 77 insertions(+), 52 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
+
+	pci_dev->state_saved = false;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
-
-		pci_dev->state_saved = false;
+		int error;
 
 		error = drv->suspend(pci_dev, state);
 		suspend_report_result(drv->suspend, error);
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: Device state not saved by %pF\n",
 				drv->suspend);
-			goto Fixup;
 		}
 	}
 
-	pci_save_state(pci_dev);
-	/*
-	 * This is for compatibility with existing code with legacy PM support.
-	 */
-	pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
 
 	if (drv && drv->suspend_late) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
 		error = drv->suspend_late(pci_dev, state);
 		suspend_report_result(drv->suspend_late, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: Device state not saved by %pF\n",
+				drv->suspend_late);
+			return 0;
+		}
 	}
-	return error;
+
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
+
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
 	/* Disable non-bridge devices without PM support */
 	if (!pci_is_bridge(pci_dev))
 		pci_disable_enabled_device(pci_dev);
-	pci_save_state(pci_dev);
 }
 
 static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device 
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: State of device not saved by %pF\n",
 				pm->suspend);
-			goto Fixup;
 		}
 	}
 
-	if (!pci_dev->state_saved) {
-		pci_save_state(pci_dev);
-		if (!pci_is_bridge(pci_dev))
-			pci_prepare_to_sleep(pci_dev);
-	}
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device 
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (drv && drv->pm && drv->pm->suspend_noirq) {
-		error = drv->pm->suspend_noirq(dev);
-		suspend_report_result(drv->pm->suspend_noirq, error);
+	if (!pm)
+		return 0;
+
+	if (pm->suspend_noirq) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
+		error = pm->suspend_noirq(dev);
+		suspend_report_result(pm->suspend_noirq, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: State of device not saved by %pF\n",
+				pm->suspend_noirq);
+			return 0;
+		}
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved) {
+		pci_save_state(pci_dev);
+		if (!pci_is_bridge(pci_dev))
+			pci_prepare_to_sleep(pci_dev);
+	}
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (!pci_dev->state_saved)
-		pci_save_state(pci_dev);
-
 	return 0;
 }
 
@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
 	if (drv && drv->pm && drv->pm->freeze_noirq) {
+		int error;
+
 		error = drv->pm->freeze_noirq(dev);
 		suspend_report_result(drv->pm->freeze_noirq, error);
+		if (error)
+			return error;
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,41 @@ static int pci_pm_poweroff(struct device
 	pci_dev->state_saved = false;
 
 	if (pm->poweroff) {
+		int error;
+
 		error = pm->poweroff(dev);
 		suspend_report_result(pm->poweroff, error);
+		if (error)
+			return error;
 	}
 
-	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
-		pci_prepare_to_sleep(pci_dev);
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
 	if (drv && drv->pm && drv->pm->poweroff_noirq) {
+		int error;
+
 		error = drv->pm->poweroff_noirq(dev);
 		suspend_report_result(drv->pm->poweroff_noirq, error);
+		if (error)
+			return error;
 	}
 
-	return error;
+	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+		pci_prepare_to_sleep(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_restore_noirq(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-07 10:20     ` Rafael J. Wysocki
  (?)
@ 2009-03-07 16:51     ` Alan Stern
  2009-03-07 17:56       ` Rafael J. Wysocki
  2009-03-07 17:56       ` [linux-pm] " Rafael J. Wysocki
  -1 siblings, 2 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-07 16:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Arve, Jeremy Fitzhardinge, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Introduce two helper functions allowing us to prevent device drivers
> from getting any interrupts (without disabling interrupts on the CPU)
> during suspend (or hibernation) and to make them start to receive
> interrupts again during the subsequent resume, respectively.  These
> functions make it possible to keep timer interrupts enabled while the
> "late" suspend and "early" resume callbacks provided by device
> drivers are being executed.
> 
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> drivers will be prevented from receiving interrupts, with the help of
> the new helper function, before their "late" suspend callbacks run
> (and analogously during resume).
> 
> In addition, since the device interrups are now disabled before the
> CPU has turned all interrupts off and the CPU will ACK the interrupts
> setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> any wake-up interrupts are pending and abort suspend if that's the
> case.

One thing about this isn't clear: the distinction between "wake-up" 
interrupts and other interrupts.

In an ideal world, the only pending interrupts during sysdev_suspend
would be wake-up interrupts, because drivers would have prevented their
devices from generating any other kind of IRQ and would have done all
the necessary synchronization as part of their suspend (_not_
suspend_late) methods.  Thus there would be no need to distinguish
between wake-up and non-wake-up interrupts.

So perhaps you're worried about drivers that aren't sufficiently
clever.  Or is something deeper going on?

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-07 10:20     ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-03-07 16:51     ` Alan Stern
  -1 siblings, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-07 16:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Arve, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Linus Torvalds, Ingo Molnar

On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Introduce two helper functions allowing us to prevent device drivers
> from getting any interrupts (without disabling interrupts on the CPU)
> during suspend (or hibernation) and to make them start to receive
> interrupts again during the subsequent resume, respectively.  These
> functions make it possible to keep timer interrupts enabled while the
> "late" suspend and "early" resume callbacks provided by device
> drivers are being executed.
> 
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume.  Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> drivers will be prevented from receiving interrupts, with the help of
> the new helper function, before their "late" suspend callbacks run
> (and analogously during resume).
> 
> In addition, since the device interrups are now disabled before the
> CPU has turned all interrupts off and the CPU will ACK the interrupts
> setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> any wake-up interrupts are pending and abort suspend if that's the
> case.

One thing about this isn't clear: the distinction between "wake-up" 
interrupts and other interrupts.

In an ideal world, the only pending interrupts during sysdev_suspend
would be wake-up interrupts, because drivers would have prevented their
devices from generating any other kind of IRQ and would have done all
the necessary synchronization as part of their suspend (_not_
suspend_late) methods.  Thus there would be no need to distinguish
between wake-up and non-wake-up interrupts.

So perhaps you're worried about drivers that aren't sufficiently
clever.  Or is something deeper going on?

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-07 16:51     ` [linux-pm] " Alan Stern
  2009-03-07 17:56       ` Rafael J. Wysocki
@ 2009-03-07 17:56       ` Rafael J. Wysocki
  2009-03-08  3:53         ` Alan Stern
  2009-03-08  3:53         ` [linux-pm] " Alan Stern
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 17:56 UTC (permalink / raw)
  To: Alan Stern, Jeremy Fitzhardinge
  Cc: LKML, Jesse Barnes, Thomas Gleixner, Eric W. Biederman,
	Ingo Molnar, Linus Torvalds, pm list, Arve Hjønnevåg

On Saturday 07 March 2009, Alan Stern wrote:
> On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Introduce two helper functions allowing us to prevent device drivers
> > from getting any interrupts (without disabling interrupts on the CPU)
> > during suspend (or hibernation) and to make them start to receive
> > interrupts again during the subsequent resume, respectively.  These
> > functions make it possible to keep timer interrupts enabled while the
> > "late" suspend and "early" resume callbacks provided by device
> > drivers are being executed.
> > 
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > drivers will be prevented from receiving interrupts, with the help of
> > the new helper function, before their "late" suspend callbacks run
> > (and analogously during resume).
> > 
> > In addition, since the device interrups are now disabled before the
> > CPU has turned all interrupts off and the CPU will ACK the interrupts
> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> > any wake-up interrupts are pending and abort suspend if that's the
> > case.
> 
> One thing about this isn't clear: the distinction between "wake-up" 
> interrupts and other interrupts.
> 
> In an ideal world, the only pending interrupts during sysdev_suspend
> would be wake-up interrupts, because drivers would have prevented their
> devices from generating any other kind of IRQ and would have done all
> the necessary synchronization as part of their suspend (_not_
> suspend_late) methods.  Thus there would be no need to distinguish
> between wake-up and non-wake-up interrupts.
> 
> So perhaps you're worried about drivers that aren't sufficiently
> clever.  Or is something deeper going on?

Some drivers leave interrupts enabled during suspend on purpose and mark
them as "wake-up interrupts" so that the platform can abort suspend if any
of them is pending at the time the "enter suspend" hook is called (this doesn't
happen on x86 AFAICS).

However, after the $subject patch the CPU will ACK those interrupts if they
happen between suspend_device_irqs() and local_irq_disable(), so the platform
won't see them as pending.  Instead, they will have IRQ_PENDING set in
desc->status, so we check if this is the case.

Thanks,
Rafael



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-07 16:51     ` [linux-pm] " Alan Stern
@ 2009-03-07 17:56       ` Rafael J. Wysocki
  2009-03-07 17:56       ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-07 17:56 UTC (permalink / raw)
  To: Alan Stern, Jeremy Fitzhardinge
  Cc: Arve, LKML, Jesse Barnes, Eric W. Biederman, pm list,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On Saturday 07 March 2009, Alan Stern wrote:
> On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Introduce two helper functions allowing us to prevent device drivers
> > from getting any interrupts (without disabling interrupts on the CPU)
> > during suspend (or hibernation) and to make them start to receive
> > interrupts again during the subsequent resume, respectively.  These
> > functions make it possible to keep timer interrupts enabled while the
> > "late" suspend and "early" resume callbacks provided by device
> > drivers are being executed.
> > 
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume.  Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > drivers will be prevented from receiving interrupts, with the help of
> > the new helper function, before their "late" suspend callbacks run
> > (and analogously during resume).
> > 
> > In addition, since the device interrups are now disabled before the
> > CPU has turned all interrupts off and the CPU will ACK the interrupts
> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> > any wake-up interrupts are pending and abort suspend if that's the
> > case.
> 
> One thing about this isn't clear: the distinction between "wake-up" 
> interrupts and other interrupts.
> 
> In an ideal world, the only pending interrupts during sysdev_suspend
> would be wake-up interrupts, because drivers would have prevented their
> devices from generating any other kind of IRQ and would have done all
> the necessary synchronization as part of their suspend (_not_
> suspend_late) methods.  Thus there would be no need to distinguish
> between wake-up and non-wake-up interrupts.
> 
> So perhaps you're worried about drivers that aren't sufficiently
> clever.  Or is something deeper going on?

Some drivers leave interrupts enabled during suspend on purpose and mark
them as "wake-up interrupts" so that the platform can abort suspend if any
of them is pending at the time the "enter suspend" hook is called (this doesn't
happen on x86 AFAICS).

However, after the $subject patch the CPU will ACK those interrupts if they
happen between suspend_device_irqs() and local_irq_disable(), so the platform
won't see them as pending.  Instead, they will have IRQ_PENDING set in
desc->status, so we check if this is the case.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-07 17:56       ` [linux-pm] " Rafael J. Wysocki
  2009-03-08  3:53         ` Alan Stern
@ 2009-03-08  3:53         ` Alan Stern
  2009-03-08 10:00           ` Rafael J. Wysocki
                             ` (3 more replies)
  1 sibling, 4 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-08  3:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list,
	Arve Hjønnevåg

On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:

> > One thing about this isn't clear: the distinction between "wake-up" 
> > interrupts and other interrupts.
> > 
> > In an ideal world, the only pending interrupts during sysdev_suspend
> > would be wake-up interrupts, because drivers would have prevented their
> > devices from generating any other kind of IRQ and would have done all
> > the necessary synchronization as part of their suspend (_not_
> > suspend_late) methods.  Thus there would be no need to distinguish
> > between wake-up and non-wake-up interrupts.
> > 
> > So perhaps you're worried about drivers that aren't sufficiently
> > clever.  Or is something deeper going on?
> 
> Some drivers leave interrupts enabled during suspend on purpose and mark
> them as "wake-up interrupts" so that the platform can abort suspend if any
> of them is pending at the time the "enter suspend" hook is called (this doesn't
> happen on x86 AFAICS).
> 
> However, after the $subject patch the CPU will ACK those interrupts if they
> happen between suspend_device_irqs() and local_irq_disable(), so the platform
> won't see them as pending.  Instead, they will have IRQ_PENDING set in
> desc->status, so we check if this is the case.

You didn't answer my question.  Why bother to distinguish between 
"wake-up" interrupts and non-"wake-up" interrupts?

In other words, why not simply abort the suspend if IRQ_PENDING is set
for _any_ interrupt during sysdev_suspend()?

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-07 17:56       ` [linux-pm] " Rafael J. Wysocki
@ 2009-03-08  3:53         ` Alan Stern
  2009-03-08  3:53         ` [linux-pm] " Alan Stern
  1 sibling, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-08  3:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Ingo Molnar, Linus Torvalds, Thomas Gleixner

On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:

> > One thing about this isn't clear: the distinction between "wake-up" 
> > interrupts and other interrupts.
> > 
> > In an ideal world, the only pending interrupts during sysdev_suspend
> > would be wake-up interrupts, because drivers would have prevented their
> > devices from generating any other kind of IRQ and would have done all
> > the necessary synchronization as part of their suspend (_not_
> > suspend_late) methods.  Thus there would be no need to distinguish
> > between wake-up and non-wake-up interrupts.
> > 
> > So perhaps you're worried about drivers that aren't sufficiently
> > clever.  Or is something deeper going on?
> 
> Some drivers leave interrupts enabled during suspend on purpose and mark
> them as "wake-up interrupts" so that the platform can abort suspend if any
> of them is pending at the time the "enter suspend" hook is called (this doesn't
> happen on x86 AFAICS).
> 
> However, after the $subject patch the CPU will ACK those interrupts if they
> happen between suspend_device_irqs() and local_irq_disable(), so the platform
> won't see them as pending.  Instead, they will have IRQ_PENDING set in
> desc->status, so we check if this is the case.

You didn't answer my question.  Why bother to distinguish between 
"wake-up" interrupts and non-"wake-up" interrupts?

In other words, why not simply abort the suspend if IRQ_PENDING is set
for _any_ interrupt during sysdev_suspend()?

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08  3:53         ` [linux-pm] " Alan Stern
  2009-03-08 10:00           ` Rafael J. Wysocki
@ 2009-03-08 10:00           ` Rafael J. Wysocki
  2009-03-08 12:37             ` Alan Stern
  2009-03-08 12:37             ` [linux-pm] " Alan Stern
  2009-03-08 17:20           ` Linus Torvalds
  2009-03-08 17:20           ` Linus Torvalds
  3 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-08 10:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list,
	Arve Hjønnevåg

On Sunday 08 March 2009, Alan Stern wrote:
> On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:
> 
> > > One thing about this isn't clear: the distinction between "wake-up" 
> > > interrupts and other interrupts.
> > > 
> > > In an ideal world, the only pending interrupts during sysdev_suspend
> > > would be wake-up interrupts, because drivers would have prevented their
> > > devices from generating any other kind of IRQ and would have done all
> > > the necessary synchronization as part of their suspend (_not_
> > > suspend_late) methods.  Thus there would be no need to distinguish
> > > between wake-up and non-wake-up interrupts.
> > > 
> > > So perhaps you're worried about drivers that aren't sufficiently
> > > clever.  Or is something deeper going on?
> > 
> > Some drivers leave interrupts enabled during suspend on purpose and mark
> > them as "wake-up interrupts" so that the platform can abort suspend if any
> > of them is pending at the time the "enter suspend" hook is called (this doesn't
> > happen on x86 AFAICS).
> > 
> > However, after the $subject patch the CPU will ACK those interrupts if they
> > happen between suspend_device_irqs() and local_irq_disable(), so the platform
> > won't see them as pending.  Instead, they will have IRQ_PENDING set in
> > desc->status, so we check if this is the case.
> 
> You didn't answer my question.  Why bother to distinguish between 
> "wake-up" interrupts and non-"wake-up" interrupts?

Sorry, I thought it followed from what I wrote.

> In other words, why not simply abort the suspend if IRQ_PENDING is set
> for _any_ interrupt during sysdev_suspend()?

The "wake-up" ones are _intentionally_ left enabled, while the other ones may
be left enabled by mistake.  The check is intended to prevent the current
behavior from changing (ie. suspend is aborted if any "wake-up" interrupts
are pending) and since the platforms only check for the "wake-up" interrupts,
it doesn't go any further.  Moreover, I think it might introduce a regression
if it did.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08  3:53         ` [linux-pm] " Alan Stern
@ 2009-03-08 10:00           ` Rafael J. Wysocki
  2009-03-08 10:00           ` [linux-pm] " Rafael J. Wysocki
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-08 10:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Ingo Molnar, Linus Torvalds, Thomas Gleixner

On Sunday 08 March 2009, Alan Stern wrote:
> On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:
> 
> > > One thing about this isn't clear: the distinction between "wake-up" 
> > > interrupts and other interrupts.
> > > 
> > > In an ideal world, the only pending interrupts during sysdev_suspend
> > > would be wake-up interrupts, because drivers would have prevented their
> > > devices from generating any other kind of IRQ and would have done all
> > > the necessary synchronization as part of their suspend (_not_
> > > suspend_late) methods.  Thus there would be no need to distinguish
> > > between wake-up and non-wake-up interrupts.
> > > 
> > > So perhaps you're worried about drivers that aren't sufficiently
> > > clever.  Or is something deeper going on?
> > 
> > Some drivers leave interrupts enabled during suspend on purpose and mark
> > them as "wake-up interrupts" so that the platform can abort suspend if any
> > of them is pending at the time the "enter suspend" hook is called (this doesn't
> > happen on x86 AFAICS).
> > 
> > However, after the $subject patch the CPU will ACK those interrupts if they
> > happen between suspend_device_irqs() and local_irq_disable(), so the platform
> > won't see them as pending.  Instead, they will have IRQ_PENDING set in
> > desc->status, so we check if this is the case.
> 
> You didn't answer my question.  Why bother to distinguish between 
> "wake-up" interrupts and non-"wake-up" interrupts?

Sorry, I thought it followed from what I wrote.

> In other words, why not simply abort the suspend if IRQ_PENDING is set
> for _any_ interrupt during sysdev_suspend()?

The "wake-up" ones are _intentionally_ left enabled, while the other ones may
be left enabled by mistake.  The check is intended to prevent the current
behavior from changing (ie. suspend is aborted if any "wake-up" interrupts
are pending) and since the platforms only check for the "wake-up" interrupts,
it doesn't go any further.  Moreover, I think it might introduce a regression
if it did.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 10:00           ` [linux-pm] " Rafael J. Wysocki
  2009-03-08 12:37             ` Alan Stern
@ 2009-03-08 12:37             ` Alan Stern
  1 sibling, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-08 12:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Thomas Gleixner,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list,
	Arve Hjønnevåg

On Sun, 8 Mar 2009, Rafael J. Wysocki wrote:

> > > > So perhaps you're worried about drivers that aren't sufficiently
> > > > clever.  Or is something deeper going on?

> > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > for _any_ interrupt during sysdev_suspend()?
> 
> The "wake-up" ones are _intentionally_ left enabled, while the other ones may
> be left enabled by mistake.  The check is intended to prevent the current
> behavior from changing (ie. suspend is aborted if any "wake-up" interrupts
> are pending) and since the platforms only check for the "wake-up" interrupts,
> it doesn't go any further.  Moreover, I think it might introduce a regression
> if it did.

So it _is_ because you are worried about drivers that aren't
sufficiently clever.  If the drivers did their job correctly then there
wouldn't be any pending non-"wake-up" interrupts to confuse matters.

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 10:00           ` [linux-pm] " Rafael J. Wysocki
@ 2009-03-08 12:37             ` Alan Stern
  2009-03-08 12:37             ` [linux-pm] " Alan Stern
  1 sibling, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-08 12:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Ingo Molnar, Linus Torvalds, Thomas Gleixner

On Sun, 8 Mar 2009, Rafael J. Wysocki wrote:

> > > > So perhaps you're worried about drivers that aren't sufficiently
> > > > clever.  Or is something deeper going on?

> > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > for _any_ interrupt during sysdev_suspend()?
> 
> The "wake-up" ones are _intentionally_ left enabled, while the other ones may
> be left enabled by mistake.  The check is intended to prevent the current
> behavior from changing (ie. suspend is aborted if any "wake-up" interrupts
> are pending) and since the platforms only check for the "wake-up" interrupts,
> it doesn't go any further.  Moreover, I think it might introduce a regression
> if it did.

So it _is_ because you are worried about drivers that aren't
sufficiently clever.  If the drivers did their job correctly then there
wouldn't be any pending non-"wake-up" interrupts to confuse matters.

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08  3:53         ` [linux-pm] " Alan Stern
  2009-03-08 10:00           ` Rafael J. Wysocki
  2009-03-08 10:00           ` [linux-pm] " Rafael J. Wysocki
@ 2009-03-08 17:20           ` Linus Torvalds
  2009-03-08 20:40             ` Alan Stern
  2009-03-08 20:40             ` [linux-pm] " Alan Stern
  2009-03-08 17:20           ` Linus Torvalds
  3 siblings, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-08 17:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list,
	Arve  Hjønnevåg



On Sat, 7 Mar 2009, Alan Stern wrote:
> 
> You didn't answer my question.  Why bother to distinguish between 
> "wake-up" interrupts and non-"wake-up" interrupts?
> 
> In other words, why not simply abort the suspend if IRQ_PENDING is set
> for _any_ interrupt during sysdev_suspend()?

.. because some drivers might not actually shut down the hardware until 
they get to "suspend_late"? If even then, for that matter - a driver may 
simply not care, knowing that the hardware will be powered off, and will 
be re-initialized at resume.

The thinking that you have to shut your hardware down at "->suspend()" 
time is a _disease_. There are literally classes of hardware out there 
where that would be an outright _bug_, like for a PCI bridge device. For 
many devices, "suspend()" has to be the phase where you shut down the 
_external_ stuff (eg for a disk controller, it's when you'd flush and stop 
your disks), but the controller itself may well be alive until later.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08  3:53         ` [linux-pm] " Alan Stern
                             ` (2 preceding siblings ...)
  2009-03-08 17:20           ` Linus Torvalds
@ 2009-03-08 17:20           ` Linus Torvalds
  3 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-08 17:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar



On Sat, 7 Mar 2009, Alan Stern wrote:
> 
> You didn't answer my question.  Why bother to distinguish between 
> "wake-up" interrupts and non-"wake-up" interrupts?
> 
> In other words, why not simply abort the suspend if IRQ_PENDING is set
> for _any_ interrupt during sysdev_suspend()?

.. because some drivers might not actually shut down the hardware until 
they get to "suspend_late"? If even then, for that matter - a driver may 
simply not care, knowing that the hardware will be powered off, and will 
be re-initialized at resume.

The thinking that you have to shut your hardware down at "->suspend()" 
time is a _disease_. There are literally classes of hardware out there 
where that would be an outright _bug_, like for a PCI bridge device. For 
many devices, "suspend()" has to be the phase where you shut down the 
_external_ stuff (eg for a disk controller, it's when you'd flush and stop 
your disks), but the controller itself may well be alive until later.

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (14 preceding siblings ...)
  2009-03-07 10:27   ` Rafael J. Wysocki
@ 2009-03-08 19:28   ` Frans Pop
  2009-03-08 20:50     ` Rafael J. Wysocki
  2009-03-08 20:50     ` Rafael J. Wysocki
  2009-03-08 19:28   ` Frans Pop
  16 siblings, 2 replies; 373+ messages in thread
From: Frans Pop @ 2009-03-08 19:28 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, torvalds, linux-pm

(Most CCs dropped.)

Hi Rafael,

Rafael J. Wysocki wrote:
> The following patches modifiy the way in which we handle disabling
> interrupts during suspend and enabling them during resume.  They also
> change the ordering of the core suspend and hibernation code to take
> advantage of the new approach to the interrupts and modify the PCI PM
> core to avoid a few problems.

I've given this series a try on my HP 2510p. I've seen no regressions
with suspend to RAM.

Below is a diff between suspend/resume dmesg from before (based on rc5)
and after (rc7 + series) the patch, with some comments.
Nothing looks really wrong, but there are some surprising changes.

Essentially JFYI though.

Cheers,
FJP

    PM: Syncing filesystems ... done.
    Freezing user space processes ... (elapsed 0.00 seconds) done.
    Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
    Suspending console(s) (use no_console_suspend to debug)
    sd 0:0:0:0: [sda] Synchronizing SCSI cache
    sd 0:0:0:0: [sda] Stopping disk
    ACPI handle has no context!
    ACPI handle has no context!
    sdhci-pci 0000:02:06.2: PME# disabled
    sdhci-pci 0000:02:06.2: PCI INT C disabled
    ACPI handle has no context!
    ACPI handle has no context!
# Bogus: result of using wireless instead of wired networking.
   +iwlagn 0000:10:00.0: PCI INT A disabled
    ata2: port disabled. ignoring.
    ata_piix 0000:00:1f.1: PCI INT A disabled
    ehci_hcd 0000:00:1d.7: PCI INT A disabled
    ehci_hcd 0000:00:1d.7: PME# disabled
    uhci_hcd 0000:00:1d.2: PCI INT C disabled
    uhci_hcd 0000:00:1d.1: PCI INT B disabled
    uhci_hcd 0000:00:1d.0: PCI INT A disabled
    HDA Intel 0000:00:1b.0: PCI INT A disabled
    HDA Intel 0000:00:1b.0: power state changed by ACPI to D3
    ehci_hcd 0000:00:1a.7: PCI INT C disabled
    ehci_hcd 0000:00:1a.7: PME# disabled
    uhci_hcd 0000:00:1a.1: PCI INT B disabled
    uhci_hcd 0000:00:1a.0: PCI INT A disabled
    e1000e 0000:00:19.0: PME# enabled
    e1000e 0000:00:19.0: wake-up capability enabled by ACPI
    e1000e 0000:00:19.0: PME# enabled
    e1000e 0000:00:19.0: wake-up capability enabled by ACPI
    e1000e 0000:00:19.0: PCI INT A disabled
    ACPI handle has no context!
# This has moved up a bit. Looks more logical.
   +ricoh-mmc: Suspending.
   +ricoh-mmc: Controller is now re-enabled.
    ACPI: Preparing to enter system sleep state S3
    Disabling non-boot CPUs ...
    CPU 1 is now offline
    SMP alternatives: switching to UP code
    CPU0 attaching NULL sched-domain.
    CPU1 attaching NULL sched-domain.
    CPU0 attaching NULL sched-domain.
    CPU1 is down
   -ricoh-mmc: Suspending.
   -ricoh-mmc: Controller is now re-enabled.
    Extended CMOS year: 2000

    Back to C!
   +CPU0: Thermal monitoring enabled (TM2)
    Extended CMOS year: 2000
# This whole block has moved up before early config space restores.
# No changes in the block itself.
   +Enabling non-boot CPUs ...
   +SMP alternatives: switching to SMP code
   +Booting processor 1 APIC 0x1 ip 0x6000
   +Initializing CPU#1
   +Calibrating delay using timer specific routine.. 2660.04 BogoMIPS (lpj=5320097)
   +CPU: L1 I cache: 32K, L1 D cache: 32K
   +CPU: L2 cache: 2048K
   +[ds] using Core 2/Atom configuration
   +CPU: Physical Processor ID: 0
   +CPU: Processor Core ID: 1
   +CPU1: Thermal monitoring enabled (TM2)
   +CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
   +CPU0 attaching NULL sched-domain.
   +Switched to high resolution mode on CPU 1
   +CPU0 attaching sched-domain:
   + domain 0: span 0-1 level MC
   +  groups: 0 1
   +CPU1 attaching sched-domain:
   + domain 0: span 0-1 level MC
   +  groups: 1 0
   +CPU1 is up
   +ACPI: Waking up from system sleep state S3
    pci 0000:00:02.0: restoring config space at offset 0x8 (was 0x1, writing 0x2001)
# These don't need restoring anymore?
   -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
   -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
   -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
   -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
   -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
   -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
   -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
   -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
   -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
   -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
   -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
    serial 0000:00:03.3: restoring config space at offset 0xf (was 0x200, writing 0x20a)
    serial 0000:00:03.3: restoring config space at offset 0x5 (was 0x0, writing 0xe0601000)
    serial 0000:00:03.3: restoring config space at offset 0x4 (was 0x1, writing 0x2041)
    serial 0000:00:03.3: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00007)
    e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
    e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0x2061)
    e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xe0640000)
    e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
# These have moved down to late resume.
   -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
   -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
   -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
   -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
   -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    ehci_hcd 0000:00:1a.7: restoring config space at offset 0xf (was 0x300, writing 0x30b)
    ehci_hcd 0000:00:1a.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0641000)
    ehci_hcd 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
    HDA Intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
    HDA Intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
    HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100002)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x4010a)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x8 (was 0x0, writing 0xfff0)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x7 (was 0x0, writing 0x200000f0)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x6 (was 0x0, writing 0x80800)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x4020a)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x8 (was 0x0, writing 0xe000e000)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x7 (was 0x0, writing 0xf0)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
# These have moved down to late resume.
   -uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
   -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
   -uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
   -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
   -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
   -uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
   -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
   -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    ehci_hcd 0000:00:1d.7: restoring config space at offset 0xf (was 0x100, writing 0x10a)
    ehci_hcd 0000:00:1d.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0648000)
    ehci_hcd 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
# These have disappeared.
   -pci 0000:00:1e.0: restoring config space at offset 0x9 (was 0x10001, writing 0x83f18001)
   -pci 0000:00:1e.0: restoring config space at offset 0x8 (was 0x0, writing 0xe030e010)
   -pci 0000:00:1e.0: restoring config space at offset 0x7 (was 0x228000f0, writing 0x22803030)
   -pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100007, writing 0x100107)
# First two moved to late resume.
# The third already happened during late resume (duplicated).
   -ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   -ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
   -ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
    iwlagn 0000:10:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
    iwlagn 0000:10:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xe0000004)
    iwlagn 0000:10:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
    iwlagn 0000:10:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xf (was 0x3000100, writing 0x580010b)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xe (was 0x0, writing 0x34fc)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xd (was 0x0, writing 0x3400)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xc (was 0x0, writing 0x30fc)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xb (was 0x0, writing 0x3000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xa (was 0x0, writing 0x87fff000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x9 (was 0x0, writing 0x84000000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x8 (was 0x0, writing 0x83fff000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x7 (was 0x0, writing 0x80000000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x6 (was 0x0, writing 0xb0060302)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x4 (was 0x0, writing 0xe0100000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x3 (was 0x820000, writing 0x82a800)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100007)
    ohci1394 0000:02:06.1: restoring config space at offset 0xf (was 0x4020200, writing 0x4020205)
    ohci1394 0000:02:06.1: restoring config space at offset 0x4 (was 0x0, writing 0xe0101000)
    ohci1394 0000:02:06.1: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
    ohci1394 0000:02:06.1: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0xf (was 0x300, writing 0x30a)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0x4 (was 0x0, writing 0xe0102000)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
# Some changes; a lot just got dropped.
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0x30a)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xe (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xd (was 0x80, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xc (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xb (was 0x30c9103c, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xa (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x9 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x8 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x7 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x6 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x5 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xe0103000)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x2 (was 0x8800011, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x0 (was 0x8431180, writing 0xffffffff)
    ricoh-mmc: Resuming.
    ricoh-mmc: Controller is now disabled.
   -Enabling non-boot CPUs ...
   -SMP alternatives: switching to SMP code
   -Booting processor 1 APIC 0x1 ip 0x6000
   -Initializing CPU#1
   -Calibrating delay using timer specific routine.. 2660.07 BogoMIPS (lpj=5320158)
   -CPU: L1 I cache: 32K, L1 D cache: 32K
   -CPU: L2 cache: 2048K
   -[ds] using Core 2/Atom configuration
   -CPU: Physical Processor ID: 0
   -CPU: Processor Core ID: 1
   -CPU1: Thermal monitoring enabled (TM2)
   -x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
   -CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
   -CPU0 attaching NULL sched-domain.
   -Switched to high resolution mode on CPU 1
   -CPU0 attaching sched-domain:
   - domain 0: span 0-1 level MC
   -  groups: 0 1
   -CPU1 attaching sched-domain:
   - domain 0: span 0-1 level MC
   -  groups: 1 0
   -CPU1 is up
   -ACPI: Waking up from system sleep state S3
    ACPI: EC: non-query interrupt received, switching to interrupt mode
    pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
    pci 0000:00:02.0: PME# disabled
    pci 0000:00:02.1: PME# disabled
    pci 0000:00:03.0: PME# disabled
    pci 0000:00:03.2: PME# disabled
    e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
    e1000e 0000:00:19.0: setting latency timer to 64
    e1000e 0000:00:19.0: wake-up capability disabled by ACPI
    e1000e 0000:00:19.0: PME# disabled
    e1000e 0000:00:19.0: wake-up capability disabled by ACPI
    e1000e 0000:00:19.0: PME# disabled
    e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
   +uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
   +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
    uhci_hcd 0000:00:1a.0: setting latency timer to 64
    usb usb1: root hub lost power or was reset
   +uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
   +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
   +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
    uhci_hcd 0000:00:1a.1: setting latency timer to 64
    usb usb3: root hub lost power or was reset
    ehci_hcd 0000:00:1a.7: PME# disabled
    ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
    ehci_hcd 0000:00:1a.7: setting latency timer to 64
    ehci_hcd 0000:00:1a.7: PME# disabled
# Called twice now?
    HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
   +HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
    HDA Intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
    HDA Intel 0000:00:1b.0: setting latency timer to 64
    pcieport-driver 0000:00:1c.0: setting latency timer to 64
    pcieport-driver 0000:00:1c.1: setting latency timer to 64
   +uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
   +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
    uhci_hcd 0000:00:1d.0: setting latency timer to 64
    usb usb5: root hub lost power or was reset
   +uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
   +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
   +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22
    uhci_hcd 0000:00:1d.1: setting latency timer to 64
    usb usb6: root hub lost power or was reset
   +uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
   +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
   +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
    uhci_hcd 0000:00:1d.2: setting latency timer to 64
    usb usb7: root hub lost power or was reset
    ehci_hcd 0000:00:1d.7: PME# disabled
    ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20
    ehci_hcd 0000:00:1d.7: setting latency timer to 64
    ehci_hcd 0000:00:1d.7: PME# disabled
    pci 0000:00:1e.0: setting latency timer to 64
   +ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   +ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
    ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
    ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
    ata_piix 0000:00:1f.1: setting latency timer to 64
    ata2: port disabled. ignoring.
    ACPI Exception (exoparg2-0445): AE_AML_PACKAGE_LIMIT, Index (000000005) is beyond end of object [20081204]
    ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C2C3] (Node ffff88007e01dea0), AE_AML_PACKAGE_LIMIT
    ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C003.C0F6.C3F3._STM] (Node ffff88007e043de0), AE_AML_PACKAGE_LIMIT
    ata1: ACPI set timing mode failed (status=0x300b)
# Remaining differences are bogus: result of using wireless instead of wired networking.
   +iwlagn 0000:10:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
   +iwlagn 0000:10:00.0: irq 27 for MSI/MSI-X
    ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[19]  MMIO=[e0101000-e01017ff]  Max Packet=[2048]  IR/IT contexts=[4/4]
    sdhci-pci 0000:02:06.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20
   +Registered led device: iwl-phy0:radio
   +Registered led device: iwl-phy0:assoc
   +Registered led device: iwl-phy0:RX
   +Registered led device: iwl-phy0:TX
    sd 0:0:0:0: [sda] Starting disk
    ata1.01: ACPI cmd ef/03:0c:00:00:00:b0 filtered out
    ata1.01: ACPI cmd ef/03:40:00:00:00:b0 filtered out
    ata1.00: ACPI cmd ef/03:01:00:00:00:a0 filtered out
    ata1.00: ACPI cmd ef/03:45:00:00:00:a0 filtered out
    ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
    ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 filtered out
    ata1.00: ACPI cmd c6/00:10:00:00:00:a0 succeeded
   -e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
   -0000:00:19.0: eth0: 10/100 speed: disabling TSO
    ata1.00: configured for UDMA/100
    ata1.01: configured for MWDMA2
    ata1.00: configured for UDMA/100
    ata1.01: configured for MWDMA2
    ata1: EH complete
    sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors: (120 GB/111 GiB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
    sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors: (120 GB/111 GiB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
    sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    usb 1-1: reset full speed USB device using uhci_hcd and address 2
    usb 5-2: reset full speed USB device using uhci_hcd and address 2
    pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
    pci 0000:00:02.0: setting latency timer to 64
    Restarting tasks ... done.

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-07 10:19 ` Rafael J. Wysocki
                     ` (15 preceding siblings ...)
  2009-03-08 19:28   ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Frans Pop
@ 2009-03-08 19:28   ` Frans Pop
  16 siblings, 0 replies; 373+ messages in thread
From: Frans Pop @ 2009-03-08 19:28 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, torvalds, linux-kernel

(Most CCs dropped.)

Hi Rafael,

Rafael J. Wysocki wrote:
> The following patches modifiy the way in which we handle disabling
> interrupts during suspend and enabling them during resume.  They also
> change the ordering of the core suspend and hibernation code to take
> advantage of the new approach to the interrupts and modify the PCI PM
> core to avoid a few problems.

I've given this series a try on my HP 2510p. I've seen no regressions
with suspend to RAM.

Below is a diff between suspend/resume dmesg from before (based on rc5)
and after (rc7 + series) the patch, with some comments.
Nothing looks really wrong, but there are some surprising changes.

Essentially JFYI though.

Cheers,
FJP

    PM: Syncing filesystems ... done.
    Freezing user space processes ... (elapsed 0.00 seconds) done.
    Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
    Suspending console(s) (use no_console_suspend to debug)
    sd 0:0:0:0: [sda] Synchronizing SCSI cache
    sd 0:0:0:0: [sda] Stopping disk
    ACPI handle has no context!
    ACPI handle has no context!
    sdhci-pci 0000:02:06.2: PME# disabled
    sdhci-pci 0000:02:06.2: PCI INT C disabled
    ACPI handle has no context!
    ACPI handle has no context!
# Bogus: result of using wireless instead of wired networking.
   +iwlagn 0000:10:00.0: PCI INT A disabled
    ata2: port disabled. ignoring.
    ata_piix 0000:00:1f.1: PCI INT A disabled
    ehci_hcd 0000:00:1d.7: PCI INT A disabled
    ehci_hcd 0000:00:1d.7: PME# disabled
    uhci_hcd 0000:00:1d.2: PCI INT C disabled
    uhci_hcd 0000:00:1d.1: PCI INT B disabled
    uhci_hcd 0000:00:1d.0: PCI INT A disabled
    HDA Intel 0000:00:1b.0: PCI INT A disabled
    HDA Intel 0000:00:1b.0: power state changed by ACPI to D3
    ehci_hcd 0000:00:1a.7: PCI INT C disabled
    ehci_hcd 0000:00:1a.7: PME# disabled
    uhci_hcd 0000:00:1a.1: PCI INT B disabled
    uhci_hcd 0000:00:1a.0: PCI INT A disabled
    e1000e 0000:00:19.0: PME# enabled
    e1000e 0000:00:19.0: wake-up capability enabled by ACPI
    e1000e 0000:00:19.0: PME# enabled
    e1000e 0000:00:19.0: wake-up capability enabled by ACPI
    e1000e 0000:00:19.0: PCI INT A disabled
    ACPI handle has no context!
# This has moved up a bit. Looks more logical.
   +ricoh-mmc: Suspending.
   +ricoh-mmc: Controller is now re-enabled.
    ACPI: Preparing to enter system sleep state S3
    Disabling non-boot CPUs ...
    CPU 1 is now offline
    SMP alternatives: switching to UP code
    CPU0 attaching NULL sched-domain.
    CPU1 attaching NULL sched-domain.
    CPU0 attaching NULL sched-domain.
    CPU1 is down
   -ricoh-mmc: Suspending.
   -ricoh-mmc: Controller is now re-enabled.
    Extended CMOS year: 2000

    Back to C!
   +CPU0: Thermal monitoring enabled (TM2)
    Extended CMOS year: 2000
# This whole block has moved up before early config space restores.
# No changes in the block itself.
   +Enabling non-boot CPUs ...
   +SMP alternatives: switching to SMP code
   +Booting processor 1 APIC 0x1 ip 0x6000
   +Initializing CPU#1
   +Calibrating delay using timer specific routine.. 2660.04 BogoMIPS (lpj=5320097)
   +CPU: L1 I cache: 32K, L1 D cache: 32K
   +CPU: L2 cache: 2048K
   +[ds] using Core 2/Atom configuration
   +CPU: Physical Processor ID: 0
   +CPU: Processor Core ID: 1
   +CPU1: Thermal monitoring enabled (TM2)
   +CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
   +CPU0 attaching NULL sched-domain.
   +Switched to high resolution mode on CPU 1
   +CPU0 attaching sched-domain:
   + domain 0: span 0-1 level MC
   +  groups: 0 1
   +CPU1 attaching sched-domain:
   + domain 0: span 0-1 level MC
   +  groups: 1 0
   +CPU1 is up
   +ACPI: Waking up from system sleep state S3
    pci 0000:00:02.0: restoring config space at offset 0x8 (was 0x1, writing 0x2001)
# These don't need restoring anymore?
   -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
   -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
   -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
   -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
   -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
   -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
   -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
   -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
   -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
   -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
   -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
    serial 0000:00:03.3: restoring config space at offset 0xf (was 0x200, writing 0x20a)
    serial 0000:00:03.3: restoring config space at offset 0x5 (was 0x0, writing 0xe0601000)
    serial 0000:00:03.3: restoring config space at offset 0x4 (was 0x1, writing 0x2041)
    serial 0000:00:03.3: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00007)
    e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
    e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0x2061)
    e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xe0640000)
    e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
# These have moved down to late resume.
   -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
   -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
   -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
   -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
   -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    ehci_hcd 0000:00:1a.7: restoring config space at offset 0xf (was 0x300, writing 0x30b)
    ehci_hcd 0000:00:1a.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0641000)
    ehci_hcd 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
    HDA Intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
    HDA Intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
    HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100002)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x4010a)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x8 (was 0x0, writing 0xfff0)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x7 (was 0x0, writing 0x200000f0)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x6 (was 0x0, writing 0x80800)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
    pcieport-driver 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x4020a)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x8 (was 0x0, writing 0xe000e000)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x7 (was 0x0, writing 0xf0)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
    pcieport-driver 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
# These have moved down to late resume.
   -uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
   -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
   -uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
   -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
   -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
   -uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
   -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
   -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    ehci_hcd 0000:00:1d.7: restoring config space at offset 0xf (was 0x100, writing 0x10a)
    ehci_hcd 0000:00:1d.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0648000)
    ehci_hcd 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
# These have disappeared.
   -pci 0000:00:1e.0: restoring config space at offset 0x9 (was 0x10001, writing 0x83f18001)
   -pci 0000:00:1e.0: restoring config space at offset 0x8 (was 0x0, writing 0xe030e010)
   -pci 0000:00:1e.0: restoring config space at offset 0x7 (was 0x228000f0, writing 0x22803030)
   -pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100007, writing 0x100107)
# First two moved to late resume.
# The third already happened during late resume (duplicated).
   -ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   -ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
   -ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
    iwlagn 0000:10:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
    iwlagn 0000:10:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xe0000004)
    iwlagn 0000:10:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
    iwlagn 0000:10:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xf (was 0x3000100, writing 0x580010b)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xe (was 0x0, writing 0x34fc)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xd (was 0x0, writing 0x3400)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xc (was 0x0, writing 0x30fc)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xb (was 0x0, writing 0x3000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0xa (was 0x0, writing 0x87fff000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x9 (was 0x0, writing 0x84000000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x8 (was 0x0, writing 0x83fff000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x7 (was 0x0, writing 0x80000000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x6 (was 0x0, writing 0xb0060302)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x4 (was 0x0, writing 0xe0100000)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x3 (was 0x820000, writing 0x82a800)
    yenta_cardbus 0000:02:06.0: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100007)
    ohci1394 0000:02:06.1: restoring config space at offset 0xf (was 0x4020200, writing 0x4020205)
    ohci1394 0000:02:06.1: restoring config space at offset 0x4 (was 0x0, writing 0xe0101000)
    ohci1394 0000:02:06.1: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
    ohci1394 0000:02:06.1: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0xf (was 0x300, writing 0x30a)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0x4 (was 0x0, writing 0xe0102000)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
    sdhci-pci 0000:02:06.2: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
# Some changes; a lot just got dropped.
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0x30a)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xe (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xd (was 0x80, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xc (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xb (was 0x30c9103c, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xa (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x9 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x8 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x7 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x6 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x5 (was 0x0, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xe0103000)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x2 (was 0x8800011, writing 0xffffffff)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0xffffffff)
   +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
   -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x0 (was 0x8431180, writing 0xffffffff)
    ricoh-mmc: Resuming.
    ricoh-mmc: Controller is now disabled.
   -Enabling non-boot CPUs ...
   -SMP alternatives: switching to SMP code
   -Booting processor 1 APIC 0x1 ip 0x6000
   -Initializing CPU#1
   -Calibrating delay using timer specific routine.. 2660.07 BogoMIPS (lpj=5320158)
   -CPU: L1 I cache: 32K, L1 D cache: 32K
   -CPU: L2 cache: 2048K
   -[ds] using Core 2/Atom configuration
   -CPU: Physical Processor ID: 0
   -CPU: Processor Core ID: 1
   -CPU1: Thermal monitoring enabled (TM2)
   -x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
   -CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
   -CPU0 attaching NULL sched-domain.
   -Switched to high resolution mode on CPU 1
   -CPU0 attaching sched-domain:
   - domain 0: span 0-1 level MC
   -  groups: 0 1
   -CPU1 attaching sched-domain:
   - domain 0: span 0-1 level MC
   -  groups: 1 0
   -CPU1 is up
   -ACPI: Waking up from system sleep state S3
    ACPI: EC: non-query interrupt received, switching to interrupt mode
    pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
    pci 0000:00:02.0: PME# disabled
    pci 0000:00:02.1: PME# disabled
    pci 0000:00:03.0: PME# disabled
    pci 0000:00:03.2: PME# disabled
    e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
    e1000e 0000:00:19.0: setting latency timer to 64
    e1000e 0000:00:19.0: wake-up capability disabled by ACPI
    e1000e 0000:00:19.0: PME# disabled
    e1000e 0000:00:19.0: wake-up capability disabled by ACPI
    e1000e 0000:00:19.0: PME# disabled
    e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
   +uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
   +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
    uhci_hcd 0000:00:1a.0: setting latency timer to 64
    usb usb1: root hub lost power or was reset
   +uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
   +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
   +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
    uhci_hcd 0000:00:1a.1: setting latency timer to 64
    usb usb3: root hub lost power or was reset
    ehci_hcd 0000:00:1a.7: PME# disabled
    ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
    ehci_hcd 0000:00:1a.7: setting latency timer to 64
    ehci_hcd 0000:00:1a.7: PME# disabled
# Called twice now?
    HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
   +HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
    HDA Intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
    HDA Intel 0000:00:1b.0: setting latency timer to 64
    pcieport-driver 0000:00:1c.0: setting latency timer to 64
    pcieport-driver 0000:00:1c.1: setting latency timer to 64
   +uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
   +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
    uhci_hcd 0000:00:1d.0: setting latency timer to 64
    usb usb5: root hub lost power or was reset
   +uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
   +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
   +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22
    uhci_hcd 0000:00:1d.1: setting latency timer to 64
    usb usb6: root hub lost power or was reset
   +uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
   +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
   +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
    uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
    uhci_hcd 0000:00:1d.2: setting latency timer to 64
    usb usb7: root hub lost power or was reset
    ehci_hcd 0000:00:1d.7: PME# disabled
    ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20
    ehci_hcd 0000:00:1d.7: setting latency timer to 64
    ehci_hcd 0000:00:1d.7: PME# disabled
    pci 0000:00:1e.0: setting latency timer to 64
   +ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
   +ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
    ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
    ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
    ata_piix 0000:00:1f.1: setting latency timer to 64
    ata2: port disabled. ignoring.
    ACPI Exception (exoparg2-0445): AE_AML_PACKAGE_LIMIT, Index (000000005) is beyond end of object [20081204]
    ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C2C3] (Node ffff88007e01dea0), AE_AML_PACKAGE_LIMIT
    ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C003.C0F6.C3F3._STM] (Node ffff88007e043de0), AE_AML_PACKAGE_LIMIT
    ata1: ACPI set timing mode failed (status=0x300b)
# Remaining differences are bogus: result of using wireless instead of wired networking.
   +iwlagn 0000:10:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
   +iwlagn 0000:10:00.0: irq 27 for MSI/MSI-X
    ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[19]  MMIO=[e0101000-e01017ff]  Max Packet=[2048]  IR/IT contexts=[4/4]
    sdhci-pci 0000:02:06.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20
   +Registered led device: iwl-phy0:radio
   +Registered led device: iwl-phy0:assoc
   +Registered led device: iwl-phy0:RX
   +Registered led device: iwl-phy0:TX
    sd 0:0:0:0: [sda] Starting disk
    ata1.01: ACPI cmd ef/03:0c:00:00:00:b0 filtered out
    ata1.01: ACPI cmd ef/03:40:00:00:00:b0 filtered out
    ata1.00: ACPI cmd ef/03:01:00:00:00:a0 filtered out
    ata1.00: ACPI cmd ef/03:45:00:00:00:a0 filtered out
    ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
    ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 filtered out
    ata1.00: ACPI cmd c6/00:10:00:00:00:a0 succeeded
   -e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
   -0000:00:19.0: eth0: 10/100 speed: disabling TSO
    ata1.00: configured for UDMA/100
    ata1.01: configured for MWDMA2
    ata1.00: configured for UDMA/100
    ata1.01: configured for MWDMA2
    ata1: EH complete
    sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors: (120 GB/111 GiB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
    sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors: (120 GB/111 GiB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
    sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    usb 1-1: reset full speed USB device using uhci_hcd and address 2
    usb 5-2: reset full speed USB device using uhci_hcd and address 2
    pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
    pci 0000:00:02.0: setting latency timer to 64
    Restarting tasks ... done.

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 17:20           ` Linus Torvalds
  2009-03-08 20:40             ` Alan Stern
@ 2009-03-08 20:40             ` Alan Stern
  2009-03-08 21:37               ` Rafael J. Wysocki
                                 ` (3 more replies)
  1 sibling, 4 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-08 20:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list,
	Arve  Hjønnevåg

On Sun, 8 Mar 2009, Linus Torvalds wrote:

> On Sat, 7 Mar 2009, Alan Stern wrote:
> > 
> > You didn't answer my question.  Why bother to distinguish between 
> > "wake-up" interrupts and non-"wake-up" interrupts?
> > 
> > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > for _any_ interrupt during sysdev_suspend()?
> 
> .. because some drivers might not actually shut down the hardware until 
> they get to "suspend_late"? If even then, for that matter - a driver may 
> simply not care, knowing that the hardware will be powered off, and will 
> be re-initialized at resume.
> 
> The thinking that you have to shut your hardware down at "->suspend()" 
> time is a _disease_. There are literally classes of hardware out there 
> where that would be an outright _bug_, like for a PCI bridge device. For 
> many devices, "suspend()" has to be the phase where you shut down the 
> _external_ stuff (eg for a disk controller, it's when you'd flush and stop 
> your disks), but the controller itself may well be alive until later.

Yes, certainly.  I agree completely.

But there is a difference between shutting down the hardware and merely
preventing it from generating interrupt requests.  If a device remains
capable of generating IRQs after its driver's suspend method has run,
the driver runs the risk of having its handler called at a time when it
isn't prepared to cope correctly.  Of course, this will depend on the
details of how the driver is written.

There have been examples in the past of devices that, for one reason or
another, _did_ generate IRQs at inconvenient times.  The hardware or
the BIOS may have done improper initialization, for example.  On a
shared IRQ this led to interrupt storms.  IIRC, the solution was to add
a PCI quirk routine to disable IRQ generation at an early stage.  
Didn't e100 have this problem?

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 17:20           ` Linus Torvalds
@ 2009-03-08 20:40             ` Alan Stern
  2009-03-08 20:40             ` [linux-pm] " Alan Stern
  1 sibling, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-08 20:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar

On Sun, 8 Mar 2009, Linus Torvalds wrote:

> On Sat, 7 Mar 2009, Alan Stern wrote:
> > 
> > You didn't answer my question.  Why bother to distinguish between 
> > "wake-up" interrupts and non-"wake-up" interrupts?
> > 
> > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > for _any_ interrupt during sysdev_suspend()?
> 
> .. because some drivers might not actually shut down the hardware until 
> they get to "suspend_late"? If even then, for that matter - a driver may 
> simply not care, knowing that the hardware will be powered off, and will 
> be re-initialized at resume.
> 
> The thinking that you have to shut your hardware down at "->suspend()" 
> time is a _disease_. There are literally classes of hardware out there 
> where that would be an outright _bug_, like for a PCI bridge device. For 
> many devices, "suspend()" has to be the phase where you shut down the 
> _external_ stuff (eg for a disk controller, it's when you'd flush and stop 
> your disks), but the controller itself may well be alive until later.

Yes, certainly.  I agree completely.

But there is a difference between shutting down the hardware and merely
preventing it from generating interrupt requests.  If a device remains
capable of generating IRQs after its driver's suspend method has run,
the driver runs the risk of having its handler called at a time when it
isn't prepared to cope correctly.  Of course, this will depend on the
details of how the driver is written.

There have been examples in the past of devices that, for one reason or
another, _did_ generate IRQs at inconvenient times.  The hardware or
the BIOS may have done improper initialization, for example.  On a
shared IRQ this led to interrupt storms.  IIRC, the solution was to add
a PCI quirk routine to disable IRQ generation at an early stage.  
Didn't e100 have this problem?

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-08 19:28   ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Frans Pop
  2009-03-08 20:50     ` Rafael J. Wysocki
@ 2009-03-08 20:50     ` Rafael J. Wysocki
  2009-03-14  8:44       ` Frans Pop
  2009-03-14  8:44       ` Frans Pop
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-08 20:50 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, torvalds, linux-pm

On Sunday 08 March 2009, Frans Pop wrote:
> (Most CCs dropped.)
> 
> Hi Rafael,

Hi Frans,

> Rafael J. Wysocki wrote:
> > The following patches modifiy the way in which we handle disabling
> > interrupts during suspend and enabling them during resume.  They also
> > change the ordering of the core suspend and hibernation code to take
> > advantage of the new approach to the interrupts and modify the PCI PM
> > core to avoid a few problems.
> 
> I've given this series a try on my HP 2510p. I've seen no regressions
> with suspend to RAM.

Great, thanks for testing!

> Below is a diff between suspend/resume dmesg from before (based on rc5)
> and after (rc7 + series) the patch, with some comments.
> Nothing looks really wrong, but there are some surprising changes.
> 
> Essentially JFYI though.
> 
> Cheers,
> FJP
> 
>     PM: Syncing filesystems ... done.
>     Freezing user space processes ... (elapsed 0.00 seconds) done.
>     Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
>     Suspending console(s) (use no_console_suspend to debug)
>     sd 0:0:0:0: [sda] Synchronizing SCSI cache
>     sd 0:0:0:0: [sda] Stopping disk
>     ACPI handle has no context!
>     ACPI handle has no context!
>     sdhci-pci 0000:02:06.2: PME# disabled
>     sdhci-pci 0000:02:06.2: PCI INT C disabled
>     ACPI handle has no context!
>     ACPI handle has no context!
> # Bogus: result of using wireless instead of wired networking.
>    +iwlagn 0000:10:00.0: PCI INT A disabled
>     ata2: port disabled. ignoring.
>     ata_piix 0000:00:1f.1: PCI INT A disabled
>     ehci_hcd 0000:00:1d.7: PCI INT A disabled
>     ehci_hcd 0000:00:1d.7: PME# disabled
>     uhci_hcd 0000:00:1d.2: PCI INT C disabled
>     uhci_hcd 0000:00:1d.1: PCI INT B disabled
>     uhci_hcd 0000:00:1d.0: PCI INT A disabled
>     HDA Intel 0000:00:1b.0: PCI INT A disabled
>     HDA Intel 0000:00:1b.0: power state changed by ACPI to D3
>     ehci_hcd 0000:00:1a.7: PCI INT C disabled
>     ehci_hcd 0000:00:1a.7: PME# disabled
>     uhci_hcd 0000:00:1a.1: PCI INT B disabled
>     uhci_hcd 0000:00:1a.0: PCI INT A disabled
>     e1000e 0000:00:19.0: PME# enabled
>     e1000e 0000:00:19.0: wake-up capability enabled by ACPI
>     e1000e 0000:00:19.0: PME# enabled
>     e1000e 0000:00:19.0: wake-up capability enabled by ACPI
>     e1000e 0000:00:19.0: PCI INT A disabled
>     ACPI handle has no context!
> # This has moved up a bit. Looks more logical.

This is a result of patch 2/8, intentional.

>    +ricoh-mmc: Suspending.
>    +ricoh-mmc: Controller is now re-enabled.
>     ACPI: Preparing to enter system sleep state S3
>     Disabling non-boot CPUs ...
>     CPU 1 is now offline
>     SMP alternatives: switching to UP code
>     CPU0 attaching NULL sched-domain.
>     CPU1 attaching NULL sched-domain.
>     CPU0 attaching NULL sched-domain.
>     CPU1 is down
>    -ricoh-mmc: Suspending.
>    -ricoh-mmc: Controller is now re-enabled.
>     Extended CMOS year: 2000
> 
>     Back to C!
>    +CPU0: Thermal monitoring enabled (TM2)
>     Extended CMOS year: 2000
> # This whole block has moved up before early config space restores.
> # No changes in the block itself.

Yes this also is an intentional result of patch 2/8.

>    +Enabling non-boot CPUs ...
>    +SMP alternatives: switching to SMP code
>    +Booting processor 1 APIC 0x1 ip 0x6000
>    +Initializing CPU#1
>    +Calibrating delay using timer specific routine.. 2660.04 BogoMIPS (lpj=5320097)
>    +CPU: L1 I cache: 32K, L1 D cache: 32K
>    +CPU: L2 cache: 2048K
>    +[ds] using Core 2/Atom configuration
>    +CPU: Physical Processor ID: 0
>    +CPU: Processor Core ID: 1
>    +CPU1: Thermal monitoring enabled (TM2)
>    +CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
>    +CPU0 attaching NULL sched-domain.
>    +Switched to high resolution mode on CPU 1
>    +CPU0 attaching sched-domain:
>    + domain 0: span 0-1 level MC
>    +  groups: 0 1
>    +CPU1 attaching sched-domain:
>    + domain 0: span 0-1 level MC
>    +  groups: 1 0
>    +CPU1 is up
>    +ACPI: Waking up from system sleep state S3
>     pci 0000:00:02.0: restoring config space at offset 0x8 (was 0x1, writing 0x2001)
> # These don't need restoring anymore?

I think they generally do, but the restored values may (and often are)
identical to the current ones.

>    -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
>    -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
>    -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
>    -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
>    -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>    -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
>    -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
>    -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
>    -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
>    -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
>    -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
>     serial 0000:00:03.3: restoring config space at offset 0xf (was 0x200, writing 0x20a)
>     serial 0000:00:03.3: restoring config space at offset 0x5 (was 0x0, writing 0xe0601000)
>     serial 0000:00:03.3: restoring config space at offset 0x4 (was 0x1, writing 0x2041)
>     serial 0000:00:03.3: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00007)
>     e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
>     e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0x2061)
>     e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xe0640000)
>     e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> # These have moved down to late resume.

That's a bit strange.  It looks like the registers changed after we had
restored them during "early" resume.  So either we hadn't actually restored
them (it would be interesting to find out why), or they really changed (again,
it would be interesting to see why).

>    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
>    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
>    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
>    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     ehci_hcd 0000:00:1a.7: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>     ehci_hcd 0000:00:1a.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0641000)
>     ehci_hcd 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
>     HDA Intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>     HDA Intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
>     HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100002)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x4010a)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x8 (was 0x0, writing 0xfff0)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x7 (was 0x0, writing 0x200000f0)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x6 (was 0x0, writing 0x80800)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x4020a)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x8 (was 0x0, writing 0xe000e000)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x7 (was 0x0, writing 0xf0)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> # These have moved down to late resume.

The last comment applies here too.

>    -uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
>    -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>    -uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
>    -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
>    -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>    -uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>    -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
>    -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     ehci_hcd 0000:00:1d.7: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>     ehci_hcd 0000:00:1d.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0648000)
>     ehci_hcd 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
> # These have disappeared.

Good.

>    -pci 0000:00:1e.0: restoring config space at offset 0x9 (was 0x10001, writing 0x83f18001)
>    -pci 0000:00:1e.0: restoring config space at offset 0x8 (was 0x0, writing 0xe030e010)
>    -pci 0000:00:1e.0: restoring config space at offset 0x7 (was 0x228000f0, writing 0x22803030)
>    -pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100007, writing 0x100107)
> # First two moved to late resume.

Again, a bit strange.

> # The third already happened during late resume (duplicated).
>    -ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    -ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
>    -ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
>     iwlagn 0000:10:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>     iwlagn 0000:10:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xe0000004)
>     iwlagn 0000:10:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
>     iwlagn 0000:10:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xf (was 0x3000100, writing 0x580010b)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xe (was 0x0, writing 0x34fc)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xd (was 0x0, writing 0x3400)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xc (was 0x0, writing 0x30fc)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xb (was 0x0, writing 0x3000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xa (was 0x0, writing 0x87fff000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x9 (was 0x0, writing 0x84000000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x8 (was 0x0, writing 0x83fff000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x7 (was 0x0, writing 0x80000000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x6 (was 0x0, writing 0xb0060302)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x4 (was 0x0, writing 0xe0100000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x3 (was 0x820000, writing 0x82a800)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100007)
>     ohci1394 0000:02:06.1: restoring config space at offset 0xf (was 0x4020200, writing 0x4020205)
>     ohci1394 0000:02:06.1: restoring config space at offset 0x4 (was 0x0, writing 0xe0101000)
>     ohci1394 0000:02:06.1: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
>     ohci1394 0000:02:06.1: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0xf (was 0x300, writing 0x30a)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0x4 (was 0x0, writing 0xe0102000)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
> # Some changes; a lot just got dropped.
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0x30a)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xe (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xd (was 0x80, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xc (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xb (was 0x30c9103c, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xa (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x9 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x8 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x7 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x6 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x5 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xe0103000)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x2 (was 0x8800011, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x0 (was 0x8431180, writing 0xffffffff)
>     ricoh-mmc: Resuming.
>     ricoh-mmc: Controller is now disabled.
>    -Enabling non-boot CPUs ...
>    -SMP alternatives: switching to SMP code
>    -Booting processor 1 APIC 0x1 ip 0x6000
>    -Initializing CPU#1
>    -Calibrating delay using timer specific routine.. 2660.07 BogoMIPS (lpj=5320158)
>    -CPU: L1 I cache: 32K, L1 D cache: 32K
>    -CPU: L2 cache: 2048K
>    -[ds] using Core 2/Atom configuration
>    -CPU: Physical Processor ID: 0
>    -CPU: Processor Core ID: 1
>    -CPU1: Thermal monitoring enabled (TM2)
>    -x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
>    -CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
>    -CPU0 attaching NULL sched-domain.
>    -Switched to high resolution mode on CPU 1
>    -CPU0 attaching sched-domain:
>    - domain 0: span 0-1 level MC
>    -  groups: 0 1
>    -CPU1 attaching sched-domain:
>    - domain 0: span 0-1 level MC
>    -  groups: 1 0
>    -CPU1 is up
>    -ACPI: Waking up from system sleep state S3
>     ACPI: EC: non-query interrupt received, switching to interrupt mode
>     pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
>     pci 0000:00:02.0: PME# disabled
>     pci 0000:00:02.1: PME# disabled
>     pci 0000:00:03.0: PME# disabled
>     pci 0000:00:03.2: PME# disabled
>     e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
>     e1000e 0000:00:19.0: setting latency timer to 64
>     e1000e 0000:00:19.0: wake-up capability disabled by ACPI
>     e1000e 0000:00:19.0: PME# disabled
>     e1000e 0000:00:19.0: wake-up capability disabled by ACPI
>     e1000e 0000:00:19.0: PME# disabled
>     e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
>    +uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
>    +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>     uhci_hcd 0000:00:1a.0: setting latency timer to 64
>     usb usb1: root hub lost power or was reset
>    +uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
>    +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
>    +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
>     uhci_hcd 0000:00:1a.1: setting latency timer to 64
>     usb usb3: root hub lost power or was reset
>     ehci_hcd 0000:00:1a.7: PME# disabled
>     ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
>     ehci_hcd 0000:00:1a.7: setting latency timer to 64
>     ehci_hcd 0000:00:1a.7: PME# disabled
> # Called twice now?
>     HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
>    +HDA Intel 0000:00:1b.0: power state changed by ACPI to D0

Yeah, it's not nice.  The problem is that pci_set_power_state() doesn't
check if the power state is already correct before calling the platform to
change it.  The platform should cope with that, but it shouldn't be called
for the second time at all.

In fact I have a patch to change this behavior, but I consider it as a separate
thing.

>     HDA Intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
>     HDA Intel 0000:00:1b.0: setting latency timer to 64
>     pcieport-driver 0000:00:1c.0: setting latency timer to 64
>     pcieport-driver 0000:00:1c.1: setting latency timer to 64
>    +uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
>    +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
>     uhci_hcd 0000:00:1d.0: setting latency timer to 64
>     usb usb5: root hub lost power or was reset
>    +uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
>    +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
>    +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22
>     uhci_hcd 0000:00:1d.1: setting latency timer to 64
>     usb usb6: root hub lost power or was reset
>    +uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>    +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
>    +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
>     uhci_hcd 0000:00:1d.2: setting latency timer to 64
>     usb usb7: root hub lost power or was reset
>     ehci_hcd 0000:00:1d.7: PME# disabled
>     ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20
>     ehci_hcd 0000:00:1d.7: setting latency timer to 64
>     ehci_hcd 0000:00:1d.7: PME# disabled
>     pci 0000:00:1e.0: setting latency timer to 64
>    +ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    +ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
>     ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
>     ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>     ata_piix 0000:00:1f.1: setting latency timer to 64
>     ata2: port disabled. ignoring.
>     ACPI Exception (exoparg2-0445): AE_AML_PACKAGE_LIMIT, Index (000000005) is beyond end of object [20081204]
>     ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C2C3] (Node ffff88007e01dea0), AE_AML_PACKAGE_LIMIT
>     ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C003.C0F6.C3F3._STM] (Node ffff88007e043de0), AE_AML_PACKAGE_LIMIT
>     ata1: ACPI set timing mode failed (status=0x300b)
> # Remaining differences are bogus: result of using wireless instead of wired networking.

OK

Thanks for the debugging work.

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-08 19:28   ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Frans Pop
@ 2009-03-08 20:50     ` Rafael J. Wysocki
  2009-03-08 20:50     ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-08 20:50 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-pm, torvalds, linux-kernel

On Sunday 08 March 2009, Frans Pop wrote:
> (Most CCs dropped.)
> 
> Hi Rafael,

Hi Frans,

> Rafael J. Wysocki wrote:
> > The following patches modifiy the way in which we handle disabling
> > interrupts during suspend and enabling them during resume.  They also
> > change the ordering of the core suspend and hibernation code to take
> > advantage of the new approach to the interrupts and modify the PCI PM
> > core to avoid a few problems.
> 
> I've given this series a try on my HP 2510p. I've seen no regressions
> with suspend to RAM.

Great, thanks for testing!

> Below is a diff between suspend/resume dmesg from before (based on rc5)
> and after (rc7 + series) the patch, with some comments.
> Nothing looks really wrong, but there are some surprising changes.
> 
> Essentially JFYI though.
> 
> Cheers,
> FJP
> 
>     PM: Syncing filesystems ... done.
>     Freezing user space processes ... (elapsed 0.00 seconds) done.
>     Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
>     Suspending console(s) (use no_console_suspend to debug)
>     sd 0:0:0:0: [sda] Synchronizing SCSI cache
>     sd 0:0:0:0: [sda] Stopping disk
>     ACPI handle has no context!
>     ACPI handle has no context!
>     sdhci-pci 0000:02:06.2: PME# disabled
>     sdhci-pci 0000:02:06.2: PCI INT C disabled
>     ACPI handle has no context!
>     ACPI handle has no context!
> # Bogus: result of using wireless instead of wired networking.
>    +iwlagn 0000:10:00.0: PCI INT A disabled
>     ata2: port disabled. ignoring.
>     ata_piix 0000:00:1f.1: PCI INT A disabled
>     ehci_hcd 0000:00:1d.7: PCI INT A disabled
>     ehci_hcd 0000:00:1d.7: PME# disabled
>     uhci_hcd 0000:00:1d.2: PCI INT C disabled
>     uhci_hcd 0000:00:1d.1: PCI INT B disabled
>     uhci_hcd 0000:00:1d.0: PCI INT A disabled
>     HDA Intel 0000:00:1b.0: PCI INT A disabled
>     HDA Intel 0000:00:1b.0: power state changed by ACPI to D3
>     ehci_hcd 0000:00:1a.7: PCI INT C disabled
>     ehci_hcd 0000:00:1a.7: PME# disabled
>     uhci_hcd 0000:00:1a.1: PCI INT B disabled
>     uhci_hcd 0000:00:1a.0: PCI INT A disabled
>     e1000e 0000:00:19.0: PME# enabled
>     e1000e 0000:00:19.0: wake-up capability enabled by ACPI
>     e1000e 0000:00:19.0: PME# enabled
>     e1000e 0000:00:19.0: wake-up capability enabled by ACPI
>     e1000e 0000:00:19.0: PCI INT A disabled
>     ACPI handle has no context!
> # This has moved up a bit. Looks more logical.

This is a result of patch 2/8, intentional.

>    +ricoh-mmc: Suspending.
>    +ricoh-mmc: Controller is now re-enabled.
>     ACPI: Preparing to enter system sleep state S3
>     Disabling non-boot CPUs ...
>     CPU 1 is now offline
>     SMP alternatives: switching to UP code
>     CPU0 attaching NULL sched-domain.
>     CPU1 attaching NULL sched-domain.
>     CPU0 attaching NULL sched-domain.
>     CPU1 is down
>    -ricoh-mmc: Suspending.
>    -ricoh-mmc: Controller is now re-enabled.
>     Extended CMOS year: 2000
> 
>     Back to C!
>    +CPU0: Thermal monitoring enabled (TM2)
>     Extended CMOS year: 2000
> # This whole block has moved up before early config space restores.
> # No changes in the block itself.

Yes this also is an intentional result of patch 2/8.

>    +Enabling non-boot CPUs ...
>    +SMP alternatives: switching to SMP code
>    +Booting processor 1 APIC 0x1 ip 0x6000
>    +Initializing CPU#1
>    +Calibrating delay using timer specific routine.. 2660.04 BogoMIPS (lpj=5320097)
>    +CPU: L1 I cache: 32K, L1 D cache: 32K
>    +CPU: L2 cache: 2048K
>    +[ds] using Core 2/Atom configuration
>    +CPU: Physical Processor ID: 0
>    +CPU: Processor Core ID: 1
>    +CPU1: Thermal monitoring enabled (TM2)
>    +CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
>    +CPU0 attaching NULL sched-domain.
>    +Switched to high resolution mode on CPU 1
>    +CPU0 attaching sched-domain:
>    + domain 0: span 0-1 level MC
>    +  groups: 0 1
>    +CPU1 attaching sched-domain:
>    + domain 0: span 0-1 level MC
>    +  groups: 1 0
>    +CPU1 is up
>    +ACPI: Waking up from system sleep state S3
>     pci 0000:00:02.0: restoring config space at offset 0x8 (was 0x1, writing 0x2001)
> # These don't need restoring anymore?

I think they generally do, but the restored values may (and often are)
identical to the current ones.

>    -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
>    -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
>    -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
>    -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
>    -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>    -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
>    -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
>    -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
>    -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
>    -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
>    -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
>     serial 0000:00:03.3: restoring config space at offset 0xf (was 0x200, writing 0x20a)
>     serial 0000:00:03.3: restoring config space at offset 0x5 (was 0x0, writing 0xe0601000)
>     serial 0000:00:03.3: restoring config space at offset 0x4 (was 0x1, writing 0x2041)
>     serial 0000:00:03.3: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00007)
>     e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
>     e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0x2061)
>     e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xe0640000)
>     e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> # These have moved down to late resume.

That's a bit strange.  It looks like the registers changed after we had
restored them during "early" resume.  So either we hadn't actually restored
them (it would be interesting to find out why), or they really changed (again,
it would be interesting to see why).

>    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
>    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
>    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
>    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     ehci_hcd 0000:00:1a.7: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>     ehci_hcd 0000:00:1a.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0641000)
>     ehci_hcd 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
>     HDA Intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>     HDA Intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
>     HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100002)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x4010a)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x8 (was 0x0, writing 0xfff0)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x7 (was 0x0, writing 0x200000f0)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x6 (was 0x0, writing 0x80800)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
>     pcieport-driver 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x4020a)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x8 (was 0x0, writing 0xe000e000)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x7 (was 0x0, writing 0xf0)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
>     pcieport-driver 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> # These have moved down to late resume.

The last comment applies here too.

>    -uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
>    -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>    -uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
>    -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
>    -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>    -uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>    -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
>    -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     ehci_hcd 0000:00:1d.7: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>     ehci_hcd 0000:00:1d.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0648000)
>     ehci_hcd 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
> # These have disappeared.

Good.

>    -pci 0000:00:1e.0: restoring config space at offset 0x9 (was 0x10001, writing 0x83f18001)
>    -pci 0000:00:1e.0: restoring config space at offset 0x8 (was 0x0, writing 0xe030e010)
>    -pci 0000:00:1e.0: restoring config space at offset 0x7 (was 0x228000f0, writing 0x22803030)
>    -pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100007, writing 0x100107)
> # First two moved to late resume.

Again, a bit strange.

> # The third already happened during late resume (duplicated).
>    -ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    -ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
>    -ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
>     iwlagn 0000:10:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>     iwlagn 0000:10:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xe0000004)
>     iwlagn 0000:10:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
>     iwlagn 0000:10:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xf (was 0x3000100, writing 0x580010b)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xe (was 0x0, writing 0x34fc)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xd (was 0x0, writing 0x3400)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xc (was 0x0, writing 0x30fc)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xb (was 0x0, writing 0x3000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0xa (was 0x0, writing 0x87fff000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x9 (was 0x0, writing 0x84000000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x8 (was 0x0, writing 0x83fff000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x7 (was 0x0, writing 0x80000000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x6 (was 0x0, writing 0xb0060302)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x4 (was 0x0, writing 0xe0100000)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x3 (was 0x820000, writing 0x82a800)
>     yenta_cardbus 0000:02:06.0: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100007)
>     ohci1394 0000:02:06.1: restoring config space at offset 0xf (was 0x4020200, writing 0x4020205)
>     ohci1394 0000:02:06.1: restoring config space at offset 0x4 (was 0x0, writing 0xe0101000)
>     ohci1394 0000:02:06.1: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
>     ohci1394 0000:02:06.1: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0xf (was 0x300, writing 0x30a)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0x4 (was 0x0, writing 0xe0102000)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
>     sdhci-pci 0000:02:06.2: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
> # Some changes; a lot just got dropped.
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0x30a)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xe (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xd (was 0x80, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xc (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xb (was 0x30c9103c, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xa (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x9 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x8 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x7 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x6 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x5 (was 0x0, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xe0103000)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x2 (was 0x8800011, writing 0xffffffff)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0xffffffff)
>    +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
>    -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x0 (was 0x8431180, writing 0xffffffff)
>     ricoh-mmc: Resuming.
>     ricoh-mmc: Controller is now disabled.
>    -Enabling non-boot CPUs ...
>    -SMP alternatives: switching to SMP code
>    -Booting processor 1 APIC 0x1 ip 0x6000
>    -Initializing CPU#1
>    -Calibrating delay using timer specific routine.. 2660.07 BogoMIPS (lpj=5320158)
>    -CPU: L1 I cache: 32K, L1 D cache: 32K
>    -CPU: L2 cache: 2048K
>    -[ds] using Core 2/Atom configuration
>    -CPU: Physical Processor ID: 0
>    -CPU: Processor Core ID: 1
>    -CPU1: Thermal monitoring enabled (TM2)
>    -x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
>    -CPU1: Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz stepping 0d
>    -CPU0 attaching NULL sched-domain.
>    -Switched to high resolution mode on CPU 1
>    -CPU0 attaching sched-domain:
>    - domain 0: span 0-1 level MC
>    -  groups: 0 1
>    -CPU1 attaching sched-domain:
>    - domain 0: span 0-1 level MC
>    -  groups: 1 0
>    -CPU1 is up
>    -ACPI: Waking up from system sleep state S3
>     ACPI: EC: non-query interrupt received, switching to interrupt mode
>     pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
>     pci 0000:00:02.0: PME# disabled
>     pci 0000:00:02.1: PME# disabled
>     pci 0000:00:03.0: PME# disabled
>     pci 0000:00:03.2: PME# disabled
>     e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
>     e1000e 0000:00:19.0: setting latency timer to 64
>     e1000e 0000:00:19.0: wake-up capability disabled by ACPI
>     e1000e 0000:00:19.0: PME# disabled
>     e1000e 0000:00:19.0: wake-up capability disabled by ACPI
>     e1000e 0000:00:19.0: PME# disabled
>     e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
>    +uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
>    +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>     uhci_hcd 0000:00:1a.0: setting latency timer to 64
>     usb usb1: root hub lost power or was reset
>    +uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
>    +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
>    +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
>     uhci_hcd 0000:00:1a.1: setting latency timer to 64
>     usb usb3: root hub lost power or was reset
>     ehci_hcd 0000:00:1a.7: PME# disabled
>     ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
>     ehci_hcd 0000:00:1a.7: setting latency timer to 64
>     ehci_hcd 0000:00:1a.7: PME# disabled
> # Called twice now?
>     HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
>    +HDA Intel 0000:00:1b.0: power state changed by ACPI to D0

Yeah, it's not nice.  The problem is that pci_set_power_state() doesn't
check if the power state is already correct before calling the platform to
change it.  The platform should cope with that, but it shouldn't be called
for the second time at all.

In fact I have a patch to change this behavior, but I consider it as a separate
thing.

>     HDA Intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
>     HDA Intel 0000:00:1b.0: setting latency timer to 64
>     pcieport-driver 0000:00:1c.0: setting latency timer to 64
>     pcieport-driver 0000:00:1c.1: setting latency timer to 64
>    +uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
>    +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
>     uhci_hcd 0000:00:1d.0: setting latency timer to 64
>     usb usb5: root hub lost power or was reset
>    +uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
>    +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
>    +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22
>     uhci_hcd 0000:00:1d.1: setting latency timer to 64
>     usb usb6: root hub lost power or was reset
>    +uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
>    +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
>    +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>     uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
>     uhci_hcd 0000:00:1d.2: setting latency timer to 64
>     usb usb7: root hub lost power or was reset
>     ehci_hcd 0000:00:1d.7: PME# disabled
>     ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20
>     ehci_hcd 0000:00:1d.7: setting latency timer to 64
>     ehci_hcd 0000:00:1d.7: PME# disabled
>     pci 0000:00:1e.0: setting latency timer to 64
>    +ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
>    +ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
>     ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
>     ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>     ata_piix 0000:00:1f.1: setting latency timer to 64
>     ata2: port disabled. ignoring.
>     ACPI Exception (exoparg2-0445): AE_AML_PACKAGE_LIMIT, Index (000000005) is beyond end of object [20081204]
>     ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C2C3] (Node ffff88007e01dea0), AE_AML_PACKAGE_LIMIT
>     ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C003.C0F6.C3F3._STM] (Node ffff88007e043de0), AE_AML_PACKAGE_LIMIT
>     ata1: ACPI set timing mode failed (status=0x300b)
> # Remaining differences are bogus: result of using wireless instead of wired networking.

OK

Thanks for the debugging work.

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 20:40             ` [linux-pm] " Alan Stern
@ 2009-03-08 21:37               ` Rafael J. Wysocki
  2009-03-08 21:37               ` Rafael J. Wysocki
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-08 21:37 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list,
	Arve Hjønnevåg

On Sunday 08 March 2009, Alan Stern wrote:
> On Sun, 8 Mar 2009, Linus Torvalds wrote:
> 
> > On Sat, 7 Mar 2009, Alan Stern wrote:
> > > 
> > > You didn't answer my question.  Why bother to distinguish between 
> > > "wake-up" interrupts and non-"wake-up" interrupts?
> > > 
> > > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > > for _any_ interrupt during sysdev_suspend()?
> > 
> > .. because some drivers might not actually shut down the hardware until 
> > they get to "suspend_late"? If even then, for that matter - a driver may 
> > simply not care, knowing that the hardware will be powered off, and will 
> > be re-initialized at resume.
> > 
> > The thinking that you have to shut your hardware down at "->suspend()" 
> > time is a _disease_. There are literally classes of hardware out there 
> > where that would be an outright _bug_, like for a PCI bridge device. For 
> > many devices, "suspend()" has to be the phase where you shut down the 
> > _external_ stuff (eg for a disk controller, it's when you'd flush and stop 
> > your disks), but the controller itself may well be alive until later.
> 
> Yes, certainly.  I agree completely.
> 
> But there is a difference between shutting down the hardware and merely
> preventing it from generating interrupt requests.  If a device remains
> capable of generating IRQs after its driver's suspend method has run,
> the driver runs the risk of having its handler called at a time when it
> isn't prepared to cope correctly.  Of course, this will depend on the
> details of how the driver is written.
> 
> There have been examples in the past of devices that, for one reason or
> another, _did_ generate IRQs at inconvenient times.  The hardware or
> the BIOS may have done improper initialization, for example.  On a
> shared IRQ this led to interrupt storms.

Well, we're now trying to fix exactly this problem. :-)

>  IIRC, the solution was to add a PCI quirk routine to disable IRQ generation
>  at an early stage.   Didn't e100 have this problem?

I don't remember, sorry.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 20:40             ` [linux-pm] " Alan Stern
  2009-03-08 21:37               ` Rafael J. Wysocki
@ 2009-03-08 21:37               ` Rafael J. Wysocki
  2009-03-09 14:59               ` Linus Torvalds
  2009-03-09 14:59               ` [linux-pm] " Linus Torvalds
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-08 21:37 UTC (permalink / raw)
  To: Alan Stern
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Linus Torvalds, Ingo Molnar

On Sunday 08 March 2009, Alan Stern wrote:
> On Sun, 8 Mar 2009, Linus Torvalds wrote:
> 
> > On Sat, 7 Mar 2009, Alan Stern wrote:
> > > 
> > > You didn't answer my question.  Why bother to distinguish between 
> > > "wake-up" interrupts and non-"wake-up" interrupts?
> > > 
> > > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > > for _any_ interrupt during sysdev_suspend()?
> > 
> > .. because some drivers might not actually shut down the hardware until 
> > they get to "suspend_late"? If even then, for that matter - a driver may 
> > simply not care, knowing that the hardware will be powered off, and will 
> > be re-initialized at resume.
> > 
> > The thinking that you have to shut your hardware down at "->suspend()" 
> > time is a _disease_. There are literally classes of hardware out there 
> > where that would be an outright _bug_, like for a PCI bridge device. For 
> > many devices, "suspend()" has to be the phase where you shut down the 
> > _external_ stuff (eg for a disk controller, it's when you'd flush and stop 
> > your disks), but the controller itself may well be alive until later.
> 
> Yes, certainly.  I agree completely.
> 
> But there is a difference between shutting down the hardware and merely
> preventing it from generating interrupt requests.  If a device remains
> capable of generating IRQs after its driver's suspend method has run,
> the driver runs the risk of having its handler called at a time when it
> isn't prepared to cope correctly.  Of course, this will depend on the
> details of how the driver is written.
> 
> There have been examples in the past of devices that, for one reason or
> another, _did_ generate IRQs at inconvenient times.  The hardware or
> the BIOS may have done improper initialization, for example.  On a
> shared IRQ this led to interrupt storms.

Well, we're now trying to fix exactly this problem. :-)

>  IIRC, the solution was to add a PCI quirk routine to disable IRQ generation
>  at an early stage.   Didn't e100 have this problem?

I don't remember, sorry.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 20:40             ` [linux-pm] " Alan Stern
                                 ` (2 preceding siblings ...)
  2009-03-09 14:59               ` Linus Torvalds
@ 2009-03-09 14:59               ` Linus Torvalds
  2009-03-09 15:13                 ` Alan Stern
  2009-03-09 15:13                 ` Alan Stern
  3 siblings, 2 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-09 14:59 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list,
	Arve  Hjønnevåg



On Sun, 8 Mar 2009, Alan Stern wrote:
> 
> There have been examples in the past of devices that, for one reason or
> another, _did_ generate IRQs at inconvenient times.  The hardware or
> the BIOS may have done improper initialization, for example.  On a
> shared IRQ this led to interrupt storms.  IIRC, the solution was to add
> a PCI quirk routine to disable IRQ generation at an early stage.  
> Didn't e100 have this problem?

.. and this is exactly the reason why we've done all these changes.

There are tons of drivers that are unable to cope with interrupts that 
happen after they've done their "pci_set_power_state(PCI_D3hot)".

With shared interrupts (and _another_ device still live), they do stupid 
things like read the interrupt status register, getting all-ones (because 
the device is dead), and then deciding that that means that that need to 
handle the interrupt. And that goes on in a loop. Forever.

Or they do _that_ part right, but their suspend also free'd some data 
structure, so now the interrupt handler will follow a NULL pointer and/or 
scribble to freed memory. The source of bugs is infinite, and not fixable 
(because, quite frankly, most device driver writers are very focused on 
the hardware, and have a hard time thinking about it as part of the bigger 
system - and even if they happen test suspend/resume, they probably won't 
be testing it with shared interrupts, so it will work _for_them_ even if 
it's totally broken).

So what all the PCI changes try to do is to basically not have the driver 
do the "pci_set_power_state(PCI_D3)" at _all_, an do it in the PCI layer. 
But more importantly, it needs to be done _after_ interrupts have been 
disabled for this all to work. And, for exactly the same reason, the PCI 
layer needs to wake the device up and restore its config space _before_ 
enabling interrupts again, and _before_ doing any ->resume calls.

And that, in turn, means that since we have all these ACPI ordering 
things, and many cases want to use ACPI to wake things up, and/or have 
delays etc, we end up actually wanting things like timer interrupts 
working at that time - but not normal "device" interrupts. Because many 
delays do need them, even as simple delays as the (fairly short, but not 
"busy loop" short) one for turning the device back into PCI_D0 again.

So this literally explains all the re-ordering, and all the interrupt 
games we now play in Rafael's patch-set. The _whole_ (and only) point is 
to make it easier for device drivers, while also changing the environment 
so that we can call ACPI and we can sleep even before the devices have 
really resumed (or even early_resume'd).

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-08 20:40             ` [linux-pm] " Alan Stern
  2009-03-08 21:37               ` Rafael J. Wysocki
  2009-03-08 21:37               ` Rafael J. Wysocki
@ 2009-03-09 14:59               ` Linus Torvalds
  2009-03-09 14:59               ` [linux-pm] " Linus Torvalds
  3 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-09 14:59 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar



On Sun, 8 Mar 2009, Alan Stern wrote:
> 
> There have been examples in the past of devices that, for one reason or
> another, _did_ generate IRQs at inconvenient times.  The hardware or
> the BIOS may have done improper initialization, for example.  On a
> shared IRQ this led to interrupt storms.  IIRC, the solution was to add
> a PCI quirk routine to disable IRQ generation at an early stage.  
> Didn't e100 have this problem?

.. and this is exactly the reason why we've done all these changes.

There are tons of drivers that are unable to cope with interrupts that 
happen after they've done their "pci_set_power_state(PCI_D3hot)".

With shared interrupts (and _another_ device still live), they do stupid 
things like read the interrupt status register, getting all-ones (because 
the device is dead), and then deciding that that means that that need to 
handle the interrupt. And that goes on in a loop. Forever.

Or they do _that_ part right, but their suspend also free'd some data 
structure, so now the interrupt handler will follow a NULL pointer and/or 
scribble to freed memory. The source of bugs is infinite, and not fixable 
(because, quite frankly, most device driver writers are very focused on 
the hardware, and have a hard time thinking about it as part of the bigger 
system - and even if they happen test suspend/resume, they probably won't 
be testing it with shared interrupts, so it will work _for_them_ even if 
it's totally broken).

So what all the PCI changes try to do is to basically not have the driver 
do the "pci_set_power_state(PCI_D3)" at _all_, an do it in the PCI layer. 
But more importantly, it needs to be done _after_ interrupts have been 
disabled for this all to work. And, for exactly the same reason, the PCI 
layer needs to wake the device up and restore its config space _before_ 
enabling interrupts again, and _before_ doing any ->resume calls.

And that, in turn, means that since we have all these ACPI ordering 
things, and many cases want to use ACPI to wake things up, and/or have 
delays etc, we end up actually wanting things like timer interrupts 
working at that time - but not normal "device" interrupts. Because many 
delays do need them, even as simple delays as the (fairly short, but not 
"busy loop" short) one for turning the device back into PCI_D0 again.

So this literally explains all the re-ordering, and all the interrupt 
games we now play in Rafael's patch-set. The _whole_ (and only) point is 
to make it easier for device drivers, while also changing the environment 
so that we can call ACPI and we can sleep even before the devices have 
really resumed (or even early_resume'd).

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-09 14:59               ` [linux-pm] " Linus Torvalds
@ 2009-03-09 15:13                 ` Alan Stern
  2009-03-09 15:40                   ` Linus Torvalds
  2009-03-09 15:40                   ` [linux-pm] " Linus Torvalds
  2009-03-09 15:13                 ` Alan Stern
  1 sibling, 2 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-09 15:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list,
	Arve  Hjønnevåg

On Mon, 9 Mar 2009, Linus Torvalds wrote:

> On Sun, 8 Mar 2009, Alan Stern wrote:
> > 
> > There have been examples in the past of devices that, for one reason or
> > another, _did_ generate IRQs at inconvenient times.  The hardware or
> > the BIOS may have done improper initialization, for example.  On a
> > shared IRQ this led to interrupt storms.  IIRC, the solution was to add
> > a PCI quirk routine to disable IRQ generation at an early stage.  
> > Didn't e100 have this problem?
> 
> .. and this is exactly the reason why we've done all these changes.
> 
> There are tons of drivers that are unable to cope with interrupts that 
> happen after they've done their "pci_set_power_state(PCI_D3hot)".
> 
> With shared interrupts (and _another_ device still live), they do stupid 
> things like read the interrupt status register, getting all-ones (because 
> the device is dead), and then deciding that that means that that need to 
> handle the interrupt. And that goes on in a loop. Forever.
> 
> Or they do _that_ part right, but their suspend also free'd some data 
> structure, so now the interrupt handler will follow a NULL pointer and/or 
> scribble to freed memory. The source of bugs is infinite, and not fixable 
> (because, quite frankly, most device driver writers are very focused on 
> the hardware, and have a hard time thinking about it as part of the bigger 
> system - and even if they happen test suspend/resume, they probably won't 
> be testing it with shared interrupts, so it will work _for_them_ even if 
> it's totally broken).
> 
> So what all the PCI changes try to do is to basically not have the driver 
> do the "pci_set_power_state(PCI_D3)" at _all_, an do it in the PCI layer. 
> But more importantly, it needs to be done _after_ interrupts have been 
> disabled for this all to work. And, for exactly the same reason, the PCI 
> layer needs to wake the device up and restore its config space _before_ 
> enabling interrupts again, and _before_ doing any ->resume calls.
> 
> And that, in turn, means that since we have all these ACPI ordering 
> things, and many cases want to use ACPI to wake things up, and/or have 
> delays etc, we end up actually wanting things like timer interrupts 
> working at that time - but not normal "device" interrupts. Because many 
> delays do need them, even as simple delays as the (fairly short, but not 
> "busy loop" short) one for turning the device back into PCI_D0 again.
> 
> So this literally explains all the re-ordering, and all the interrupt 
> games we now play in Rafael's patch-set. The _whole_ (and only) point is 
> to make it easier for device drivers, while also changing the environment 
> so that we can call ACPI and we can sleep even before the devices have 
> really resumed (or even early_resume'd).

I see.  The unstated key point is this:

	Unsophisticated drivers can still be expected to work if they
	get an interrupt after their suspend method has run, _provided_
	the device is still in D0.  Likewise, unsophisticated drivers 
	can be expected to fail if they get an interrupt after the 
	device has been put in D3.

Hence you don't require drivers to disable interrupt generation in 
their suspend methods, and you do prevent interrupts from being 
delivered to drivers before changing device power states.

And hence you also go to some trouble to distinguish between IRQs
which might be received merely because the driver didn't bother to
suppress them vs. IRQs which indicate a genuine wakeup request.

Got it.

Alan Stern


^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-09 14:59               ` [linux-pm] " Linus Torvalds
  2009-03-09 15:13                 ` Alan Stern
@ 2009-03-09 15:13                 ` Alan Stern
  1 sibling, 0 replies; 373+ messages in thread
From: Alan Stern @ 2009-03-09 15:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar

On Mon, 9 Mar 2009, Linus Torvalds wrote:

> On Sun, 8 Mar 2009, Alan Stern wrote:
> > 
> > There have been examples in the past of devices that, for one reason or
> > another, _did_ generate IRQs at inconvenient times.  The hardware or
> > the BIOS may have done improper initialization, for example.  On a
> > shared IRQ this led to interrupt storms.  IIRC, the solution was to add
> > a PCI quirk routine to disable IRQ generation at an early stage.  
> > Didn't e100 have this problem?
> 
> .. and this is exactly the reason why we've done all these changes.
> 
> There are tons of drivers that are unable to cope with interrupts that 
> happen after they've done their "pci_set_power_state(PCI_D3hot)".
> 
> With shared interrupts (and _another_ device still live), they do stupid 
> things like read the interrupt status register, getting all-ones (because 
> the device is dead), and then deciding that that means that that need to 
> handle the interrupt. And that goes on in a loop. Forever.
> 
> Or they do _that_ part right, but their suspend also free'd some data 
> structure, so now the interrupt handler will follow a NULL pointer and/or 
> scribble to freed memory. The source of bugs is infinite, and not fixable 
> (because, quite frankly, most device driver writers are very focused on 
> the hardware, and have a hard time thinking about it as part of the bigger 
> system - and even if they happen test suspend/resume, they probably won't 
> be testing it with shared interrupts, so it will work _for_them_ even if 
> it's totally broken).
> 
> So what all the PCI changes try to do is to basically not have the driver 
> do the "pci_set_power_state(PCI_D3)" at _all_, an do it in the PCI layer. 
> But more importantly, it needs to be done _after_ interrupts have been 
> disabled for this all to work. And, for exactly the same reason, the PCI 
> layer needs to wake the device up and restore its config space _before_ 
> enabling interrupts again, and _before_ doing any ->resume calls.
> 
> And that, in turn, means that since we have all these ACPI ordering 
> things, and many cases want to use ACPI to wake things up, and/or have 
> delays etc, we end up actually wanting things like timer interrupts 
> working at that time - but not normal "device" interrupts. Because many 
> delays do need them, even as simple delays as the (fairly short, but not 
> "busy loop" short) one for turning the device back into PCI_D0 again.
> 
> So this literally explains all the re-ordering, and all the interrupt 
> games we now play in Rafael's patch-set. The _whole_ (and only) point is 
> to make it easier for device drivers, while also changing the environment 
> so that we can call ACPI and we can sleep even before the devices have 
> really resumed (or even early_resume'd).

I see.  The unstated key point is this:

	Unsophisticated drivers can still be expected to work if they
	get an interrupt after their suspend method has run, _provided_
	the device is still in D0.  Likewise, unsophisticated drivers 
	can be expected to fail if they get an interrupt after the 
	device has been put in D3.

Hence you don't require drivers to disable interrupt generation in 
their suspend methods, and you do prevent interrupts from being 
delivered to drivers before changing device power states.

And hence you also go to some trouble to distinguish between IRQs
which might be received merely because the driver didn't bother to
suppress them vs. IRQs which indicate a genuine wakeup request.

Got it.

Alan Stern

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-09 15:13                 ` Alan Stern
  2009-03-09 15:40                   ` Linus Torvalds
@ 2009-03-09 15:40                   ` Linus Torvalds
  1 sibling, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-09 15:40 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Jeremy Fitzhardinge, LKML, Jesse Barnes,
	Thomas Gleixner, Eric W. Biederman, Ingo Molnar, pm list,
	Arve  Hjønnevåg



On Mon, 9 Mar 2009, Alan Stern wrote:
> 
> I see.  The unstated key point is this:
> 
> 	Unsophisticated drivers [...]

Another key point is:

 - _un_sophisticated is the norm, and anybody who expects otherwise is 
   living in some odd la-la-land together with his or her pink unicorn and 
   endless supplies of quaaludes.

The thing is, we have about a metric sh*tload of drivers, and many of them 
are effectively written by people who don't really do kernel work, and are
basically unmaintained in the long run (ie they may be maintained while 
written, but two years down the line they have a couple of hundred users 
and nobody who really cares about it, because the original author long 
since moved on to fancier hardware).

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-09 15:13                 ` Alan Stern
@ 2009-03-09 15:40                   ` Linus Torvalds
  2009-03-09 15:40                   ` [linux-pm] " Linus Torvalds
  1 sibling, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-09 15:40 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	pm list, Thomas Gleixner, Ingo Molnar



On Mon, 9 Mar 2009, Alan Stern wrote:
> 
> I see.  The unstated key point is this:
> 
> 	Unsophisticated drivers [...]

Another key point is:

 - _un_sophisticated is the norm, and anybody who expects otherwise is 
   living in some odd la-la-land together with his or her pink unicorn and 
   endless supplies of quaaludes.

The thing is, we have about a metric sh*tload of drivers, and many of them 
are effectively written by people who don't really do kernel work, and are
basically unmaintained in the long run (ie they may be maintained while 
written, but two years down the line they have a couple of hundred users 
and nobody who really cares about it, because the original author long 
since moved on to fancier hardware).

			Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 0/10] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated)
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (12 preceding siblings ...)
  (?)
@ 2009-03-11  9:30 ` Rafael J. Wysocki
  2009-03-11  9:36   ` [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
                     ` (18 more replies)
  -1 siblings, 19 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:30 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

Hi,

The last iteration of this series of patches didn't draw comments except for
the discussion about the "wake-up" interrupts, so here's an update that I'd
like to consider as more-or-less final.  I would also like to use a separate
'suspend' git tree for merging these patches, if you don't mind.

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/10 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

2/10 - 4/10 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

5/10 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

6/10 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

7/10 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

8/10 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.
[The second rev of the patch retains the current behavior during the
"power-off" phase of hibernation, which is that the devices without drivers or
without PM support in the drivers are not power managed by the core.]

9/10 fixes pci_set_power_state() so that it doesn't return error code when
attempting to put a PCI device without PM support (either native or through the
platform) into D0 (such devices are always in D0).

10/10 makes the PCI PM core save and restore the configuration spaces of
devices that have no drivers or no PM support in the drivers during suspend and
resume, respectively.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 0/10] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated)
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (11 preceding siblings ...)
  (?)
@ 2009-03-11  9:30 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:30 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

Hi,

The last iteration of this series of patches didn't draw comments except for
the discussion about the "wake-up" interrupts, so here's an update that I'd
like to consider as more-or-less final.  I would also like to use a separate
'suspend' git tree for merging these patches, if you don't mind.

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/10 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

2/10 - 4/10 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

5/10 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

6/10 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

7/10 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

8/10 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.
[The second rev of the patch retains the current behavior during the
"power-off" phase of hibernation, which is that the devices without drivers or
without PM support in the drivers are not power managed by the core.]

9/10 fixes pci_set_power_state() so that it doesn't return error code when
attempting to put a PCI device without PM support (either native or through the
platform) into D0 (such devices are always in D0).

10/10 makes the PCI PM core save and restore the configuration spaces of
devices that have no drivers or no PM support in the drivers during suspend and
resume, respectively.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11  9:30 ` Rafael J. Wysocki
  2009-03-11  9:36   ` [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
@ 2009-03-11  9:36   ` Rafael J. Wysocki
  2009-03-11 10:33     ` Thomas Gleixner
  2009-03-11 10:33     ` Thomas Gleixner
  2009-03-11  9:37   ` [PATCH 2/10] PM: Change suspend code ordering Rafael J. Wysocki
                     ` (16 subsequent siblings)
  18 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:36 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++--
 drivers/base/power/main.c |   20 +++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 ++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    1 
 kernel/irq/manage.c       |    2 -
 kernel/irq/pm.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++------
 kernel/power/main.c       |   17 +++++---
 13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		bool sync = false;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			if (!desc->depth++) {
+				desc->status |= IRQ_DISABLED;
+				desc->chip->disable(irq);
+				sync = true;
+			}
+			desc->status |= IRQ_SUSPENDED;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (sync)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		desc->status &= ~IRQ_SUSPENDED;
+		__enable_irq(desc, irq);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11  9:30 ` Rafael J. Wysocki
@ 2009-03-11  9:36   ` Rafael J. Wysocki
  2009-03-11  9:36   ` Rafael J. Wysocki
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:36 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++--
 drivers/base/power/main.c |   20 +++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 ++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    1 
 kernel/irq/manage.c       |    2 -
 kernel/irq/pm.c           |   89 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++------
 kernel/power/main.c       |   17 +++++---
 13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+		bool sync = false;
+
+		spin_lock_irqsave(&desc->lock, flags);
+
+		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+			if (!desc->depth++) {
+				desc->status |= IRQ_DISABLED;
+				desc->chip->disable(irq);
+				sync = true;
+			}
+			desc->status |= IRQ_SUSPENDED;
+		}
+
+		spin_unlock_irqrestore(&desc->lock, flags);
+
+		if (sync)
+			synchronize_irq(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		desc->status &= ~IRQ_SUSPENDED;
+		__enable_irq(desc, irq);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 2/10] PM: Change suspend code ordering
  2009-03-11  9:30 ` Rafael J. Wysocki
  2009-03-11  9:36   ` [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
  2009-03-11  9:36   ` Rafael J. Wysocki
@ 2009-03-11  9:37   ` Rafael J. Wysocki
  2009-03-11  9:37   ` Rafael J. Wysocki
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:37 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 2/10] PM: Change suspend code ordering
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-03-11  9:37   ` [PATCH 2/10] PM: Change suspend code ordering Rafael J. Wysocki
@ 2009-03-11  9:37   ` Rafael J. Wysocki
  2009-03-11  9:38   ` [PATCH 3/10] PM: Change hibernation " Rafael J. Wysocki
                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:37 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 3/10] PM: Change hibernation code ordering
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2009-03-11  9:37   ` Rafael J. Wysocki
@ 2009-03-11  9:38   ` Rafael J. Wysocki
  2009-03-11  9:38   ` Rafael J. Wysocki
                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:38 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 3/10] PM: Change hibernation code ordering
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (4 preceding siblings ...)
  2009-03-11  9:38   ` [PATCH 3/10] PM: Change hibernation " Rafael J. Wysocki
@ 2009-03-11  9:38   ` Rafael J. Wysocki
  2009-03-11  9:39     ` Rafael J. Wysocki
                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:38 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 4/10] kexec: Change kexec jump code ordering
  2009-03-11  9:30 ` Rafael J. Wysocki
@ 2009-03-11  9:39     ` Rafael J. Wysocki
  2009-03-11  9:36   ` Rafael J. Wysocki
                       ` (17 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:39 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 4/10] kexec: Change kexec jump code ordering
@ 2009-03-11  9:39     ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:39 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 5/10] PCI PM: Consistently use variable name "error" for pm call return values
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (7 preceding siblings ...)
  2009-03-11  9:41   ` [PATCH 5/10] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
@ 2009-03-11  9:41   ` Rafael J. Wysocki
  2009-03-11  9:42   ` [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:41 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Frans Pop <elendil@planet.nl>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 
 		pci_dev->state_saved = false;
 
-		i = drv->suspend(pci_dev, state);
-		suspend_report_result(drv->suspend, i);
-		if (i)
-			return i;
+		error = drv->suspend(pci_dev, state);
+		suspend_report_result(drv->suspend, error);
+		if (error)
+			return error;
 
 		if (pci_dev->state_saved)
 			goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return i;
+	return error;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend_late) {
-		i = drv->suspend_late(pci_dev, state);
-		suspend_report_result(drv->suspend_late, i);
+		error = drv->suspend_late(pci_dev, state);
+		suspend_report_result(drv->suspend_late, error);
 	}
-	return i;
+	return error;
 }
 
 static int pci_legacy_resume_early(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 5/10] PCI PM: Consistently use variable name "error" for pm call return values
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (6 preceding siblings ...)
  2009-03-11  9:39     ` Rafael J. Wysocki
@ 2009-03-11  9:41   ` Rafael J. Wysocki
  2009-03-11  9:41   ` Rafael J. Wysocki
                     ` (10 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:41 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Frans Pop <elendil@planet.nl>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 
 		pci_dev->state_saved = false;
 
-		i = drv->suspend(pci_dev, state);
-		suspend_report_result(drv->suspend, i);
-		if (i)
-			return i;
+		error = drv->suspend(pci_dev, state);
+		suspend_report_result(drv->suspend, error);
+		if (error)
+			return error;
 
 		if (pci_dev->state_saved)
 			goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return i;
+	return error;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend_late) {
-		i = drv->suspend_late(pci_dev, state);
-		suspend_report_result(drv->suspend_late, i);
+		error = drv->suspend_late(pci_dev, state);
+		suspend_report_result(drv->suspend_late, error);
 	}
-	return i;
+	return error;
 }
 
 static int pci_legacy_resume_early(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (8 preceding siblings ...)
  2009-03-11  9:41   ` Rafael J. Wysocki
@ 2009-03-11  9:42   ` Rafael J. Wysocki
  2009-03-11  9:42   ` Rafael J. Wysocki
                     ` (8 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:42 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time.  Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up.  Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   48 +++++++++---------------------------------------
 1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
  *                           given PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
 	bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
 		break;
 	case PCI_UNKNOWN: /* Boot-up */
 		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
 			need_restore = true;
-			wait = true;
-		}
 		/* Fall-through: force to D0 */
 	default:
 		pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
 	/* enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
-	if (!wait)
-		return 0;
-
 	/* Mandatory power management transition delays */
 	/* see PCI PM 1.1 5.6.1 table 18 */
 	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
 	if (need_restore)
 		pci_restore_bars(dev);
 
-	if (wait && dev->bus->self)
+	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
 	if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	error = pci_raw_set_power_state(dev, state, true);
+	error = pci_raw_set_power_state(dev, state);
 
 	if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
 		/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
  */
 int pci_restore_standard_config(struct pci_dev *dev)
 {
-	pci_power_t prev_state;
-	int error;
-
-	pci_update_current_state(dev, PCI_D0);
-
-	prev_state = dev->current_state;
-	if (prev_state == PCI_D0)
-		goto Restore;
-
-	error = pci_raw_set_power_state(dev, PCI_D0, false);
-	if (error)
-		return error;
+	pci_update_current_state(dev, PCI_UNKNOWN);
 
-	/*
-	 * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
-	 * we've made this assumption forever and it appears to be universally
-	 * satisfied.
-	 */
-	switch(prev_state) {
-	case PCI_D3cold:
-	case PCI_D3hot:
-		mdelay(pci_pm_d3_delay);
-		break;
-	case PCI_D2:
-		udelay(PCI_PM_D2_DELAY);
-		break;
+	if (dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(dev, PCI_D0);
+		if (error)
+			return error;
 	}
 
-	pci_update_current_state(dev, PCI_D0);
-
- Restore:
 	return dev->state_saved ? pci_restore_state(dev) : 0;
 }
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (9 preceding siblings ...)
  2009-03-11  9:42   ` [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
@ 2009-03-11  9:42   ` Rafael J. Wysocki
  2009-03-11  9:47   ` [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
                     ` (7 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:42 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time.  Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up.  Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   48 +++++++++---------------------------------------
 1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
  *                           given PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
 	bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
 		break;
 	case PCI_UNKNOWN: /* Boot-up */
 		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
 			need_restore = true;
-			wait = true;
-		}
 		/* Fall-through: force to D0 */
 	default:
 		pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
 	/* enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
-	if (!wait)
-		return 0;
-
 	/* Mandatory power management transition delays */
 	/* see PCI PM 1.1 5.6.1 table 18 */
 	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
 	if (need_restore)
 		pci_restore_bars(dev);
 
-	if (wait && dev->bus->self)
+	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
 	if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	error = pci_raw_set_power_state(dev, state, true);
+	error = pci_raw_set_power_state(dev, state);
 
 	if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
 		/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
  */
 int pci_restore_standard_config(struct pci_dev *dev)
 {
-	pci_power_t prev_state;
-	int error;
-
-	pci_update_current_state(dev, PCI_D0);
-
-	prev_state = dev->current_state;
-	if (prev_state == PCI_D0)
-		goto Restore;
-
-	error = pci_raw_set_power_state(dev, PCI_D0, false);
-	if (error)
-		return error;
+	pci_update_current_state(dev, PCI_UNKNOWN);
 
-	/*
-	 * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
-	 * we've made this assumption forever and it appears to be universally
-	 * satisfied.
-	 */
-	switch(prev_state) {
-	case PCI_D3cold:
-	case PCI_D3hot:
-		mdelay(pci_pm_d3_delay);
-		break;
-	case PCI_D2:
-		udelay(PCI_PM_D2_DELAY);
-		break;
+	if (dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(dev, PCI_D0);
+		if (error)
+			return error;
 	}
 
-	pci_update_current_state(dev, PCI_D0);
-
- Restore:
 	return dev->state_saved ? pci_restore_state(dev) : 0;
 }
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (11 preceding siblings ...)
  2009-03-11  9:47   ` [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
@ 2009-03-11  9:47   ` Rafael J. Wysocki
  2009-03-11  9:48   ` [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:47 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   17 +++++++++++++++++
 drivers/pci/pci.c        |   21 ---------------------
 drivers/pci/pci.h        |    1 -
 3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi
 
 /* Auxiliary functions used by the new power management framework */
 
+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+	pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+	if (pci_dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(pci_dev, PCI_D0);
+		if (error)
+			return error;
+	}
+
+	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
 	pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
 }
 
 /**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
-	pci_update_current_state(dev, PCI_UNKNOWN);
-
-	if (dev->current_state != PCI_D0) {
-		int error = pci_set_power_state(dev, PCI_D0);
-		if (error)
-			return error;
-	}
-
-	return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
  * pci_enable_ari - enable ARI forwarding if hardware support it
  * @dev: the PCI device
  */
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
 extern void pci_pm_init(struct pci_dev *dev);
 extern void platform_pci_wakeup_init(struct pci_dev *dev);
 extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);
 
 static inline bool pci_is_bridge(struct pci_dev *pci_dev)
 {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (10 preceding siblings ...)
  2009-03-11  9:42   ` Rafael J. Wysocki
@ 2009-03-11  9:47   ` Rafael J. Wysocki
  2009-03-11  9:47   ` Rafael J. Wysocki
                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:47 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   17 +++++++++++++++++
 drivers/pci/pci.c        |   21 ---------------------
 drivers/pci/pci.h        |    1 -
 3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi
 
 /* Auxiliary functions used by the new power management framework */
 
+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+	pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+	if (pci_dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(pci_dev, PCI_D0);
+		if (error)
+			return error;
+	}
+
+	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
 	pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
 }
 
 /**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
-	pci_update_current_state(dev, PCI_UNKNOWN);
-
-	if (dev->current_state != PCI_D0) {
-		int error = pci_set_power_state(dev, PCI_D0);
-		if (error)
-			return error;
-	}
-
-	return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
  * pci_enable_ari - enable ARI forwarding if hardware support it
  * @dev: the PCI device
  */
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
 extern void pci_pm_init(struct pci_dev *dev);
 extern void platform_pci_wakeup_init(struct pci_dev *dev);
 extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);
 
 static inline bool pci_is_bridge(struct pci_dev *pci_dev)
 {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2)
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (13 preceding siblings ...)
  2009-03-11  9:48   ` [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
@ 2009-03-11  9:48   ` Rafael J. Wysocki
  2009-03-11  9:55   ` [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
                     ` (3 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:48 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time.  We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver.  At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |  134 ++++++++++++++++++++++++++++-------------------
 1 file changed, 81 insertions(+), 53 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
+
+	pci_dev->state_saved = false;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
-
-		pci_dev->state_saved = false;
+		int error;
 
 		error = drv->suspend(pci_dev, state);
 		suspend_report_result(drv->suspend, error);
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: Device state not saved by %pF\n",
 				drv->suspend);
-			goto Fixup;
 		}
 	}
 
-	pci_save_state(pci_dev);
-	/*
-	 * This is for compatibility with existing code with legacy PM support.
-	 */
-	pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
 
 	if (drv && drv->suspend_late) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
 		error = drv->suspend_late(pci_dev, state);
 		suspend_report_result(drv->suspend_late, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: Device state not saved by %pF\n",
+				drv->suspend_late);
+			return 0;
+		}
 	}
-	return error;
+
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
+
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
 	/* Disable non-bridge devices without PM support */
 	if (!pci_is_bridge(pci_dev))
 		pci_disable_enabled_device(pci_dev);
-	pci_save_state(pci_dev);
 }
 
 static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device 
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: State of device not saved by %pF\n",
 				pm->suspend);
-			goto Fixup;
 		}
 	}
 
-	if (!pci_dev->state_saved) {
-		pci_save_state(pci_dev);
-		if (!pci_is_bridge(pci_dev))
-			pci_prepare_to_sleep(pci_dev);
-	}
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device 
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (drv && drv->pm && drv->pm->suspend_noirq) {
-		error = drv->pm->suspend_noirq(dev);
-		suspend_report_result(drv->pm->suspend_noirq, error);
+	if (!pm)
+		return 0;
+
+	if (pm->suspend_noirq) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
+		error = pm->suspend_noirq(dev);
+		suspend_report_result(pm->suspend_noirq, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: State of device not saved by %pF\n",
+				pm->suspend_noirq);
+			return 0;
+		}
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved) {
+		pci_save_state(pci_dev);
+		if (!pci_is_bridge(pci_dev))
+			pci_prepare_to_sleep(pci_dev);
+	}
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (!pci_dev->state_saved)
-		pci_save_state(pci_dev);
-
 	return 0;
 }
 
@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
 	if (drv && drv->pm && drv->pm->freeze_noirq) {
+		int error;
+
 		error = drv->pm->freeze_noirq(dev);
 		suspend_report_result(drv->pm->freeze_noirq, error);
+		if (error)
+			return error;
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,44 @@ static int pci_pm_poweroff(struct device
 	pci_dev->state_saved = false;
 
 	if (pm->poweroff) {
+		int error;
+
 		error = pm->poweroff(dev);
 		suspend_report_result(pm->poweroff, error);
+		if (error)
+			return error;
 	}
 
-	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
-		pci_prepare_to_sleep(pci_dev);
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
-	if (drv && drv->pm && drv->pm->poweroff_noirq) {
+	if (!drv || !drv->pm)
+		return 0;
+
+	if (drv->pm->poweroff_noirq) {
+		int error;
+
 		error = drv->pm->poweroff_noirq(dev);
 		suspend_report_result(drv->pm->poweroff_noirq, error);
+		if (error)
+			return error;
 	}
 
-	return error;
+	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+		pci_prepare_to_sleep(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_restore_noirq(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2)
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (12 preceding siblings ...)
  2009-03-11  9:47   ` Rafael J. Wysocki
@ 2009-03-11  9:48   ` Rafael J. Wysocki
  2009-03-11  9:48   ` Rafael J. Wysocki
                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:48 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time.  We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver.  At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |  134 ++++++++++++++++++++++++++++-------------------
 1 file changed, 81 insertions(+), 53 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
+
+	pci_dev->state_saved = false;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
-
-		pci_dev->state_saved = false;
+		int error;
 
 		error = drv->suspend(pci_dev, state);
 		suspend_report_result(drv->suspend, error);
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: Device state not saved by %pF\n",
 				drv->suspend);
-			goto Fixup;
 		}
 	}
 
-	pci_save_state(pci_dev);
-	/*
-	 * This is for compatibility with existing code with legacy PM support.
-	 */
-	pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
 
 	if (drv && drv->suspend_late) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
 		error = drv->suspend_late(pci_dev, state);
 		suspend_report_result(drv->suspend_late, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: Device state not saved by %pF\n",
+				drv->suspend_late);
+			return 0;
+		}
 	}
-	return error;
+
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
+
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
 	/* Disable non-bridge devices without PM support */
 	if (!pci_is_bridge(pci_dev))
 		pci_disable_enabled_device(pci_dev);
-	pci_save_state(pci_dev);
 }
 
 static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device 
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: State of device not saved by %pF\n",
 				pm->suspend);
-			goto Fixup;
 		}
 	}
 
-	if (!pci_dev->state_saved) {
-		pci_save_state(pci_dev);
-		if (!pci_is_bridge(pci_dev))
-			pci_prepare_to_sleep(pci_dev);
-	}
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device 
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (drv && drv->pm && drv->pm->suspend_noirq) {
-		error = drv->pm->suspend_noirq(dev);
-		suspend_report_result(drv->pm->suspend_noirq, error);
+	if (!pm)
+		return 0;
+
+	if (pm->suspend_noirq) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
+		error = pm->suspend_noirq(dev);
+		suspend_report_result(pm->suspend_noirq, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: State of device not saved by %pF\n",
+				pm->suspend_noirq);
+			return 0;
+		}
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved) {
+		pci_save_state(pci_dev);
+		if (!pci_is_bridge(pci_dev))
+			pci_prepare_to_sleep(pci_dev);
+	}
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (!pci_dev->state_saved)
-		pci_save_state(pci_dev);
-
 	return 0;
 }
 
@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
 	if (drv && drv->pm && drv->pm->freeze_noirq) {
+		int error;
+
 		error = drv->pm->freeze_noirq(dev);
 		suspend_report_result(drv->pm->freeze_noirq, error);
+		if (error)
+			return error;
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,44 @@ static int pci_pm_poweroff(struct device
 	pci_dev->state_saved = false;
 
 	if (pm->poweroff) {
+		int error;
+
 		error = pm->poweroff(dev);
 		suspend_report_result(pm->poweroff, error);
+		if (error)
+			return error;
 	}
 
-	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
-		pci_prepare_to_sleep(pci_dev);
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
-	if (drv && drv->pm && drv->pm->poweroff_noirq) {
+	if (!drv || !drv->pm)
+		return 0;
+
+	if (drv->pm->poweroff_noirq) {
+		int error;
+
 		error = drv->pm->poweroff_noirq(dev);
 		suspend_report_result(drv->pm->poweroff_noirq, error);
+		if (error)
+			return error;
 	}
 
-	return error;
+	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+		pci_prepare_to_sleep(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_restore_noirq(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (15 preceding siblings ...)
  2009-03-11  9:55   ` [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
@ 2009-03-11  9:55   ` Rafael J. Wysocki
  2009-03-11  9:56   ` [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
  2009-03-11  9:56   ` Rafael J. Wysocki
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:55 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

There is a problem with PCI devices without any PM support (either native or
through the platform) that pci_set_power_state() always returns error code for
them, even if they are being put into D0.  However, such devices are always in
D0, so pci_set_power_state() should return success when attempting to put such a
device into D0.  It also should update the current_state field for these
devices as appropriate.  This modification is necessary so that the standard
configuration registers of these devices are successfully restored by
pci_restore_standard_config() during the "early" phase of resume.

In addition, pci_set_power_state() should check the value of current_state
before calling the platform to change the power state of the device to avoid
doing that unnecessarily.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -439,6 +439,10 @@ static int pci_raw_set_power_state(struc
 	u16 pmcsr;
 	bool need_restore = false;
 
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
 	if (!dev->pm_cap)
 		return -EIO;
 
@@ -449,10 +453,7 @@ static int pci_raw_set_power_state(struc
 	 * Can enter D0 from any state, but if we can only go deeper 
 	 * to sleep if we're already in a low power state
 	 */
-	if (dev->current_state == state) {
-		/* we're already there */
-		return 0;
-	} else if (state != PCI_D0 && dev->current_state <= PCI_D3cold
+	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
 	    && dev->current_state > state) {
 		dev_err(&dev->dev, "invalid power transition "
 			"(from state %d to %d)\n", dev->current_state, state);
@@ -570,12 +571,17 @@ int pci_set_power_state(struct pci_dev *
 		 */
 		return 0;
 
-	if (state == PCI_D0 && platform_pci_power_manageable(dev)) {
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
+	if (state == PCI_D0) {
 		/*
 		 * Allow the platform to change the state, for example via ACPI
 		 * _PR0, _PS0 and some such, but do not trust it.
 		 */
-		int ret = platform_pci_set_power_state(dev, PCI_D0);
+		int ret = platform_pci_power_manageable(dev) ?
+			platform_pci_set_power_state(dev, PCI_D0) : 0;
 		if (!ret)
 			pci_update_current_state(dev, PCI_D0);
 	}

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (14 preceding siblings ...)
  2009-03-11  9:48   ` Rafael J. Wysocki
@ 2009-03-11  9:55   ` Rafael J. Wysocki
  2009-03-11  9:55   ` Rafael J. Wysocki
                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:55 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

There is a problem with PCI devices without any PM support (either native or
through the platform) that pci_set_power_state() always returns error code for
them, even if they are being put into D0.  However, such devices are always in
D0, so pci_set_power_state() should return success when attempting to put such a
device into D0.  It also should update the current_state field for these
devices as appropriate.  This modification is necessary so that the standard
configuration registers of these devices are successfully restored by
pci_restore_standard_config() during the "early" phase of resume.

In addition, pci_set_power_state() should check the value of current_state
before calling the platform to change the power state of the device to avoid
doing that unnecessarily.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -439,6 +439,10 @@ static int pci_raw_set_power_state(struc
 	u16 pmcsr;
 	bool need_restore = false;
 
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
 	if (!dev->pm_cap)
 		return -EIO;
 
@@ -449,10 +453,7 @@ static int pci_raw_set_power_state(struc
 	 * Can enter D0 from any state, but if we can only go deeper 
 	 * to sleep if we're already in a low power state
 	 */
-	if (dev->current_state == state) {
-		/* we're already there */
-		return 0;
-	} else if (state != PCI_D0 && dev->current_state <= PCI_D3cold
+	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
 	    && dev->current_state > state) {
 		dev_err(&dev->dev, "invalid power transition "
 			"(from state %d to %d)\n", dev->current_state, state);
@@ -570,12 +571,17 @@ int pci_set_power_state(struct pci_dev *
 		 */
 		return 0;
 
-	if (state == PCI_D0 && platform_pci_power_manageable(dev)) {
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
+	if (state == PCI_D0) {
 		/*
 		 * Allow the platform to change the state, for example via ACPI
 		 * _PR0, _PS0 and some such, but do not trust it.
 		 */
-		int ret = platform_pci_set_power_state(dev, PCI_D0);
+		int ret = platform_pci_power_manageable(dev) ?
+			platform_pci_set_power_state(dev, PCI_D0) : 0;
 		if (!ret)
 			pci_update_current_state(dev, PCI_D0);
 	}

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (17 preceding siblings ...)
  2009-03-11  9:56   ` [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
@ 2009-03-11  9:56   ` Rafael J. Wysocki
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:56 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Frans Pop,
	Arve Hjønnevåg

From: Rafael J. Wysocki <rjw@sisk.pl>

At present the configuration spaces of PCI devices that have no drivers
or no PM support in the drivers (either legacy or through a pm object) are not
saved during suspend and, consequently, they are not restored during resume.
This generally may lead to the state of the system being slightly inconsistent
after the resume, so it's better to save and restore the configuration spaces
of these devices as well.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -516,13 +516,13 @@ static int pci_pm_suspend(struct device 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_SUSPEND);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 		int error;
@@ -554,8 +554,10 @@ static int pci_pm_suspend_noirq(struct d
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (!pm)
+	if (!pm) {
+		pci_save_state(pci_dev);
 		return 0;
+	}
 
 	if (pm->suspend_noirq) {
 		pci_power_t prev = pci_dev->current_state;
@@ -650,13 +652,13 @@ static int pci_pm_freeze(struct device *
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_FREEZE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		return 0;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->freeze) {
 		int error;
 
@@ -738,13 +740,13 @@ static int pci_pm_poweroff(struct device
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->poweroff) {
 		int error;
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume
  2009-03-11  9:30 ` Rafael J. Wysocki
                     ` (16 preceding siblings ...)
  2009-03-11  9:55   ` Rafael J. Wysocki
@ 2009-03-11  9:56   ` Rafael J. Wysocki
  2009-03-11  9:56   ` Rafael J. Wysocki
  18 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11  9:56 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

At present the configuration spaces of PCI devices that have no drivers
or no PM support in the drivers (either legacy or through a pm object) are not
saved during suspend and, consequently, they are not restored during resume.
This generally may lead to the state of the system being slightly inconsistent
after the resume, so it's better to save and restore the configuration spaces
of these devices as well.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -516,13 +516,13 @@ static int pci_pm_suspend(struct device 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_SUSPEND);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 		int error;
@@ -554,8 +554,10 @@ static int pci_pm_suspend_noirq(struct d
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (!pm)
+	if (!pm) {
+		pci_save_state(pci_dev);
 		return 0;
+	}
 
 	if (pm->suspend_noirq) {
 		pci_power_t prev = pci_dev->current_state;
@@ -650,13 +652,13 @@ static int pci_pm_freeze(struct device *
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_FREEZE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		return 0;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->freeze) {
 		int error;
 
@@ -738,13 +740,13 @@ static int pci_pm_poweroff(struct device
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->poweroff) {
 		int error;
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11  9:36   ` Rafael J. Wysocki
  2009-03-11 10:33     ` Thomas Gleixner
@ 2009-03-11 10:33     ` Thomas Gleixner
  2009-03-11 20:59       ` Rafael J. Wysocki
                         ` (3 more replies)
  1 sibling, 4 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 10:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

Rafael,

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/kernel/irq/pm.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/irq/pm.c
> @@ -0,0 +1,89 @@
> +/*
> + * linux/kernel/irq/pm.c
> + *
> + * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
> + *
> + * This file contains power management functions related to interrupts.
> + */
> +
> +#include <linux/irq.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +
> +#include "internals.h"
> +
> +/**
> + * suspend_device_irqs - disable all currently enabled interrupt lines
> + *
> + * During system-wide suspend or hibernation device interrupts need to be
> + * disabled at the chip level and this function is provided for this purpose.
> + * It disables all interrupt lines that are enabled at the moment and sets the
> + * IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +		bool sync = false;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +
> +		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> +			if (!desc->depth++) {
> +				desc->status |= IRQ_DISABLED;
> +				desc->chip->disable(irq);
> +				sync = true;
> +			}
> +			desc->status |= IRQ_SUSPENDED;

  This flag needs to be checked in __enable_irq().

> +		}
> +
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +
> +		if (sync)
> +			synchronize_irq(irq);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);

  I'm not too enthusiastic about this open coded implementation of
  disable_irq() with slightly different semantics.

  Can we please move the fiddling with desc->* into
  kernel/irq/manage.c and share the code there ?

> +/**
> + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> + *
> + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> + * have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		if (!(desc->status & IRQ_SUSPENDED))
> +			continue;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +		desc->status &= ~IRQ_SUSPENDED;
> +		__enable_irq(desc, irq);
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(resume_device_irqs);

  Ditto.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11  9:36   ` Rafael J. Wysocki
@ 2009-03-11 10:33     ` Thomas Gleixner
  2009-03-11 10:33     ` Thomas Gleixner
  1 sibling, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 10:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds, Ingo Molnar

Rafael,

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/kernel/irq/pm.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/irq/pm.c
> @@ -0,0 +1,89 @@
> +/*
> + * linux/kernel/irq/pm.c
> + *
> + * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
> + *
> + * This file contains power management functions related to interrupts.
> + */
> +
> +#include <linux/irq.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +
> +#include "internals.h"
> +
> +/**
> + * suspend_device_irqs - disable all currently enabled interrupt lines
> + *
> + * During system-wide suspend or hibernation device interrupts need to be
> + * disabled at the chip level and this function is provided for this purpose.
> + * It disables all interrupt lines that are enabled at the moment and sets the
> + * IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +		bool sync = false;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +
> +		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> +			if (!desc->depth++) {
> +				desc->status |= IRQ_DISABLED;
> +				desc->chip->disable(irq);
> +				sync = true;
> +			}
> +			desc->status |= IRQ_SUSPENDED;

  This flag needs to be checked in __enable_irq().

> +		}
> +
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +
> +		if (sync)
> +			synchronize_irq(irq);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);

  I'm not too enthusiastic about this open coded implementation of
  disable_irq() with slightly different semantics.

  Can we please move the fiddling with desc->* into
  kernel/irq/manage.c and share the code there ?

> +/**
> + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> + *
> + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> + * have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		if (!(desc->status & IRQ_SUSPENDED))
> +			continue;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +		desc->status &= ~IRQ_SUSPENDED;
> +		__enable_irq(desc, irq);
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(resume_device_irqs);

  Ditto.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 10:33     ` Thomas Gleixner
@ 2009-03-11 20:59       ` Rafael J. Wysocki
  2009-03-11 21:42         ` Thomas Gleixner
  2009-03-11 21:42         ` Thomas Gleixner
  2009-03-11 20:59       ` Rafael J. Wysocki
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 20:59 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> Rafael,
> 
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/kernel/irq/pm.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/kernel/irq/pm.c
> > @@ -0,0 +1,89 @@
> > +/*
> > + * linux/kernel/irq/pm.c
> > + *
> > + * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
> > + *
> > + * This file contains power management functions related to interrupts.
> > + */
> > +
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/interrupt.h>
> > +
> > +#include "internals.h"
> > +
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +		bool sync = false;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +
> > +		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> > +			if (!desc->depth++) {
> > +				desc->status |= IRQ_DISABLED;
> > +				desc->chip->disable(irq);
> > +				sync = true;
> > +			}
> > +			desc->status |= IRQ_SUSPENDED;
> 
>   This flag needs to be checked in __enable_irq().
> 
> > +		}
> > +
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> > +
> > +		if (sync)
> > +			synchronize_irq(irq);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> 
>   I'm not too enthusiastic about this open coded implementation of
>   disable_irq() with slightly different semantics.

The difference in semantics is important IMO, otherwise I woulndn't have
done that.  In particular, IMO, the condition should be under the spinlock IMO
and I'd rather not synchronize all interrupts we don't really disable here.

>   Can we please move the fiddling with desc->* into
>   kernel/irq/manage.c and share the code there ?

Can you please discuss that with Ingo?  I moved that from manage.c at his
request.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 10:33     ` Thomas Gleixner
  2009-03-11 20:59       ` Rafael J. Wysocki
@ 2009-03-11 20:59       ` Rafael J. Wysocki
  2009-03-11 21:15       ` Rafael J. Wysocki
  2009-03-11 21:15       ` Rafael J. Wysocki
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 20:59 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> Rafael,
> 
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/kernel/irq/pm.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/kernel/irq/pm.c
> > @@ -0,0 +1,89 @@
> > +/*
> > + * linux/kernel/irq/pm.c
> > + *
> > + * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
> > + *
> > + * This file contains power management functions related to interrupts.
> > + */
> > +
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/interrupt.h>
> > +
> > +#include "internals.h"
> > +
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +		bool sync = false;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +
> > +		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> > +			if (!desc->depth++) {
> > +				desc->status |= IRQ_DISABLED;
> > +				desc->chip->disable(irq);
> > +				sync = true;
> > +			}
> > +			desc->status |= IRQ_SUSPENDED;
> 
>   This flag needs to be checked in __enable_irq().
> 
> > +		}
> > +
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> > +
> > +		if (sync)
> > +			synchronize_irq(irq);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> 
>   I'm not too enthusiastic about this open coded implementation of
>   disable_irq() with slightly different semantics.

The difference in semantics is important IMO, otherwise I woulndn't have
done that.  In particular, IMO, the condition should be under the spinlock IMO
and I'd rather not synchronize all interrupts we don't really disable here.

>   Can we please move the fiddling with desc->* into
>   kernel/irq/manage.c and share the code there ?

Can you please discuss that with Ingo?  I moved that from manage.c at his
request.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 10:33     ` Thomas Gleixner
                         ` (2 preceding siblings ...)
  2009-03-11 21:15       ` Rafael J. Wysocki
@ 2009-03-11 21:15       ` Rafael J. Wysocki
  2009-03-11 21:35         ` Thomas Gleixner
  2009-03-11 21:35         ` Thomas Gleixner
  3 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 21:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: pm list, LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> Rafael,
> 
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/kernel/irq/pm.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/kernel/irq/pm.c
> > @@ -0,0 +1,89 @@
> > +/*
> > + * linux/kernel/irq/pm.c
> > + *
> > + * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
> > + *
> > + * This file contains power management functions related to interrupts.
> > + */
> > +
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/interrupt.h>
> > +
> > +#include "internals.h"
> > +
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +		bool sync = false;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +
> > +		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> > +			if (!desc->depth++) {
> > +				desc->status |= IRQ_DISABLED;
> > +				desc->chip->disable(irq);
> > +				sync = true;
> > +			}
> > +			desc->status |= IRQ_SUSPENDED;
> 
>   This flag needs to be checked in __enable_irq().

[I overlooked this comment, sorry.]

Why does it?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 10:33     ` Thomas Gleixner
  2009-03-11 20:59       ` Rafael J. Wysocki
  2009-03-11 20:59       ` Rafael J. Wysocki
@ 2009-03-11 21:15       ` Rafael J. Wysocki
  2009-03-11 21:15       ` Rafael J. Wysocki
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 21:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds, Ingo Molnar

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> Rafael,
> 
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> 
> > Index: linux-2.6/kernel/irq/pm.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/kernel/irq/pm.c
> > @@ -0,0 +1,89 @@
> > +/*
> > + * linux/kernel/irq/pm.c
> > + *
> > + * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
> > + *
> > + * This file contains power management functions related to interrupts.
> > + */
> > +
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/interrupt.h>
> > +
> > +#include "internals.h"
> > +
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +		bool sync = false;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +
> > +		if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> > +			if (!desc->depth++) {
> > +				desc->status |= IRQ_DISABLED;
> > +				desc->chip->disable(irq);
> > +				sync = true;
> > +			}
> > +			desc->status |= IRQ_SUSPENDED;
> 
>   This flag needs to be checked in __enable_irq().

[I overlooked this comment, sorry.]

Why does it?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:15       ` Rafael J. Wysocki
  2009-03-11 21:35         ` Thomas Gleixner
@ 2009-03-11 21:35         ` Thomas Gleixner
  2009-03-11 21:50           ` Rafael J. Wysocki
  2009-03-11 21:50           ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 21:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > +			desc->status |= IRQ_SUSPENDED;
> > 
> >   This flag needs to be checked in __enable_irq().
> 
> [I overlooked this comment, sorry.]
> 
> Why does it?

To catch abuse and callers of enable_irq() when this flag is set.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:15       ` Rafael J. Wysocki
@ 2009-03-11 21:35         ` Thomas Gleixner
  2009-03-11 21:35         ` Thomas Gleixner
  1 sibling, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 21:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds, Ingo Molnar

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > +			desc->status |= IRQ_SUSPENDED;
> > 
> >   This flag needs to be checked in __enable_irq().
> 
> [I overlooked this comment, sorry.]
> 
> Why does it?

To catch abuse and callers of enable_irq() when this flag is set.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 20:59       ` Rafael J. Wysocki
  2009-03-11 21:42         ` Thomas Gleixner
@ 2009-03-11 21:42         ` Thomas Gleixner
  2009-03-11 22:01             ` Rafael J. Wysocki
                             ` (2 more replies)
  1 sibling, 3 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 21:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > 
> >   I'm not too enthusiastic about this open coded implementation of
> >   disable_irq() with slightly different semantics.
> 
> The difference in semantics is important IMO, otherwise I woulndn't have
> done that.  In particular, IMO, the condition should be under the spinlock IMO
> and I'd rather not synchronize all interrupts we don't really disable here.

I don't say that the difference is not relevant. But the code is
almost the same and disable_irq() could have the sync_irq optimization
as well.
 
> >   Can we please move the fiddling with desc->* into
> >   kernel/irq/manage.c and share the code there ?
> 
> Can you please discuss that with Ingo?  I moved that from manage.c at his
> request.

Hmrpf. Will do. I just want to avoid that we have scattered functions
which deal with the guts of the irq code all over the place. I'm fine
with your loop in irq/pm.c, but the actual handling of the irq
internals should remain in manage.c.

I'll have a closer look how to solve this.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 20:59       ` Rafael J. Wysocki
@ 2009-03-11 21:42         ` Thomas Gleixner
  2009-03-11 21:42         ` Thomas Gleixner
  1 sibling, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 21:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > 
> >   I'm not too enthusiastic about this open coded implementation of
> >   disable_irq() with slightly different semantics.
> 
> The difference in semantics is important IMO, otherwise I woulndn't have
> done that.  In particular, IMO, the condition should be under the spinlock IMO
> and I'd rather not synchronize all interrupts we don't really disable here.

I don't say that the difference is not relevant. But the code is
almost the same and disable_irq() could have the sync_irq optimization
as well.
 
> >   Can we please move the fiddling with desc->* into
> >   kernel/irq/manage.c and share the code there ?
> 
> Can you please discuss that with Ingo?  I moved that from manage.c at his
> request.

Hmrpf. Will do. I just want to avoid that we have scattered functions
which deal with the guts of the irq code all over the place. I'm fine
with your loop in irq/pm.c, but the actual handling of the irq
internals should remain in manage.c.

I'll have a closer look how to solve this.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:35         ` Thomas Gleixner
  2009-03-11 21:50           ` Rafael J. Wysocki
@ 2009-03-11 21:50           ` Rafael J. Wysocki
  2009-03-11 21:53             ` Thomas Gleixner
  2009-03-11 21:53             ` Thomas Gleixner
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 21:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: pm list, LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +			desc->status |= IRQ_SUSPENDED;
> > > 
> > >   This flag needs to be checked in __enable_irq().
> > 
> > [I overlooked this comment, sorry.]
> > 
> > Why does it?
> 
> To catch abuse and callers of enable_irq() when this flag is set.

Hmm.  This means you'd like to make enable_irq() fail if called with
IRQ_SUSPENDED set, correct?

What if someone calls irq_disable() and then irq_enable() between
suspend_device_irqs() and resume_device_irqs()?  That would be pointless, but
surely not a bug?  Should irq_disable() also fail if IRQ_SUSPENDED is set?

Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:35         ` Thomas Gleixner
@ 2009-03-11 21:50           ` Rafael J. Wysocki
  2009-03-11 21:50           ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 21:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds, Ingo Molnar

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +			desc->status |= IRQ_SUSPENDED;
> > > 
> > >   This flag needs to be checked in __enable_irq().
> > 
> > [I overlooked this comment, sorry.]
> > 
> > Why does it?
> 
> To catch abuse and callers of enable_irq() when this flag is set.

Hmm.  This means you'd like to make enable_irq() fail if called with
IRQ_SUSPENDED set, correct?

What if someone calls irq_disable() and then irq_enable() between
suspend_device_irqs() and resume_device_irqs()?  That would be pointless, but
surely not a bug?  Should irq_disable() also fail if IRQ_SUSPENDED is set?

Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:50           ` Rafael J. Wysocki
  2009-03-11 21:53             ` Thomas Gleixner
@ 2009-03-11 21:53             ` Thomas Gleixner
  2009-03-11 22:01                 ` Linus Torvalds
                                 ` (2 more replies)
  1 sibling, 3 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 21:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:

> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > +			desc->status |= IRQ_SUSPENDED;
> > > > 
> > > >   This flag needs to be checked in __enable_irq().
> > > 
> > > [I overlooked this comment, sorry.]
> > > 
> > > Why does it?
> > 
> > To catch abuse and callers of enable_irq() when this flag is set.
> 
> Hmm.  This means you'd like to make enable_irq() fail if called with
> IRQ_SUSPENDED set, correct?
> 
> What if someone calls irq_disable() and then irq_enable() between
> suspend_device_irqs() and resume_device_irqs()?  That would be pointless, but
> surely not a bug?  Should irq_disable() also fail if IRQ_SUSPENDED is set?

I'm not worried about nested ones.

> Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?

At least it needs a WARN_ON() in that case. A very prominent one.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:50           ` Rafael J. Wysocki
@ 2009-03-11 21:53             ` Thomas Gleixner
  2009-03-11 21:53             ` Thomas Gleixner
  1 sibling, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 21:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds, Ingo Molnar

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:

> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > +			desc->status |= IRQ_SUSPENDED;
> > > > 
> > > >   This flag needs to be checked in __enable_irq().
> > > 
> > > [I overlooked this comment, sorry.]
> > > 
> > > Why does it?
> > 
> > To catch abuse and callers of enable_irq() when this flag is set.
> 
> Hmm.  This means you'd like to make enable_irq() fail if called with
> IRQ_SUSPENDED set, correct?
> 
> What if someone calls irq_disable() and then irq_enable() between
> suspend_device_irqs() and resume_device_irqs()?  That would be pointless, but
> surely not a bug?  Should irq_disable() also fail if IRQ_SUSPENDED is set?

I'm not worried about nested ones.

> Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?

At least it needs a WARN_ON() in that case. A very prominent one.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:53             ` Thomas Gleixner
@ 2009-03-11 22:01                 ` Linus Torvalds
  2009-03-11 22:07               ` Rafael J. Wysocki
  2009-03-11 22:07               ` Rafael J. Wysocki
  2 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-11 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, pm list, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg



On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> 
> I'm not worried about nested ones.

Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one 
increments the disabled depth count.

So _all_ disable/enable_irq calls will by definition be nested inside 
IRQ_SUSPENDED. 

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
@ 2009-03-11 22:01                 ` Linus Torvalds
  0 siblings, 0 replies; 373+ messages in thread
From: Linus Torvalds @ 2009-03-11 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Ingo Molnar



On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> 
> I'm not worried about nested ones.

Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one 
increments the disabled depth count.

So _all_ disable/enable_irq calls will by definition be nested inside 
IRQ_SUSPENDED. 

		Linus

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:42         ` Thomas Gleixner
@ 2009-03-11 22:01             ` Rafael J. Wysocki
  2009-03-11 22:45           ` Thomas Gleixner
  2009-03-11 22:45           ` Thomas Gleixner
  2 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > 
> > >   I'm not too enthusiastic about this open coded implementation of
> > >   disable_irq() with slightly different semantics.
> > 
> > The difference in semantics is important IMO, otherwise I woulndn't have
> > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > and I'd rather not synchronize all interrupts we don't really disable here.
> 
> I don't say that the difference is not relevant. But the code is
> almost the same and disable_irq() could have the sync_irq optimization
> as well.

Agreed.

> > >   Can we please move the fiddling with desc->* into
> > >   kernel/irq/manage.c and share the code there ?
> > 
> > Can you please discuss that with Ingo?  I moved that from manage.c at his
> > request.
> 
> Hmrpf. Will do. I just want to avoid that we have scattered functions
> which deal with the guts of the irq code all over the place.

I understand your concern, I'd prefer to avoid that too.

> I'm fine with your loop in irq/pm.c, but the actual handling of the irq
> internals should remain in manage.c.

Well, perhaps we can add a parameter to disable_irq_nosync() telling it not
to disable the interrupt if it's a timer one?  Something like

void disable_irq_nosync(unsigned int irq, bool skip_timer) etc.?

Also, it could return a value meaning whether or not the interrupt has been
actually disabled.

> I'll have a closer look how to solve this.

Thanks!

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
@ 2009-03-11 22:01             ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > 
> > >   I'm not too enthusiastic about this open coded implementation of
> > >   disable_irq() with slightly different semantics.
> > 
> > The difference in semantics is important IMO, otherwise I woulndn't have
> > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > and I'd rather not synchronize all interrupts we don't really disable here.
> 
> I don't say that the difference is not relevant. But the code is
> almost the same and disable_irq() could have the sync_irq optimization
> as well.

Agreed.

> > >   Can we please move the fiddling with desc->* into
> > >   kernel/irq/manage.c and share the code there ?
> > 
> > Can you please discuss that with Ingo?  I moved that from manage.c at his
> > request.
> 
> Hmrpf. Will do. I just want to avoid that we have scattered functions
> which deal with the guts of the irq code all over the place.

I understand your concern, I'd prefer to avoid that too.

> I'm fine with your loop in irq/pm.c, but the actual handling of the irq
> internals should remain in manage.c.

Well, perhaps we can add a parameter to disable_irq_nosync() telling it not
to disable the interrupt if it's a timer one?  Something like

void disable_irq_nosync(unsigned int irq, bool skip_timer) etc.?

Also, it could return a value meaning whether or not the interrupt has been
actually disabled.

> I'll have a closer look how to solve this.

Thanks!

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:53             ` Thomas Gleixner
  2009-03-11 22:01                 ` Linus Torvalds
@ 2009-03-11 22:07               ` Rafael J. Wysocki
  2009-03-11 22:07               ` Rafael J. Wysocki
  2 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 22:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: pm list, LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> 
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > +			desc->status |= IRQ_SUSPENDED;
> > > > > 
> > > > >   This flag needs to be checked in __enable_irq().
> > > > 
> > > > [I overlooked this comment, sorry.]
> > > > 
> > > > Why does it?
> > > 
> > > To catch abuse and callers of enable_irq() when this flag is set.
> > 
> > Hmm.  This means you'd like to make enable_irq() fail if called with
> > IRQ_SUSPENDED set, correct?
> > 
> > What if someone calls irq_disable() and then irq_enable() between
> > suspend_device_irqs() and resume_device_irqs()?  That would be pointless, but
> > surely not a bug?  Should irq_disable() also fail if IRQ_SUSPENDED is set?
> 
> I'm not worried about nested ones.
> 
> > Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?
> 
> At least it needs a WARN_ON() in that case. A very prominent one.

I'm going to make it fail and print a warning for desc->depth == 1if IRQ_SUSPENDED is set.
Hope that's fine with everyone.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:53             ` Thomas Gleixner
  2009-03-11 22:01                 ` Linus Torvalds
  2009-03-11 22:07               ` Rafael J. Wysocki
@ 2009-03-11 22:07               ` Rafael J. Wysocki
  2 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 22:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds, Ingo Molnar

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> 
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > +			desc->status |= IRQ_SUSPENDED;
> > > > > 
> > > > >   This flag needs to be checked in __enable_irq().
> > > > 
> > > > [I overlooked this comment, sorry.]
> > > > 
> > > > Why does it?
> > > 
> > > To catch abuse and callers of enable_irq() when this flag is set.
> > 
> > Hmm.  This means you'd like to make enable_irq() fail if called with
> > IRQ_SUSPENDED set, correct?
> > 
> > What if someone calls irq_disable() and then irq_enable() between
> > suspend_device_irqs() and resume_device_irqs()?  That would be pointless, but
> > surely not a bug?  Should irq_disable() also fail if IRQ_SUSPENDED is set?
> 
> I'm not worried about nested ones.
> 
> > Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?
> 
> At least it needs a WARN_ON() in that case. A very prominent one.

I'm going to make it fail and print a warning for desc->depth == 1if IRQ_SUSPENDED is set.
Hope that's fine with everyone.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 22:01                 ` Linus Torvalds
  (?)
@ 2009-03-11 22:13                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 22:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, pm list, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wednesday 11 March 2009, Linus Torvalds wrote:
> 
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > 
> > I'm not worried about nested ones.
> 
> Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one 
> increments the disabled depth count.
> 
> So _all_ disable/enable_irq calls will by definition be nested inside 
> IRQ_SUSPENDED. 

Still, if there's an unbalanced irq_enable() between suspend_device_irqs()
and resume_device_irqs(), we'll not detect it immediately, but only in
resume_device_irqs().  It would be better if the unbalanced call failed in that
case IMHO.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 22:01                 ` Linus Torvalds
  (?)
  (?)
@ 2009-03-11 22:13                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-11 22:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Thomas Gleixner, Ingo Molnar

On Wednesday 11 March 2009, Linus Torvalds wrote:
> 
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > 
> > I'm not worried about nested ones.
> 
> Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one 
> increments the disabled depth count.
> 
> So _all_ disable/enable_irq calls will by definition be nested inside 
> IRQ_SUSPENDED. 

Still, if there's an unbalanced irq_enable() between suspend_device_irqs()
and resume_device_irqs(), we'll not detect it immediately, but only in
resume_device_irqs().  It would be better if the unbalanced call failed in that
case IMHO.

Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 22:01                 ` Linus Torvalds
                                   ` (2 preceding siblings ...)
  (?)
@ 2009-03-11 22:25                 ` Thomas Gleixner
  -1 siblings, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 22:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, pm list, LKML, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wed, 11 Mar 2009, Linus Torvalds wrote:
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > 
> > I'm not worried about nested ones.
> 
> Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one 
> increments the disabled depth count.
> 
> So _all_ disable/enable_irq calls will by definition be nested inside 
> IRQ_SUSPENDED. 

Right, if they are in disable -> enable order.

But the stupid stray enable will be visible either by wrecking the
suspend with hard to debug failures or trigger the depth check in the
resume code. I'm burned enough by the timer failures which pop up long
after the real bug happened.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 22:01                 ` Linus Torvalds
                                   ` (3 preceding siblings ...)
  (?)
@ 2009-03-11 22:25                 ` Thomas Gleixner
  -1 siblings, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 22:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Ingo Molnar

On Wed, 11 Mar 2009, Linus Torvalds wrote:
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > 
> > I'm not worried about nested ones.
> 
> Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one 
> increments the disabled depth count.
> 
> So _all_ disable/enable_irq calls will by definition be nested inside 
> IRQ_SUSPENDED. 

Right, if they are in disable -> enable order.

But the stupid stray enable will be visible either by wrecking the
suspend with hard to debug failures or trigger the depth check in the
resume code. I'm burned enough by the timer failures which pop up long
after the real bug happened.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:42         ` Thomas Gleixner
  2009-03-11 22:01             ` Rafael J. Wysocki
@ 2009-03-11 22:45           ` Thomas Gleixner
  2009-03-12 13:36             ` Rafael J. Wysocki
  2009-03-12 13:36             ` Rafael J. Wysocki
  2009-03-11 22:45           ` Thomas Gleixner
  2 siblings, 2 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 22:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > 
> > >   I'm not too enthusiastic about this open coded implementation of
> > >   disable_irq() with slightly different semantics.
> > 
> > The difference in semantics is important IMO, otherwise I woulndn't have
> > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > and I'd rather not synchronize all interrupts we don't really disable here.
> 
> I don't say that the difference is not relevant. But the code is
> almost the same and disable_irq() could have the sync_irq optimization
> as well.

Thought more about that. Avoiding the sync_irq() for irqs which have
no action associated is fine, but you need to catch the following case
as well:

   driver code calls disable_irq_nosyc() from the handler (which is
   still running)

   suspend code skips the sync due to depth > 0

The sync operation is not that expensive.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 21:42         ` Thomas Gleixner
  2009-03-11 22:01             ` Rafael J. Wysocki
  2009-03-11 22:45           ` Thomas Gleixner
@ 2009-03-11 22:45           ` Thomas Gleixner
  2 siblings, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-11 22:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > 
> > >   I'm not too enthusiastic about this open coded implementation of
> > >   disable_irq() with slightly different semantics.
> > 
> > The difference in semantics is important IMO, otherwise I woulndn't have
> > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > and I'd rather not synchronize all interrupts we don't really disable here.
> 
> I don't say that the difference is not relevant. But the code is
> almost the same and disable_irq() could have the sync_irq optimization
> as well.

Thought more about that. Avoiding the sync_irq() for irqs which have
no action associated is fine, but you need to catch the following case
as well:

   driver code calls disable_irq_nosyc() from the handler (which is
   still running)

   suspend code skips the sync due to depth > 0

The sync operation is not that expensive.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 22:45           ` Thomas Gleixner
@ 2009-03-12 13:36             ` Rafael J. Wysocki
  2009-03-12 21:43               ` [update, rev. 6] " Rafael J. Wysocki
  2009-03-12 21:43               ` Rafael J. Wysocki
  2009-03-12 13:36             ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-12 13:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > 
> > > >   I'm not too enthusiastic about this open coded implementation of
> > > >   disable_irq() with slightly different semantics.
> > > 
> > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > and I'd rather not synchronize all interrupts we don't really disable here.
> > 
> > I don't say that the difference is not relevant. But the code is
> > almost the same and disable_irq() could have the sync_irq optimization
> > as well.
> 
> Thought more about that. Avoiding the sync_irq() for irqs which have
> no action associated is fine, but you need to catch the following case
> as well:
> 
>    driver code calls disable_irq_nosyc() from the handler (which is
>    still running)
> 
>    suspend code skips the sync due to depth > 0
> 
> The sync operation is not that expensive.

OK, what about this (untested, irrelevant parts skipped)?

Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__disable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__enable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+	if (suspend) {
+		if (desc->action && (desc->action->flags & IRQF_TIMER))
+			return;
+		desc->status |= IRQ_SUSPENDED;
+	}
+
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,19 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
 {
+	if (resume)
+		desc->status &= ~IRQ_SUSPENDED;
+
 	switch (desc->depth) {
 	case 0:
-		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
-		break;
+		goto err_out;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -232,6 +247,11 @@ static void __enable_irq(struct irq_desc
 	default:
 		desc->depth--;
 	}
+
+	return;
+
+ err_out:
+	WARN(true, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 }
 
 /**
@@ -253,7 +273,7 @@ void enable_irq(unsigned int irq)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	__enable_irq(desc, irq);
+	__enable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(enable_irq);
@@ -511,7 +531,7 @@ __setup_irq(unsigned int irq, struct irq
 	 */
 	if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
 		desc->status &= ~IRQ_SPURIOUS_DISABLED;
-		__enable_irq(desc, irq);
+		__enable_irq(desc, irq, false);
 	}
 
 	spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-11 22:45           ` Thomas Gleixner
  2009-03-12 13:36             ` Rafael J. Wysocki
@ 2009-03-12 13:36             ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-12 13:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > 
> > > >   I'm not too enthusiastic about this open coded implementation of
> > > >   disable_irq() with slightly different semantics.
> > > 
> > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > and I'd rather not synchronize all interrupts we don't really disable here.
> > 
> > I don't say that the difference is not relevant. But the code is
> > almost the same and disable_irq() could have the sync_irq optimization
> > as well.
> 
> Thought more about that. Avoiding the sync_irq() for irqs which have
> no action associated is fine, but you need to catch the following case
> as well:
> 
>    driver code calls disable_irq_nosyc() from the handler (which is
>    still running)
> 
>    suspend code skips the sync due to depth > 0
> 
> The sync operation is not that expensive.

OK, what about this (untested, irrelevant parts skipped)?

Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__disable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__enable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+	if (suspend) {
+		if (desc->action && (desc->action->flags & IRQF_TIMER))
+			return;
+		desc->status |= IRQ_SUSPENDED;
+	}
+
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,19 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
 {
+	if (resume)
+		desc->status &= ~IRQ_SUSPENDED;
+
 	switch (desc->depth) {
 	case 0:
-		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
-		break;
+		goto err_out;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -232,6 +247,11 @@ static void __enable_irq(struct irq_desc
 	default:
 		desc->depth--;
 	}
+
+	return;
+
+ err_out:
+	WARN(true, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 }
 
 /**
@@ -253,7 +273,7 @@ void enable_irq(unsigned int irq)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	__enable_irq(desc, irq);
+	__enable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(enable_irq);
@@ -511,7 +531,7 @@ __setup_irq(unsigned int irq, struct irq
 	 */
 	if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
 		desc->status &= ~IRQ_SPURIOUS_DISABLED;
-		__enable_irq(desc, irq);
+		__enable_irq(desc, irq, false);
 	}
 
 	spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-12 13:36             ` Rafael J. Wysocki
  2009-03-12 21:43               ` [update, rev. 6] " Rafael J. Wysocki
@ 2009-03-12 21:43               ` Rafael J. Wysocki
  2009-03-13  0:39                 ` Ingo Molnar
                                   ` (4 more replies)
  1 sibling, 5 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-12 21:43 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > 
> > > > >   I'm not too enthusiastic about this open coded implementation of
> > > > >   disable_irq() with slightly different semantics.
> > > > 
> > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > 
> > > I don't say that the difference is not relevant. But the code is
> > > almost the same and disable_irq() could have the sync_irq optimization
> > > as well.
> > 
> > Thought more about that. Avoiding the sync_irq() for irqs which have
> > no action associated is fine, but you need to catch the following case
> > as well:
> > 
> >    driver code calls disable_irq_nosyc() from the handler (which is
> >    still running)
> > 
> >    suspend code skips the sync due to depth > 0
> > 
> > The sync operation is not that expensive.
> 
> OK, what about this (untested, irrelevant parts skipped)?

Well, I guess I need to assume that no reaction means it's fine. ;-)

Below is the complete patch.  Thomas, Ingo, please let me know it it is fine
with you.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 6)

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 ++++++--
 drivers/base/power/main.c |   20 ++++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 +++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    2 +
 kernel/irq/manage.c       |   31 +++++++++++++-----
 kernel/irq/pm.c           |   79 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++++------
 kernel/power/main.c       |   17 ++++++---
 13 files changed, 195 insertions(+), 47 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__disable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__enable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+	if (suspend) {
+		if (desc->action && (desc->action->flags & IRQF_TIMER))
+			return;
+		desc->status |= IRQ_SUSPENDED;
+	}
+
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,21 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
 {
+	if (resume)
+		desc->status &= ~IRQ_SUSPENDED;
+
 	switch (desc->depth) {
 	case 0:
+ err_out:
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -253,7 +270,7 @@ void enable_irq(unsigned int irq)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	__enable_irq(desc, irq);
+	__enable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(enable_irq);
@@ -511,7 +528,7 @@ __setup_irq(unsigned int irq, struct irq
 	 */
 	if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
 		desc->status &= ~IRQ_SPURIOUS_DISABLED;
-		__enable_irq(desc, irq);
+		__enable_irq(desc, irq, false);
 	}
 
 	spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-12 13:36             ` Rafael J. Wysocki
@ 2009-03-12 21:43               ` Rafael J. Wysocki
  2009-03-12 21:43               ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-12 21:43 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds

On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > 
> > > > >   I'm not too enthusiastic about this open coded implementation of
> > > > >   disable_irq() with slightly different semantics.
> > > > 
> > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > 
> > > I don't say that the difference is not relevant. But the code is
> > > almost the same and disable_irq() could have the sync_irq optimization
> > > as well.
> > 
> > Thought more about that. Avoiding the sync_irq() for irqs which have
> > no action associated is fine, but you need to catch the following case
> > as well:
> > 
> >    driver code calls disable_irq_nosyc() from the handler (which is
> >    still running)
> > 
> >    suspend code skips the sync due to depth > 0
> > 
> > The sync operation is not that expensive.
> 
> OK, what about this (untested, irrelevant parts skipped)?

Well, I guess I need to assume that no reaction means it's fine. ;-)

Below is the complete patch.  Thomas, Ingo, please let me know it it is fine
with you.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 6)

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume.  Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 ++++++--
 drivers/base/power/main.c |   20 ++++++-----
 drivers/base/sys.c        |    8 ++++
 drivers/xen/manage.c      |   16 +++++----
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    2 +
 kernel/irq/manage.c       |   31 +++++++++++++-----
 kernel/irq/pm.c           |   79 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c            |    8 ++--
 kernel/power/disk.c       |   39 ++++++++++++++++------
 kernel/power/main.c       |   17 ++++++---
 13 files changed, 195 insertions(+), 47 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__disable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__enable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+	if (suspend) {
+		if (desc->action && (desc->action->flags & IRQF_TIMER))
+			return;
+		desc->status |= IRQ_SUSPENDED;
+	}
+
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,21 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
 {
+	if (resume)
+		desc->status &= ~IRQ_SUSPENDED;
+
 	switch (desc->depth) {
 	case 0:
+ err_out:
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -253,7 +270,7 @@ void enable_irq(unsigned int irq)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	__enable_irq(desc, irq);
+	__enable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(enable_irq);
@@ -511,7 +528,7 @@ __setup_irq(unsigned int irq, struct irq
 	 */
 	if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
 		desc->status &= ~IRQ_SPURIOUS_DISABLED;
-		__enable_irq(desc, irq);
+		__enable_irq(desc, irq, false);
 	}
 
 	spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-12 21:43               ` Rafael J. Wysocki
  2009-03-13  0:39                 ` Ingo Molnar
@ 2009-03-13  0:39                 ` Ingo Molnar
  2009-03-13 17:07                     ` Rafael J. Wysocki
  2009-03-13  7:15                   ` Arve Hjønnevåg
                                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 373+ messages in thread
From: Ingo Molnar @ 2009-03-13  0:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, pm list, LKML, Linus Torvalds,
	Eric W. Biederman, Benjamin Herrenschmidt, Jeremy Fitzhardinge,
	Len Brown, Jesse Barnes, Frans Pop, Arve Hjønnevåg


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > > 
> > > > > >   I'm not too enthusiastic about this open coded implementation of
> > > > > >   disable_irq() with slightly different semantics.
> > > > > 
> > > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > > 
> > > > I don't say that the difference is not relevant. But the code is
> > > > almost the same and disable_irq() could have the sync_irq optimization
> > > > as well.
> > > 
> > > Thought more about that. Avoiding the sync_irq() for irqs which have
> > > no action associated is fine, but you need to catch the following case
> > > as well:
> > > 
> > >    driver code calls disable_irq_nosyc() from the handler (which is
> > >    still running)
> > > 
> > >    suspend code skips the sync due to depth > 0
> > > 
> > > The sync operation is not that expensive.
> > 
> > OK, what about this (untested, irrelevant parts skipped)?
> 
> Well, I guess I need to assume that no reaction means it's fine. ;-)
> 
> Below is the complete patch.  Thomas, Ingo, please let me know 
> it it is fine with you.

looks good - but you sure want to split it up some more, right?

> 13 files changed, 195 insertions(+), 47 deletions(-)

We want the non-intrusive 'add new APIs' bits [which give most 
of the linecount] separated from the 'all hell breaks lose' 
functional changes ;-) Makes it easier to revert, bisect, etc.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-12 21:43               ` Rafael J. Wysocki
@ 2009-03-13  0:39                 ` Ingo Molnar
  2009-03-13  0:39                 ` Ingo Molnar
                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-03-13  0:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Thomas Gleixner, Linus Torvalds, pm list


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > > 
> > > > > >   I'm not too enthusiastic about this open coded implementation of
> > > > > >   disable_irq() with slightly different semantics.
> > > > > 
> > > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > > 
> > > > I don't say that the difference is not relevant. But the code is
> > > > almost the same and disable_irq() could have the sync_irq optimization
> > > > as well.
> > > 
> > > Thought more about that. Avoiding the sync_irq() for irqs which have
> > > no action associated is fine, but you need to catch the following case
> > > as well:
> > > 
> > >    driver code calls disable_irq_nosyc() from the handler (which is
> > >    still running)
> > > 
> > >    suspend code skips the sync due to depth > 0
> > > 
> > > The sync operation is not that expensive.
> > 
> > OK, what about this (untested, irrelevant parts skipped)?
> 
> Well, I guess I need to assume that no reaction means it's fine. ;-)
> 
> Below is the complete patch.  Thomas, Ingo, please let me know 
> it it is fine with you.

looks good - but you sure want to split it up some more, right?

> 13 files changed, 195 insertions(+), 47 deletions(-)

We want the non-intrusive 'add new APIs' bits [which give most 
of the linecount] separated from the 'all hell breaks lose' 
functional changes ;-) Makes it easier to revert, bisect, etc.

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of  interrupts during suspend-resume (rev. 5)
  2009-03-12 21:43               ` Rafael J. Wysocki
@ 2009-03-13  7:15                   ` Arve Hjønnevåg
  2009-03-13  0:39                 ` Ingo Molnar
                                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-13  7:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Ingo Molnar, pm list, LKML, Linus Torvalds,
	Eric W. Biederman, Benjamin Herrenschmidt, Jeremy Fitzhardinge,
	Len Brown, Jesse Barnes, Frans Pop

On Thu, Mar 12, 2009 at 1:43 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> +void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
> +{
> +       if (suspend) {
> +               if (desc->action && (desc->action->flags & IRQF_TIMER))
> +                       return;

Don't you want "(!desc->action || ..." here to avoid enabling unused
interrupts on resume?

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
@ 2009-03-13  7:15                   ` Arve Hjønnevåg
  0 siblings, 0 replies; 373+ messages in thread
From: Arve Hjønnevåg @ 2009-03-13  7:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Thomas Gleixner, Linus Torvalds,
	Ingo Molnar

On Thu, Mar 12, 2009 at 1:43 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> +void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
> +{
> +       if (suspend) {
> +               if (desc->action && (desc->action->flags & IRQF_TIMER))
> +                       return;

Don't you want "(!desc->action || ..." here to avoid enabling unused
interrupts on resume?

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13  7:15                   ` Arve Hjønnevåg
  (?)
@ 2009-03-13 16:53                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-13 16:53 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Thomas Gleixner, Ingo Molnar, pm list, LKML, Linus Torvalds,
	Eric W. Biederman, Benjamin Herrenschmidt, Jeremy Fitzhardinge,
	Len Brown, Jesse Barnes, Frans Pop

On Friday 13 March 2009, Arve Hjønnevåg wrote:
> On Thu, Mar 12, 2009 at 1:43 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > +void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
> > +{
> > +       if (suspend) {
> > +               if (desc->action && (desc->action->flags & IRQF_TIMER))
> > +                       return;
> 
> Don't you want "(!desc->action || ..." here to avoid enabling unused
> interrupts on resume?

Hmm, good idea, thanks.

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13  7:15                   ` Arve Hjønnevåg
  (?)
  (?)
@ 2009-03-13 16:53                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-13 16:53 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Thomas Gleixner, Linus Torvalds,
	Ingo Molnar

On Friday 13 March 2009, Arve Hjønnevåg wrote:
> On Thu, Mar 12, 2009 at 1:43 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > +void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
> > +{
> > +       if (suspend) {
> > +               if (desc->action && (desc->action->flags & IRQF_TIMER))
> > +                       return;
> 
> Don't you want "(!desc->action || ..." here to avoid enabling unused
> interrupts on resume?

Hmm, good idea, thanks.

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13  0:39                 ` Ingo Molnar
@ 2009-03-13 17:07                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-13 17:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, pm list, LKML, Linus Torvalds,
	Eric W. Biederman, Benjamin Herrenschmidt, Jeremy Fitzhardinge,
	Len Brown, Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Friday 13 March 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > > > 
> > > > > > >   I'm not too enthusiastic about this open coded implementation of
> > > > > > >   disable_irq() with slightly different semantics.
> > > > > > 
> > > > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > > > 
> > > > > I don't say that the difference is not relevant. But the code is
> > > > > almost the same and disable_irq() could have the sync_irq optimization
> > > > > as well.
> > > > 
> > > > Thought more about that. Avoiding the sync_irq() for irqs which have
> > > > no action associated is fine, but you need to catch the following case
> > > > as well:
> > > > 
> > > >    driver code calls disable_irq_nosyc() from the handler (which is
> > > >    still running)
> > > > 
> > > >    suspend code skips the sync due to depth > 0
> > > > 
> > > > The sync operation is not that expensive.
> > > 
> > > OK, what about this (untested, irrelevant parts skipped)?
> > 
> > Well, I guess I need to assume that no reaction means it's fine. ;-)
> > 
> > Below is the complete patch.  Thomas, Ingo, please let me know 
> > it it is fine with you.
> 
> looks good - but you sure want to split it up some more, right?

Well, in fact I didn't think about that.

> > 13 files changed, 195 insertions(+), 47 deletions(-)
> 
> We want the non-intrusive 'add new APIs' bits [which give most 
> of the linecount] separated from the 'all hell breaks lose' 
> functional changes ;-) Makes it easier to revert, bisect, etc.

I can split it into a patch adding the new functions under kernel/irq and
another one making the suspend code use them, but that's going
to put the new functions somewhat out of context, IMO.

Still, I'll do it if you want me to.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
@ 2009-03-13 17:07                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-13 17:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Thomas Gleixner, Linus Torvalds, pm list

On Friday 13 March 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > > > 
> > > > > > >   I'm not too enthusiastic about this open coded implementation of
> > > > > > >   disable_irq() with slightly different semantics.
> > > > > > 
> > > > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > > > done that.  In particular, IMO, the condition should be under the spinlock IMO
> > > > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > > > 
> > > > > I don't say that the difference is not relevant. But the code is
> > > > > almost the same and disable_irq() could have the sync_irq optimization
> > > > > as well.
> > > > 
> > > > Thought more about that. Avoiding the sync_irq() for irqs which have
> > > > no action associated is fine, but you need to catch the following case
> > > > as well:
> > > > 
> > > >    driver code calls disable_irq_nosyc() from the handler (which is
> > > >    still running)
> > > > 
> > > >    suspend code skips the sync due to depth > 0
> > > > 
> > > > The sync operation is not that expensive.
> > > 
> > > OK, what about this (untested, irrelevant parts skipped)?
> > 
> > Well, I guess I need to assume that no reaction means it's fine. ;-)
> > 
> > Below is the complete patch.  Thomas, Ingo, please let me know 
> > it it is fine with you.
> 
> looks good - but you sure want to split it up some more, right?

Well, in fact I didn't think about that.

> > 13 files changed, 195 insertions(+), 47 deletions(-)
> 
> We want the non-intrusive 'add new APIs' bits [which give most 
> of the linecount] separated from the 'all hell breaks lose' 
> functional changes ;-) Makes it easier to revert, bisect, etc.

I can split it into a patch adding the new functions under kernel/irq and
another one making the suspend code use them, but that's going
to put the new functions somewhat out of context, IMO.

Still, I'll do it if you want me to.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-12 21:43               ` Rafael J. Wysocki
                                   ` (3 preceding siblings ...)
  2009-03-13 19:55                 ` Thomas Gleixner
@ 2009-03-13 19:55                 ` Thomas Gleixner
  2009-03-13 21:56                   ` Rafael J. Wysocki
                                     ` (3 more replies)
  4 siblings, 4 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-13 19:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> +/**
> + * suspend_device_irqs - disable all currently enabled interrupt lines
> + *
> + * During system-wide suspend or hibernation device interrupts need to be
> + * disabled at the chip level and this function is provided for this purpose.
> + * It disables all interrupt lines that are enabled at the moment and sets the
> + * IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +		__disable_irq(desc, irq, true);
> +		spin_unlock_irqrestore(&desc->lock, flags);

Can we move the locking into __disable_irq ?

> +	}
> +
> +	for_each_irq_desc(irq, desc)
> +		if (desc->status & IRQ_SUSPENDED)
> +			synchronize_irq(irq);
> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> +
> +/**
> + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> + *
> + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> + * have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		if (!(desc->status & IRQ_SUSPENDED))
> +			continue;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +		__enable_irq(desc, irq, true);
> +		spin_unlock_irqrestore(&desc->lock, flags);

Ditto

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-12 21:43               ` Rafael J. Wysocki
                                   ` (2 preceding siblings ...)
  2009-03-13  7:15                   ` Arve Hjønnevåg
@ 2009-03-13 19:55                 ` Thomas Gleixner
  2009-03-13 19:55                 ` Thomas Gleixner
  4 siblings, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-13 19:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> +/**
> + * suspend_device_irqs - disable all currently enabled interrupt lines
> + *
> + * During system-wide suspend or hibernation device interrupts need to be
> + * disabled at the chip level and this function is provided for this purpose.
> + * It disables all interrupt lines that are enabled at the moment and sets the
> + * IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +		__disable_irq(desc, irq, true);
> +		spin_unlock_irqrestore(&desc->lock, flags);

Can we move the locking into __disable_irq ?

> +	}
> +
> +	for_each_irq_desc(irq, desc)
> +		if (desc->status & IRQ_SUSPENDED)
> +			synchronize_irq(irq);
> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> +
> +/**
> + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> + *
> + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> + * have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		if (!(desc->status & IRQ_SUSPENDED))
> +			continue;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +		__enable_irq(desc, irq, true);
> +		spin_unlock_irqrestore(&desc->lock, flags);

Ditto

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13 19:55                 ` Thomas Gleixner
  2009-03-13 21:56                   ` Rafael J. Wysocki
@ 2009-03-13 21:56                   ` Rafael J. Wysocki
  2009-03-14  7:31                     ` Thomas Gleixner
  2009-03-14  7:31                     ` Thomas Gleixner
  2009-03-14  0:04                   ` Rafael J. Wysocki
  2009-03-14  0:04                   ` Rafael J. Wysocki
  3 siblings, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-13 21:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Friday 13 March 2009, Thomas Gleixner wrote:
> On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__disable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Can we move the locking into __disable_irq ?

Well, yes, but (see below)

> > +	}
> > +
> > +	for_each_irq_desc(irq, desc)
> > +		if (desc->status & IRQ_SUSPENDED)
> > +			synchronize_irq(irq);
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> > + *
> > + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> > + * have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		if (!(desc->status & IRQ_SUSPENDED))
> > +			continue;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__enable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Ditto

No, because of __setup_irq().

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13 19:55                 ` Thomas Gleixner
@ 2009-03-13 21:56                   ` Rafael J. Wysocki
  2009-03-13 21:56                   ` Rafael J. Wysocki
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-13 21:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Friday 13 March 2009, Thomas Gleixner wrote:
> On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__disable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Can we move the locking into __disable_irq ?

Well, yes, but (see below)

> > +	}
> > +
> > +	for_each_irq_desc(irq, desc)
> > +		if (desc->status & IRQ_SUSPENDED)
> > +			synchronize_irq(irq);
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> > + *
> > + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> > + * have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		if (!(desc->status & IRQ_SUSPENDED))
> > +			continue;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__enable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Ditto

No, because of __setup_irq().

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13 19:55                 ` Thomas Gleixner
  2009-03-13 21:56                   ` Rafael J. Wysocki
  2009-03-13 21:56                   ` Rafael J. Wysocki
@ 2009-03-14  0:04                   ` Rafael J. Wysocki
  2009-03-14  0:04                   ` Rafael J. Wysocki
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14  0:04 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Friday 13 March 2009, Thomas Gleixner wrote:
> On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__disable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Can we move the locking into __disable_irq ?
> 
> > +	}
> > +
> > +	for_each_irq_desc(irq, desc)
> > +		if (desc->status & IRQ_SUSPENDED)
> > +			synchronize_irq(irq);
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> > + *
> > + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> > + * have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		if (!(desc->status & IRQ_SUSPENDED))
> > +			continue;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__enable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Ditto

Well, I guess you'd prefer something like the appended patch, but Ingo probably
won't like it since it contains additional #ifdefs in irq/manage.c .  Sigh.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce functions for suspending and resuming device interrupts

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

These functions will be used to rework the handling of interrupts
during suspend (hibernation) and resume.  Namely, interrupts will
only be disabled on the CPU right before suspending sysdevs, while
device drivers will be prevented from receiving interrupts, with the
help of the new helper function, before their "late" suspend
callbacks run (and analogously during resume).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/interrupt.h |    5 +++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    2 +
 kernel/irq/manage.c       |   45 ++++++++++++++++++++++++++++---
 kernel/irq/pm.c           |   66 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 116 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,14 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+static void __disable_irq(struct irq_desc *desc, unsigned int irq)
+{
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +190,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +220,32 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
+#ifdef CONFIG_PM_SLEEP
+void suspend_irq(struct irq_desc *desc, unsigned int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+		__disable_irq(desc, irq);
+		desc->status |= IRQ_SUSPENDED;
+	}
+	spin_unlock_irqrestore(&desc->lock, flags);
+}
+#endif
+
 static void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
+ err_out:
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -258,6 +280,21 @@ void enable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(enable_irq);
 
+#ifdef CONFIG_PM_SLEEP
+void resume_irq(struct irq_desc *desc, unsigned int irq)
+{
+	unsigned long flags;
+
+	if (!(desc->status & IRQ_SUSPENDED))
+		return;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	desc->status &= ~IRQ_SUSPENDED;
+	__enable_irq(desc, irq);
+	spin_unlock_irqrestore(&desc->lock, flags);
+}
+#endif
+
 static int set_irq_wake_real(unsigned int irq, unsigned int on)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void suspend_irq(struct irq_desc *desc, unsigned int irq);
+extern void resume_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,66 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		suspend_irq(desc, irq);
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		resume_irq(desc, irq);
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13 19:55                 ` Thomas Gleixner
                                     ` (2 preceding siblings ...)
  2009-03-14  0:04                   ` Rafael J. Wysocki
@ 2009-03-14  0:04                   ` Rafael J. Wysocki
  3 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14  0:04 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, pm list, Linus Torvalds

On Friday 13 March 2009, Thomas Gleixner wrote:
> On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__disable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Can we move the locking into __disable_irq ?
> 
> > +	}
> > +
> > +	for_each_irq_desc(irq, desc)
> > +		if (desc->status & IRQ_SUSPENDED)
> > +			synchronize_irq(irq);
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> > + *
> > + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> > + * have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > +	struct irq_desc *desc;
> > +	int irq;
> > +
> > +	for_each_irq_desc(irq, desc) {
> > +		unsigned long flags;
> > +
> > +		if (!(desc->status & IRQ_SUSPENDED))
> > +			continue;
> > +
> > +		spin_lock_irqsave(&desc->lock, flags);
> > +		__enable_irq(desc, irq, true);
> > +		spin_unlock_irqrestore(&desc->lock, flags);
> 
> Ditto

Well, I guess you'd prefer something like the appended patch, but Ingo probably
won't like it since it contains additional #ifdefs in irq/manage.c .  Sigh.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce functions for suspending and resuming device interrupts

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively.  These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

These functions will be used to rework the handling of interrupts
during suspend (hibernation) and resume.  Namely, interrupts will
only be disabled on the CPU right before suspending sysdevs, while
device drivers will be prevented from receiving interrupts, with the
help of the new helper function, before their "late" suspend
callbacks run (and analogously during resume).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/interrupt.h |    5 +++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    2 +
 kernel/irq/manage.c       |   45 ++++++++++++++++++++++++++++---
 kernel/irq/pm.c           |   66 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 116 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,14 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+static void __disable_irq(struct irq_desc *desc, unsigned int irq)
+{
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +190,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +220,32 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
+#ifdef CONFIG_PM_SLEEP
+void suspend_irq(struct irq_desc *desc, unsigned int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+		__disable_irq(desc, irq);
+		desc->status |= IRQ_SUSPENDED;
+	}
+	spin_unlock_irqrestore(&desc->lock, flags);
+}
+#endif
+
 static void __enable_irq(struct irq_desc *desc, unsigned int irq)
 {
 	switch (desc->depth) {
 	case 0:
+ err_out:
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -258,6 +280,21 @@ void enable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(enable_irq);
 
+#ifdef CONFIG_PM_SLEEP
+void resume_irq(struct irq_desc *desc, unsigned int irq)
+{
+	unsigned long flags;
+
+	if (!(desc->status & IRQ_SUSPENDED))
+		return;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	desc->status &= ~IRQ_SUSPENDED;
+	__enable_irq(desc, irq);
+	spin_unlock_irqrestore(&desc->lock, flags);
+}
+#endif
+
 static int set_irq_wake_real(unsigned int irq, unsigned int on)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void suspend_irq(struct irq_desc *desc, unsigned int irq);
+extern void resume_irq(struct irq_desc *desc, unsigned int irq);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,66 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		suspend_irq(desc, irq);
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		resume_irq(desc, irq);
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13 21:56                   ` Rafael J. Wysocki
  2009-03-14  7:31                     ` Thomas Gleixner
@ 2009-03-14  7:31                     ` Thomas Gleixner
  2009-03-14 10:01                       ` Rafael J. Wysocki
  2009-03-14 10:01                       ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-14  7:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Fri, 13 Mar 2009, Rafael J. Wysocki wrote:
> > > +		spin_unlock_irqrestore(&desc->lock, flags);
> > 
> > Ditto
> 
> No, because of __setup_irq().

Sorry, forgot about that. Ok. Keep the locking in pm.c then.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-13 21:56                   ` Rafael J. Wysocki
@ 2009-03-14  7:31                     ` Thomas Gleixner
  2009-03-14  7:31                     ` Thomas Gleixner
  1 sibling, 0 replies; 373+ messages in thread
From: Thomas Gleixner @ 2009-03-14  7:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Fri, 13 Mar 2009, Rafael J. Wysocki wrote:
> > > +		spin_unlock_irqrestore(&desc->lock, flags);
> > 
> > Ditto
> 
> No, because of __setup_irq().

Sorry, forgot about that. Ok. Keep the locking in pm.c then.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-08 20:50     ` Rafael J. Wysocki
@ 2009-03-14  8:44       ` Frans Pop
  2009-03-14 11:59         ` Rafael J. Wysocki
  2009-03-14 11:59         ` Rafael J. Wysocki
  2009-03-14  8:44       ` Frans Pop
  1 sibling, 2 replies; 373+ messages in thread
From: Frans Pop @ 2009-03-14  8:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, torvalds, linux-pm

On Sunday 08 March 2009, Rafael J. Wysocki wrote:
> > # These don't need restoring anymore?
>
> I think they generally do, but the restored values may (and often are)
> identical to the current ones.
>
> >    -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
> >    -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
> >    -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
> >    -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
> >    -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> >    -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
> >    -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
> >    -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
> >    -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
> >    -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
> >    -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
[...]
> > # These have moved down to late resume.
>
> That's a bit strange.  It looks like the registers changed after we had
> restored them during "early" resume.  So either we hadn't actually
> restored them (it would be interesting to find out why), or they really
> changed (again, it would be interesting to see why).
>
> >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)

These changes look to have been reverted somehow with rc8 + your latest
patch set. Not sure if it's due to changes in the patches, or just an
effect of local circumstances (such as (un)suspending while the system
is docked). Or sun spots of course.

The "restoring config space" messages now look virtually the same
as for rc5, only some messages for the ricoh-mmc module are still
"missing", but I'm not worried about that.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-08 20:50     ` Rafael J. Wysocki
  2009-03-14  8:44       ` Frans Pop
@ 2009-03-14  8:44       ` Frans Pop
  1 sibling, 0 replies; 373+ messages in thread
From: Frans Pop @ 2009-03-14  8:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, torvalds, linux-kernel

On Sunday 08 March 2009, Rafael J. Wysocki wrote:
> > # These don't need restoring anymore?
>
> I think they generally do, but the restored values may (and often are)
> identical to the current ones.
>
> >    -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
> >    -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
> >    -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
> >    -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
> >    -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> >    -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
> >    -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
> >    -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
> >    -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
> >    -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
> >    -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
[...]
> > # These have moved down to late resume.
>
> That's a bit strange.  It looks like the registers changed after we had
> restored them during "early" resume.  So either we hadn't actually
> restored them (it would be interesting to find out why), or they really
> changed (again, it would be interesting to see why).
>
> >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)

These changes look to have been reverted somehow with rc8 + your latest
patch set. Not sure if it's due to changes in the patches, or just an
effect of local circumstances (such as (un)suspending while the system
is docked). Or sun spots of course.

The "restoring config space" messages now look virtually the same
as for rc5, only some messages for the ricoh-mmc module are still
"missing", but I'm not worried about that.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-14  7:31                     ` Thomas Gleixner
  2009-03-14 10:01                       ` Rafael J. Wysocki
@ 2009-03-14 10:01                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 10:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Frans Pop, Arve Hjønnevåg

On Saturday 14 March 2009, Thomas Gleixner wrote:
> On Fri, 13 Mar 2009, Rafael J. Wysocki wrote:
> > > > +		spin_unlock_irqrestore(&desc->lock, flags);
> > > 
> > > Ditto
> > 
> > No, because of __setup_irq().
> 
> Sorry, forgot about that. Ok. Keep the locking in pm.c then.

Will do, thanks.

OK, it seems we're approaching the final version. :-)

I'm going to split the $subject patch as requested by Ingo (API changes and
functionality changes) and post the full series once again for completness.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)
  2009-03-14  7:31                     ` Thomas Gleixner
@ 2009-03-14 10:01                       ` Rafael J. Wysocki
  2009-03-14 10:01                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 10:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Ingo Molnar, Linus Torvalds, pm list

On Saturday 14 March 2009, Thomas Gleixner wrote:
> On Fri, 13 Mar 2009, Rafael J. Wysocki wrote:
> > > > +		spin_unlock_irqrestore(&desc->lock, flags);
> > > 
> > > Ditto
> > 
> > No, because of __setup_irq().
> 
> Sorry, forgot about that. Ok. Keep the locking in pm.c then.

Will do, thanks.

OK, it seems we're approaching the final version. :-)

I'm going to split the $subject patch as requested by Ingo (API changes and
functionality changes) and post the full series once again for completness.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x)
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (13 preceding siblings ...)
  (?)
@ 2009-03-14 11:24 ` Rafael J. Wysocki
  2009-03-14 11:26   ` [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts Rafael J. Wysocki
                     ` (23 more replies)
  -1 siblings, 24 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:24 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

Hi,

This is an update of the patch series reworking the handling of interrupts
during suspend-resume, addressing some comments from Thomas and Ingo.

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/11 introduces helper functions used by the subsequent patches.

2/11 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

3/11 - 5/11 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

6/11 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

7/11 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

8/11 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

9/11 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.
[The second rev of the patch retains the current behavior during the
"power-off" phase of hibernation, which is that the devices without drivers or
without PM support in the drivers are not power managed by the core.]

10/11 fixes pci_set_power_state() so that it doesn't return error code when
attempting to put a PCI device without PM support (either native or through the
platform) into D0 (such devices are always in D0).

11/11 makes the PCI PM core save and restore the configuration spaces of
devices that have no drivers or no PM support in the drivers during suspend and
resume, respectively.

There is a git tree containing these patches, for easier testing, at:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git

(linux-next branch).  At the moment it has a merge conflict with the PCI
linux-next tree due to 6/11.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x)
  2009-02-22 17:37 ` Rafael J. Wysocki
                   ` (14 preceding siblings ...)
  (?)
@ 2009-03-14 11:24 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:24 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

Hi,

This is an update of the patch series reworking the handling of interrupts
during suspend-resume, addressing some comments from Thomas and Ingo.

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume.  They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep.  Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/11 introduces helper functions used by the subsequent patches.

2/11 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

3/11 - 5/11 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

6/11 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

7/11 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

8/11 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

9/11 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.
[The second rev of the patch retains the current behavior during the
"power-off" phase of hibernation, which is that the devices without drivers or
without PM support in the drivers are not power managed by the core.]

10/11 fixes pci_set_power_state() so that it doesn't return error code when
attempting to put a PCI device without PM support (either native or through the
platform) into D0 (such devices are always in D0).

11/11 makes the PCI PM core save and restore the configuration spaces of
devices that have no drivers or no PM support in the drivers during suspend and
resume, respectively.

There is a git tree containing these patches, for easier testing, at:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git

(linux-next branch).  At the moment it has a merge conflict with the PCI
linux-next tree due to 6/11.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
@ 2009-03-14 11:26   ` Rafael J. Wysocki
  2009-03-14 11:26   ` Rafael J. Wysocki
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:26 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce helper functions allowing us to prevent device drivers from
getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume.  These functions make it
possible to keep timer interrupts enabled while the "late" suspend and
"early" resume callbacks provided by device drivers are being
executed.  In turn, this allows device drivers' "late" suspend and
"early" resume callbacks to sleep, execute ACPI callbacks etc.

The functions introduced here will be used to rework the handling of
interrupts during suspend (hibernation) and resume.  Namely,
interrupts will only be disabled on the CPU right before suspending
sysdevs, while device drivers will be prevented from receiving
interrupts, with the help of the new helper function, before their
"late" suspend callbacks run (and analogously during resume).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    2 +
 kernel/irq/manage.c       |   31 +++++++++++++-----
 kernel/irq/pm.c           |   79 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 112 insertions(+), 7 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__disable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__enable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+	if (suspend) {
+		if (!desc->action || (desc->action->flags & IRQF_TIMER))
+			return;
+		desc->status |= IRQ_SUSPENDED;
+	}
+
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,21 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
 {
+	if (resume)
+		desc->status &= ~IRQ_SUSPENDED;
+
 	switch (desc->depth) {
 	case 0:
+ err_out:
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -253,7 +270,7 @@ void enable_irq(unsigned int irq)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	__enable_irq(desc, irq);
+	__enable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(enable_irq);
@@ -511,7 +528,7 @@ __setup_irq(unsigned int irq, struct irq
 	 */
 	if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
 		desc->status &= ~IRQ_SPURIOUS_DISABLED;
-		__enable_irq(desc, irq);
+		__enable_irq(desc, irq, false);
 	}
 
 	spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
  2009-03-14 11:26   ` [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts Rafael J. Wysocki
@ 2009-03-14 11:26   ` Rafael J. Wysocki
  2009-03-14 11:27   ` [PATCH 2/11] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
                     ` (21 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:26 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Introduce helper functions allowing us to prevent device drivers from
getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume.  These functions make it
possible to keep timer interrupts enabled while the "late" suspend and
"early" resume callbacks provided by device drivers are being
executed.  In turn, this allows device drivers' "late" suspend and
"early" resume callbacks to sleep, execute ACPI callbacks etc.

The functions introduced here will be used to rework the handling of
interrupts during suspend (hibernation) and resume.  Namely,
interrupts will only be disabled on the CPU right before suspending
sysdevs, while device drivers will be prevented from receiving
interrupts, with the help of the new helper function, before their
"late" suspend callbacks run (and analogously during resume).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/interrupt.h |    5 ++
 include/linux/irq.h       |    1 
 kernel/irq/Makefile       |    1 
 kernel/irq/internals.h    |    2 +
 kernel/irq/manage.c       |   31 +++++++++++++-----
 kernel/irq/pm.c           |   79 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 112 insertions(+), 7 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned 
 extern void disable_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
 
+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
 #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
 #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__disable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+
+	for_each_irq_desc(irq, desc)
+		if (desc->status & IRQ_SUSPENDED)
+			synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc) {
+		unsigned long flags;
+
+		if (!(desc->status & IRQ_SUSPENDED))
+			continue;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		__enable_irq(desc, irq, true);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	}
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+	struct irq_desc *desc;
+	int irq;
+
+	for_each_irq_desc(irq, desc)
+		if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+			return -EBUSY;
+
+	return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
 obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
 }
 #endif
 
+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+	if (suspend) {
+		if (!desc->action || (desc->action->flags & IRQF_TIMER))
+			return;
+		desc->status |= IRQ_SUSPENDED;
+	}
+
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->chip->disable(irq);
+	}
+}
+
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->chip->disable(irq);
-	}
+	__disable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,21 @@ void disable_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(disable_irq);
 
-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
 {
+	if (resume)
+		desc->status &= ~IRQ_SUSPENDED;
+
 	switch (desc->depth) {
 	case 0:
+ err_out:
 		WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
 		break;
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
 
+		if (desc->status & IRQ_SUSPENDED)
+			goto err_out;
 		/* Prevent probing on this irq: */
 		desc->status = status | IRQ_NOPROBE;
 		check_irq_resend(desc, irq);
@@ -253,7 +270,7 @@ void enable_irq(unsigned int irq)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	__enable_irq(desc, irq);
+	__enable_irq(desc, irq, false);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 EXPORT_SYMBOL(enable_irq);
@@ -511,7 +528,7 @@ __setup_irq(unsigned int irq, struct irq
 	 */
 	if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
 		desc->status &= ~IRQ_SPURIOUS_DISABLED;
-		__enable_irq(desc, irq);
+		__enable_irq(desc, irq, false);
 	}
 
 	spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_
 
 extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 		unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);
 
 extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 2/11] PM: Rework handling of interrupts during suspend-resume
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-03-14 11:27   ` [PATCH 2/11] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
@ 2009-03-14 11:27   ` Rafael J. Wysocki
  2009-03-14 11:28   ` [PATCH 3/11] PM: Change suspend code ordering Rafael J. Wysocki
                     ` (19 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:27 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Use the functions introduced in by the previous patch,
suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(),
to rework the handling of interrupts during suspend (hibernation) and
resume.  Namely, interrupts will only be disabled on the CPU right
before suspending sysdevs, while device drivers will be prevented
from receiving interrupts, with the help of the new helper function,
before their "late" suspend callbacks run (and analogously during
resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++++++++----
 drivers/base/power/main.c |   20 +++++++++++---------
 drivers/base/sys.c        |    8 ++++++++
 drivers/xen/manage.c      |   16 +++++++++-------
 kernel/kexec.c            |    8 ++++----
 kernel/power/disk.c       |   39 +++++++++++++++++++++++++++++----------
 kernel/power/main.c       |   17 +++++++++++------
 7 files changed, 83 insertions(+), 40 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 2/11] PM: Rework handling of interrupts during suspend-resume
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
  2009-03-14 11:26   ` [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts Rafael J. Wysocki
  2009-03-14 11:26   ` Rafael J. Wysocki
@ 2009-03-14 11:27   ` Rafael J. Wysocki
  2009-03-14 11:27   ` Rafael J. Wysocki
                     ` (20 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:27 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Use the functions introduced in by the previous patch,
suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(),
to rework the handling of interrupts during suspend (hibernation) and
resume.  Namely, interrupts will only be disabled on the CPU right
before suspending sysdevs, while device drivers will be prevented
from receiving interrupts, with the help of the new helper function,
before their "late" suspend callbacks run (and analogously during
resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/apm_32.c  |   15 +++++++++++----
 drivers/base/power/main.c |   20 +++++++++++---------
 drivers/base/sys.c        |    8 ++++++++
 drivers/xen/manage.c      |   16 +++++++++-------
 kernel/kexec.c            |    8 ++++----
 kernel/power/disk.c       |   39 +++++++++++++++++++++++++++++----------
 kernel/power/main.c       |   17 +++++++++++------
 7 files changed, 83 insertions(+), 40 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
  */
 static int suspend_enter(suspend_state_t state)
 {
-	int error = 0;
+	int error;
 
 	device_pm_lock();
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
 
-	if ((error = device_power_down(PMSG_SUSPEND))) {
+	error = device_power_down(PMSG_SUSPEND);
+	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Done;
 	}
 
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
 	error = sysdev_suspend(PMSG_SUSPEND);
 	if (!error) {
 		if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
 		sysdev_resume();
 	}
 
-	device_power_up(PMSG_RESUME);
- Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
+
+	device_power_up(PMSG_RESUME);
+
+ Done:
 	device_pm_unlock();
+
 	return error;
 }
 
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
 		return error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
+
  Power_up_devices:
+	local_irq_enable();
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
-	local_irq_enable();
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
 	int error;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Enable_irqs;
+		goto Unlock;
 	}
+
+	local_irq_disable();
+
 	sysdev_suspend(PMSG_QUIESCE);
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
+
 	sysdev_resume();
-	device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
 	local_irq_enable();
+
+	device_power_up(PMSG_RECOVER);
+
+ Unlock:
 	device_pm_unlock();
+
 	return error;
 }
 
@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
 		goto Finish;
 
 	device_pm_lock();
-	local_irq_disable();
+
 	error = device_power_down(PMSG_HIBERNATE);
 	if (!error) {
+		local_irq_disable();
 		sysdev_suspend(PMSG_HIBERNATE);
 		hibernation_ops->enter();
 		/* We should never get here */
 		while (1);
 	}
-	local_irq_enable();
+
 	device_pm_unlock();
 
 	/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
 	 */
  Finish:
 	hibernation_ops->finish();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
 	resume_console();
+
  Close:
 	hibernation_ops->end();
+
 	return error;
 }
 
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
 	struct apm_user	*as;
 
 	device_suspend(PMSG_SUSPEND);
-	local_irq_disable();
+
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 
 	local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
+
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
+
 	device_resume(PMSG_RESUME);
 	queue_event(APM_NORMAL_RESUME, NULL);
 	spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
 {
 	int err;
 
-	local_irq_disable();
 	device_power_down(PMSG_SUSPEND);
+
+	local_irq_disable();
 	sysdev_suspend(PMSG_SUSPEND);
 	local_irq_enable();
 
@@ -1239,8 +1245,9 @@ static void standby(void)
 
 	local_irq_disable();
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 	local_irq_enable();
+
+	device_power_up(PMSG_RESUME);
 }
 
 static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)
 
 	BUG_ON(!irqs_disabled());
 
-	err = device_power_down(PMSG_SUSPEND);
-	if (err) {
-		printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
-		       err);
-		return err;
-	}
 	err = sysdev_suspend(PMSG_SUSPEND);
 	if (err) {
 		printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
 	xen_mm_unpin_all();
 
 	sysdev_resume();
-	device_power_up(PMSG_RESUME);
 
 	if (!*cancelled) {
 		xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
+	err = device_power_down(PMSG_SUSPEND);
+	if (err) {
+		printk(KERN_ERR "device_power_down failed: %d\n", err);
+		goto resume_devices;
+	}
+
 	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
 	} else
 		xenbus_suspend_cancel();
 
+	device_power_up(PMSG_RESUME);
+
+resume_devices:
 	device_resume(PMSG_RESUME);
 
 	/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
 		if (error)
 			goto Resume_devices;
 		device_pm_lock();
-		local_irq_disable();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Enable_irqs;
+			goto Unlock_pm;
 
+		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
  Power_up_devices:
-		device_power_up(PMSG_RESTORE);
- Enable_irqs:
 		local_irq_enable();
+		device_power_up(PMSG_RESTORE);
+ Unlock_pm:
 		device_pm_unlock();
 		enable_nonboot_cpus();
  Resume_devices:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
 #include <linux/pm.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
+#include <linux/interrupt.h>
 
 #include "../base.h"
 #include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
  *	Execute the appropriate "noirq resume" callback for all devices marked
  *	as DPM_OFF_IRQ.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.  Device drivers should not receive
+ *	interrupts while it's being executed.
  */
 static void dpm_power_up(pm_message_t state)
 {
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
  *	device_power_up - Turn on all devices that need special attention.
  *	@state: PM transition of the system being carried out.
  *
- *	Power on system devices, then devices that required we shut them down
- *	with interrupts disabled.
- *
- *	Must be called with interrupts disabled.
+ *	Call the "early" resume handlers and enable device drivers to receive
+ *	interrupts.
  */
 void device_power_up(pm_message_t state)
 {
 	dpm_power_up(state);
+	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
  *	device_power_down - Shut down special devices.
  *	@state: PM transition of the system being carried out.
  *
- *	Power down devices that require interrupts to be disabled.
- *	Then power down system devices.
+ *	Prevent device drivers from receiving interrupts and call the "late"
+ *	suspend handlers.
  *
- *	Must be called with interrupts disabled and only one CPU running.
+ *	Must be called under dpm_list_mtx.
  */
 int device_power_down(pm_message_t state)
 {
 	struct device *dev;
 	int error = 0;
 
+	suspend_device_irqs();
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
 		dev->power.status = DPM_OFF_IRQ;
 	}
 	if (error)
-		dpm_power_up(resume_event(state));
+		device_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
 #include <linux/pm.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <linux/interrupt.h>
 
 #include "base.h"
 
@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
 	struct sysdev_driver *drv, *err_drv;
 	int ret;
 
+	pr_debug("Checking wake-up interrupts\n");
+
+	/* Return error code if there are any wake-up interrupts pending */
+	ret = check_wakeup_irqs();
+	if (ret)
+		return ret;
+
 	pr_debug("Suspending System Devices\n");
 
 	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 3/11] PM: Change suspend code ordering
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2009-03-14 11:27   ` Rafael J. Wysocki
@ 2009-03-14 11:28   ` Rafael J. Wysocki
  2009-03-14 11:28   ` Rafael J. Wysocki
                     ` (18 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:28 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 3/11] PM: Change suspend code ordering
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (4 preceding siblings ...)
  2009-03-14 11:28   ` [PATCH 3/11] PM: Change suspend code ordering Rafael J. Wysocki
@ 2009-03-14 11:28   ` Rafael J. Wysocki
  2009-03-14 11:28   ` [PATCH 4/11] PM: Change hibernation " Rafael J. Wysocki
                     ` (17 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:28 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/main.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
 		goto Done;
 	}
 
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platfrom_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
 
@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+ Power_up_devices:
 	device_power_up(PMSG_RESUME);
 
  Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			goto Resume_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Finish;
+	suspend_enter(state);
 
-	error = disable_nonboot_cpus();
-	if (!error && !suspend_test(TEST_CPUS))
-		suspend_enter(state);
-
-	enable_nonboot_cpus();
- Finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
  Resume_devices:
 	suspend_test_start();
 	device_resume(PMSG_RESUME);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 4/11] PM: Change hibernation code ordering
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (5 preceding siblings ...)
  2009-03-14 11:28   ` Rafael J. Wysocki
@ 2009-03-14 11:28   ` Rafael J. Wysocki
  2009-03-14 11:28   ` Rafael J. Wysocki
                     ` (16 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:28 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 4/11] PM: Change hibernation code ordering
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (6 preceding siblings ...)
  2009-03-14 11:28   ` [PATCH 4/11] PM: Change hibernation " Rafael J. Wysocki
@ 2009-03-14 11:28   ` Rafael J. Wysocki
  2009-03-14 11:29   ` [PATCH 5/11] kexec: Change kexec jump " Rafael J. Wysocki
                     ` (15 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:28 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/disk.c |  109 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
 		goto Unlock;
 	}
 
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
 	local_irq_disable();
 
 	sysdev_suspend(PMSG_FREEZE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Power_up_devices;
+		goto Enable_irqs;
 	}
 
 	if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
 	restore_processor_state();
 	if (!in_suspend)
 		platform_leave(platform_mode);
+
  Power_up:
 	sysdev_resume();
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 
- Power_up_devices:
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
 	if (hibernation_test(TEST_DEVICES))
 		goto Recover_platform;
 
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Finish;
-
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_test(TEST_CPUS))
-			goto Enable_cpus;
-
-		if (hibernation_testmode(HIBERNATION_TEST))
-			goto Enable_cpus;
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
 
-		error = create_image(platform_mode);
-		/* Control returns here after successful restore */
-	}
- Enable_cpus:
-	enable_nonboot_cpus();
- Finish:
-	platform_finish(platform_mode);
  Resume_devices:
 	device_resume(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
  *	kernel.
  */
 
-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
 {
 	int error;
 
@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
 		goto Unlock;
 	}
 
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
 	local_irq_disable();
 
-	sysdev_suspend(PMSG_QUIESCE);
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
 	/* We'll ignore saved state, but this gets preempt count (etc) right */
 	save_processor_state();
 	error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)
 
 	sysdev_resume();
 
+ Enable_irqs:
 	local_irq_enable();
 
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
 	device_power_up(PMSG_RECOVER);
 
  Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
 	pm_prepare_console();
 	suspend_console();
 	error = device_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Finish;
-
-	error = platform_pre_restore(platform_mode);
 	if (!error) {
-		error = disable_nonboot_cpus();
-		if (!error)
-			error = resume_target_kernel();
-		enable_nonboot_cpus();
+		error = resume_target_kernel(platform_mode);
+		device_resume(PMSG_RECOVER);
 	}
-	platform_restore_cleanup(platform_mode);
-	device_resume(PMSG_RECOVER);
- Finish:
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
+	device_pm_lock();
+
+	error = device_power_down(PMSG_HIBERNATE);
+	if (error)
+		goto Unlock;
+
 	error = hibernation_ops->prepare();
 	if (error)
-		goto Resume_devices;
+		goto Platofrm_finish;
 
 	error = disable_nonboot_cpus();
 	if (error)
-		goto Finish;
-
-	device_pm_lock();
-
-	error = device_power_down(PMSG_HIBERNATE);
-	if (!error) {
-		local_irq_disable();
-		sysdev_suspend(PMSG_HIBERNATE);
-		hibernation_ops->enter();
-		/* We should never get here */
-		while (1);
-	}
+		goto Platofrm_finish;
 
-	device_pm_unlock();
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
 
 	/*
 	 * We don't need to reenable the nonboot CPUs or resume consoles, since
 	 * the system is going to be halted anyway.
 	 */
- Finish:
+ Platofrm_finish:
 	hibernation_ops->finish();
 
+	device_power_up(PMSG_RESTORE);
+
+ Unlock:
+	device_pm_unlock();
+
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 5/11] kexec: Change kexec jump code ordering
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (7 preceding siblings ...)
  2009-03-14 11:28   ` Rafael J. Wysocki
@ 2009-03-14 11:29   ` Rafael J. Wysocki
  2009-03-14 11:29   ` Rafael J. Wysocki
                     ` (14 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:29 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 5/11] kexec: Change kexec jump code ordering
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (8 preceding siblings ...)
  2009-03-14 11:29   ` [PATCH 5/11] kexec: Change kexec jump " Rafael J. Wysocki
@ 2009-03-14 11:29   ` Rafael J. Wysocki
  2009-03-14 11:30   ` [PATCH 6/11] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
                     ` (13 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:29 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/kexec.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		error = disable_nonboot_cpus();
-		if (error)
-			goto Resume_devices;
 		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
 		 */
 		error = device_power_down(PMSG_FREEZE);
 		if (error)
-			goto Unlock_pm;
-
+			goto Resume_devices;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Enable_cpus;
 		local_irq_disable();
 		/* Suspend system devices */
 		error = sysdev_suspend(PMSG_FREEZE);
 		if (error)
-			goto Power_up_devices;
+			goto Enable_irqs;
 	} else
 #endif
 	{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
 		local_irq_enable();
-		device_power_up(PMSG_RESTORE);
- Unlock_pm:
-		device_pm_unlock();
+ Enable_cpus:
 		enable_nonboot_cpus();
+		device_power_up(PMSG_RESTORE);
  Resume_devices:
+		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 6/11] PCI PM: Consistently use variable name "error" for pm call return values
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (10 preceding siblings ...)
  2009-03-14 11:30   ` [PATCH 6/11] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
@ 2009-03-14 11:30   ` Rafael J. Wysocki
  2009-03-14 11:31   ` [PATCH 7/11] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
                     ` (11 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:30 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI, Frans Pop

From: Frans Pop <elendil@planet.nl>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 
 		pci_dev->state_saved = false;
 
-		i = drv->suspend(pci_dev, state);
-		suspend_report_result(drv->suspend, i);
-		if (i)
-			return i;
+		error = drv->suspend(pci_dev, state);
+		suspend_report_result(drv->suspend, error);
+		if (error)
+			return error;
 
 		if (pci_dev->state_saved)
 			goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return i;
+	return error;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend_late) {
-		i = drv->suspend_late(pci_dev, state);
-		suspend_report_result(drv->suspend_late, i);
+		error = drv->suspend_late(pci_dev, state);
+		suspend_report_result(drv->suspend_late, error);
 	}
-	return i;
+	return error;
 }
 
 static int pci_legacy_resume_early(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 6/11] PCI PM: Consistently use variable name "error" for pm call return values
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (9 preceding siblings ...)
  2009-03-14 11:29   ` Rafael J. Wysocki
@ 2009-03-14 11:30   ` Rafael J. Wysocki
  2009-03-14 11:30   ` Rafael J. Wysocki
                     ` (12 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:30 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, Frans Pop, LKML, Jesse Barnes,
	Eric W. Biederman, Linux PCI, Ingo Molnar, Linus Torvalds,
	Thomas Gleixner

From: Frans Pop <elendil@planet.nl>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 
 		pci_dev->state_saved = false;
 
-		i = drv->suspend(pci_dev, state);
-		suspend_report_result(drv->suspend, i);
-		if (i)
-			return i;
+		error = drv->suspend(pci_dev, state);
+		suspend_report_result(drv->suspend, error);
+		if (error)
+			return error;
 
 		if (pci_dev->state_saved)
 			goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return i;
+	return error;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int i = 0;
+	int error = 0;
 
 	if (drv && drv->suspend_late) {
-		i = drv->suspend_late(pci_dev, state);
-		suspend_report_result(drv->suspend_late, i);
+		error = drv->suspend_late(pci_dev, state);
+		suspend_report_result(drv->suspend_late, error);
 	}
-	return i;
+	return error;
 }
 
 static int pci_legacy_resume_early(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 7/11] PCI PM: Use pci_set_power_state during early resume
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (12 preceding siblings ...)
  2009-03-14 11:31   ` [PATCH 7/11] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
@ 2009-03-14 11:31   ` Rafael J. Wysocki
  2009-03-14 11:32   ` [PATCH 8/11] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
                     ` (9 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:31 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time.  Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up.  Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   48 +++++++++---------------------------------------
 1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
  *                           given PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
 	bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
 		break;
 	case PCI_UNKNOWN: /* Boot-up */
 		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
 			need_restore = true;
-			wait = true;
-		}
 		/* Fall-through: force to D0 */
 	default:
 		pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
 	/* enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
-	if (!wait)
-		return 0;
-
 	/* Mandatory power management transition delays */
 	/* see PCI PM 1.1 5.6.1 table 18 */
 	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
 	if (need_restore)
 		pci_restore_bars(dev);
 
-	if (wait && dev->bus->self)
+	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
 	if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	error = pci_raw_set_power_state(dev, state, true);
+	error = pci_raw_set_power_state(dev, state);
 
 	if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
 		/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
  */
 int pci_restore_standard_config(struct pci_dev *dev)
 {
-	pci_power_t prev_state;
-	int error;
-
-	pci_update_current_state(dev, PCI_D0);
-
-	prev_state = dev->current_state;
-	if (prev_state == PCI_D0)
-		goto Restore;
-
-	error = pci_raw_set_power_state(dev, PCI_D0, false);
-	if (error)
-		return error;
+	pci_update_current_state(dev, PCI_UNKNOWN);
 
-	/*
-	 * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
-	 * we've made this assumption forever and it appears to be universally
-	 * satisfied.
-	 */
-	switch(prev_state) {
-	case PCI_D3cold:
-	case PCI_D3hot:
-		mdelay(pci_pm_d3_delay);
-		break;
-	case PCI_D2:
-		udelay(PCI_PM_D2_DELAY);
-		break;
+	if (dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(dev, PCI_D0);
+		if (error)
+			return error;
 	}
 
-	pci_update_current_state(dev, PCI_D0);
-
- Restore:
 	return dev->state_saved ? pci_restore_state(dev) : 0;
 }
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 7/11] PCI PM: Use pci_set_power_state during early resume
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (11 preceding siblings ...)
  2009-03-14 11:30   ` Rafael J. Wysocki
@ 2009-03-14 11:31   ` Rafael J. Wysocki
  2009-03-14 11:31   ` Rafael J. Wysocki
                     ` (10 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:31 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time.  Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up.  Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   48 +++++++++---------------------------------------
 1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
  *                           given PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
 	bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
 		break;
 	case PCI_UNKNOWN: /* Boot-up */
 		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
 			need_restore = true;
-			wait = true;
-		}
 		/* Fall-through: force to D0 */
 	default:
 		pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
 	/* enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
-	if (!wait)
-		return 0;
-
 	/* Mandatory power management transition delays */
 	/* see PCI PM 1.1 5.6.1 table 18 */
 	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
 	if (need_restore)
 		pci_restore_bars(dev);
 
-	if (wait && dev->bus->self)
+	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
 	return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
 	if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
 		return 0;
 
-	error = pci_raw_set_power_state(dev, state, true);
+	error = pci_raw_set_power_state(dev, state);
 
 	if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
 		/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
  */
 int pci_restore_standard_config(struct pci_dev *dev)
 {
-	pci_power_t prev_state;
-	int error;
-
-	pci_update_current_state(dev, PCI_D0);
-
-	prev_state = dev->current_state;
-	if (prev_state == PCI_D0)
-		goto Restore;
-
-	error = pci_raw_set_power_state(dev, PCI_D0, false);
-	if (error)
-		return error;
+	pci_update_current_state(dev, PCI_UNKNOWN);
 
-	/*
-	 * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
-	 * we've made this assumption forever and it appears to be universally
-	 * satisfied.
-	 */
-	switch(prev_state) {
-	case PCI_D3cold:
-	case PCI_D3hot:
-		mdelay(pci_pm_d3_delay);
-		break;
-	case PCI_D2:
-		udelay(PCI_PM_D2_DELAY);
-		break;
+	if (dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(dev, PCI_D0);
+		if (error)
+			return error;
 	}
 
-	pci_update_current_state(dev, PCI_D0);
-
- Restore:
 	return dev->state_saved ? pci_restore_state(dev) : 0;
 }
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 8/11] PCI PM: Move pci_restore_standard_config to pci-driver.c
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (13 preceding siblings ...)
  2009-03-14 11:31   ` Rafael J. Wysocki
@ 2009-03-14 11:32   ` Rafael J. Wysocki
  2009-03-14 11:32   ` Rafael J. Wysocki
                     ` (8 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:32 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   17 +++++++++++++++++
 drivers/pci/pci.c        |   21 ---------------------
 drivers/pci/pci.h        |    1 -
 3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi
 
 /* Auxiliary functions used by the new power management framework */
 
+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+	pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+	if (pci_dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(pci_dev, PCI_D0);
+		if (error)
+			return error;
+	}
+
+	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
 	pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
 }
 
 /**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
-	pci_update_current_state(dev, PCI_UNKNOWN);
-
-	if (dev->current_state != PCI_D0) {
-		int error = pci_set_power_state(dev, PCI_D0);
-		if (error)
-			return error;
-	}
-
-	return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
  * pci_enable_ari - enable ARI forwarding if hardware support it
  * @dev: the PCI device
  */
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
 extern void pci_pm_init(struct pci_dev *dev);
 extern void platform_pci_wakeup_init(struct pci_dev *dev);
 extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);
 
 static inline bool pci_is_bridge(struct pci_dev *pci_dev)
 {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 8/11] PCI PM: Move pci_restore_standard_config to pci-driver.c
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (14 preceding siblings ...)
  2009-03-14 11:32   ` [PATCH 8/11] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
@ 2009-03-14 11:32   ` Rafael J. Wysocki
  2009-03-14 11:32   ` [PATCH 9/11] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
                     ` (7 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:32 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   17 +++++++++++++++++
 drivers/pci/pci.c        |   21 ---------------------
 drivers/pci/pci.h        |    1 -
 3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi
 
 /* Auxiliary functions used by the new power management framework */
 
+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+	pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+	if (pci_dev->current_state != PCI_D0) {
+		int error = pci_set_power_state(pci_dev, PCI_D0);
+		if (error)
+			return error;
+	}
+
+	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
 	pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
 }
 
 /**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
-	pci_update_current_state(dev, PCI_UNKNOWN);
-
-	if (dev->current_state != PCI_D0) {
-		int error = pci_set_power_state(dev, PCI_D0);
-		if (error)
-			return error;
-	}
-
-	return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
  * pci_enable_ari - enable ARI forwarding if hardware support it
  * @dev: the PCI device
  */
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
 extern void pci_pm_init(struct pci_dev *dev);
 extern void platform_pci_wakeup_init(struct pci_dev *dev);
 extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);
 
 static inline bool pci_is_bridge(struct pci_dev *pci_dev)
 {

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 9/11] PCI PM: Put devices into low power states during late suspend (rev. 2)
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (16 preceding siblings ...)
  2009-03-14 11:32   ` [PATCH 9/11] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
@ 2009-03-14 11:32   ` Rafael J. Wysocki
  2009-03-14 11:33   ` [PATCH 10/11] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
                     ` (5 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:32 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time.  We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver.  At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |  134 ++++++++++++++++++++++++++++-------------------
 1 file changed, 81 insertions(+), 53 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
+
+	pci_dev->state_saved = false;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
-
-		pci_dev->state_saved = false;
+		int error;
 
 		error = drv->suspend(pci_dev, state);
 		suspend_report_result(drv->suspend, error);
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: Device state not saved by %pF\n",
 				drv->suspend);
-			goto Fixup;
 		}
 	}
 
-	pci_save_state(pci_dev);
-	/*
-	 * This is for compatibility with existing code with legacy PM support.
-	 */
-	pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
 
 	if (drv && drv->suspend_late) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
 		error = drv->suspend_late(pci_dev, state);
 		suspend_report_result(drv->suspend_late, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: Device state not saved by %pF\n",
+				drv->suspend_late);
+			return 0;
+		}
 	}
-	return error;
+
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
+
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
 	/* Disable non-bridge devices without PM support */
 	if (!pci_is_bridge(pci_dev))
 		pci_disable_enabled_device(pci_dev);
-	pci_save_state(pci_dev);
 }
 
 static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device 
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: State of device not saved by %pF\n",
 				pm->suspend);
-			goto Fixup;
 		}
 	}
 
-	if (!pci_dev->state_saved) {
-		pci_save_state(pci_dev);
-		if (!pci_is_bridge(pci_dev))
-			pci_prepare_to_sleep(pci_dev);
-	}
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device 
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (drv && drv->pm && drv->pm->suspend_noirq) {
-		error = drv->pm->suspend_noirq(dev);
-		suspend_report_result(drv->pm->suspend_noirq, error);
+	if (!pm)
+		return 0;
+
+	if (pm->suspend_noirq) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
+		error = pm->suspend_noirq(dev);
+		suspend_report_result(pm->suspend_noirq, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: State of device not saved by %pF\n",
+				pm->suspend_noirq);
+			return 0;
+		}
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved) {
+		pci_save_state(pci_dev);
+		if (!pci_is_bridge(pci_dev))
+			pci_prepare_to_sleep(pci_dev);
+	}
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (!pci_dev->state_saved)
-		pci_save_state(pci_dev);
-
 	return 0;
 }
 
@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
 	if (drv && drv->pm && drv->pm->freeze_noirq) {
+		int error;
+
 		error = drv->pm->freeze_noirq(dev);
 		suspend_report_result(drv->pm->freeze_noirq, error);
+		if (error)
+			return error;
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,44 @@ static int pci_pm_poweroff(struct device
 	pci_dev->state_saved = false;
 
 	if (pm->poweroff) {
+		int error;
+
 		error = pm->poweroff(dev);
 		suspend_report_result(pm->poweroff, error);
+		if (error)
+			return error;
 	}
 
-	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
-		pci_prepare_to_sleep(pci_dev);
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
-	if (drv && drv->pm && drv->pm->poweroff_noirq) {
+	if (!drv || !drv->pm)
+		return 0;
+
+	if (drv->pm->poweroff_noirq) {
+		int error;
+
 		error = drv->pm->poweroff_noirq(dev);
 		suspend_report_result(drv->pm->poweroff_noirq, error);
+		if (error)
+			return error;
 	}
 
-	return error;
+	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+		pci_prepare_to_sleep(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_restore_noirq(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 9/11] PCI PM: Put devices into low power states during late suspend (rev. 2)
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (15 preceding siblings ...)
  2009-03-14 11:32   ` Rafael J. Wysocki
@ 2009-03-14 11:32   ` Rafael J. Wysocki
  2009-03-14 11:32   ` Rafael J. Wysocki
                     ` (6 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:32 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time.  We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver.  At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |  134 ++++++++++++++++++++++++++++-------------------
 1 file changed, 81 insertions(+), 53 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
+
+	pci_dev->state_saved = false;
 
 	if (drv && drv->suspend) {
 		pci_power_t prev = pci_dev->current_state;
-
-		pci_dev->state_saved = false;
+		int error;
 
 		error = drv->suspend(pci_dev, state);
 		suspend_report_result(drv->suspend, error);
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: Device state not saved by %pF\n",
 				drv->suspend);
-			goto Fixup;
 		}
 	}
 
-	pci_save_state(pci_dev);
-	/*
-	 * This is for compatibility with existing code with legacy PM support.
-	 */
-	pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
 	struct pci_driver * drv = pci_dev->driver;
-	int error = 0;
 
 	if (drv && drv->suspend_late) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
 		error = drv->suspend_late(pci_dev, state);
 		suspend_report_result(drv->suspend_late, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: Device state not saved by %pF\n",
+				drv->suspend_late);
+			return 0;
+		}
 	}
-	return error;
+
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
+
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
 	/* Disable non-bridge devices without PM support */
 	if (!pci_is_bridge(pci_dev))
 		pci_disable_enabled_device(pci_dev);
-	pci_save_state(pci_dev);
 }
 
 static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device 
 		if (error)
 			return error;
 
-		if (pci_dev->state_saved)
-			goto Fixup;
-
-		if (pci_dev->current_state != PCI_D0
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
 			WARN_ONCE(pci_dev->current_state != prev,
 				"PCI PM: State of device not saved by %pF\n",
 				pm->suspend);
-			goto Fixup;
 		}
 	}
 
-	if (!pci_dev->state_saved) {
-		pci_save_state(pci_dev);
-		if (!pci_is_bridge(pci_dev))
-			pci_prepare_to_sleep(pci_dev);
-	}
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device 
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (drv && drv->pm && drv->pm->suspend_noirq) {
-		error = drv->pm->suspend_noirq(dev);
-		suspend_report_result(drv->pm->suspend_noirq, error);
+	if (!pm)
+		return 0;
+
+	if (pm->suspend_noirq) {
+		pci_power_t prev = pci_dev->current_state;
+		int error;
+
+		error = pm->suspend_noirq(dev);
+		suspend_report_result(pm->suspend_noirq, error);
+		if (error)
+			return error;
+
+		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+		    && pci_dev->current_state != PCI_UNKNOWN) {
+			WARN_ONCE(pci_dev->current_state != prev,
+				"PCI PM: State of device not saved by %pF\n",
+				pm->suspend_noirq);
+			return 0;
+		}
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved) {
+		pci_save_state(pci_dev);
+		if (!pci_is_bridge(pci_dev))
+			pci_prepare_to_sleep(pci_dev);
+	}
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (!pci_dev->state_saved)
-		pci_save_state(pci_dev);
-
 	return 0;
 }
 
@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
 	if (drv && drv->pm && drv->pm->freeze_noirq) {
+		int error;
+
 		error = drv->pm->freeze_noirq(dev);
 		suspend_report_result(drv->pm->freeze_noirq, error);
+		if (error)
+			return error;
 	}
 
-	if (!error)
-		pci_pm_set_unknown_state(pci_dev);
+	if (!pci_dev->state_saved)
+		pci_save_state(pci_dev);
 
-	return error;
+	pci_pm_set_unknown_state(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,44 @@ static int pci_pm_poweroff(struct device
 	pci_dev->state_saved = false;
 
 	if (pm->poweroff) {
+		int error;
+
 		error = pm->poweroff(dev);
 		suspend_report_result(pm->poweroff, error);
+		if (error)
+			return error;
 	}
 
-	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
-		pci_prepare_to_sleep(pci_dev);
-
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
-	int error = 0;
 
 	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
-	if (drv && drv->pm && drv->pm->poweroff_noirq) {
+	if (!drv || !drv->pm)
+		return 0;
+
+	if (drv->pm->poweroff_noirq) {
+		int error;
+
 		error = drv->pm->poweroff_noirq(dev);
 		suspend_report_result(drv->pm->poweroff_noirq, error);
+		if (error)
+			return error;
 	}
 
-	return error;
+	if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+		pci_prepare_to_sleep(pci_dev);
+
+	return 0;
 }
 
 static int pci_pm_restore_noirq(struct device *dev)

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 10/11] PCI PM: Make pci_set_power_state() handle devices with no PM support
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (18 preceding siblings ...)
  2009-03-14 11:33   ` [PATCH 10/11] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
@ 2009-03-14 11:33   ` Rafael J. Wysocki
  2009-03-14 11:34   ` [PATCH 11/11] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
                     ` (3 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:33 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

There is a problem with PCI devices without any PM support (either
native or through the platform) that pci_set_power_state() always
returns error code for them, even if they are being put into D0.
However, such devices are always in D0, so pci_set_power_state()
should return success when attempting to put such a device into D0.
It also should update the current_state field for these devices as
appropriate.  This modification is necessary so that the standard
configuration registers of these devices are successfully restored by
pci_restore_standard_config() during the "early" phase of resume.

In addition, pci_set_power_state() should check the value of
current_state before calling the platform to change the power state
of the device to avoid doing that unnecessarily.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -439,6 +439,10 @@ static int pci_raw_set_power_state(struc
 	u16 pmcsr;
 	bool need_restore = false;
 
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
 	if (!dev->pm_cap)
 		return -EIO;
 
@@ -449,10 +453,7 @@ static int pci_raw_set_power_state(struc
 	 * Can enter D0 from any state, but if we can only go deeper 
 	 * to sleep if we're already in a low power state
 	 */
-	if (dev->current_state == state) {
-		/* we're already there */
-		return 0;
-	} else if (state != PCI_D0 && dev->current_state <= PCI_D3cold
+	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
 	    && dev->current_state > state) {
 		dev_err(&dev->dev, "invalid power transition "
 			"(from state %d to %d)\n", dev->current_state, state);
@@ -570,12 +571,17 @@ int pci_set_power_state(struct pci_dev *
 		 */
 		return 0;
 
-	if (state == PCI_D0 && platform_pci_power_manageable(dev)) {
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
+	if (state == PCI_D0) {
 		/*
 		 * Allow the platform to change the state, for example via ACPI
 		 * _PR0, _PS0 and some such, but do not trust it.
 		 */
-		int ret = platform_pci_set_power_state(dev, PCI_D0);
+		int ret = platform_pci_power_manageable(dev) ?
+			platform_pci_set_power_state(dev, PCI_D0) : 0;
 		if (!ret)
 			pci_update_current_state(dev, PCI_D0);
 	}

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 10/11] PCI PM: Make pci_set_power_state() handle devices with no PM support
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (17 preceding siblings ...)
  2009-03-14 11:32   ` Rafael J. Wysocki
@ 2009-03-14 11:33   ` Rafael J. Wysocki
  2009-03-14 11:33   ` Rafael J. Wysocki
                     ` (4 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:33 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

There is a problem with PCI devices without any PM support (either
native or through the platform) that pci_set_power_state() always
returns error code for them, even if they are being put into D0.
However, such devices are always in D0, so pci_set_power_state()
should return success when attempting to put such a device into D0.
It also should update the current_state field for these devices as
appropriate.  This modification is necessary so that the standard
configuration registers of these devices are successfully restored by
pci_restore_standard_config() during the "early" phase of resume.

In addition, pci_set_power_state() should check the value of
current_state before calling the platform to change the power state
of the device to avoid doing that unnecessarily.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -439,6 +439,10 @@ static int pci_raw_set_power_state(struc
 	u16 pmcsr;
 	bool need_restore = false;
 
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
 	if (!dev->pm_cap)
 		return -EIO;
 
@@ -449,10 +453,7 @@ static int pci_raw_set_power_state(struc
 	 * Can enter D0 from any state, but if we can only go deeper 
 	 * to sleep if we're already in a low power state
 	 */
-	if (dev->current_state == state) {
-		/* we're already there */
-		return 0;
-	} else if (state != PCI_D0 && dev->current_state <= PCI_D3cold
+	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
 	    && dev->current_state > state) {
 		dev_err(&dev->dev, "invalid power transition "
 			"(from state %d to %d)\n", dev->current_state, state);
@@ -570,12 +571,17 @@ int pci_set_power_state(struct pci_dev *
 		 */
 		return 0;
 
-	if (state == PCI_D0 && platform_pci_power_manageable(dev)) {
+	/* Check if we're already there */
+	if (dev->current_state == state)
+		return 0;
+
+	if (state == PCI_D0) {
 		/*
 		 * Allow the platform to change the state, for example via ACPI
 		 * _PR0, _PS0 and some such, but do not trust it.
 		 */
-		int ret = platform_pci_set_power_state(dev, PCI_D0);
+		int ret = platform_pci_power_manageable(dev) ?
+			platform_pci_set_power_state(dev, PCI_D0) : 0;
 		if (!ret)
 			pci_update_current_state(dev, PCI_D0);
 	}

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 11/11] PCI PM: Restore config spaces of all devices during early resume
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (20 preceding siblings ...)
  2009-03-14 11:34   ` [PATCH 11/11] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
@ 2009-03-14 11:34   ` Rafael J. Wysocki
  2009-03-14 11:43   ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Ingo Molnar
  2009-03-14 11:43   ` Ingo Molnar
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:34 UTC (permalink / raw)
  To: pm list
  Cc: LKML, Linus Torvalds, Ingo Molnar, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI

From: Rafael J. Wysocki <rjw@sisk.pl>

At present the configuration spaces of PCI devices that have no
drivers or no PM support in the drivers (either legacy or through a
pm object) are not saved during suspend and, consequently, they are
not restored during resume.  This generally may lead to the state of
the system being slightly inconsistent after the resume, so it's
better to save and restore the configuration spaces of these devices
as well.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -516,13 +516,13 @@ static int pci_pm_suspend(struct device 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_SUSPEND);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 		int error;
@@ -554,8 +554,10 @@ static int pci_pm_suspend_noirq(struct d
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (!pm)
+	if (!pm) {
+		pci_save_state(pci_dev);
 		return 0;
+	}
 
 	if (pm->suspend_noirq) {
 		pci_power_t prev = pci_dev->current_state;
@@ -650,13 +652,13 @@ static int pci_pm_freeze(struct device *
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_FREEZE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		return 0;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->freeze) {
 		int error;
 
@@ -738,13 +740,13 @@ static int pci_pm_poweroff(struct device
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->poweroff) {
 		int error;
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* [PATCH 11/11] PCI PM: Restore config spaces of all devices during early resume
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (19 preceding siblings ...)
  2009-03-14 11:33   ` Rafael J. Wysocki
@ 2009-03-14 11:34   ` Rafael J. Wysocki
  2009-03-14 11:34   ` Rafael J. Wysocki
                     ` (2 subsequent siblings)
  23 siblings, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:34 UTC (permalink / raw)
  To: pm list
  Cc: Arve, Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, Ingo Molnar, Linus Torvalds, Thomas Gleixner

From: Rafael J. Wysocki <rjw@sisk.pl>

At present the configuration spaces of PCI devices that have no
drivers or no PM support in the drivers (either legacy or through a
pm object) are not saved during suspend and, consequently, they are
not restored during resume.  This generally may lead to the state of
the system being slightly inconsistent after the resume, so it's
better to save and restore the configuration spaces of these devices
as well.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci-driver.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -516,13 +516,13 @@ static int pci_pm_suspend(struct device 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_SUSPEND);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 		int error;
@@ -554,8 +554,10 @@ static int pci_pm_suspend_noirq(struct d
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
-	if (!pm)
+	if (!pm) {
+		pci_save_state(pci_dev);
 		return 0;
+	}
 
 	if (pm->suspend_noirq) {
 		pci_power_t prev = pci_dev->current_state;
@@ -650,13 +652,13 @@ static int pci_pm_freeze(struct device *
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_FREEZE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		return 0;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->freeze) {
 		int error;
 
@@ -738,13 +740,13 @@ static int pci_pm_poweroff(struct device
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_HIBERNATE);
 
+	pci_dev->state_saved = false;
+
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
 		goto Fixup;
 	}
 
-	pci_dev->state_saved = false;
-
 	if (pm->poweroff) {
 		int error;
 

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x)
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (22 preceding siblings ...)
  2009-03-14 11:43   ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Ingo Molnar
@ 2009-03-14 11:43   ` Ingo Molnar
  23 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-03-14 11:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, LKML, Linus Torvalds, Eric W. Biederman,
	Benjamin Herrenschmidt, Jeremy Fitzhardinge, Len Brown,
	Jesse Barnes, Thomas Gleixner, Arve Hjønnevåg,
	Linux PCI


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Hi,
> 
> This is an update of the patch series reworking the handling 
> of interrupts during suspend-resume, addressing some comments 
> from Thomas and Ingo.

Looks very nice - thanks Rafael!

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x)
  2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
                     ` (21 preceding siblings ...)
  2009-03-14 11:34   ` Rafael J. Wysocki
@ 2009-03-14 11:43   ` Ingo Molnar
  2009-03-14 11:43   ` Ingo Molnar
  23 siblings, 0 replies; 373+ messages in thread
From: Ingo Molnar @ 2009-03-14 11:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, LKML, Jesse Barnes, Eric W. Biederman,
	Linux PCI, pm list, Linus Torvalds, Thomas Gleixner


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Hi,
> 
> This is an update of the patch series reworking the handling 
> of interrupts during suspend-resume, addressing some comments 
> from Thomas and Ingo.

Looks very nice - thanks Rafael!

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-14  8:44       ` Frans Pop
@ 2009-03-14 11:59         ` Rafael J. Wysocki
  2009-03-14 14:11           ` Frans Pop
  2009-03-14 14:11           ` Frans Pop
  2009-03-14 11:59         ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:59 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, torvalds, linux-pm

On Saturday 14 March 2009, Frans Pop wrote:
> On Sunday 08 March 2009, Rafael J. Wysocki wrote:
> > > # These don't need restoring anymore?
> >
> > I think they generally do, but the restored values may (and often are)
> > identical to the current ones.
> >
> > >    -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
> > >    -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
> > >    -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
> > >    -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
> > >    -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
> [...]
> > > # These have moved down to late resume.
> >
> > That's a bit strange.  It looks like the registers changed after we had
> > restored them during "early" resume.  So either we hadn't actually
> > restored them (it would be interesting to find out why), or they really
> > changed (again, it would be interesting to see why).
> >
> > >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> > >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> > >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> > >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> > >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> > >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> 
> These changes look to have been reverted somehow with rc8 + your latest
> patch set. Not sure if it's due to changes in the patches, or just an
> effect of local circumstances (such as (un)suspending while the system
> is docked). Or sun spots of course.
> 
> The "restoring config space" messages now look virtually the same
> as for rc5, only some messages for the ricoh-mmc module are still
> "missing", but I'm not worried about that.

Thanks for testing!

Could you please also test the last iteration of the $subject patch series
(just sent) with the appended patch applied on top and post dmesg output?

Rafael

---
 drivers/pci/pci-driver.c |   23 +++++++++++++++++++++--
 drivers/pci/pci.c        |    5 +++++
 2 files changed, 26 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -732,6 +732,9 @@ int
 pci_save_state(struct pci_dev *dev)
 {
 	int i;
+
+	dev_info(&dev->dev, "saving PCI config space\n");
+
 	/* XXX: 100% dword access ok here? */
 	for (i = 0; i < 16; i++)
 		pci_read_config_dword(dev, i * 4,&dev->saved_config_space[i]);
@@ -753,6 +756,8 @@ pci_restore_state(struct pci_dev *dev)
 	int i;
 	u32 val;
 
+	dev_info(&dev->dev, "restoring PCI config space\n");
+
 	/* PCI Express register must be restored first */
 	pci_restore_pcie_state(dev);
 
Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -438,10 +438,24 @@ static int pci_restore_standard_config(s
 {
 	pci_update_current_state(pci_dev, PCI_UNKNOWN);
 
+	switch (pci_dev->current_state) {
+	case PCI_UNKNOWN:
+	case PCI_POWER_ERROR:
+		dev_info(&pci_dev->dev, "%s: unknown power state\n",
+				__FUNCTION__);
+		break;
+	default:
+		dev_info(&pci_dev->dev, "%s: power state D%d\n",
+				__FUNCTION__, pci_dev->current_state);
+	}
+
 	if (pci_dev->current_state != PCI_D0) {
 		int error = pci_set_power_state(pci_dev, PCI_D0);
-		if (error)
+		if (error) {
+			dev_err(&pci_dev->dev,
+				"error %d while changing power state\n", error);
 			return error;
+		}
 	}
 
 	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
@@ -449,6 +463,8 @@ static int pci_restore_standard_config(s
 
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
+	dev_info(&pci_dev->dev, "%s: calling pci_restore_standard_config()\n",
+			__FUNCTION__);
 	pci_restore_standard_config(pci_dev);
 	pci_dev->state_saved = false;
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
@@ -615,8 +631,11 @@ static int pci_pm_resume(struct device *
 	 * This is necessary for the suspend error path in which resume is
 	 * called without restoring the standard config registers of the device.
 	 */
-	if (pci_dev->state_saved)
+	if (pci_dev->state_saved) {
+		dev_info(dev, "%s: restoring standard PCI config registers\n",
+				__FUNCTION__);
 		pci_restore_standard_config(pci_dev);
+	}
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume(dev);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-14  8:44       ` Frans Pop
  2009-03-14 11:59         ` Rafael J. Wysocki
@ 2009-03-14 11:59         ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 11:59 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-pm, torvalds, linux-kernel

On Saturday 14 March 2009, Frans Pop wrote:
> On Sunday 08 March 2009, Rafael J. Wysocki wrote:
> > > # These don't need restoring anymore?
> >
> > I think they generally do, but the restored values may (and often are)
> > identical to the current ones.
> >
> > >    -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
> > >    -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
> > >    -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
> > >    -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
> > >    -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
> > >    -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
> [...]
> > > # These have moved down to late resume.
> >
> > That's a bit strange.  It looks like the registers changed after we had
> > restored them during "early" resume.  So either we hadn't actually
> > restored them (it would be interesting to find out why), or they really
> > changed (again, it would be interesting to see why).
> >
> > >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> > >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> > >    -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> > >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> > >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> > >    -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> 
> These changes look to have been reverted somehow with rc8 + your latest
> patch set. Not sure if it's due to changes in the patches, or just an
> effect of local circumstances (such as (un)suspending while the system
> is docked). Or sun spots of course.
> 
> The "restoring config space" messages now look virtually the same
> as for rc5, only some messages for the ricoh-mmc module are still
> "missing", but I'm not worried about that.

Thanks for testing!

Could you please also test the last iteration of the $subject patch series
(just sent) with the appended patch applied on top and post dmesg output?

Rafael

---
 drivers/pci/pci-driver.c |   23 +++++++++++++++++++++--
 drivers/pci/pci.c        |    5 +++++
 2 files changed, 26 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -732,6 +732,9 @@ int
 pci_save_state(struct pci_dev *dev)
 {
 	int i;
+
+	dev_info(&dev->dev, "saving PCI config space\n");
+
 	/* XXX: 100% dword access ok here? */
 	for (i = 0; i < 16; i++)
 		pci_read_config_dword(dev, i * 4,&dev->saved_config_space[i]);
@@ -753,6 +756,8 @@ pci_restore_state(struct pci_dev *dev)
 	int i;
 	u32 val;
 
+	dev_info(&dev->dev, "restoring PCI config space\n");
+
 	/* PCI Express register must be restored first */
 	pci_restore_pcie_state(dev);
 
Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -438,10 +438,24 @@ static int pci_restore_standard_config(s
 {
 	pci_update_current_state(pci_dev, PCI_UNKNOWN);
 
+	switch (pci_dev->current_state) {
+	case PCI_UNKNOWN:
+	case PCI_POWER_ERROR:
+		dev_info(&pci_dev->dev, "%s: unknown power state\n",
+				__FUNCTION__);
+		break;
+	default:
+		dev_info(&pci_dev->dev, "%s: power state D%d\n",
+				__FUNCTION__, pci_dev->current_state);
+	}
+
 	if (pci_dev->current_state != PCI_D0) {
 		int error = pci_set_power_state(pci_dev, PCI_D0);
-		if (error)
+		if (error) {
+			dev_err(&pci_dev->dev,
+				"error %d while changing power state\n", error);
 			return error;
+		}
 	}
 
 	return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
@@ -449,6 +463,8 @@ static int pci_restore_standard_config(s
 
 static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
 {
+	dev_info(&pci_dev->dev, "%s: calling pci_restore_standard_config()\n",
+			__FUNCTION__);
 	pci_restore_standard_config(pci_dev);
 	pci_dev->state_saved = false;
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
@@ -615,8 +631,11 @@ static int pci_pm_resume(struct device *
 	 * This is necessary for the suspend error path in which resume is
 	 * called without restoring the standard config registers of the device.
 	 */
-	if (pci_dev->state_saved)
+	if (pci_dev->state_saved) {
+		dev_info(dev, "%s: restoring standard PCI config registers\n",
+				__FUNCTION__);
 		pci_restore_standard_config(pci_dev);
+	}
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume(dev);

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-14 11:59         ` Rafael J. Wysocki
  2009-03-14 14:11           ` Frans Pop
@ 2009-03-14 14:11           ` Frans Pop
  2009-03-14 22:31             ` Rafael J. Wysocki
  2009-03-14 22:31             ` Rafael J. Wysocki
  1 sibling, 2 replies; 373+ messages in thread
From: Frans Pop @ 2009-03-14 14:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, torvalds, linux-pm

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Saturday 14 March 2009, you wrote:
> Could you please also test the last iteration of the $subject patch
> series (just sent) with the appended patch applied on top and post
> dmesg output?

Here you are:
- boot
- STR with wireless networking
- STD with wireless networking
- STR with wired networking and killswitch on wireless

No problems seen :-)

Cheers,
FJP


[-- Attachment #2: 2.6.29-rc8-rjw-test.gz --]
[-- Type: application/x-gzip, Size: 14185 bytes --]

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-14 11:59         ` Rafael J. Wysocki
@ 2009-03-14 14:11           ` Frans Pop
  2009-03-14 14:11           ` Frans Pop
  1 sibling, 0 replies; 373+ messages in thread
From: Frans Pop @ 2009-03-14 14:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Saturday 14 March 2009, you wrote:
> Could you please also test the last iteration of the $subject patch
> series (just sent) with the appended patch applied on top and post
> dmesg output?

Here you are:
- boot
- STR with wireless networking
- STD with wireless networking
- STR with wired networking and killswitch on wireless

No problems seen :-)

Cheers,
FJP


[-- Attachment #2: 2.6.29-rc8-rjw-test.gz --]
[-- Type: application/x-gzip, Size: 14185 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-14 14:11           ` Frans Pop
  2009-03-14 22:31             ` Rafael J. Wysocki
@ 2009-03-14 22:31             ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 22:31 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, torvalds, linux-pm

On Saturday 14 March 2009, Frans Pop wrote:
> On Saturday 14 March 2009, you wrote:
> > Could you please also test the last iteration of the $subject patch
> > series (just sent) with the appended patch applied on top and post
> > dmesg output?
> 
> Here you are:
> - boot
> - STR with wireless networking
> - STD with wireless networking
> - STR with wired networking and killswitch on wireless
> 
> No problems seen :-)

Great, thanks for the log, it looks correct.

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

* Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts
  2009-03-14 14:11           ` Frans Pop
@ 2009-03-14 22:31             ` Rafael J. Wysocki
  2009-03-14 22:31             ` Rafael J. Wysocki
  1 sibling, 0 replies; 373+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 22:31 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-pm, torvalds, linux-kernel

On Saturday 14 March 2009, Frans Pop wrote:
> On Saturday 14 March 2009, you wrote:
> > Could you please also test the last iteration of the $subject patch
> > series (just sent) with the appended patch applied on top and post
> > dmesg output?
> 
> Here you are:
> - boot
> - STR with wireless networking
> - STD with wireless networking
> - STR with wired networking and killswitch on wireless
> 
> No problems seen :-)

Great, thanks for the log, it looks correct.

Best,
Rafael

^ permalink raw reply	[flat|nested] 373+ messages in thread

end of thread, other threads:[~2009-03-14 22:31 UTC | newest]

Thread overview: 373+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-22 17:37 [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume Rafael J. Wysocki
2009-02-22 17:37 ` Rafael J. Wysocki
2009-02-22 17:38 ` [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up] Rafael J. Wysocki
2009-02-22 17:38 ` Rafael J. Wysocki
2009-02-22 20:56   ` Adrian Bunk
2009-02-22 21:07     ` Linus Torvalds
2009-02-22 21:07       ` Linus Torvalds
2009-02-22 21:12       ` Ingo Molnar
2009-02-22 21:12         ` Ingo Molnar
2009-02-22 22:42       ` Adrian Bunk
2009-02-22 22:42       ` Adrian Bunk
2009-02-22 20:56   ` Adrian Bunk
2009-03-05 16:54   ` Pavel Machek
2009-03-05 16:54   ` Pavel Machek
2009-02-22 17:39 ` [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
2009-02-22 18:01   ` Linus Torvalds
2009-02-22 22:42     ` Rafael J. Wysocki
2009-02-22 23:48       ` Rafael J. Wysocki
2009-02-23  0:05         ` Linus Torvalds
2009-02-23  0:05         ` Linus Torvalds
2009-02-23  1:23           ` Linus Torvalds
2009-02-23  1:23             ` Linus Torvalds
2009-02-23 10:52             ` Rafael J. Wysocki
2009-02-23  3:04         ` Eric W. Biederman
2009-02-23  8:44           ` Ingo Molnar
2009-02-23  8:44           ` Ingo Molnar
2009-02-23  9:22             ` Eric W. Biederman
2009-02-23  9:22             ` Eric W. Biederman
2009-02-23  9:44               ` Ingo Molnar
2009-02-23 10:42                 ` Eric W. Biederman
2009-02-23 11:03                   ` Rafael J. Wysocki
2009-02-23 15:28                     ` Eric W. Biederman
2009-02-23 15:28                     ` Eric W. Biederman
2009-02-23 21:39                       ` Rafael J. Wysocki
2009-02-23 21:39                       ` Rafael J. Wysocki
2009-02-24  3:30                         ` Eric W. Biederman
2009-02-24 22:42                           ` Rafael J. Wysocki
2009-02-24 22:51                             ` Linus Torvalds
2009-02-24 22:51                             ` Linus Torvalds
2009-02-24 23:07                               ` Rafael J. Wysocki
2009-02-24 23:09                                 ` Ingo Molnar
2009-02-24 23:29                                   ` Rafael J. Wysocki
2009-02-24 23:29                                   ` Rafael J. Wysocki
2009-02-25 13:23                                     ` Ingo Molnar
2009-02-25 13:23                                     ` Ingo Molnar
2009-02-26  1:17                                     ` Arve Hjønnevåg
2009-02-26  1:27                                       ` Linus Torvalds
2009-02-26  1:27                                       ` Linus Torvalds
2009-02-26  2:13                                         ` Arve Hjønnevåg
2009-02-26  2:51                                           ` Linus Torvalds
2009-02-26  3:00                                             ` Ingo Molnar
2009-02-26  3:00                                             ` Ingo Molnar
2009-02-26  3:31                                               ` Arve Hjønnevåg
2009-02-26  3:31                                               ` Arve Hjønnevåg
2009-02-26  3:37                                                 ` Linus Torvalds
2009-02-26  3:37                                                   ` Linus Torvalds
2009-02-26  3:50                                                   ` Arve Hjønnevåg
2009-02-26  3:50                                                   ` Arve Hjønnevåg
2009-02-26  3:57                                                     ` Linus Torvalds
2009-02-26  3:57                                                     ` Linus Torvalds
2009-02-26  4:13                                                       ` Arve Hjønnevåg
2009-02-26  4:13                                                       ` Arve Hjønnevåg
2009-02-26  4:20                                                         ` Eric W. Biederman
2009-02-26  4:20                                                         ` Eric W. Biederman
2009-02-26  4:24                                                           ` Arve Hjønnevåg
2009-02-26  4:24                                                           ` Arve Hjønnevåg
2009-02-26  2:51                                           ` Linus Torvalds
2009-02-26  2:13                                         ` Arve Hjønnevåg
2009-02-26  9:50                                       ` Rafael J. Wysocki
2009-02-26 20:34                                         ` Arve Hjønnevåg
2009-02-26 20:57                                           ` Benjamin Herrenschmidt
2009-02-26 20:57                                             ` Benjamin Herrenschmidt
2009-02-26 21:20                                             ` Arve Hjønnevåg
2009-02-26 21:49                                               ` Benjamin Herrenschmidt
2009-02-26 21:49                                               ` Benjamin Herrenschmidt
2009-02-26 21:20                                             ` Arve Hjønnevåg
2009-02-26 21:58                                           ` Rafael J. Wysocki
2009-02-26 22:10                                             ` Linus Torvalds
2009-02-26 22:10                                             ` Linus Torvalds
2009-02-26 22:30                                               ` Arve Hjønnevåg
2009-02-26 22:30                                               ` Arve Hjønnevåg
2009-02-26 23:10                                                 ` Rafael J. Wysocki
2009-02-26 23:10                                                 ` Rafael J. Wysocki
2009-02-27  0:00                                                   ` Arve Hjønnevåg
2009-02-27  0:27                                                     ` Linus Torvalds
2009-02-27  3:20                                                       ` [linux-pm] " Alan Stern
2009-02-27  4:43                                                         ` Linus Torvalds
2009-02-27  4:43                                                           ` Linus Torvalds
2009-02-27 14:59                                                           ` [linux-pm] " Alan Stern
2009-02-27 20:30                                                             ` Linus Torvalds
2009-02-27 20:30                                                             ` [linux-pm] " Linus Torvalds
2009-02-28  3:54                                                               ` Arve Hjønnevåg
2009-02-28  3:54                                                               ` [linux-pm] " Arve Hjønnevåg
2009-02-28 10:06                                                                 ` Rafael J. Wysocki
2009-02-28 10:06                                                                 ` [linux-pm] " Rafael J. Wysocki
2009-02-28 17:03                                                                   ` Linus Torvalds
2009-02-28 17:03                                                                     ` Linus Torvalds
2009-02-28 22:15                                                                   ` [linux-pm] " Arve Hjønnevåg
2009-02-28 22:15                                                                   ` Arve Hjønnevåg
2009-02-27 14:59                                                           ` Alan Stern
2009-02-27  3:20                                                       ` Alan Stern
2009-02-27  0:27                                                     ` Linus Torvalds
2009-02-27  0:00                                                   ` Arve Hjønnevåg
2009-02-26 22:30                                               ` Rafael J. Wysocki
2009-02-26 22:30                                               ` Rafael J. Wysocki
2009-02-26 21:58                                           ` Rafael J. Wysocki
2009-02-26 20:34                                         ` Arve Hjønnevåg
2009-02-26  9:50                                       ` Rafael J. Wysocki
2009-02-26  1:17                                     ` Arve Hjønnevåg
2009-02-24 23:09                                 ` Ingo Molnar
2009-02-24 23:07                               ` Rafael J. Wysocki
2009-02-25  4:16                               ` Eric W. Biederman
2009-02-25  4:26                                 ` Linus Torvalds
2009-02-25  4:26                                   ` Linus Torvalds
2009-02-25  4:59                                   ` Eric W. Biederman
2009-02-25  4:59                                   ` Eric W. Biederman
2009-02-25  4:16                               ` Eric W. Biederman
2009-02-25 15:32                             ` Alan Stern
2009-02-25 15:32                             ` [linux-pm] " Alan Stern
2009-02-25 16:19                               ` Linus Torvalds
2009-02-25 16:19                                 ` Linus Torvalds
2009-02-24 22:42                           ` Rafael J. Wysocki
2009-02-24  3:30                         ` Eric W. Biederman
2009-02-23 11:03                   ` Rafael J. Wysocki
2009-02-23 11:04                   ` Ingo Molnar
2009-02-23 14:45                     ` Rafael J. Wysocki
2009-02-23 15:06                       ` Ingo Molnar
2009-02-23 15:06                         ` Ingo Molnar
2009-02-23 21:59                         ` Rafael J. Wysocki
2009-02-23 21:59                           ` Rafael J. Wysocki
2009-02-23 14:45                     ` Rafael J. Wysocki
2009-02-23 11:04                   ` Ingo Molnar
2009-02-23 10:42                 ` Eric W. Biederman
2009-02-23  9:44               ` Ingo Molnar
2009-02-23 10:13               ` Benjamin Herrenschmidt
2009-02-23 10:13               ` Benjamin Herrenschmidt
2009-02-23  3:04         ` Eric W. Biederman
2009-02-23  8:36         ` Ingo Molnar
2009-02-23  8:36         ` Ingo Molnar
2009-02-23 11:29           ` Rafael J. Wysocki
2009-02-23 12:28             ` Ingo Molnar
2009-02-23 14:48               ` Rafael J. Wysocki
2009-02-23 14:48                 ` Rafael J. Wysocki
2009-02-23 20:49               ` Benjamin Herrenschmidt
2009-02-23 20:49               ` Benjamin Herrenschmidt
2009-02-23 12:28             ` Ingo Molnar
2009-02-23 12:45             ` Ingo Molnar
2009-02-23 15:07               ` Rafael J. Wysocki
2009-02-23 15:07               ` Rafael J. Wysocki
2009-02-23 12:45             ` Ingo Molnar
2009-02-23 15:52             ` Johannes Berg
2009-02-23 15:52             ` Johannes Berg
2009-02-23 17:16             ` Ingo Molnar
2009-02-23 17:16             ` Ingo Molnar
2009-02-23 17:28               ` Linus Torvalds
2009-02-23 17:28                 ` Linus Torvalds
2009-02-23 22:11                 ` Rafael J. Wysocki
2009-02-23 22:11                 ` Rafael J. Wysocki
2009-02-23 11:29           ` Rafael J. Wysocki
2009-02-22 23:48       ` Rafael J. Wysocki
2009-02-22 22:42     ` Rafael J. Wysocki
2009-02-22 18:01   ` Linus Torvalds
2009-02-23 22:11   ` Arve Hjønnevåg
2009-02-23 22:11   ` Arve Hjønnevåg
2009-02-23 22:23     ` Rafael J. Wysocki
2009-02-23 22:23       ` Rafael J. Wysocki
2009-02-23 22:44       ` Arve Hjønnevåg
2009-02-23 22:44       ` Arve Hjønnevåg
2009-02-22 17:39 ` Rafael J. Wysocki
2009-02-22 18:13 ` [RFC][PATCH 0/2] Rework disabling " Linus Torvalds
2009-02-22 18:13   ` Linus Torvalds
2009-02-22 18:18   ` Ingo Molnar
2009-02-22 18:25     ` Linus Torvalds
2009-02-22 18:25       ` Linus Torvalds
2009-02-22 18:35       ` Linus Torvalds
2009-02-22 18:35         ` Linus Torvalds
2009-02-22 18:18   ` Ingo Molnar
2009-02-22 22:37 ` Eric W. Biederman
2009-02-22 22:37 ` Eric W. Biederman
2009-02-22 22:56   ` Benjamin Herrenschmidt
2009-02-22 22:56   ` Benjamin Herrenschmidt
2009-02-22 23:02   ` Linus Torvalds
2009-02-22 23:02     ` Linus Torvalds
2009-03-01 22:21 ` [RFC][PATCH 0/4] " Rafael J. Wysocki
2009-03-01 22:21 ` Rafael J. Wysocki
2009-03-01 22:24   ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
2009-03-02 23:01     ` Arve Hjønnevåg
2009-03-02 23:01     ` Arve Hjønnevåg
2009-03-02 23:13       ` Rafael J. Wysocki
2009-03-02 23:18         ` Arve Hjønnevåg
2009-03-02 23:18         ` Arve Hjønnevåg
2009-03-02 23:27           ` Rafael J. Wysocki
2009-03-02 23:27           ` Rafael J. Wysocki
2009-03-03 22:56             ` Arve Hjønnevåg
2009-03-04 22:03               ` [Update, rev. 5] " Rafael J. Wysocki
2009-03-05 10:35                 ` Ingo Molnar
2009-03-05 10:35                 ` Ingo Molnar
2009-03-04 22:03               ` Rafael J. Wysocki
2009-03-03 22:56             ` Arve Hjønnevåg
2009-03-02 23:32           ` Linus Torvalds
2009-03-02 23:32             ` Linus Torvalds
2009-03-02 23:35             ` Linus Torvalds
2009-03-02 23:35               ` Linus Torvalds
2009-03-03  0:08               ` Arve Hjønnevåg
2009-03-03  0:08               ` Arve Hjønnevåg
2009-03-03  8:41                 ` Arve Hjønnevåg
2009-03-03  8:41                 ` Arve Hjønnevåg
2009-03-02 23:13       ` Rafael J. Wysocki
2009-03-01 22:24   ` Rafael J. Wysocki
2009-03-01 22:25   ` [RFC][PATCH 2/4] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-01 22:25   ` Rafael J. Wysocki
2009-03-02 20:48     ` Linus Torvalds
2009-03-02 20:48       ` Linus Torvalds
2009-03-02 22:02       ` Rafael J. Wysocki
2009-03-02 22:02       ` Rafael J. Wysocki
2009-03-01 22:26   ` [RFC][PATCH 3/4] PM: Change hibernation " Rafael J. Wysocki
2009-03-01 22:26   ` Rafael J. Wysocki
2009-03-01 22:27   ` [RFC][PATCH 4/4] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-01 22:27   ` Rafael J. Wysocki
2009-03-05 23:44   ` [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume Linus Torvalds
2009-03-05 23:44     ` Linus Torvalds
2009-03-06  6:47     ` Sitsofe Wheeler
2009-03-06  6:47     ` Sitsofe Wheeler
2009-03-06 10:19     ` Rafael J. Wysocki
2009-03-06 10:19     ` Rafael J. Wysocki
2009-03-07 10:19 ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Rafael J. Wysocki
2009-03-07 10:19 ` Rafael J. Wysocki
2009-03-07 10:20   ` [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
2009-03-07 10:20     ` Rafael J. Wysocki
2009-03-07 16:51     ` [linux-pm] " Alan Stern
2009-03-07 17:56       ` Rafael J. Wysocki
2009-03-07 17:56       ` [linux-pm] " Rafael J. Wysocki
2009-03-08  3:53         ` Alan Stern
2009-03-08  3:53         ` [linux-pm] " Alan Stern
2009-03-08 10:00           ` Rafael J. Wysocki
2009-03-08 10:00           ` [linux-pm] " Rafael J. Wysocki
2009-03-08 12:37             ` Alan Stern
2009-03-08 12:37             ` [linux-pm] " Alan Stern
2009-03-08 17:20           ` Linus Torvalds
2009-03-08 20:40             ` Alan Stern
2009-03-08 20:40             ` [linux-pm] " Alan Stern
2009-03-08 21:37               ` Rafael J. Wysocki
2009-03-08 21:37               ` Rafael J. Wysocki
2009-03-09 14:59               ` Linus Torvalds
2009-03-09 14:59               ` [linux-pm] " Linus Torvalds
2009-03-09 15:13                 ` Alan Stern
2009-03-09 15:40                   ` Linus Torvalds
2009-03-09 15:40                   ` [linux-pm] " Linus Torvalds
2009-03-09 15:13                 ` Alan Stern
2009-03-08 17:20           ` Linus Torvalds
2009-03-07 16:51     ` Alan Stern
2009-03-07 10:21   ` [RFC][PATCH][2/8] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-07 10:21   ` Rafael J. Wysocki
2009-03-07 10:22   ` [RFC][PATCH][3/8] PM: Change hibernation " Rafael J. Wysocki
2009-03-07 10:22   ` Rafael J. Wysocki
2009-03-07 10:23   ` [RFC][PATCH][4/8] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-07 10:23   ` Rafael J. Wysocki
2009-03-07 10:24   ` [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-07 10:24   ` Rafael J. Wysocki
2009-03-07 10:25   ` [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-07 10:25   ` Rafael J. Wysocki
2009-03-07 10:26   ` [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-07 10:26   ` Rafael J. Wysocki
2009-03-07 10:27   ` [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend Rafael J. Wysocki
2009-03-07 10:27   ` Rafael J. Wysocki
2009-03-08 19:28   ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Frans Pop
2009-03-08 20:50     ` Rafael J. Wysocki
2009-03-08 20:50     ` Rafael J. Wysocki
2009-03-14  8:44       ` Frans Pop
2009-03-14 11:59         ` Rafael J. Wysocki
2009-03-14 14:11           ` Frans Pop
2009-03-14 14:11           ` Frans Pop
2009-03-14 22:31             ` Rafael J. Wysocki
2009-03-14 22:31             ` Rafael J. Wysocki
2009-03-14 11:59         ` Rafael J. Wysocki
2009-03-14  8:44       ` Frans Pop
2009-03-08 19:28   ` Frans Pop
2009-03-11  9:30 ` [PATCH 0/10] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated) Rafael J. Wysocki
2009-03-11  9:30 ` Rafael J. Wysocki
2009-03-11  9:36   ` [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
2009-03-11  9:36   ` Rafael J. Wysocki
2009-03-11 10:33     ` Thomas Gleixner
2009-03-11 10:33     ` Thomas Gleixner
2009-03-11 20:59       ` Rafael J. Wysocki
2009-03-11 21:42         ` Thomas Gleixner
2009-03-11 21:42         ` Thomas Gleixner
2009-03-11 22:01           ` Rafael J. Wysocki
2009-03-11 22:01             ` Rafael J. Wysocki
2009-03-11 22:45           ` Thomas Gleixner
2009-03-12 13:36             ` Rafael J. Wysocki
2009-03-12 21:43               ` [update, rev. 6] " Rafael J. Wysocki
2009-03-12 21:43               ` Rafael J. Wysocki
2009-03-13  0:39                 ` Ingo Molnar
2009-03-13  0:39                 ` Ingo Molnar
2009-03-13 17:07                   ` Rafael J. Wysocki
2009-03-13 17:07                     ` Rafael J. Wysocki
2009-03-13  7:15                 ` Arve Hjønnevåg
2009-03-13  7:15                   ` Arve Hjønnevåg
2009-03-13 16:53                   ` Rafael J. Wysocki
2009-03-13 16:53                   ` Rafael J. Wysocki
2009-03-13 19:55                 ` Thomas Gleixner
2009-03-13 19:55                 ` Thomas Gleixner
2009-03-13 21:56                   ` Rafael J. Wysocki
2009-03-13 21:56                   ` Rafael J. Wysocki
2009-03-14  7:31                     ` Thomas Gleixner
2009-03-14  7:31                     ` Thomas Gleixner
2009-03-14 10:01                       ` Rafael J. Wysocki
2009-03-14 10:01                       ` Rafael J. Wysocki
2009-03-14  0:04                   ` Rafael J. Wysocki
2009-03-14  0:04                   ` Rafael J. Wysocki
2009-03-12 13:36             ` Rafael J. Wysocki
2009-03-11 22:45           ` Thomas Gleixner
2009-03-11 20:59       ` Rafael J. Wysocki
2009-03-11 21:15       ` Rafael J. Wysocki
2009-03-11 21:15       ` Rafael J. Wysocki
2009-03-11 21:35         ` Thomas Gleixner
2009-03-11 21:35         ` Thomas Gleixner
2009-03-11 21:50           ` Rafael J. Wysocki
2009-03-11 21:50           ` Rafael J. Wysocki
2009-03-11 21:53             ` Thomas Gleixner
2009-03-11 21:53             ` Thomas Gleixner
2009-03-11 22:01               ` Linus Torvalds
2009-03-11 22:01                 ` Linus Torvalds
2009-03-11 22:13                 ` Rafael J. Wysocki
2009-03-11 22:13                 ` Rafael J. Wysocki
2009-03-11 22:25                 ` Thomas Gleixner
2009-03-11 22:25                 ` Thomas Gleixner
2009-03-11 22:07               ` Rafael J. Wysocki
2009-03-11 22:07               ` Rafael J. Wysocki
2009-03-11  9:37   ` [PATCH 2/10] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-11  9:37   ` Rafael J. Wysocki
2009-03-11  9:38   ` [PATCH 3/10] PM: Change hibernation " Rafael J. Wysocki
2009-03-11  9:38   ` Rafael J. Wysocki
2009-03-11  9:39   ` [PATCH 4/10] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-11  9:39     ` Rafael J. Wysocki
2009-03-11  9:41   ` [PATCH 5/10] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-11  9:41   ` Rafael J. Wysocki
2009-03-11  9:42   ` [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-11  9:42   ` Rafael J. Wysocki
2009-03-11  9:47   ` [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-11  9:47   ` Rafael J. Wysocki
2009-03-11  9:48   ` [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
2009-03-11  9:48   ` Rafael J. Wysocki
2009-03-11  9:55   ` [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
2009-03-11  9:55   ` Rafael J. Wysocki
2009-03-11  9:56   ` [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
2009-03-11  9:56   ` Rafael J. Wysocki
2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
2009-03-14 11:26   ` [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts Rafael J. Wysocki
2009-03-14 11:26   ` Rafael J. Wysocki
2009-03-14 11:27   ` [PATCH 2/11] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
2009-03-14 11:27   ` Rafael J. Wysocki
2009-03-14 11:28   ` [PATCH 3/11] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-14 11:28   ` Rafael J. Wysocki
2009-03-14 11:28   ` [PATCH 4/11] PM: Change hibernation " Rafael J. Wysocki
2009-03-14 11:28   ` Rafael J. Wysocki
2009-03-14 11:29   ` [PATCH 5/11] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-14 11:29   ` Rafael J. Wysocki
2009-03-14 11:30   ` [PATCH 6/11] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-14 11:30   ` Rafael J. Wysocki
2009-03-14 11:31   ` [PATCH 7/11] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-14 11:31   ` Rafael J. Wysocki
2009-03-14 11:32   ` [PATCH 8/11] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-14 11:32   ` Rafael J. Wysocki
2009-03-14 11:32   ` [PATCH 9/11] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
2009-03-14 11:32   ` Rafael J. Wysocki
2009-03-14 11:33   ` [PATCH 10/11] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
2009-03-14 11:33   ` Rafael J. Wysocki
2009-03-14 11:34   ` [PATCH 11/11] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
2009-03-14 11:34   ` Rafael J. Wysocki
2009-03-14 11:43   ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Ingo Molnar
2009-03-14 11:43   ` Ingo Molnar
2009-03-14 11:24 ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.