All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2] PM: Rearrange core suspend code
@ 2009-06-06 22:54 Rafael J. Wysocki
  2009-06-06 22:55 ` [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core Rafael J. Wysocki
                   ` (5 more replies)
  0 siblings, 6 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-06 22:54 UTC (permalink / raw)
  To: pm list; +Cc: Pavel Machek, LKML, ACPI Devel Maling List, Len Brown

Hi,

Here's something I wanted to do quite some time ago.

kernel/power/main.c becomes more and more difficult to maintain over time,
since it contains both the suspend to RAM core code and some common PM code
that is also used for hibernation.  For this reason [1/2] separates the suspend
to RAM code from main.c and puts it into two new files (the test facility is,
again, separated from the core code for clarity).

[2/2] renames kernel/power/disk.c to kernel/power/hibernate.c, because the role
of this file is analogous to kernel/power/suspend.c (introduced by [1/2]).

Comments welcome.

Best,
Rafael


^ permalink raw reply	[flat|nested] 199+ messages in thread

* [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core
  2009-06-06 22:54 [RFC][PATCH 0/2] PM: Rearrange core suspend code Rafael J. Wysocki
  2009-06-06 22:55 ` [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core Rafael J. Wysocki
@ 2009-06-06 22:55 ` Rafael J. Wysocki
  2009-06-08  6:36   ` Pavel Machek
  2009-06-08  6:36   ` Pavel Machek
  2009-06-06 22:56 ` [RFC][PATCH 2/2] PM/Hibernate: Rename disk.c to hibernate.c Rafael J. Wysocki
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-06 22:55 UTC (permalink / raw)
  To: pm list; +Cc: Pavel Machek, LKML, ACPI Devel Maling List, Len Brown

From: Rafael J. Wysocki <rjw@sisk.pl>

Move the suspend to RAM and standby code from kernel/power/main.c
to two separate files, kernel/power/suspend.c containing the basic
functions and kernel/power/suspend-test.c containing the automatic
suspend test facility based on the RTC clock alarm.

There are no changes in functionality related to these modifications.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/Makefile       |    2 
 kernel/power/main.c         |  503 --------------------------------------------
 kernel/power/power.h        |   17 +
 kernel/power/suspend-test.c |  187 ++++++++++++++++
 kernel/power/suspend.c      |  300 ++++++++++++++++++++++++++
 5 files changed, 505 insertions(+), 504 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -8,20 +8,9 @@
  *
  */
 
-#include <linux/module.h>
-#include <linux/suspend.h>
 #include <linux/kobject.h>
 #include <linux/string.h>
-#include <linux/delay.h>
-#include <linux/errno.h>
-#include <linux/kmod.h>
-#include <linux/init.h>
-#include <linux/console.h>
-#include <linux/cpu.h>
 #include <linux/resume-trace.h>
-#include <linux/freezer.h>
-#include <linux/vmstat.h>
-#include <linux/syscalls.h>
 
 #include "power.h"
 
@@ -119,355 +108,6 @@ power_attr(pm_test);
 
 #endif /* CONFIG_PM_SLEEP */
 
-#ifdef CONFIG_SUSPEND
-
-static int suspend_test(int level)
-{
-#ifdef CONFIG_PM_DEBUG
-	if (pm_test_level == level) {
-		printk(KERN_INFO "suspend debug: Waiting for 5 seconds.\n");
-		mdelay(5000);
-		return 1;
-	}
-#endif /* !CONFIG_PM_DEBUG */
-	return 0;
-}
-
-#ifdef CONFIG_PM_TEST_SUSPEND
-
-/*
- * We test the system suspend code by setting an RTC wakealarm a short
- * time in the future, then suspending.  Suspending the devices won't
- * normally take long ... some systems only need a few milliseconds.
- *
- * The time it takes is system-specific though, so when we test this
- * during system bootup we allow a LOT of time.
- */
-#define TEST_SUSPEND_SECONDS	5
-
-static unsigned long suspend_test_start_time;
-
-static void suspend_test_start(void)
-{
-	/* FIXME Use better timebase than "jiffies", ideally a clocksource.
-	 * What we want is a hardware counter that will work correctly even
-	 * during the irqs-are-off stages of the suspend/resume cycle...
-	 */
-	suspend_test_start_time = jiffies;
-}
-
-static void suspend_test_finish(const char *label)
-{
-	long nj = jiffies - suspend_test_start_time;
-	unsigned msec;
-
-	msec = jiffies_to_msecs(abs(nj));
-	pr_info("PM: %s took %d.%03d seconds\n", label,
-			msec / 1000, msec % 1000);
-
-	/* Warning on suspend means the RTC alarm period needs to be
-	 * larger -- the system was sooo slooowwww to suspend that the
-	 * alarm (should have) fired before the system went to sleep!
-	 *
-	 * Warning on either suspend or resume also means the system
-	 * has some performance issues.  The stack dump of a WARN_ON
-	 * is more likely to get the right attention than a printk...
-	 */
-	WARN(msec > (TEST_SUSPEND_SECONDS * 1000), "Component: %s\n", label);
-}
-
-#else
-
-static void suspend_test_start(void)
-{
-}
-
-static void suspend_test_finish(const char *label)
-{
-}
-
-#endif
-
-static struct platform_suspend_ops *suspend_ops;
-
-/**
- *	suspend_set_ops - Set the global suspend method table.
- *	@ops:	Pointer to ops structure.
- */
-
-void suspend_set_ops(struct platform_suspend_ops *ops)
-{
-	mutex_lock(&pm_mutex);
-	suspend_ops = ops;
-	mutex_unlock(&pm_mutex);
-}
-
-/**
- * suspend_valid_only_mem - generic memory-only valid callback
- *
- * Platform drivers that implement mem suspend only and only need
- * to check for that in their .valid callback can use this instead
- * of rolling their own .valid callback.
- */
-int suspend_valid_only_mem(suspend_state_t state)
-{
-	return state == PM_SUSPEND_MEM;
-}
-
-/**
- *	suspend_prepare - Do prep work before entering low-power state.
- *
- *	This is common code that is called for each state that we're entering.
- *	Run suspend notifiers, allocate a console and stop all processes.
- */
-static int suspend_prepare(void)
-{
-	int error;
-
-	if (!suspend_ops || !suspend_ops->enter)
-		return -EPERM;
-
-	pm_prepare_console();
-
-	error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);
-	if (error)
-		goto Finish;
-
-	error = usermodehelper_disable();
-	if (error)
-		goto Finish;
-
-	error = suspend_freeze_processes();
-	if (!error)
-		return 0;
-
-	suspend_thaw_processes();
-	usermodehelper_enable();
- Finish:
-	pm_notifier_call_chain(PM_POST_SUSPEND);
-	pm_restore_console();
-	return error;
-}
-
-/* default implementation */
-void __attribute__ ((weak)) arch_suspend_disable_irqs(void)
-{
-	local_irq_disable();
-}
-
-/* default implementation */
-void __attribute__ ((weak)) arch_suspend_enable_irqs(void)
-{
-	local_irq_enable();
-}
-
-/**
- *	suspend_enter - enter the desired system sleep state.
- *	@state:		state to enter
- *
- *	This function should be called after devices have been suspended.
- */
-static int suspend_enter(suspend_state_t state)
-{
-	int error;
-
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			return error;
-	}
-
-	error = dpm_suspend_noirq(PMSG_SUSPEND);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to power down\n");
-		goto Platfrom_finish;
-	}
-
-	if (suspend_ops->prepare_late) {
-		error = suspend_ops->prepare_late();
-		if (error)
-			goto Power_up_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Platform_wake;
-
-	error = disable_nonboot_cpus();
-	if (error || suspend_test(TEST_CPUS))
-		goto Enable_cpus;
-
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
-
-	error = sysdev_suspend(PMSG_SUSPEND);
-	if (!error) {
-		if (!suspend_test(TEST_CORE))
-			error = suspend_ops->enter(state);
-		sysdev_resume();
-	}
-
-	arch_suspend_enable_irqs();
-	BUG_ON(irqs_disabled());
-
- Enable_cpus:
-	enable_nonboot_cpus();
-
- Platform_wake:
-	if (suspend_ops->wake)
-		suspend_ops->wake();
-
- Power_up_devices:
-	dpm_resume_noirq(PMSG_RESUME);
-
- Platfrom_finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
-
-	return error;
-}
-
-/**
- *	suspend_devices_and_enter - suspend devices and enter the desired system
- *				    sleep state.
- *	@state:		  state to enter
- */
-int suspend_devices_and_enter(suspend_state_t state)
-{
-	int error;
-
-	if (!suspend_ops)
-		return -ENOSYS;
-
-	if (suspend_ops->begin) {
-		error = suspend_ops->begin(state);
-		if (error)
-			goto Close;
-	}
-	suspend_console();
-	suspend_test_start();
-	error = dpm_suspend_start(PMSG_SUSPEND);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to suspend\n");
-		goto Recover_platform;
-	}
-	suspend_test_finish("suspend devices");
-	if (suspend_test(TEST_DEVICES))
-		goto Recover_platform;
-
-	suspend_enter(state);
-
- Resume_devices:
-	suspend_test_start();
-	dpm_resume_end(PMSG_RESUME);
-	suspend_test_finish("resume devices");
-	resume_console();
- Close:
-	if (suspend_ops->end)
-		suspend_ops->end();
-	return error;
-
- Recover_platform:
-	if (suspend_ops->recover)
-		suspend_ops->recover();
-	goto Resume_devices;
-}
-
-/**
- *	suspend_finish - Do final work before exiting suspend sequence.
- *
- *	Call platform code to clean up, restart processes, and free the 
- *	console that we've allocated. This is not called for suspend-to-disk.
- */
-static void suspend_finish(void)
-{
-	suspend_thaw_processes();
-	usermodehelper_enable();
-	pm_notifier_call_chain(PM_POST_SUSPEND);
-	pm_restore_console();
-}
-
-
-
-
-static const char * const pm_states[PM_SUSPEND_MAX] = {
-	[PM_SUSPEND_STANDBY]	= "standby",
-	[PM_SUSPEND_MEM]	= "mem",
-};
-
-static inline int valid_state(suspend_state_t state)
-{
-	/* All states need lowlevel support and need to be valid
-	 * to the lowlevel implementation, no valid callback
-	 * implies that none are valid. */
-	if (!suspend_ops || !suspend_ops->valid || !suspend_ops->valid(state))
-		return 0;
-	return 1;
-}
-
-
-/**
- *	enter_state - Do common work of entering low-power state.
- *	@state:		pm_state structure for state we're entering.
- *
- *	Make sure we're the only ones trying to enter a sleep state. Fail
- *	if someone has beat us to it, since we don't want anything weird to
- *	happen when we wake up.
- *	Then, do the setup for suspend, enter the state, and cleaup (after
- *	we've woken up).
- */
-static int enter_state(suspend_state_t state)
-{
-	int error;
-
-	if (!valid_state(state))
-		return -ENODEV;
-
-	if (!mutex_trylock(&pm_mutex))
-		return -EBUSY;
-
-	printk(KERN_INFO "PM: Syncing filesystems ... ");
-	sys_sync();
-	printk("done.\n");
-
-	pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
-	error = suspend_prepare();
-	if (error)
-		goto Unlock;
-
-	if (suspend_test(TEST_FREEZER))
-		goto Finish;
-
-	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
-	error = suspend_devices_and_enter(state);
-
- Finish:
-	pr_debug("PM: Finishing wakeup.\n");
-	suspend_finish();
- Unlock:
-	mutex_unlock(&pm_mutex);
-	return error;
-}
-
-
-/**
- *	pm_suspend - Externally visible function for suspending system.
- *	@state:		Enumerated value of state to enter.
- *
- *	Determine whether or not value is within range, get state 
- *	structure, and enter (above).
- */
-
-int pm_suspend(suspend_state_t state)
-{
-	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
-		return enter_state(state);
-	return -EINVAL;
-}
-
-EXPORT_SYMBOL(pm_suspend);
-
-#endif /* CONFIG_SUSPEND */
-
 struct kobject *power_kobj;
 
 /**
@@ -480,7 +120,6 @@ struct kobject *power_kobj;
  *	store() accepts one of those strings, translates it into the 
  *	proper enumerated value, and initiates a suspend transition.
  */
-
 static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
 			  char *buf)
 {
@@ -578,7 +217,6 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
-
 static int __init pm_init(void)
 {
 	power_kobj = kobject_create_and_add("power", NULL);
@@ -588,144 +226,3 @@ static int __init pm_init(void)
 }
 
 core_initcall(pm_init);
-
-
-#ifdef CONFIG_PM_TEST_SUSPEND
-
-#include <linux/rtc.h>
-
-/*
- * To test system suspend, we need a hands-off mechanism to resume the
- * system.  RTCs wake alarms are a common self-contained mechanism.
- */
-
-static void __init test_wakealarm(struct rtc_device *rtc, suspend_state_t state)
-{
-	static char err_readtime[] __initdata =
-		KERN_ERR "PM: can't read %s time, err %d\n";
-	static char err_wakealarm [] __initdata =
-		KERN_ERR "PM: can't set %s wakealarm, err %d\n";
-	static char err_suspend[] __initdata =
-		KERN_ERR "PM: suspend test failed, error %d\n";
-	static char info_test[] __initdata =
-		KERN_INFO "PM: test RTC wakeup from '%s' suspend\n";
-
-	unsigned long		now;
-	struct rtc_wkalrm	alm;
-	int			status;
-
-	/* this may fail if the RTC hasn't been initialized */
-	status = rtc_read_time(rtc, &alm.time);
-	if (status < 0) {
-		printk(err_readtime, dev_name(&rtc->dev), status);
-		return;
-	}
-	rtc_tm_to_time(&alm.time, &now);
-
-	memset(&alm, 0, sizeof alm);
-	rtc_time_to_tm(now + TEST_SUSPEND_SECONDS, &alm.time);
-	alm.enabled = true;
-
-	status = rtc_set_alarm(rtc, &alm);
-	if (status < 0) {
-		printk(err_wakealarm, dev_name(&rtc->dev), status);
-		return;
-	}
-
-	if (state == PM_SUSPEND_MEM) {
-		printk(info_test, pm_states[state]);
-		status = pm_suspend(state);
-		if (status == -ENODEV)
-			state = PM_SUSPEND_STANDBY;
-	}
-	if (state == PM_SUSPEND_STANDBY) {
-		printk(info_test, pm_states[state]);
-		status = pm_suspend(state);
-	}
-	if (status < 0)
-		printk(err_suspend, status);
-
-	/* Some platforms can't detect that the alarm triggered the
-	 * wakeup, or (accordingly) disable it after it afterwards.
-	 * It's supposed to give oneshot behavior; cope.
-	 */
-	alm.enabled = false;
-	rtc_set_alarm(rtc, &alm);
-}
-
-static int __init has_wakealarm(struct device *dev, void *name_ptr)
-{
-	struct rtc_device *candidate = to_rtc_device(dev);
-
-	if (!candidate->ops->set_alarm)
-		return 0;
-	if (!device_may_wakeup(candidate->dev.parent))
-		return 0;
-
-	*(const char **)name_ptr = dev_name(dev);
-	return 1;
-}
-
-/*
- * Kernel options like "test_suspend=mem" force suspend/resume sanity tests
- * at startup time.  They're normally disabled, for faster boot and because
- * we can't know which states really work on this particular system.
- */
-static suspend_state_t test_state __initdata = PM_SUSPEND_ON;
-
-static char warn_bad_state[] __initdata =
-	KERN_WARNING "PM: can't test '%s' suspend state\n";
-
-static int __init setup_test_suspend(char *value)
-{
-	unsigned i;
-
-	/* "=mem" ==> "mem" */
-	value++;
-	for (i = 0; i < PM_SUSPEND_MAX; i++) {
-		if (!pm_states[i])
-			continue;
-		if (strcmp(pm_states[i], value) != 0)
-			continue;
-		test_state = (__force suspend_state_t) i;
-		return 0;
-	}
-	printk(warn_bad_state, value);
-	return 0;
-}
-__setup("test_suspend", setup_test_suspend);
-
-static int __init test_suspend(void)
-{
-	static char		warn_no_rtc[] __initdata =
-		KERN_WARNING "PM: no wakealarm-capable RTC driver is ready\n";
-
-	char			*pony = NULL;
-	struct rtc_device	*rtc = NULL;
-
-	/* PM is initialized by now; is that state testable? */
-	if (test_state == PM_SUSPEND_ON)
-		goto done;
-	if (!valid_state(test_state)) {
-		printk(warn_bad_state, pm_states[test_state]);
-		goto done;
-	}
-
-	/* RTCs have initialized by now too ... can we use one? */
-	class_find_device(rtc_class, NULL, &pony, has_wakealarm);
-	if (pony)
-		rtc = rtc_class_open(pony);
-	if (!rtc) {
-		printk(warn_no_rtc);
-		goto done;
-	}
-
-	/* go for it */
-	test_wakealarm(rtc, test_state);
-	rtc_class_close(rtc);
-done:
-	return 0;
-}
-late_initcall(test_suspend);
-
-#endif /* CONFIG_PM_TEST_SUSPEND */
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/power/suspend.c
@@ -0,0 +1,300 @@
+/*
+ * kernel/power/suspend.c - Suspend to RAM and standby functionality.
+ *
+ * Copyright (c) 2003 Patrick Mochel
+ * Copyright (c) 2003 Open Source Development Lab
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/string.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/console.h>
+#include <linux/cpu.h>
+#include <linux/syscalls.h>
+
+#include "power.h"
+
+const char *const pm_states[PM_SUSPEND_MAX] = {
+	[PM_SUSPEND_STANDBY]	= "standby",
+	[PM_SUSPEND_MEM]	= "mem",
+};
+
+static struct platform_suspend_ops *suspend_ops;
+
+/**
+ *	suspend_set_ops - Set the global suspend method table.
+ *	@ops:	Pointer to ops structure.
+ */
+void suspend_set_ops(struct platform_suspend_ops *ops)
+{
+	mutex_lock(&pm_mutex);
+	suspend_ops = ops;
+	mutex_unlock(&pm_mutex);
+}
+
+bool valid_state(suspend_state_t state)
+{
+	/*
+	 * All states need lowlevel support and need to be valid to the lowlevel
+	 * implementation, no valid callback implies that none are valid.
+	 */
+	return suspend_ops && suspend_ops->valid && suspend_ops->valid(state);
+}
+
+/**
+ * suspend_valid_only_mem - generic memory-only valid callback
+ *
+ * Platform drivers that implement mem suspend only and only need
+ * to check for that in their .valid callback can use this instead
+ * of rolling their own .valid callback.
+ */
+int suspend_valid_only_mem(suspend_state_t state)
+{
+	return state == PM_SUSPEND_MEM;
+}
+
+static int suspend_test(int level)
+{
+#ifdef CONFIG_PM_DEBUG
+	if (pm_test_level == level) {
+		printk(KERN_INFO "suspend debug: Waiting for 5 seconds.\n");
+		mdelay(5000);
+		return 1;
+	}
+#endif /* !CONFIG_PM_DEBUG */
+	return 0;
+}
+
+/**
+ *	suspend_prepare - Do prep work before entering low-power state.
+ *
+ *	This is common code that is called for each state that we're entering.
+ *	Run suspend notifiers, allocate a console and stop all processes.
+ */
+static int suspend_prepare(void)
+{
+	int error;
+
+	if (!suspend_ops || !suspend_ops->enter)
+		return -EPERM;
+
+	pm_prepare_console();
+
+	error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);
+	if (error)
+		goto Finish;
+
+	error = usermodehelper_disable();
+	if (error)
+		goto Finish;
+
+	error = suspend_freeze_processes();
+	if (!error)
+		return 0;
+
+	suspend_thaw_processes();
+	usermodehelper_enable();
+ Finish:
+	pm_notifier_call_chain(PM_POST_SUSPEND);
+	pm_restore_console();
+	return error;
+}
+
+/* default implementation */
+void __attribute__ ((weak)) arch_suspend_disable_irqs(void)
+{
+	local_irq_disable();
+}
+
+/* default implementation */
+void __attribute__ ((weak)) arch_suspend_enable_irqs(void)
+{
+	local_irq_enable();
+}
+
+/**
+ *	suspend_enter - enter the desired system sleep state.
+ *	@state:		state to enter
+ *
+ *	This function should be called after devices have been suspended.
+ */
+static int suspend_enter(suspend_state_t state)
+{
+	int error;
+
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			return error;
+	}
+
+	error = dpm_suspend_noirq(PMSG_SUSPEND);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down\n");
+		goto Platfrom_finish;
+	}
+
+	if (suspend_ops->prepare_late) {
+		error = suspend_ops->prepare_late();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platform_wake;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
+	error = sysdev_suspend(PMSG_SUSPEND);
+	if (!error) {
+		if (!suspend_test(TEST_CORE))
+			error = suspend_ops->enter(state);
+		sysdev_resume();
+	}
+
+	arch_suspend_enable_irqs();
+	BUG_ON(irqs_disabled());
+
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_wake:
+	if (suspend_ops->wake)
+		suspend_ops->wake();
+
+ Power_up_devices:
+	dpm_resume_noirq(PMSG_RESUME);
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+	return error;
+}
+
+/**
+ *	suspend_devices_and_enter - suspend devices and enter the desired system
+ *				    sleep state.
+ *	@state:		  state to enter
+ */
+int suspend_devices_and_enter(suspend_state_t state)
+{
+	int error;
+
+	if (!suspend_ops)
+		return -ENOSYS;
+
+	if (suspend_ops->begin) {
+		error = suspend_ops->begin(state);
+		if (error)
+			goto Close;
+	}
+	suspend_console();
+	suspend_test_start();
+	error = dpm_suspend_start(PMSG_SUSPEND);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to suspend\n");
+		goto Recover_platform;
+	}
+	suspend_test_finish("suspend devices");
+	if (suspend_test(TEST_DEVICES))
+		goto Recover_platform;
+
+	suspend_enter(state);
+
+ Resume_devices:
+	suspend_test_start();
+	dpm_resume_end(PMSG_RESUME);
+	suspend_test_finish("resume devices");
+	resume_console();
+ Close:
+	if (suspend_ops->end)
+		suspend_ops->end();
+	return error;
+
+ Recover_platform:
+	if (suspend_ops->recover)
+		suspend_ops->recover();
+	goto Resume_devices;
+}
+
+/**
+ *	suspend_finish - Do final work before exiting suspend sequence.
+ *
+ *	Call platform code to clean up, restart processes, and free the
+ *	console that we've allocated. This is not called for suspend-to-disk.
+ */
+static void suspend_finish(void)
+{
+	suspend_thaw_processes();
+	usermodehelper_enable();
+	pm_notifier_call_chain(PM_POST_SUSPEND);
+	pm_restore_console();
+}
+
+/**
+ *	enter_state - Do common work of entering low-power state.
+ *	@state:		pm_state structure for state we're entering.
+ *
+ *	Make sure we're the only ones trying to enter a sleep state. Fail
+ *	if someone has beat us to it, since we don't want anything weird to
+ *	happen when we wake up.
+ *	Then, do the setup for suspend, enter the state, and cleaup (after
+ *	we've woken up).
+ */
+int enter_state(suspend_state_t state)
+{
+	int error;
+
+	if (!valid_state(state))
+		return -ENODEV;
+
+	if (!mutex_trylock(&pm_mutex))
+		return -EBUSY;
+
+	printk(KERN_INFO "PM: Syncing filesystems ... ");
+	sys_sync();
+	printk("done.\n");
+
+	pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
+	error = suspend_prepare();
+	if (error)
+		goto Unlock;
+
+	if (suspend_test(TEST_FREEZER))
+		goto Finish;
+
+	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
+	error = suspend_devices_and_enter(state);
+
+ Finish:
+	pr_debug("PM: Finishing wakeup.\n");
+	suspend_finish();
+ Unlock:
+	mutex_unlock(&pm_mutex);
+	return error;
+}
+
+/**
+ *	pm_suspend - Externally visible function for suspending system.
+ *	@state:		Enumerated value of state to enter.
+ *
+ *	Determine whether or not value is within range, get state
+ *	structure, and enter (above).
+ */
+int pm_suspend(suspend_state_t state)
+{
+	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
+		return enter_state(state);
+	return -EINVAL;
+}
+EXPORT_SYMBOL(pm_suspend);
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -160,15 +160,30 @@ extern void swsusp_show_speed(struct tim
 				unsigned int, char *);
 
 #ifdef CONFIG_SUSPEND
-/* kernel/power/main.c */
+/* kernel/power/suspend.c */
+extern const char *const pm_states[];
+
+extern bool valid_state(suspend_state_t state);
 extern int suspend_devices_and_enter(suspend_state_t state);
+extern int enter_state(suspend_state_t state);
 #else /* !CONFIG_SUSPEND */
 static inline int suspend_devices_and_enter(suspend_state_t state)
 {
 	return -ENOSYS;
 }
+static inline int enter_state(suspend_state_t state) { return -ENOSYS; }
+static inline bool valid_state(suspend_state_t state) { return false; }
 #endif /* !CONFIG_SUSPEND */
 
+#ifdef CONFIG_PM_TEST_SUSPEND
+/* kernel/power/suspend-test.c */
+extern void suspend_test_start(void);
+extern void suspend_test_finish(const char *label);
+#else /* !CONFIG_PM_TEST_SUSPEND */
+static inline void suspend_test_start(void) {}
+static inline void suspend_test_finish(const char *label) {}
+#endif /* !CONFIG_PM_TEST_SUSPEND */
+
 #ifdef CONFIG_PM_SLEEP
 /* kernel/power/main.c */
 extern int pm_notifier_call_chain(unsigned long val);
Index: linux-2.6/kernel/power/Makefile
===================================================================
--- linux-2.6.orig/kernel/power/Makefile
+++ linux-2.6/kernel/power/Makefile
@@ -6,6 +6,8 @@ endif
 obj-$(CONFIG_PM)		+= main.o
 obj-$(CONFIG_PM_SLEEP)		+= console.o
 obj-$(CONFIG_FREEZER)		+= process.o
+obj-$(CONFIG_SUSPEND)		+= suspend.o
+obj-$(CONFIG_PM_TEST_SUSPEND)	+= suspend-test.o
 obj-$(CONFIG_HIBERNATION)	+= swsusp.o disk.o snapshot.o swap.o user.o
 
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
Index: linux-2.6/kernel/power/suspend-test.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/power/suspend-test.c
@@ -0,0 +1,187 @@
+/*
+ * kernel/power/suspend-test.c - Suspend to RAM and standby test facility.
+ *
+ * Copyright (c) 2009 Pavel Machek <pavel@ucw.cz>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/init.h>
+#include <linux/rtc.h>
+
+#include "power.h"
+
+/*
+ * We test the system suspend code by setting an RTC wakealarm a short
+ * time in the future, then suspending.  Suspending the devices won't
+ * normally take long ... some systems only need a few milliseconds.
+ *
+ * The time it takes is system-specific though, so when we test this
+ * during system bootup we allow a LOT of time.
+ */
+#define TEST_SUSPEND_SECONDS	5
+
+static unsigned long suspend_test_start_time;
+
+void suspend_test_start(void)
+{
+	/* FIXME Use better timebase than "jiffies", ideally a clocksource.
+	 * What we want is a hardware counter that will work correctly even
+	 * during the irqs-are-off stages of the suspend/resume cycle...
+	 */
+	suspend_test_start_time = jiffies;
+}
+
+void suspend_test_finish(const char *label)
+{
+	long nj = jiffies - suspend_test_start_time;
+	unsigned msec;
+
+	msec = jiffies_to_msecs(abs(nj));
+	pr_info("PM: %s took %d.%03d seconds\n", label,
+			msec / 1000, msec % 1000);
+
+	/* Warning on suspend means the RTC alarm period needs to be
+	 * larger -- the system was sooo slooowwww to suspend that the
+	 * alarm (should have) fired before the system went to sleep!
+	 *
+	 * Warning on either suspend or resume also means the system
+	 * has some performance issues.  The stack dump of a WARN_ON
+	 * is more likely to get the right attention than a printk...
+	 */
+	WARN(msec > (TEST_SUSPEND_SECONDS * 1000), "Component: %s\n", label);
+}
+
+/*
+ * To test system suspend, we need a hands-off mechanism to resume the
+ * system.  RTCs wake alarms are a common self-contained mechanism.
+ */
+
+static void __init test_wakealarm(struct rtc_device *rtc, suspend_state_t state)
+{
+	static char err_readtime[] __initdata =
+		KERN_ERR "PM: can't read %s time, err %d\n";
+	static char err_wakealarm [] __initdata =
+		KERN_ERR "PM: can't set %s wakealarm, err %d\n";
+	static char err_suspend[] __initdata =
+		KERN_ERR "PM: suspend test failed, error %d\n";
+	static char info_test[] __initdata =
+		KERN_INFO "PM: test RTC wakeup from '%s' suspend\n";
+
+	unsigned long		now;
+	struct rtc_wkalrm	alm;
+	int			status;
+
+	/* this may fail if the RTC hasn't been initialized */
+	status = rtc_read_time(rtc, &alm.time);
+	if (status < 0) {
+		printk(err_readtime, dev_name(&rtc->dev), status);
+		return;
+	}
+	rtc_tm_to_time(&alm.time, &now);
+
+	memset(&alm, 0, sizeof alm);
+	rtc_time_to_tm(now + TEST_SUSPEND_SECONDS, &alm.time);
+	alm.enabled = true;
+
+	status = rtc_set_alarm(rtc, &alm);
+	if (status < 0) {
+		printk(err_wakealarm, dev_name(&rtc->dev), status);
+		return;
+	}
+
+	if (state == PM_SUSPEND_MEM) {
+		printk(info_test, pm_states[state]);
+		status = pm_suspend(state);
+		if (status == -ENODEV)
+			state = PM_SUSPEND_STANDBY;
+	}
+	if (state == PM_SUSPEND_STANDBY) {
+		printk(info_test, pm_states[state]);
+		status = pm_suspend(state);
+	}
+	if (status < 0)
+		printk(err_suspend, status);
+
+	/* Some platforms can't detect that the alarm triggered the
+	 * wakeup, or (accordingly) disable it after it afterwards.
+	 * It's supposed to give oneshot behavior; cope.
+	 */
+	alm.enabled = false;
+	rtc_set_alarm(rtc, &alm);
+}
+
+static int __init has_wakealarm(struct device *dev, void *name_ptr)
+{
+	struct rtc_device *candidate = to_rtc_device(dev);
+
+	if (!candidate->ops->set_alarm)
+		return 0;
+	if (!device_may_wakeup(candidate->dev.parent))
+		return 0;
+
+	*(const char **)name_ptr = dev_name(dev);
+	return 1;
+}
+
+/*
+ * Kernel options like "test_suspend=mem" force suspend/resume sanity tests
+ * at startup time.  They're normally disabled, for faster boot and because
+ * we can't know which states really work on this particular system.
+ */
+static suspend_state_t test_state __initdata = PM_SUSPEND_ON;
+
+static char warn_bad_state[] __initdata =
+	KERN_WARNING "PM: can't test '%s' suspend state\n";
+
+static int __init setup_test_suspend(char *value)
+{
+	unsigned i;
+
+	/* "=mem" ==> "mem" */
+	value++;
+	for (i = 0; i < PM_SUSPEND_MAX; i++) {
+		if (!pm_states[i])
+			continue;
+		if (strcmp(pm_states[i], value) != 0)
+			continue;
+		test_state = (__force suspend_state_t) i;
+		return 0;
+	}
+	printk(warn_bad_state, value);
+	return 0;
+}
+__setup("test_suspend", setup_test_suspend);
+
+static int __init test_suspend(void)
+{
+	static char		warn_no_rtc[] __initdata =
+		KERN_WARNING "PM: no wakealarm-capable RTC driver is ready\n";
+
+	char			*pony = NULL;
+	struct rtc_device	*rtc = NULL;
+
+	/* PM is initialized by now; is that state testable? */
+	if (test_state == PM_SUSPEND_ON)
+		goto done;
+	if (!valid_state(test_state)) {
+		printk(warn_bad_state, pm_states[test_state]);
+		goto done;
+	}
+
+	/* RTCs have initialized by now too ... can we use one? */
+	class_find_device(rtc_class, NULL, &pony, has_wakealarm);
+	if (pony)
+		rtc = rtc_class_open(pony);
+	if (!rtc) {
+		printk(warn_no_rtc);
+		goto done;
+	}
+
+	/* go for it */
+	test_wakealarm(rtc, test_state);
+	rtc_class_close(rtc);
+done:
+	return 0;
+}
+late_initcall(test_suspend);


^ permalink raw reply	[flat|nested] 199+ messages in thread

* [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core
  2009-06-06 22:54 [RFC][PATCH 0/2] PM: Rearrange core suspend code Rafael J. Wysocki
@ 2009-06-06 22:55 ` Rafael J. Wysocki
  2009-06-06 22:55 ` Rafael J. Wysocki
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-06 22:55 UTC (permalink / raw)
  To: pm list; +Cc: ACPI Devel Maling List, LKML

From: Rafael J. Wysocki <rjw@sisk.pl>

Move the suspend to RAM and standby code from kernel/power/main.c
to two separate files, kernel/power/suspend.c containing the basic
functions and kernel/power/suspend-test.c containing the automatic
suspend test facility based on the RTC clock alarm.

There are no changes in functionality related to these modifications.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/Makefile       |    2 
 kernel/power/main.c         |  503 --------------------------------------------
 kernel/power/power.h        |   17 +
 kernel/power/suspend-test.c |  187 ++++++++++++++++
 kernel/power/suspend.c      |  300 ++++++++++++++++++++++++++
 5 files changed, 505 insertions(+), 504 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -8,20 +8,9 @@
  *
  */
 
-#include <linux/module.h>
-#include <linux/suspend.h>
 #include <linux/kobject.h>
 #include <linux/string.h>
-#include <linux/delay.h>
-#include <linux/errno.h>
-#include <linux/kmod.h>
-#include <linux/init.h>
-#include <linux/console.h>
-#include <linux/cpu.h>
 #include <linux/resume-trace.h>
-#include <linux/freezer.h>
-#include <linux/vmstat.h>
-#include <linux/syscalls.h>
 
 #include "power.h"
 
@@ -119,355 +108,6 @@ power_attr(pm_test);
 
 #endif /* CONFIG_PM_SLEEP */
 
-#ifdef CONFIG_SUSPEND
-
-static int suspend_test(int level)
-{
-#ifdef CONFIG_PM_DEBUG
-	if (pm_test_level == level) {
-		printk(KERN_INFO "suspend debug: Waiting for 5 seconds.\n");
-		mdelay(5000);
-		return 1;
-	}
-#endif /* !CONFIG_PM_DEBUG */
-	return 0;
-}
-
-#ifdef CONFIG_PM_TEST_SUSPEND
-
-/*
- * We test the system suspend code by setting an RTC wakealarm a short
- * time in the future, then suspending.  Suspending the devices won't
- * normally take long ... some systems only need a few milliseconds.
- *
- * The time it takes is system-specific though, so when we test this
- * during system bootup we allow a LOT of time.
- */
-#define TEST_SUSPEND_SECONDS	5
-
-static unsigned long suspend_test_start_time;
-
-static void suspend_test_start(void)
-{
-	/* FIXME Use better timebase than "jiffies", ideally a clocksource.
-	 * What we want is a hardware counter that will work correctly even
-	 * during the irqs-are-off stages of the suspend/resume cycle...
-	 */
-	suspend_test_start_time = jiffies;
-}
-
-static void suspend_test_finish(const char *label)
-{
-	long nj = jiffies - suspend_test_start_time;
-	unsigned msec;
-
-	msec = jiffies_to_msecs(abs(nj));
-	pr_info("PM: %s took %d.%03d seconds\n", label,
-			msec / 1000, msec % 1000);
-
-	/* Warning on suspend means the RTC alarm period needs to be
-	 * larger -- the system was sooo slooowwww to suspend that the
-	 * alarm (should have) fired before the system went to sleep!
-	 *
-	 * Warning on either suspend or resume also means the system
-	 * has some performance issues.  The stack dump of a WARN_ON
-	 * is more likely to get the right attention than a printk...
-	 */
-	WARN(msec > (TEST_SUSPEND_SECONDS * 1000), "Component: %s\n", label);
-}
-
-#else
-
-static void suspend_test_start(void)
-{
-}
-
-static void suspend_test_finish(const char *label)
-{
-}
-
-#endif
-
-static struct platform_suspend_ops *suspend_ops;
-
-/**
- *	suspend_set_ops - Set the global suspend method table.
- *	@ops:	Pointer to ops structure.
- */
-
-void suspend_set_ops(struct platform_suspend_ops *ops)
-{
-	mutex_lock(&pm_mutex);
-	suspend_ops = ops;
-	mutex_unlock(&pm_mutex);
-}
-
-/**
- * suspend_valid_only_mem - generic memory-only valid callback
- *
- * Platform drivers that implement mem suspend only and only need
- * to check for that in their .valid callback can use this instead
- * of rolling their own .valid callback.
- */
-int suspend_valid_only_mem(suspend_state_t state)
-{
-	return state == PM_SUSPEND_MEM;
-}
-
-/**
- *	suspend_prepare - Do prep work before entering low-power state.
- *
- *	This is common code that is called for each state that we're entering.
- *	Run suspend notifiers, allocate a console and stop all processes.
- */
-static int suspend_prepare(void)
-{
-	int error;
-
-	if (!suspend_ops || !suspend_ops->enter)
-		return -EPERM;
-
-	pm_prepare_console();
-
-	error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);
-	if (error)
-		goto Finish;
-
-	error = usermodehelper_disable();
-	if (error)
-		goto Finish;
-
-	error = suspend_freeze_processes();
-	if (!error)
-		return 0;
-
-	suspend_thaw_processes();
-	usermodehelper_enable();
- Finish:
-	pm_notifier_call_chain(PM_POST_SUSPEND);
-	pm_restore_console();
-	return error;
-}
-
-/* default implementation */
-void __attribute__ ((weak)) arch_suspend_disable_irqs(void)
-{
-	local_irq_disable();
-}
-
-/* default implementation */
-void __attribute__ ((weak)) arch_suspend_enable_irqs(void)
-{
-	local_irq_enable();
-}
-
-/**
- *	suspend_enter - enter the desired system sleep state.
- *	@state:		state to enter
- *
- *	This function should be called after devices have been suspended.
- */
-static int suspend_enter(suspend_state_t state)
-{
-	int error;
-
-	if (suspend_ops->prepare) {
-		error = suspend_ops->prepare();
-		if (error)
-			return error;
-	}
-
-	error = dpm_suspend_noirq(PMSG_SUSPEND);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to power down\n");
-		goto Platfrom_finish;
-	}
-
-	if (suspend_ops->prepare_late) {
-		error = suspend_ops->prepare_late();
-		if (error)
-			goto Power_up_devices;
-	}
-
-	if (suspend_test(TEST_PLATFORM))
-		goto Platform_wake;
-
-	error = disable_nonboot_cpus();
-	if (error || suspend_test(TEST_CPUS))
-		goto Enable_cpus;
-
-	arch_suspend_disable_irqs();
-	BUG_ON(!irqs_disabled());
-
-	error = sysdev_suspend(PMSG_SUSPEND);
-	if (!error) {
-		if (!suspend_test(TEST_CORE))
-			error = suspend_ops->enter(state);
-		sysdev_resume();
-	}
-
-	arch_suspend_enable_irqs();
-	BUG_ON(irqs_disabled());
-
- Enable_cpus:
-	enable_nonboot_cpus();
-
- Platform_wake:
-	if (suspend_ops->wake)
-		suspend_ops->wake();
-
- Power_up_devices:
-	dpm_resume_noirq(PMSG_RESUME);
-
- Platfrom_finish:
-	if (suspend_ops->finish)
-		suspend_ops->finish();
-
-	return error;
-}
-
-/**
- *	suspend_devices_and_enter - suspend devices and enter the desired system
- *				    sleep state.
- *	@state:		  state to enter
- */
-int suspend_devices_and_enter(suspend_state_t state)
-{
-	int error;
-
-	if (!suspend_ops)
-		return -ENOSYS;
-
-	if (suspend_ops->begin) {
-		error = suspend_ops->begin(state);
-		if (error)
-			goto Close;
-	}
-	suspend_console();
-	suspend_test_start();
-	error = dpm_suspend_start(PMSG_SUSPEND);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to suspend\n");
-		goto Recover_platform;
-	}
-	suspend_test_finish("suspend devices");
-	if (suspend_test(TEST_DEVICES))
-		goto Recover_platform;
-
-	suspend_enter(state);
-
- Resume_devices:
-	suspend_test_start();
-	dpm_resume_end(PMSG_RESUME);
-	suspend_test_finish("resume devices");
-	resume_console();
- Close:
-	if (suspend_ops->end)
-		suspend_ops->end();
-	return error;
-
- Recover_platform:
-	if (suspend_ops->recover)
-		suspend_ops->recover();
-	goto Resume_devices;
-}
-
-/**
- *	suspend_finish - Do final work before exiting suspend sequence.
- *
- *	Call platform code to clean up, restart processes, and free the 
- *	console that we've allocated. This is not called for suspend-to-disk.
- */
-static void suspend_finish(void)
-{
-	suspend_thaw_processes();
-	usermodehelper_enable();
-	pm_notifier_call_chain(PM_POST_SUSPEND);
-	pm_restore_console();
-}
-
-
-
-
-static const char * const pm_states[PM_SUSPEND_MAX] = {
-	[PM_SUSPEND_STANDBY]	= "standby",
-	[PM_SUSPEND_MEM]	= "mem",
-};
-
-static inline int valid_state(suspend_state_t state)
-{
-	/* All states need lowlevel support and need to be valid
-	 * to the lowlevel implementation, no valid callback
-	 * implies that none are valid. */
-	if (!suspend_ops || !suspend_ops->valid || !suspend_ops->valid(state))
-		return 0;
-	return 1;
-}
-
-
-/**
- *	enter_state - Do common work of entering low-power state.
- *	@state:		pm_state structure for state we're entering.
- *
- *	Make sure we're the only ones trying to enter a sleep state. Fail
- *	if someone has beat us to it, since we don't want anything weird to
- *	happen when we wake up.
- *	Then, do the setup for suspend, enter the state, and cleaup (after
- *	we've woken up).
- */
-static int enter_state(suspend_state_t state)
-{
-	int error;
-
-	if (!valid_state(state))
-		return -ENODEV;
-
-	if (!mutex_trylock(&pm_mutex))
-		return -EBUSY;
-
-	printk(KERN_INFO "PM: Syncing filesystems ... ");
-	sys_sync();
-	printk("done.\n");
-
-	pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
-	error = suspend_prepare();
-	if (error)
-		goto Unlock;
-
-	if (suspend_test(TEST_FREEZER))
-		goto Finish;
-
-	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
-	error = suspend_devices_and_enter(state);
-
- Finish:
-	pr_debug("PM: Finishing wakeup.\n");
-	suspend_finish();
- Unlock:
-	mutex_unlock(&pm_mutex);
-	return error;
-}
-
-
-/**
- *	pm_suspend - Externally visible function for suspending system.
- *	@state:		Enumerated value of state to enter.
- *
- *	Determine whether or not value is within range, get state 
- *	structure, and enter (above).
- */
-
-int pm_suspend(suspend_state_t state)
-{
-	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
-		return enter_state(state);
-	return -EINVAL;
-}
-
-EXPORT_SYMBOL(pm_suspend);
-
-#endif /* CONFIG_SUSPEND */
-
 struct kobject *power_kobj;
 
 /**
@@ -480,7 +120,6 @@ struct kobject *power_kobj;
  *	store() accepts one of those strings, translates it into the 
  *	proper enumerated value, and initiates a suspend transition.
  */
-
 static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
 			  char *buf)
 {
@@ -578,7 +217,6 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
-
 static int __init pm_init(void)
 {
 	power_kobj = kobject_create_and_add("power", NULL);
@@ -588,144 +226,3 @@ static int __init pm_init(void)
 }
 
 core_initcall(pm_init);
-
-
-#ifdef CONFIG_PM_TEST_SUSPEND
-
-#include <linux/rtc.h>
-
-/*
- * To test system suspend, we need a hands-off mechanism to resume the
- * system.  RTCs wake alarms are a common self-contained mechanism.
- */
-
-static void __init test_wakealarm(struct rtc_device *rtc, suspend_state_t state)
-{
-	static char err_readtime[] __initdata =
-		KERN_ERR "PM: can't read %s time, err %d\n";
-	static char err_wakealarm [] __initdata =
-		KERN_ERR "PM: can't set %s wakealarm, err %d\n";
-	static char err_suspend[] __initdata =
-		KERN_ERR "PM: suspend test failed, error %d\n";
-	static char info_test[] __initdata =
-		KERN_INFO "PM: test RTC wakeup from '%s' suspend\n";
-
-	unsigned long		now;
-	struct rtc_wkalrm	alm;
-	int			status;
-
-	/* this may fail if the RTC hasn't been initialized */
-	status = rtc_read_time(rtc, &alm.time);
-	if (status < 0) {
-		printk(err_readtime, dev_name(&rtc->dev), status);
-		return;
-	}
-	rtc_tm_to_time(&alm.time, &now);
-
-	memset(&alm, 0, sizeof alm);
-	rtc_time_to_tm(now + TEST_SUSPEND_SECONDS, &alm.time);
-	alm.enabled = true;
-
-	status = rtc_set_alarm(rtc, &alm);
-	if (status < 0) {
-		printk(err_wakealarm, dev_name(&rtc->dev), status);
-		return;
-	}
-
-	if (state == PM_SUSPEND_MEM) {
-		printk(info_test, pm_states[state]);
-		status = pm_suspend(state);
-		if (status == -ENODEV)
-			state = PM_SUSPEND_STANDBY;
-	}
-	if (state == PM_SUSPEND_STANDBY) {
-		printk(info_test, pm_states[state]);
-		status = pm_suspend(state);
-	}
-	if (status < 0)
-		printk(err_suspend, status);
-
-	/* Some platforms can't detect that the alarm triggered the
-	 * wakeup, or (accordingly) disable it after it afterwards.
-	 * It's supposed to give oneshot behavior; cope.
-	 */
-	alm.enabled = false;
-	rtc_set_alarm(rtc, &alm);
-}
-
-static int __init has_wakealarm(struct device *dev, void *name_ptr)
-{
-	struct rtc_device *candidate = to_rtc_device(dev);
-
-	if (!candidate->ops->set_alarm)
-		return 0;
-	if (!device_may_wakeup(candidate->dev.parent))
-		return 0;
-
-	*(const char **)name_ptr = dev_name(dev);
-	return 1;
-}
-
-/*
- * Kernel options like "test_suspend=mem" force suspend/resume sanity tests
- * at startup time.  They're normally disabled, for faster boot and because
- * we can't know which states really work on this particular system.
- */
-static suspend_state_t test_state __initdata = PM_SUSPEND_ON;
-
-static char warn_bad_state[] __initdata =
-	KERN_WARNING "PM: can't test '%s' suspend state\n";
-
-static int __init setup_test_suspend(char *value)
-{
-	unsigned i;
-
-	/* "=mem" ==> "mem" */
-	value++;
-	for (i = 0; i < PM_SUSPEND_MAX; i++) {
-		if (!pm_states[i])
-			continue;
-		if (strcmp(pm_states[i], value) != 0)
-			continue;
-		test_state = (__force suspend_state_t) i;
-		return 0;
-	}
-	printk(warn_bad_state, value);
-	return 0;
-}
-__setup("test_suspend", setup_test_suspend);
-
-static int __init test_suspend(void)
-{
-	static char		warn_no_rtc[] __initdata =
-		KERN_WARNING "PM: no wakealarm-capable RTC driver is ready\n";
-
-	char			*pony = NULL;
-	struct rtc_device	*rtc = NULL;
-
-	/* PM is initialized by now; is that state testable? */
-	if (test_state == PM_SUSPEND_ON)
-		goto done;
-	if (!valid_state(test_state)) {
-		printk(warn_bad_state, pm_states[test_state]);
-		goto done;
-	}
-
-	/* RTCs have initialized by now too ... can we use one? */
-	class_find_device(rtc_class, NULL, &pony, has_wakealarm);
-	if (pony)
-		rtc = rtc_class_open(pony);
-	if (!rtc) {
-		printk(warn_no_rtc);
-		goto done;
-	}
-
-	/* go for it */
-	test_wakealarm(rtc, test_state);
-	rtc_class_close(rtc);
-done:
-	return 0;
-}
-late_initcall(test_suspend);
-
-#endif /* CONFIG_PM_TEST_SUSPEND */
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/power/suspend.c
@@ -0,0 +1,300 @@
+/*
+ * kernel/power/suspend.c - Suspend to RAM and standby functionality.
+ *
+ * Copyright (c) 2003 Patrick Mochel
+ * Copyright (c) 2003 Open Source Development Lab
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/string.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/console.h>
+#include <linux/cpu.h>
+#include <linux/syscalls.h>
+
+#include "power.h"
+
+const char *const pm_states[PM_SUSPEND_MAX] = {
+	[PM_SUSPEND_STANDBY]	= "standby",
+	[PM_SUSPEND_MEM]	= "mem",
+};
+
+static struct platform_suspend_ops *suspend_ops;
+
+/**
+ *	suspend_set_ops - Set the global suspend method table.
+ *	@ops:	Pointer to ops structure.
+ */
+void suspend_set_ops(struct platform_suspend_ops *ops)
+{
+	mutex_lock(&pm_mutex);
+	suspend_ops = ops;
+	mutex_unlock(&pm_mutex);
+}
+
+bool valid_state(suspend_state_t state)
+{
+	/*
+	 * All states need lowlevel support and need to be valid to the lowlevel
+	 * implementation, no valid callback implies that none are valid.
+	 */
+	return suspend_ops && suspend_ops->valid && suspend_ops->valid(state);
+}
+
+/**
+ * suspend_valid_only_mem - generic memory-only valid callback
+ *
+ * Platform drivers that implement mem suspend only and only need
+ * to check for that in their .valid callback can use this instead
+ * of rolling their own .valid callback.
+ */
+int suspend_valid_only_mem(suspend_state_t state)
+{
+	return state == PM_SUSPEND_MEM;
+}
+
+static int suspend_test(int level)
+{
+#ifdef CONFIG_PM_DEBUG
+	if (pm_test_level == level) {
+		printk(KERN_INFO "suspend debug: Waiting for 5 seconds.\n");
+		mdelay(5000);
+		return 1;
+	}
+#endif /* !CONFIG_PM_DEBUG */
+	return 0;
+}
+
+/**
+ *	suspend_prepare - Do prep work before entering low-power state.
+ *
+ *	This is common code that is called for each state that we're entering.
+ *	Run suspend notifiers, allocate a console and stop all processes.
+ */
+static int suspend_prepare(void)
+{
+	int error;
+
+	if (!suspend_ops || !suspend_ops->enter)
+		return -EPERM;
+
+	pm_prepare_console();
+
+	error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);
+	if (error)
+		goto Finish;
+
+	error = usermodehelper_disable();
+	if (error)
+		goto Finish;
+
+	error = suspend_freeze_processes();
+	if (!error)
+		return 0;
+
+	suspend_thaw_processes();
+	usermodehelper_enable();
+ Finish:
+	pm_notifier_call_chain(PM_POST_SUSPEND);
+	pm_restore_console();
+	return error;
+}
+
+/* default implementation */
+void __attribute__ ((weak)) arch_suspend_disable_irqs(void)
+{
+	local_irq_disable();
+}
+
+/* default implementation */
+void __attribute__ ((weak)) arch_suspend_enable_irqs(void)
+{
+	local_irq_enable();
+}
+
+/**
+ *	suspend_enter - enter the desired system sleep state.
+ *	@state:		state to enter
+ *
+ *	This function should be called after devices have been suspended.
+ */
+static int suspend_enter(suspend_state_t state)
+{
+	int error;
+
+	if (suspend_ops->prepare) {
+		error = suspend_ops->prepare();
+		if (error)
+			return error;
+	}
+
+	error = dpm_suspend_noirq(PMSG_SUSPEND);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down\n");
+		goto Platfrom_finish;
+	}
+
+	if (suspend_ops->prepare_late) {
+		error = suspend_ops->prepare_late();
+		if (error)
+			goto Power_up_devices;
+	}
+
+	if (suspend_test(TEST_PLATFORM))
+		goto Platform_wake;
+
+	error = disable_nonboot_cpus();
+	if (error || suspend_test(TEST_CPUS))
+		goto Enable_cpus;
+
+	arch_suspend_disable_irqs();
+	BUG_ON(!irqs_disabled());
+
+	error = sysdev_suspend(PMSG_SUSPEND);
+	if (!error) {
+		if (!suspend_test(TEST_CORE))
+			error = suspend_ops->enter(state);
+		sysdev_resume();
+	}
+
+	arch_suspend_enable_irqs();
+	BUG_ON(irqs_disabled());
+
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_wake:
+	if (suspend_ops->wake)
+		suspend_ops->wake();
+
+ Power_up_devices:
+	dpm_resume_noirq(PMSG_RESUME);
+
+ Platfrom_finish:
+	if (suspend_ops->finish)
+		suspend_ops->finish();
+
+	return error;
+}
+
+/**
+ *	suspend_devices_and_enter - suspend devices and enter the desired system
+ *				    sleep state.
+ *	@state:		  state to enter
+ */
+int suspend_devices_and_enter(suspend_state_t state)
+{
+	int error;
+
+	if (!suspend_ops)
+		return -ENOSYS;
+
+	if (suspend_ops->begin) {
+		error = suspend_ops->begin(state);
+		if (error)
+			goto Close;
+	}
+	suspend_console();
+	suspend_test_start();
+	error = dpm_suspend_start(PMSG_SUSPEND);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to suspend\n");
+		goto Recover_platform;
+	}
+	suspend_test_finish("suspend devices");
+	if (suspend_test(TEST_DEVICES))
+		goto Recover_platform;
+
+	suspend_enter(state);
+
+ Resume_devices:
+	suspend_test_start();
+	dpm_resume_end(PMSG_RESUME);
+	suspend_test_finish("resume devices");
+	resume_console();
+ Close:
+	if (suspend_ops->end)
+		suspend_ops->end();
+	return error;
+
+ Recover_platform:
+	if (suspend_ops->recover)
+		suspend_ops->recover();
+	goto Resume_devices;
+}
+
+/**
+ *	suspend_finish - Do final work before exiting suspend sequence.
+ *
+ *	Call platform code to clean up, restart processes, and free the
+ *	console that we've allocated. This is not called for suspend-to-disk.
+ */
+static void suspend_finish(void)
+{
+	suspend_thaw_processes();
+	usermodehelper_enable();
+	pm_notifier_call_chain(PM_POST_SUSPEND);
+	pm_restore_console();
+}
+
+/**
+ *	enter_state - Do common work of entering low-power state.
+ *	@state:		pm_state structure for state we're entering.
+ *
+ *	Make sure we're the only ones trying to enter a sleep state. Fail
+ *	if someone has beat us to it, since we don't want anything weird to
+ *	happen when we wake up.
+ *	Then, do the setup for suspend, enter the state, and cleaup (after
+ *	we've woken up).
+ */
+int enter_state(suspend_state_t state)
+{
+	int error;
+
+	if (!valid_state(state))
+		return -ENODEV;
+
+	if (!mutex_trylock(&pm_mutex))
+		return -EBUSY;
+
+	printk(KERN_INFO "PM: Syncing filesystems ... ");
+	sys_sync();
+	printk("done.\n");
+
+	pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
+	error = suspend_prepare();
+	if (error)
+		goto Unlock;
+
+	if (suspend_test(TEST_FREEZER))
+		goto Finish;
+
+	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
+	error = suspend_devices_and_enter(state);
+
+ Finish:
+	pr_debug("PM: Finishing wakeup.\n");
+	suspend_finish();
+ Unlock:
+	mutex_unlock(&pm_mutex);
+	return error;
+}
+
+/**
+ *	pm_suspend - Externally visible function for suspending system.
+ *	@state:		Enumerated value of state to enter.
+ *
+ *	Determine whether or not value is within range, get state
+ *	structure, and enter (above).
+ */
+int pm_suspend(suspend_state_t state)
+{
+	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
+		return enter_state(state);
+	return -EINVAL;
+}
+EXPORT_SYMBOL(pm_suspend);
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -160,15 +160,30 @@ extern void swsusp_show_speed(struct tim
 				unsigned int, char *);
 
 #ifdef CONFIG_SUSPEND
-/* kernel/power/main.c */
+/* kernel/power/suspend.c */
+extern const char *const pm_states[];
+
+extern bool valid_state(suspend_state_t state);
 extern int suspend_devices_and_enter(suspend_state_t state);
+extern int enter_state(suspend_state_t state);
 #else /* !CONFIG_SUSPEND */
 static inline int suspend_devices_and_enter(suspend_state_t state)
 {
 	return -ENOSYS;
 }
+static inline int enter_state(suspend_state_t state) { return -ENOSYS; }
+static inline bool valid_state(suspend_state_t state) { return false; }
 #endif /* !CONFIG_SUSPEND */
 
+#ifdef CONFIG_PM_TEST_SUSPEND
+/* kernel/power/suspend-test.c */
+extern void suspend_test_start(void);
+extern void suspend_test_finish(const char *label);
+#else /* !CONFIG_PM_TEST_SUSPEND */
+static inline void suspend_test_start(void) {}
+static inline void suspend_test_finish(const char *label) {}
+#endif /* !CONFIG_PM_TEST_SUSPEND */
+
 #ifdef CONFIG_PM_SLEEP
 /* kernel/power/main.c */
 extern int pm_notifier_call_chain(unsigned long val);
Index: linux-2.6/kernel/power/Makefile
===================================================================
--- linux-2.6.orig/kernel/power/Makefile
+++ linux-2.6/kernel/power/Makefile
@@ -6,6 +6,8 @@ endif
 obj-$(CONFIG_PM)		+= main.o
 obj-$(CONFIG_PM_SLEEP)		+= console.o
 obj-$(CONFIG_FREEZER)		+= process.o
+obj-$(CONFIG_SUSPEND)		+= suspend.o
+obj-$(CONFIG_PM_TEST_SUSPEND)	+= suspend-test.o
 obj-$(CONFIG_HIBERNATION)	+= swsusp.o disk.o snapshot.o swap.o user.o
 
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
Index: linux-2.6/kernel/power/suspend-test.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/power/suspend-test.c
@@ -0,0 +1,187 @@
+/*
+ * kernel/power/suspend-test.c - Suspend to RAM and standby test facility.
+ *
+ * Copyright (c) 2009 Pavel Machek <pavel@ucw.cz>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/init.h>
+#include <linux/rtc.h>
+
+#include "power.h"
+
+/*
+ * We test the system suspend code by setting an RTC wakealarm a short
+ * time in the future, then suspending.  Suspending the devices won't
+ * normally take long ... some systems only need a few milliseconds.
+ *
+ * The time it takes is system-specific though, so when we test this
+ * during system bootup we allow a LOT of time.
+ */
+#define TEST_SUSPEND_SECONDS	5
+
+static unsigned long suspend_test_start_time;
+
+void suspend_test_start(void)
+{
+	/* FIXME Use better timebase than "jiffies", ideally a clocksource.
+	 * What we want is a hardware counter that will work correctly even
+	 * during the irqs-are-off stages of the suspend/resume cycle...
+	 */
+	suspend_test_start_time = jiffies;
+}
+
+void suspend_test_finish(const char *label)
+{
+	long nj = jiffies - suspend_test_start_time;
+	unsigned msec;
+
+	msec = jiffies_to_msecs(abs(nj));
+	pr_info("PM: %s took %d.%03d seconds\n", label,
+			msec / 1000, msec % 1000);
+
+	/* Warning on suspend means the RTC alarm period needs to be
+	 * larger -- the system was sooo slooowwww to suspend that the
+	 * alarm (should have) fired before the system went to sleep!
+	 *
+	 * Warning on either suspend or resume also means the system
+	 * has some performance issues.  The stack dump of a WARN_ON
+	 * is more likely to get the right attention than a printk...
+	 */
+	WARN(msec > (TEST_SUSPEND_SECONDS * 1000), "Component: %s\n", label);
+}
+
+/*
+ * To test system suspend, we need a hands-off mechanism to resume the
+ * system.  RTCs wake alarms are a common self-contained mechanism.
+ */
+
+static void __init test_wakealarm(struct rtc_device *rtc, suspend_state_t state)
+{
+	static char err_readtime[] __initdata =
+		KERN_ERR "PM: can't read %s time, err %d\n";
+	static char err_wakealarm [] __initdata =
+		KERN_ERR "PM: can't set %s wakealarm, err %d\n";
+	static char err_suspend[] __initdata =
+		KERN_ERR "PM: suspend test failed, error %d\n";
+	static char info_test[] __initdata =
+		KERN_INFO "PM: test RTC wakeup from '%s' suspend\n";
+
+	unsigned long		now;
+	struct rtc_wkalrm	alm;
+	int			status;
+
+	/* this may fail if the RTC hasn't been initialized */
+	status = rtc_read_time(rtc, &alm.time);
+	if (status < 0) {
+		printk(err_readtime, dev_name(&rtc->dev), status);
+		return;
+	}
+	rtc_tm_to_time(&alm.time, &now);
+
+	memset(&alm, 0, sizeof alm);
+	rtc_time_to_tm(now + TEST_SUSPEND_SECONDS, &alm.time);
+	alm.enabled = true;
+
+	status = rtc_set_alarm(rtc, &alm);
+	if (status < 0) {
+		printk(err_wakealarm, dev_name(&rtc->dev), status);
+		return;
+	}
+
+	if (state == PM_SUSPEND_MEM) {
+		printk(info_test, pm_states[state]);
+		status = pm_suspend(state);
+		if (status == -ENODEV)
+			state = PM_SUSPEND_STANDBY;
+	}
+	if (state == PM_SUSPEND_STANDBY) {
+		printk(info_test, pm_states[state]);
+		status = pm_suspend(state);
+	}
+	if (status < 0)
+		printk(err_suspend, status);
+
+	/* Some platforms can't detect that the alarm triggered the
+	 * wakeup, or (accordingly) disable it after it afterwards.
+	 * It's supposed to give oneshot behavior; cope.
+	 */
+	alm.enabled = false;
+	rtc_set_alarm(rtc, &alm);
+}
+
+static int __init has_wakealarm(struct device *dev, void *name_ptr)
+{
+	struct rtc_device *candidate = to_rtc_device(dev);
+
+	if (!candidate->ops->set_alarm)
+		return 0;
+	if (!device_may_wakeup(candidate->dev.parent))
+		return 0;
+
+	*(const char **)name_ptr = dev_name(dev);
+	return 1;
+}
+
+/*
+ * Kernel options like "test_suspend=mem" force suspend/resume sanity tests
+ * at startup time.  They're normally disabled, for faster boot and because
+ * we can't know which states really work on this particular system.
+ */
+static suspend_state_t test_state __initdata = PM_SUSPEND_ON;
+
+static char warn_bad_state[] __initdata =
+	KERN_WARNING "PM: can't test '%s' suspend state\n";
+
+static int __init setup_test_suspend(char *value)
+{
+	unsigned i;
+
+	/* "=mem" ==> "mem" */
+	value++;
+	for (i = 0; i < PM_SUSPEND_MAX; i++) {
+		if (!pm_states[i])
+			continue;
+		if (strcmp(pm_states[i], value) != 0)
+			continue;
+		test_state = (__force suspend_state_t) i;
+		return 0;
+	}
+	printk(warn_bad_state, value);
+	return 0;
+}
+__setup("test_suspend", setup_test_suspend);
+
+static int __init test_suspend(void)
+{
+	static char		warn_no_rtc[] __initdata =
+		KERN_WARNING "PM: no wakealarm-capable RTC driver is ready\n";
+
+	char			*pony = NULL;
+	struct rtc_device	*rtc = NULL;
+
+	/* PM is initialized by now; is that state testable? */
+	if (test_state == PM_SUSPEND_ON)
+		goto done;
+	if (!valid_state(test_state)) {
+		printk(warn_bad_state, pm_states[test_state]);
+		goto done;
+	}
+
+	/* RTCs have initialized by now too ... can we use one? */
+	class_find_device(rtc_class, NULL, &pony, has_wakealarm);
+	if (pony)
+		rtc = rtc_class_open(pony);
+	if (!rtc) {
+		printk(warn_no_rtc);
+		goto done;
+	}
+
+	/* go for it */
+	test_wakealarm(rtc, test_state);
+	rtc_class_close(rtc);
+done:
+	return 0;
+}
+late_initcall(test_suspend);

^ permalink raw reply	[flat|nested] 199+ messages in thread

* [RFC][PATCH 2/2] PM/Hibernate: Rename disk.c to hibernate.c
  2009-06-06 22:54 [RFC][PATCH 0/2] PM: Rearrange core suspend code Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2009-06-06 22:56 ` [RFC][PATCH 2/2] PM/Hibernate: Rename disk.c to hibernate.c Rafael J. Wysocki
@ 2009-06-06 22:56 ` Rafael J. Wysocki
  2009-06-08  6:37   ` Pavel Machek
  2009-06-08  6:37   ` Pavel Machek
  2009-06-07 20:51 ` [RFC][PATCH 0/2] PM: Rearrange core suspend code Alan Stern
  2009-06-07 20:51 ` [linux-pm] " Alan Stern
  5 siblings, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-06 22:56 UTC (permalink / raw)
  To: pm list; +Cc: Pavel Machek, LKML, ACPI Devel Maling List, Len Brown

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the name of kernel/power/disk.c to kernel/power/hibernate.c
in analogy with the file names introduced by the changes that
separated the suspend to RAM and standby funtionality from the
common PM functions.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/Makefile    |    2 
 kernel/power/disk.c      |  955 -----------------------------------------------
 kernel/power/hibernate.c |  955 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/power/power.h     |    4 
 4 files changed, 958 insertions(+), 958 deletions(-)

Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -45,7 +45,7 @@ static inline char *check_image_kernel(s
  */
 #define SPARE_PAGES	((1024 * 1024) >> PAGE_SHIFT)
 
-/* kernel/power/disk.c */
+/* kernel/power/hibernate.c */
 extern int hibernation_snapshot(int platform_mode);
 extern int hibernation_restore(int platform_mode);
 extern int hibernation_platform_enter(void);
@@ -147,7 +147,7 @@ extern int swsusp_swap_in_use(void);
  */
 #define SF_PLATFORM_MODE	1
 
-/* kernel/power/disk.c */
+/* kernel/power/hibernate.c */
 extern int swsusp_check(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
Index: linux-2.6/kernel/power/Makefile
===================================================================
--- linux-2.6.orig/kernel/power/Makefile
+++ linux-2.6/kernel/power/Makefile
@@ -8,6 +8,6 @@ obj-$(CONFIG_PM_SLEEP)		+= console.o
 obj-$(CONFIG_FREEZER)		+= process.o
 obj-$(CONFIG_SUSPEND)		+= suspend.o
 obj-$(CONFIG_PM_TEST_SUSPEND)	+= suspend-test.o
-obj-$(CONFIG_HIBERNATION)	+= swsusp.o disk.o snapshot.o swap.o user.o
+obj-$(CONFIG_HIBERNATION)	+= swsusp.o hibernate.o snapshot.o swap.o user.o
 
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ /dev/null
@@ -1,955 +0,0 @@
-/*
- * kernel/power/disk.c - Suspend-to-disk support.
- *
- * Copyright (c) 2003 Patrick Mochel
- * Copyright (c) 2003 Open Source Development Lab
- * Copyright (c) 2004 Pavel Machek <pavel@suse.cz>
- *
- * This file is released under the GPLv2.
- *
- */
-
-#include <linux/suspend.h>
-#include <linux/syscalls.h>
-#include <linux/reboot.h>
-#include <linux/string.h>
-#include <linux/device.h>
-#include <linux/kmod.h>
-#include <linux/delay.h>
-#include <linux/fs.h>
-#include <linux/mount.h>
-#include <linux/pm.h>
-#include <linux/console.h>
-#include <linux/cpu.h>
-#include <linux/freezer.h>
-#include <scsi/scsi_scan.h>
-#include <asm/suspend.h>
-
-#include "power.h"
-
-
-static int noresume = 0;
-static char resume_file[256] = CONFIG_PM_STD_PARTITION;
-dev_t swsusp_resume_device;
-sector_t swsusp_resume_block;
-
-enum {
-	HIBERNATION_INVALID,
-	HIBERNATION_PLATFORM,
-	HIBERNATION_TEST,
-	HIBERNATION_TESTPROC,
-	HIBERNATION_SHUTDOWN,
-	HIBERNATION_REBOOT,
-	/* keep last */
-	__HIBERNATION_AFTER_LAST
-};
-#define HIBERNATION_MAX (__HIBERNATION_AFTER_LAST-1)
-#define HIBERNATION_FIRST (HIBERNATION_INVALID + 1)
-
-static int hibernation_mode = HIBERNATION_SHUTDOWN;
-
-static struct platform_hibernation_ops *hibernation_ops;
-
-/**
- * hibernation_set_ops - set the global hibernate operations
- * @ops: the hibernation operations to use in subsequent hibernation transitions
- */
-
-void hibernation_set_ops(struct platform_hibernation_ops *ops)
-{
-	if (ops && !(ops->begin && ops->end &&  ops->pre_snapshot
-	    && ops->prepare && ops->finish && ops->enter && ops->pre_restore
-	    && ops->restore_cleanup)) {
-		WARN_ON(1);
-		return;
-	}
-	mutex_lock(&pm_mutex);
-	hibernation_ops = ops;
-	if (ops)
-		hibernation_mode = HIBERNATION_PLATFORM;
-	else if (hibernation_mode == HIBERNATION_PLATFORM)
-		hibernation_mode = HIBERNATION_SHUTDOWN;
-
-	mutex_unlock(&pm_mutex);
-}
-
-static bool entering_platform_hibernation;
-
-bool system_entering_hibernation(void)
-{
-	return entering_platform_hibernation;
-}
-EXPORT_SYMBOL(system_entering_hibernation);
-
-#ifdef CONFIG_PM_DEBUG
-static void hibernation_debug_sleep(void)
-{
-	printk(KERN_INFO "hibernation debug: Waiting for 5 seconds.\n");
-	mdelay(5000);
-}
-
-static int hibernation_testmode(int mode)
-{
-	if (hibernation_mode == mode) {
-		hibernation_debug_sleep();
-		return 1;
-	}
-	return 0;
-}
-
-static int hibernation_test(int level)
-{
-	if (pm_test_level == level) {
-		hibernation_debug_sleep();
-		return 1;
-	}
-	return 0;
-}
-#else /* !CONFIG_PM_DEBUG */
-static int hibernation_testmode(int mode) { return 0; }
-static int hibernation_test(int level) { return 0; }
-#endif /* !CONFIG_PM_DEBUG */
-
-/**
- *	platform_begin - tell the platform driver that we're starting
- *	hibernation
- */
-
-static int platform_begin(int platform_mode)
-{
-	return (platform_mode && hibernation_ops) ?
-		hibernation_ops->begin() : 0;
-}
-
-/**
- *	platform_end - tell the platform driver that we've entered the
- *	working state
- */
-
-static void platform_end(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->end();
-}
-
-/**
- *	platform_pre_snapshot - prepare the machine for hibernation using the
- *	platform driver if so configured and return an error code if it fails
- */
-
-static int platform_pre_snapshot(int platform_mode)
-{
-	return (platform_mode && hibernation_ops) ?
-		hibernation_ops->pre_snapshot() : 0;
-}
-
-/**
- *	platform_leave - prepare the machine for switching to the normal mode
- *	of operation using the platform driver (called with interrupts disabled)
- */
-
-static void platform_leave(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->leave();
-}
-
-/**
- *	platform_finish - switch the machine to the normal mode of operation
- *	using the platform driver (must be called after platform_prepare())
- */
-
-static void platform_finish(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->finish();
-}
-
-/**
- *	platform_pre_restore - prepare the platform for the restoration from a
- *	hibernation image.  If the restore fails after this function has been
- *	called, platform_restore_cleanup() must be called.
- */
-
-static int platform_pre_restore(int platform_mode)
-{
-	return (platform_mode && hibernation_ops) ?
-		hibernation_ops->pre_restore() : 0;
-}
-
-/**
- *	platform_restore_cleanup - switch the platform to the normal mode of
- *	operation after a failing restore.  If platform_pre_restore() has been
- *	called before the failing restore, this function must be called too,
- *	regardless of the result of platform_pre_restore().
- */
-
-static void platform_restore_cleanup(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->restore_cleanup();
-}
-
-/**
- *	platform_recover - recover the platform from a failure to suspend
- *	devices.
- */
-
-static void platform_recover(int platform_mode)
-{
-	if (platform_mode && hibernation_ops && hibernation_ops->recover)
-		hibernation_ops->recover();
-}
-
-/**
- *	create_image - freeze devices that need to be frozen with interrupts
- *	off, create the hibernation image and thaw those devices.  Control
- *	reappears in this routine after a restore.
- */
-
-static int create_image(int platform_mode)
-{
-	int error;
-
-	error = arch_prepare_suspend();
-	if (error)
-		return error;
-
-	/* At this point, dpm_suspend_start() has been called, but *not*
-	 * dpm_suspend_noirq(). We *must* call dpm_suspend_noirq() now.
-	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
-	 * become desynchronized with the actual state of the hardware
-	 * at resume time, and evil weirdness ensues.
-	 */
-	error = dpm_suspend_noirq(PMSG_FREEZE);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to power down, "
-			"aborting hibernation\n");
-		return error;
-	}
-
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Platform_finish;
-
-	error = disable_nonboot_cpus();
-	if (error || hibernation_test(TEST_CPUS)
-	    || hibernation_testmode(HIBERNATION_TEST))
-		goto Enable_cpus;
-
-	local_irq_disable();
-
-	error = sysdev_suspend(PMSG_FREEZE);
-	if (error) {
-		printk(KERN_ERR "PM: Some system devices failed to power down, "
-			"aborting hibernation\n");
-		goto Enable_irqs;
-	}
-
-	if (hibernation_test(TEST_CORE))
-		goto Power_up;
-
-	in_suspend = 1;
-	save_processor_state();
-	error = swsusp_arch_suspend();
-	if (error)
-		printk(KERN_ERR "PM: Error %d creating hibernation image\n",
-			error);
-	/* Restore control flow magically appears here */
-	restore_processor_state();
-	if (!in_suspend)
-		platform_leave(platform_mode);
-
- Power_up:
-	sysdev_resume();
-	/* NOTE:  dpm_resume_noirq() is just a resume() for devices
-	 * that suspended with irqs off ... no overall powerup.
-	 */
-
- Enable_irqs:
-	local_irq_enable();
-
- Enable_cpus:
-	enable_nonboot_cpus();
-
- Platform_finish:
-	platform_finish(platform_mode);
-
-	dpm_resume_noirq(in_suspend ?
-		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
-
-	return error;
-}
-
-/**
- *	hibernation_snapshot - quiesce devices and create the hibernation
- *	snapshot image.
- *	@platform_mode - if set, use the platform driver, if available, to
- *			 prepare the platform firmware for the power transition.
- *
- *	Must be called with pm_mutex held
- */
-
-int hibernation_snapshot(int platform_mode)
-{
-	int error;
-
-	error = platform_begin(platform_mode);
-	if (error)
-		return error;
-
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
-	if (error)
-		goto Close;
-
-	suspend_console();
-	error = dpm_suspend_start(PMSG_FREEZE);
-	if (error)
-		goto Recover_platform;
-
-	if (hibernation_test(TEST_DEVICES))
-		goto Recover_platform;
-
-	error = create_image(platform_mode);
-	/* Control returns here after successful restore */
-
- Resume_devices:
-	dpm_resume_end(in_suspend ?
-		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
-	resume_console();
- Close:
-	platform_end(platform_mode);
-	return error;
-
- Recover_platform:
-	platform_recover(platform_mode);
-	goto Resume_devices;
-}
-
-/**
- *	resume_target_kernel - prepare devices that need to be suspended with
- *	interrupts off, restore the contents of highmem that have not been
- *	restored yet from the image and run the low level code that will restore
- *	the remaining contents of memory and switch to the just restored target
- *	kernel.
- */
-
-static int resume_target_kernel(bool platform_mode)
-{
-	int error;
-
-	error = dpm_suspend_noirq(PMSG_QUIESCE);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to power down, "
-			"aborting resume\n");
-		return error;
-	}
-
-	error = platform_pre_restore(platform_mode);
-	if (error)
-		goto Cleanup;
-
-	error = disable_nonboot_cpus();
-	if (error)
-		goto Enable_cpus;
-
-	local_irq_disable();
-
-	error = sysdev_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Enable_irqs;
-
-	/* We'll ignore saved state, but this gets preempt count (etc) right */
-	save_processor_state();
-	error = restore_highmem();
-	if (!error) {
-		error = swsusp_arch_resume();
-		/*
-		 * The code below is only ever reached in case of a failure.
-		 * Otherwise execution continues at place where
-		 * swsusp_arch_suspend() was called
-		 */
-		BUG_ON(!error);
-		/* This call to restore_highmem() undos the previous one */
-		restore_highmem();
-	}
-	/*
-	 * The only reason why swsusp_arch_resume() can fail is memory being
-	 * very tight, so we have to free it as soon as we can to avoid
-	 * subsequent failures
-	 */
-	swsusp_free();
-	restore_processor_state();
-	touch_softlockup_watchdog();
-
-	sysdev_resume();
-
- Enable_irqs:
-	local_irq_enable();
-
- Enable_cpus:
-	enable_nonboot_cpus();
-
- Cleanup:
-	platform_restore_cleanup(platform_mode);
-
-	dpm_resume_noirq(PMSG_RECOVER);
-
-	return error;
-}
-
-/**
- *	hibernation_restore - quiesce devices and restore the hibernation
- *	snapshot image.  If successful, control returns in hibernation_snaphot()
- *	@platform_mode - if set, use the platform driver, if available, to
- *			 prepare the platform firmware for the transition.
- *
- *	Must be called with pm_mutex held
- */
-
-int hibernation_restore(int platform_mode)
-{
-	int error;
-
-	pm_prepare_console();
-	suspend_console();
-	error = dpm_suspend_start(PMSG_QUIESCE);
-	if (!error) {
-		error = resume_target_kernel(platform_mode);
-		dpm_resume_end(PMSG_RECOVER);
-	}
-	resume_console();
-	pm_restore_console();
-	return error;
-}
-
-/**
- *	hibernation_platform_enter - enter the hibernation state using the
- *	platform driver (if available)
- */
-
-int hibernation_platform_enter(void)
-{
-	int error;
-
-	if (!hibernation_ops)
-		return -ENOSYS;
-
-	/*
-	 * We have cancelled the power transition by running
-	 * hibernation_ops->finish() before saving the image, so we should let
-	 * the firmware know that we're going to enter the sleep state after all
-	 */
-	error = hibernation_ops->begin();
-	if (error)
-		goto Close;
-
-	entering_platform_hibernation = true;
-	suspend_console();
-	error = dpm_suspend_start(PMSG_HIBERNATE);
-	if (error) {
-		if (hibernation_ops->recover)
-			hibernation_ops->recover();
-		goto Resume_devices;
-	}
-
-	error = dpm_suspend_noirq(PMSG_HIBERNATE);
-	if (error)
-		goto Resume_devices;
-
-	error = hibernation_ops->prepare();
-	if (error)
-		goto Platofrm_finish;
-
-	error = disable_nonboot_cpus();
-	if (error)
-		goto Platofrm_finish;
-
-	local_irq_disable();
-	sysdev_suspend(PMSG_HIBERNATE);
-	hibernation_ops->enter();
-	/* We should never get here */
-	while (1);
-
-	/*
-	 * We don't need to reenable the nonboot CPUs or resume consoles, since
-	 * the system is going to be halted anyway.
-	 */
- Platofrm_finish:
-	hibernation_ops->finish();
-
-	dpm_suspend_noirq(PMSG_RESTORE);
-
- Resume_devices:
-	entering_platform_hibernation = false;
-	dpm_resume_end(PMSG_RESTORE);
-	resume_console();
-
- Close:
-	hibernation_ops->end();
-
-	return error;
-}
-
-/**
- *	power_down - Shut the machine down for hibernation.
- *
- *	Use the platform driver, if configured so; otherwise try
- *	to power off or reboot.
- */
-
-static void power_down(void)
-{
-	switch (hibernation_mode) {
-	case HIBERNATION_TEST:
-	case HIBERNATION_TESTPROC:
-		break;
-	case HIBERNATION_REBOOT:
-		kernel_restart(NULL);
-		break;
-	case HIBERNATION_PLATFORM:
-		hibernation_platform_enter();
-	case HIBERNATION_SHUTDOWN:
-		kernel_power_off();
-		break;
-	}
-	kernel_halt();
-	/*
-	 * Valid image is on the disk, if we continue we risk serious data
-	 * corruption after resume.
-	 */
-	printk(KERN_CRIT "PM: Please power down manually\n");
-	while(1);
-}
-
-static int prepare_processes(void)
-{
-	int error = 0;
-
-	if (freeze_processes()) {
-		error = -EBUSY;
-		thaw_processes();
-	}
-	return error;
-}
-
-/**
- *	hibernate - The granpappy of the built-in hibernation management
- */
-
-int hibernate(void)
-{
-	int error;
-
-	mutex_lock(&pm_mutex);
-	/* The snapshot device should not be opened while we're running */
-	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
-		error = -EBUSY;
-		goto Unlock;
-	}
-
-	pm_prepare_console();
-	error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
-	if (error)
-		goto Exit;
-
-	error = usermodehelper_disable();
-	if (error)
-		goto Exit;
-
-	/* Allocate memory management structures */
-	error = create_basic_memory_bitmaps();
-	if (error)
-		goto Exit;
-
-	printk(KERN_INFO "PM: Syncing filesystems ... ");
-	sys_sync();
-	printk("done.\n");
-
-	error = prepare_processes();
-	if (error)
-		goto Finish;
-
-	if (hibernation_test(TEST_FREEZER))
-		goto Thaw;
-
-	if (hibernation_testmode(HIBERNATION_TESTPROC))
-		goto Thaw;
-
-	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
-		unsigned int flags = 0;
-
-		if (hibernation_mode == HIBERNATION_PLATFORM)
-			flags |= SF_PLATFORM_MODE;
-		pr_debug("PM: writing image.\n");
-		error = swsusp_write(flags);
-		swsusp_free();
-		if (!error)
-			power_down();
-	} else {
-		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
-	}
- Thaw:
-	thaw_processes();
- Finish:
-	free_basic_memory_bitmaps();
-	usermodehelper_enable();
- Exit:
-	pm_notifier_call_chain(PM_POST_HIBERNATION);
-	pm_restore_console();
-	atomic_inc(&snapshot_device_available);
- Unlock:
-	mutex_unlock(&pm_mutex);
-	return error;
-}
-
-
-/**
- *	software_resume - Resume from a saved image.
- *
- *	Called as a late_initcall (so all devices are discovered and
- *	initialized), we call swsusp to see if we have a saved image or not.
- *	If so, we quiesce devices, the restore the saved image. We will
- *	return above (in hibernate() ) if everything goes well.
- *	Otherwise, we fail gracefully and return to the normally
- *	scheduled program.
- *
- */
-
-static int software_resume(void)
-{
-	int error;
-	unsigned int flags;
-
-	/*
-	 * If the user said "noresume".. bail out early.
-	 */
-	if (noresume)
-		return 0;
-
-	/*
-	 * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
-	 * is configured into the kernel. Since the regular hibernate
-	 * trigger path is via sysfs which takes a buffer mutex before
-	 * calling hibernate functions (which take pm_mutex) this can
-	 * cause lockdep to complain about a possible ABBA deadlock
-	 * which cannot happen since we're in the boot code here and
-	 * sysfs can't be invoked yet. Therefore, we use a subclass
-	 * here to avoid lockdep complaining.
-	 */
-	mutex_lock_nested(&pm_mutex, SINGLE_DEPTH_NESTING);
-
-	if (swsusp_resume_device)
-		goto Check_image;
-
-	if (!strlen(resume_file)) {
-		error = -ENOENT;
-		goto Unlock;
-	}
-
-	pr_debug("PM: Checking image partition %s\n", resume_file);
-
-	/* Check if the device is there */
-	swsusp_resume_device = name_to_dev_t(resume_file);
-	if (!swsusp_resume_device) {
-		/*
-		 * Some device discovery might still be in progress; we need
-		 * to wait for this to finish.
-		 */
-		wait_for_device_probe();
-		/*
-		 * We can't depend on SCSI devices being available after loading
-		 * one of their modules until scsi_complete_async_scans() is
-		 * called and the resume device usually is a SCSI one.
-		 */
-		scsi_complete_async_scans();
-
-		swsusp_resume_device = name_to_dev_t(resume_file);
-		if (!swsusp_resume_device) {
-			error = -ENODEV;
-			goto Unlock;
-		}
-	}
-
- Check_image:
-	pr_debug("PM: Resume from partition %d:%d\n",
-		MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
-
-	pr_debug("PM: Checking hibernation image.\n");
-	error = swsusp_check();
-	if (error)
-		goto Unlock;
-
-	/* The snapshot device should not be opened while we're running */
-	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
-		error = -EBUSY;
-		goto Unlock;
-	}
-
-	pm_prepare_console();
-	error = pm_notifier_call_chain(PM_RESTORE_PREPARE);
-	if (error)
-		goto Finish;
-
-	error = usermodehelper_disable();
-	if (error)
-		goto Finish;
-
-	error = create_basic_memory_bitmaps();
-	if (error)
-		goto Finish;
-
-	pr_debug("PM: Preparing processes for restore.\n");
-	error = prepare_processes();
-	if (error) {
-		swsusp_close(FMODE_READ);
-		goto Done;
-	}
-
-	pr_debug("PM: Reading hibernation image.\n");
-
-	error = swsusp_read(&flags);
-	if (!error)
-		hibernation_restore(flags & SF_PLATFORM_MODE);
-
-	printk(KERN_ERR "PM: Restore failed, recovering.\n");
-	swsusp_free();
-	thaw_processes();
- Done:
-	free_basic_memory_bitmaps();
-	usermodehelper_enable();
- Finish:
-	pm_notifier_call_chain(PM_POST_RESTORE);
-	pm_restore_console();
-	atomic_inc(&snapshot_device_available);
-	/* For success case, the suspend path will release the lock */
- Unlock:
-	mutex_unlock(&pm_mutex);
-	pr_debug("PM: Resume from disk failed.\n");
-	return error;
-}
-
-late_initcall(software_resume);
-
-
-static const char * const hibernation_modes[] = {
-	[HIBERNATION_PLATFORM]	= "platform",
-	[HIBERNATION_SHUTDOWN]	= "shutdown",
-	[HIBERNATION_REBOOT]	= "reboot",
-	[HIBERNATION_TEST]	= "test",
-	[HIBERNATION_TESTPROC]	= "testproc",
-};
-
-/**
- *	disk - Control hibernation mode
- *
- *	Suspend-to-disk can be handled in several ways. We have a few options
- *	for putting the system to sleep - using the platform driver (e.g. ACPI
- *	or other hibernation_ops), powering off the system or rebooting the
- *	system (for testing) as well as the two test modes.
- *
- *	The system can support 'platform', and that is known a priori (and
- *	encoded by the presence of hibernation_ops). However, the user may
- *	choose 'shutdown' or 'reboot' as alternatives, as well as one fo the
- *	test modes, 'test' or 'testproc'.
- *
- *	show() will display what the mode is currently set to.
- *	store() will accept one of
- *
- *	'platform'
- *	'shutdown'
- *	'reboot'
- *	'test'
- *	'testproc'
- *
- *	It will only change to 'platform' if the system
- *	supports it (as determined by having hibernation_ops).
- */
-
-static ssize_t disk_show(struct kobject *kobj, struct kobj_attribute *attr,
-			 char *buf)
-{
-	int i;
-	char *start = buf;
-
-	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
-		if (!hibernation_modes[i])
-			continue;
-		switch (i) {
-		case HIBERNATION_SHUTDOWN:
-		case HIBERNATION_REBOOT:
-		case HIBERNATION_TEST:
-		case HIBERNATION_TESTPROC:
-			break;
-		case HIBERNATION_PLATFORM:
-			if (hibernation_ops)
-				break;
-			/* not a valid mode, continue with loop */
-			continue;
-		}
-		if (i == hibernation_mode)
-			buf += sprintf(buf, "[%s] ", hibernation_modes[i]);
-		else
-			buf += sprintf(buf, "%s ", hibernation_modes[i]);
-	}
-	buf += sprintf(buf, "\n");
-	return buf-start;
-}
-
-
-static ssize_t disk_store(struct kobject *kobj, struct kobj_attribute *attr,
-			  const char *buf, size_t n)
-{
-	int error = 0;
-	int i;
-	int len;
-	char *p;
-	int mode = HIBERNATION_INVALID;
-
-	p = memchr(buf, '\n', n);
-	len = p ? p - buf : n;
-
-	mutex_lock(&pm_mutex);
-	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
-		if (len == strlen(hibernation_modes[i])
-		    && !strncmp(buf, hibernation_modes[i], len)) {
-			mode = i;
-			break;
-		}
-	}
-	if (mode != HIBERNATION_INVALID) {
-		switch (mode) {
-		case HIBERNATION_SHUTDOWN:
-		case HIBERNATION_REBOOT:
-		case HIBERNATION_TEST:
-		case HIBERNATION_TESTPROC:
-			hibernation_mode = mode;
-			break;
-		case HIBERNATION_PLATFORM:
-			if (hibernation_ops)
-				hibernation_mode = mode;
-			else
-				error = -EINVAL;
-		}
-	} else
-		error = -EINVAL;
-
-	if (!error)
-		pr_debug("PM: Hibernation mode set to '%s'\n",
-			 hibernation_modes[mode]);
-	mutex_unlock(&pm_mutex);
-	return error ? error : n;
-}
-
-power_attr(disk);
-
-static ssize_t resume_show(struct kobject *kobj, struct kobj_attribute *attr,
-			   char *buf)
-{
-	return sprintf(buf,"%d:%d\n", MAJOR(swsusp_resume_device),
-		       MINOR(swsusp_resume_device));
-}
-
-static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
-			    const char *buf, size_t n)
-{
-	unsigned int maj, min;
-	dev_t res;
-	int ret = -EINVAL;
-
-	if (sscanf(buf, "%u:%u", &maj, &min) != 2)
-		goto out;
-
-	res = MKDEV(maj,min);
-	if (maj != MAJOR(res) || min != MINOR(res))
-		goto out;
-
-	mutex_lock(&pm_mutex);
-	swsusp_resume_device = res;
-	mutex_unlock(&pm_mutex);
-	printk(KERN_INFO "PM: Starting manual resume from disk\n");
-	noresume = 0;
-	software_resume();
-	ret = n;
- out:
-	return ret;
-}
-
-power_attr(resume);
-
-static ssize_t image_size_show(struct kobject *kobj, struct kobj_attribute *attr,
-			       char *buf)
-{
-	return sprintf(buf, "%lu\n", image_size);
-}
-
-static ssize_t image_size_store(struct kobject *kobj, struct kobj_attribute *attr,
-				const char *buf, size_t n)
-{
-	unsigned long size;
-
-	if (sscanf(buf, "%lu", &size) == 1) {
-		image_size = size;
-		return n;
-	}
-
-	return -EINVAL;
-}
-
-power_attr(image_size);
-
-static struct attribute * g[] = {
-	&disk_attr.attr,
-	&resume_attr.attr,
-	&image_size_attr.attr,
-	NULL,
-};
-
-
-static struct attribute_group attr_group = {
-	.attrs = g,
-};
-
-
-static int __init pm_disk_init(void)
-{
-	return sysfs_create_group(power_kobj, &attr_group);
-}
-
-core_initcall(pm_disk_init);
-
-
-static int __init resume_setup(char *str)
-{
-	if (noresume)
-		return 1;
-
-	strncpy( resume_file, str, 255 );
-	return 1;
-}
-
-static int __init resume_offset_setup(char *str)
-{
-	unsigned long long offset;
-
-	if (noresume)
-		return 1;
-
-	if (sscanf(str, "%llu", &offset) == 1)
-		swsusp_resume_block = offset;
-
-	return 1;
-}
-
-static int __init noresume_setup(char *str)
-{
-	noresume = 1;
-	return 1;
-}
-
-__setup("noresume", noresume_setup);
-__setup("resume_offset=", resume_offset_setup);
-__setup("resume=", resume_setup);
Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/power/hibernate.c
@@ -0,0 +1,955 @@
+/*
+ * kernel/power/hibernate.c - Hibernation (a.k.a suspend-to-disk) support.
+ *
+ * Copyright (c) 2003 Patrick Mochel
+ * Copyright (c) 2003 Open Source Development Lab
+ * Copyright (c) 2004 Pavel Machek <pavel@suse.cz>
+ * Copyright (c) 2009 Rafael J. Wysocki, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/suspend.h>
+#include <linux/syscalls.h>
+#include <linux/reboot.h>
+#include <linux/string.h>
+#include <linux/device.h>
+#include <linux/kmod.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/pm.h>
+#include <linux/console.h>
+#include <linux/cpu.h>
+#include <linux/freezer.h>
+#include <scsi/scsi_scan.h>
+#include <asm/suspend.h>
+
+#include "power.h"
+
+
+static int noresume = 0;
+static char resume_file[256] = CONFIG_PM_STD_PARTITION;
+dev_t swsusp_resume_device;
+sector_t swsusp_resume_block;
+
+enum {
+	HIBERNATION_INVALID,
+	HIBERNATION_PLATFORM,
+	HIBERNATION_TEST,
+	HIBERNATION_TESTPROC,
+	HIBERNATION_SHUTDOWN,
+	HIBERNATION_REBOOT,
+	/* keep last */
+	__HIBERNATION_AFTER_LAST
+};
+#define HIBERNATION_MAX (__HIBERNATION_AFTER_LAST-1)
+#define HIBERNATION_FIRST (HIBERNATION_INVALID + 1)
+
+static int hibernation_mode = HIBERNATION_SHUTDOWN;
+
+static struct platform_hibernation_ops *hibernation_ops;
+
+/**
+ * hibernation_set_ops - set the global hibernate operations
+ * @ops: the hibernation operations to use in subsequent hibernation transitions
+ */
+
+void hibernation_set_ops(struct platform_hibernation_ops *ops)
+{
+	if (ops && !(ops->begin && ops->end &&  ops->pre_snapshot
+	    && ops->prepare && ops->finish && ops->enter && ops->pre_restore
+	    && ops->restore_cleanup)) {
+		WARN_ON(1);
+		return;
+	}
+	mutex_lock(&pm_mutex);
+	hibernation_ops = ops;
+	if (ops)
+		hibernation_mode = HIBERNATION_PLATFORM;
+	else if (hibernation_mode == HIBERNATION_PLATFORM)
+		hibernation_mode = HIBERNATION_SHUTDOWN;
+
+	mutex_unlock(&pm_mutex);
+}
+
+static bool entering_platform_hibernation;
+
+bool system_entering_hibernation(void)
+{
+	return entering_platform_hibernation;
+}
+EXPORT_SYMBOL(system_entering_hibernation);
+
+#ifdef CONFIG_PM_DEBUG
+static void hibernation_debug_sleep(void)
+{
+	printk(KERN_INFO "hibernation debug: Waiting for 5 seconds.\n");
+	mdelay(5000);
+}
+
+static int hibernation_testmode(int mode)
+{
+	if (hibernation_mode == mode) {
+		hibernation_debug_sleep();
+		return 1;
+	}
+	return 0;
+}
+
+static int hibernation_test(int level)
+{
+	if (pm_test_level == level) {
+		hibernation_debug_sleep();
+		return 1;
+	}
+	return 0;
+}
+#else /* !CONFIG_PM_DEBUG */
+static int hibernation_testmode(int mode) { return 0; }
+static int hibernation_test(int level) { return 0; }
+#endif /* !CONFIG_PM_DEBUG */
+
+/**
+ *	platform_begin - tell the platform driver that we're starting
+ *	hibernation
+ */
+
+static int platform_begin(int platform_mode)
+{
+	return (platform_mode && hibernation_ops) ?
+		hibernation_ops->begin() : 0;
+}
+
+/**
+ *	platform_end - tell the platform driver that we've entered the
+ *	working state
+ */
+
+static void platform_end(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->end();
+}
+
+/**
+ *	platform_pre_snapshot - prepare the machine for hibernation using the
+ *	platform driver if so configured and return an error code if it fails
+ */
+
+static int platform_pre_snapshot(int platform_mode)
+{
+	return (platform_mode && hibernation_ops) ?
+		hibernation_ops->pre_snapshot() : 0;
+}
+
+/**
+ *	platform_leave - prepare the machine for switching to the normal mode
+ *	of operation using the platform driver (called with interrupts disabled)
+ */
+
+static void platform_leave(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->leave();
+}
+
+/**
+ *	platform_finish - switch the machine to the normal mode of operation
+ *	using the platform driver (must be called after platform_prepare())
+ */
+
+static void platform_finish(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->finish();
+}
+
+/**
+ *	platform_pre_restore - prepare the platform for the restoration from a
+ *	hibernation image.  If the restore fails after this function has been
+ *	called, platform_restore_cleanup() must be called.
+ */
+
+static int platform_pre_restore(int platform_mode)
+{
+	return (platform_mode && hibernation_ops) ?
+		hibernation_ops->pre_restore() : 0;
+}
+
+/**
+ *	platform_restore_cleanup - switch the platform to the normal mode of
+ *	operation after a failing restore.  If platform_pre_restore() has been
+ *	called before the failing restore, this function must be called too,
+ *	regardless of the result of platform_pre_restore().
+ */
+
+static void platform_restore_cleanup(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->restore_cleanup();
+}
+
+/**
+ *	platform_recover - recover the platform from a failure to suspend
+ *	devices.
+ */
+
+static void platform_recover(int platform_mode)
+{
+	if (platform_mode && hibernation_ops && hibernation_ops->recover)
+		hibernation_ops->recover();
+}
+
+/**
+ *	create_image - freeze devices that need to be frozen with interrupts
+ *	off, create the hibernation image and thaw those devices.  Control
+ *	reappears in this routine after a restore.
+ */
+
+static int create_image(int platform_mode)
+{
+	int error;
+
+	error = arch_prepare_suspend();
+	if (error)
+		return error;
+
+	/* At this point, dpm_suspend_start() has been called, but *not*
+	 * dpm_suspend_noirq(). We *must* call dpm_suspend_noirq() now.
+	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
+	 * become desynchronized with the actual state of the hardware
+	 * at resume time, and evil weirdness ensues.
+	 */
+	error = dpm_suspend_noirq(PMSG_FREEZE);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down, "
+			"aborting hibernation\n");
+		return error;
+	}
+
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
+	local_irq_disable();
+
+	error = sysdev_suspend(PMSG_FREEZE);
+	if (error) {
+		printk(KERN_ERR "PM: Some system devices failed to power down, "
+			"aborting hibernation\n");
+		goto Enable_irqs;
+	}
+
+	if (hibernation_test(TEST_CORE))
+		goto Power_up;
+
+	in_suspend = 1;
+	save_processor_state();
+	error = swsusp_arch_suspend();
+	if (error)
+		printk(KERN_ERR "PM: Error %d creating hibernation image\n",
+			error);
+	/* Restore control flow magically appears here */
+	restore_processor_state();
+	if (!in_suspend)
+		platform_leave(platform_mode);
+
+ Power_up:
+	sysdev_resume();
+	/* NOTE:  dpm_resume_noirq() is just a resume() for devices
+	 * that suspended with irqs off ... no overall powerup.
+	 */
+
+ Enable_irqs:
+	local_irq_enable();
+
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
+	dpm_resume_noirq(in_suspend ?
+		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+
+	return error;
+}
+
+/**
+ *	hibernation_snapshot - quiesce devices and create the hibernation
+ *	snapshot image.
+ *	@platform_mode - if set, use the platform driver, if available, to
+ *			 prepare the platform firmware for the power transition.
+ *
+ *	Must be called with pm_mutex held
+ */
+
+int hibernation_snapshot(int platform_mode)
+{
+	int error;
+
+	error = platform_begin(platform_mode);
+	if (error)
+		return error;
+
+	/* Free memory before shutting down devices. */
+	error = swsusp_shrink_memory();
+	if (error)
+		goto Close;
+
+	suspend_console();
+	error = dpm_suspend_start(PMSG_FREEZE);
+	if (error)
+		goto Recover_platform;
+
+	if (hibernation_test(TEST_DEVICES))
+		goto Recover_platform;
+
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
+
+ Resume_devices:
+	dpm_resume_end(in_suspend ?
+		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+	resume_console();
+ Close:
+	platform_end(platform_mode);
+	return error;
+
+ Recover_platform:
+	platform_recover(platform_mode);
+	goto Resume_devices;
+}
+
+/**
+ *	resume_target_kernel - prepare devices that need to be suspended with
+ *	interrupts off, restore the contents of highmem that have not been
+ *	restored yet from the image and run the low level code that will restore
+ *	the remaining contents of memory and switch to the just restored target
+ *	kernel.
+ */
+
+static int resume_target_kernel(bool platform_mode)
+{
+	int error;
+
+	error = dpm_suspend_noirq(PMSG_QUIESCE);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down, "
+			"aborting resume\n");
+		return error;
+	}
+
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
+	local_irq_disable();
+
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
+	/* We'll ignore saved state, but this gets preempt count (etc) right */
+	save_processor_state();
+	error = restore_highmem();
+	if (!error) {
+		error = swsusp_arch_resume();
+		/*
+		 * The code below is only ever reached in case of a failure.
+		 * Otherwise execution continues at place where
+		 * swsusp_arch_suspend() was called
+		 */
+		BUG_ON(!error);
+		/* This call to restore_highmem() undos the previous one */
+		restore_highmem();
+	}
+	/*
+	 * The only reason why swsusp_arch_resume() can fail is memory being
+	 * very tight, so we have to free it as soon as we can to avoid
+	 * subsequent failures
+	 */
+	swsusp_free();
+	restore_processor_state();
+	touch_softlockup_watchdog();
+
+	sysdev_resume();
+
+ Enable_irqs:
+	local_irq_enable();
+
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
+	dpm_resume_noirq(PMSG_RECOVER);
+
+	return error;
+}
+
+/**
+ *	hibernation_restore - quiesce devices and restore the hibernation
+ *	snapshot image.  If successful, control returns in hibernation_snaphot()
+ *	@platform_mode - if set, use the platform driver, if available, to
+ *			 prepare the platform firmware for the transition.
+ *
+ *	Must be called with pm_mutex held
+ */
+
+int hibernation_restore(int platform_mode)
+{
+	int error;
+
+	pm_prepare_console();
+	suspend_console();
+	error = dpm_suspend_start(PMSG_QUIESCE);
+	if (!error) {
+		error = resume_target_kernel(platform_mode);
+		dpm_resume_end(PMSG_RECOVER);
+	}
+	resume_console();
+	pm_restore_console();
+	return error;
+}
+
+/**
+ *	hibernation_platform_enter - enter the hibernation state using the
+ *	platform driver (if available)
+ */
+
+int hibernation_platform_enter(void)
+{
+	int error;
+
+	if (!hibernation_ops)
+		return -ENOSYS;
+
+	/*
+	 * We have cancelled the power transition by running
+	 * hibernation_ops->finish() before saving the image, so we should let
+	 * the firmware know that we're going to enter the sleep state after all
+	 */
+	error = hibernation_ops->begin();
+	if (error)
+		goto Close;
+
+	entering_platform_hibernation = true;
+	suspend_console();
+	error = dpm_suspend_start(PMSG_HIBERNATE);
+	if (error) {
+		if (hibernation_ops->recover)
+			hibernation_ops->recover();
+		goto Resume_devices;
+	}
+
+	error = dpm_suspend_noirq(PMSG_HIBERNATE);
+	if (error)
+		goto Resume_devices;
+
+	error = hibernation_ops->prepare();
+	if (error)
+		goto Platofrm_finish;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Platofrm_finish;
+
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
+
+	/*
+	 * We don't need to reenable the nonboot CPUs or resume consoles, since
+	 * the system is going to be halted anyway.
+	 */
+ Platofrm_finish:
+	hibernation_ops->finish();
+
+	dpm_suspend_noirq(PMSG_RESTORE);
+
+ Resume_devices:
+	entering_platform_hibernation = false;
+	dpm_resume_end(PMSG_RESTORE);
+	resume_console();
+
+ Close:
+	hibernation_ops->end();
+
+	return error;
+}
+
+/**
+ *	power_down - Shut the machine down for hibernation.
+ *
+ *	Use the platform driver, if configured so; otherwise try
+ *	to power off or reboot.
+ */
+
+static void power_down(void)
+{
+	switch (hibernation_mode) {
+	case HIBERNATION_TEST:
+	case HIBERNATION_TESTPROC:
+		break;
+	case HIBERNATION_REBOOT:
+		kernel_restart(NULL);
+		break;
+	case HIBERNATION_PLATFORM:
+		hibernation_platform_enter();
+	case HIBERNATION_SHUTDOWN:
+		kernel_power_off();
+		break;
+	}
+	kernel_halt();
+	/*
+	 * Valid image is on the disk, if we continue we risk serious data
+	 * corruption after resume.
+	 */
+	printk(KERN_CRIT "PM: Please power down manually\n");
+	while(1);
+}
+
+static int prepare_processes(void)
+{
+	int error = 0;
+
+	if (freeze_processes()) {
+		error = -EBUSY;
+		thaw_processes();
+	}
+	return error;
+}
+
+/**
+ *	hibernate - The granpappy of the built-in hibernation management
+ */
+
+int hibernate(void)
+{
+	int error;
+
+	mutex_lock(&pm_mutex);
+	/* The snapshot device should not be opened while we're running */
+	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
+		error = -EBUSY;
+		goto Unlock;
+	}
+
+	pm_prepare_console();
+	error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
+	if (error)
+		goto Exit;
+
+	error = usermodehelper_disable();
+	if (error)
+		goto Exit;
+
+	/* Allocate memory management structures */
+	error = create_basic_memory_bitmaps();
+	if (error)
+		goto Exit;
+
+	printk(KERN_INFO "PM: Syncing filesystems ... ");
+	sys_sync();
+	printk("done.\n");
+
+	error = prepare_processes();
+	if (error)
+		goto Finish;
+
+	if (hibernation_test(TEST_FREEZER))
+		goto Thaw;
+
+	if (hibernation_testmode(HIBERNATION_TESTPROC))
+		goto Thaw;
+
+	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
+	if (in_suspend && !error) {
+		unsigned int flags = 0;
+
+		if (hibernation_mode == HIBERNATION_PLATFORM)
+			flags |= SF_PLATFORM_MODE;
+		pr_debug("PM: writing image.\n");
+		error = swsusp_write(flags);
+		swsusp_free();
+		if (!error)
+			power_down();
+	} else {
+		pr_debug("PM: Image restored successfully.\n");
+		swsusp_free();
+	}
+ Thaw:
+	thaw_processes();
+ Finish:
+	free_basic_memory_bitmaps();
+	usermodehelper_enable();
+ Exit:
+	pm_notifier_call_chain(PM_POST_HIBERNATION);
+	pm_restore_console();
+	atomic_inc(&snapshot_device_available);
+ Unlock:
+	mutex_unlock(&pm_mutex);
+	return error;
+}
+
+
+/**
+ *	software_resume - Resume from a saved image.
+ *
+ *	Called as a late_initcall (so all devices are discovered and
+ *	initialized), we call swsusp to see if we have a saved image or not.
+ *	If so, we quiesce devices, the restore the saved image. We will
+ *	return above (in hibernate() ) if everything goes well.
+ *	Otherwise, we fail gracefully and return to the normally
+ *	scheduled program.
+ *
+ */
+
+static int software_resume(void)
+{
+	int error;
+	unsigned int flags;
+
+	/*
+	 * If the user said "noresume".. bail out early.
+	 */
+	if (noresume)
+		return 0;
+
+	/*
+	 * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
+	 * is configured into the kernel. Since the regular hibernate
+	 * trigger path is via sysfs which takes a buffer mutex before
+	 * calling hibernate functions (which take pm_mutex) this can
+	 * cause lockdep to complain about a possible ABBA deadlock
+	 * which cannot happen since we're in the boot code here and
+	 * sysfs can't be invoked yet. Therefore, we use a subclass
+	 * here to avoid lockdep complaining.
+	 */
+	mutex_lock_nested(&pm_mutex, SINGLE_DEPTH_NESTING);
+
+	if (swsusp_resume_device)
+		goto Check_image;
+
+	if (!strlen(resume_file)) {
+		error = -ENOENT;
+		goto Unlock;
+	}
+
+	pr_debug("PM: Checking image partition %s\n", resume_file);
+
+	/* Check if the device is there */
+	swsusp_resume_device = name_to_dev_t(resume_file);
+	if (!swsusp_resume_device) {
+		/*
+		 * Some device discovery might still be in progress; we need
+		 * to wait for this to finish.
+		 */
+		wait_for_device_probe();
+		/*
+		 * We can't depend on SCSI devices being available after loading
+		 * one of their modules until scsi_complete_async_scans() is
+		 * called and the resume device usually is a SCSI one.
+		 */
+		scsi_complete_async_scans();
+
+		swsusp_resume_device = name_to_dev_t(resume_file);
+		if (!swsusp_resume_device) {
+			error = -ENODEV;
+			goto Unlock;
+		}
+	}
+
+ Check_image:
+	pr_debug("PM: Resume from partition %d:%d\n",
+		MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
+
+	pr_debug("PM: Checking hibernation image.\n");
+	error = swsusp_check();
+	if (error)
+		goto Unlock;
+
+	/* The snapshot device should not be opened while we're running */
+	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
+		error = -EBUSY;
+		goto Unlock;
+	}
+
+	pm_prepare_console();
+	error = pm_notifier_call_chain(PM_RESTORE_PREPARE);
+	if (error)
+		goto Finish;
+
+	error = usermodehelper_disable();
+	if (error)
+		goto Finish;
+
+	error = create_basic_memory_bitmaps();
+	if (error)
+		goto Finish;
+
+	pr_debug("PM: Preparing processes for restore.\n");
+	error = prepare_processes();
+	if (error) {
+		swsusp_close(FMODE_READ);
+		goto Done;
+	}
+
+	pr_debug("PM: Reading hibernation image.\n");
+
+	error = swsusp_read(&flags);
+	if (!error)
+		hibernation_restore(flags & SF_PLATFORM_MODE);
+
+	printk(KERN_ERR "PM: Restore failed, recovering.\n");
+	swsusp_free();
+	thaw_processes();
+ Done:
+	free_basic_memory_bitmaps();
+	usermodehelper_enable();
+ Finish:
+	pm_notifier_call_chain(PM_POST_RESTORE);
+	pm_restore_console();
+	atomic_inc(&snapshot_device_available);
+	/* For success case, the suspend path will release the lock */
+ Unlock:
+	mutex_unlock(&pm_mutex);
+	pr_debug("PM: Resume from disk failed.\n");
+	return error;
+}
+
+late_initcall(software_resume);
+
+
+static const char * const hibernation_modes[] = {
+	[HIBERNATION_PLATFORM]	= "platform",
+	[HIBERNATION_SHUTDOWN]	= "shutdown",
+	[HIBERNATION_REBOOT]	= "reboot",
+	[HIBERNATION_TEST]	= "test",
+	[HIBERNATION_TESTPROC]	= "testproc",
+};
+
+/**
+ *	disk - Control hibernation mode
+ *
+ *	Suspend-to-disk can be handled in several ways. We have a few options
+ *	for putting the system to sleep - using the platform driver (e.g. ACPI
+ *	or other hibernation_ops), powering off the system or rebooting the
+ *	system (for testing) as well as the two test modes.
+ *
+ *	The system can support 'platform', and that is known a priori (and
+ *	encoded by the presence of hibernation_ops). However, the user may
+ *	choose 'shutdown' or 'reboot' as alternatives, as well as one fo the
+ *	test modes, 'test' or 'testproc'.
+ *
+ *	show() will display what the mode is currently set to.
+ *	store() will accept one of
+ *
+ *	'platform'
+ *	'shutdown'
+ *	'reboot'
+ *	'test'
+ *	'testproc'
+ *
+ *	It will only change to 'platform' if the system
+ *	supports it (as determined by having hibernation_ops).
+ */
+
+static ssize_t disk_show(struct kobject *kobj, struct kobj_attribute *attr,
+			 char *buf)
+{
+	int i;
+	char *start = buf;
+
+	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
+		if (!hibernation_modes[i])
+			continue;
+		switch (i) {
+		case HIBERNATION_SHUTDOWN:
+		case HIBERNATION_REBOOT:
+		case HIBERNATION_TEST:
+		case HIBERNATION_TESTPROC:
+			break;
+		case HIBERNATION_PLATFORM:
+			if (hibernation_ops)
+				break;
+			/* not a valid mode, continue with loop */
+			continue;
+		}
+		if (i == hibernation_mode)
+			buf += sprintf(buf, "[%s] ", hibernation_modes[i]);
+		else
+			buf += sprintf(buf, "%s ", hibernation_modes[i]);
+	}
+	buf += sprintf(buf, "\n");
+	return buf-start;
+}
+
+
+static ssize_t disk_store(struct kobject *kobj, struct kobj_attribute *attr,
+			  const char *buf, size_t n)
+{
+	int error = 0;
+	int i;
+	int len;
+	char *p;
+	int mode = HIBERNATION_INVALID;
+
+	p = memchr(buf, '\n', n);
+	len = p ? p - buf : n;
+
+	mutex_lock(&pm_mutex);
+	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
+		if (len == strlen(hibernation_modes[i])
+		    && !strncmp(buf, hibernation_modes[i], len)) {
+			mode = i;
+			break;
+		}
+	}
+	if (mode != HIBERNATION_INVALID) {
+		switch (mode) {
+		case HIBERNATION_SHUTDOWN:
+		case HIBERNATION_REBOOT:
+		case HIBERNATION_TEST:
+		case HIBERNATION_TESTPROC:
+			hibernation_mode = mode;
+			break;
+		case HIBERNATION_PLATFORM:
+			if (hibernation_ops)
+				hibernation_mode = mode;
+			else
+				error = -EINVAL;
+		}
+	} else
+		error = -EINVAL;
+
+	if (!error)
+		pr_debug("PM: Hibernation mode set to '%s'\n",
+			 hibernation_modes[mode]);
+	mutex_unlock(&pm_mutex);
+	return error ? error : n;
+}
+
+power_attr(disk);
+
+static ssize_t resume_show(struct kobject *kobj, struct kobj_attribute *attr,
+			   char *buf)
+{
+	return sprintf(buf,"%d:%d\n", MAJOR(swsusp_resume_device),
+		       MINOR(swsusp_resume_device));
+}
+
+static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
+			    const char *buf, size_t n)
+{
+	unsigned int maj, min;
+	dev_t res;
+	int ret = -EINVAL;
+
+	if (sscanf(buf, "%u:%u", &maj, &min) != 2)
+		goto out;
+
+	res = MKDEV(maj,min);
+	if (maj != MAJOR(res) || min != MINOR(res))
+		goto out;
+
+	mutex_lock(&pm_mutex);
+	swsusp_resume_device = res;
+	mutex_unlock(&pm_mutex);
+	printk(KERN_INFO "PM: Starting manual resume from disk\n");
+	noresume = 0;
+	software_resume();
+	ret = n;
+ out:
+	return ret;
+}
+
+power_attr(resume);
+
+static ssize_t image_size_show(struct kobject *kobj, struct kobj_attribute *attr,
+			       char *buf)
+{
+	return sprintf(buf, "%lu\n", image_size);
+}
+
+static ssize_t image_size_store(struct kobject *kobj, struct kobj_attribute *attr,
+				const char *buf, size_t n)
+{
+	unsigned long size;
+
+	if (sscanf(buf, "%lu", &size) == 1) {
+		image_size = size;
+		return n;
+	}
+
+	return -EINVAL;
+}
+
+power_attr(image_size);
+
+static struct attribute * g[] = {
+	&disk_attr.attr,
+	&resume_attr.attr,
+	&image_size_attr.attr,
+	NULL,
+};
+
+
+static struct attribute_group attr_group = {
+	.attrs = g,
+};
+
+
+static int __init pm_disk_init(void)
+{
+	return sysfs_create_group(power_kobj, &attr_group);
+}
+
+core_initcall(pm_disk_init);
+
+
+static int __init resume_setup(char *str)
+{
+	if (noresume)
+		return 1;
+
+	strncpy( resume_file, str, 255 );
+	return 1;
+}
+
+static int __init resume_offset_setup(char *str)
+{
+	unsigned long long offset;
+
+	if (noresume)
+		return 1;
+
+	if (sscanf(str, "%llu", &offset) == 1)
+		swsusp_resume_block = offset;
+
+	return 1;
+}
+
+static int __init noresume_setup(char *str)
+{
+	noresume = 1;
+	return 1;
+}
+
+__setup("noresume", noresume_setup);
+__setup("resume_offset=", resume_offset_setup);
+__setup("resume=", resume_setup);

^ permalink raw reply	[flat|nested] 199+ messages in thread

* [RFC][PATCH 2/2] PM/Hibernate: Rename disk.c to hibernate.c
  2009-06-06 22:54 [RFC][PATCH 0/2] PM: Rearrange core suspend code Rafael J. Wysocki
  2009-06-06 22:55 ` [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core Rafael J. Wysocki
  2009-06-06 22:55 ` Rafael J. Wysocki
@ 2009-06-06 22:56 ` Rafael J. Wysocki
  2009-06-06 22:56 ` Rafael J. Wysocki
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-06 22:56 UTC (permalink / raw)
  To: pm list; +Cc: ACPI Devel Maling List, LKML

From: Rafael J. Wysocki <rjw@sisk.pl>

Change the name of kernel/power/disk.c to kernel/power/hibernate.c
in analogy with the file names introduced by the changes that
separated the suspend to RAM and standby funtionality from the
common PM functions.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/power/Makefile    |    2 
 kernel/power/disk.c      |  955 -----------------------------------------------
 kernel/power/hibernate.c |  955 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/power/power.h     |    4 
 4 files changed, 958 insertions(+), 958 deletions(-)

Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -45,7 +45,7 @@ static inline char *check_image_kernel(s
  */
 #define SPARE_PAGES	((1024 * 1024) >> PAGE_SHIFT)
 
-/* kernel/power/disk.c */
+/* kernel/power/hibernate.c */
 extern int hibernation_snapshot(int platform_mode);
 extern int hibernation_restore(int platform_mode);
 extern int hibernation_platform_enter(void);
@@ -147,7 +147,7 @@ extern int swsusp_swap_in_use(void);
  */
 #define SF_PLATFORM_MODE	1
 
-/* kernel/power/disk.c */
+/* kernel/power/hibernate.c */
 extern int swsusp_check(void);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
Index: linux-2.6/kernel/power/Makefile
===================================================================
--- linux-2.6.orig/kernel/power/Makefile
+++ linux-2.6/kernel/power/Makefile
@@ -8,6 +8,6 @@ obj-$(CONFIG_PM_SLEEP)		+= console.o
 obj-$(CONFIG_FREEZER)		+= process.o
 obj-$(CONFIG_SUSPEND)		+= suspend.o
 obj-$(CONFIG_PM_TEST_SUSPEND)	+= suspend-test.o
-obj-$(CONFIG_HIBERNATION)	+= swsusp.o disk.o snapshot.o swap.o user.o
+obj-$(CONFIG_HIBERNATION)	+= swsusp.o hibernate.o snapshot.o swap.o user.o
 
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ /dev/null
@@ -1,955 +0,0 @@
-/*
- * kernel/power/disk.c - Suspend-to-disk support.
- *
- * Copyright (c) 2003 Patrick Mochel
- * Copyright (c) 2003 Open Source Development Lab
- * Copyright (c) 2004 Pavel Machek <pavel@suse.cz>
- *
- * This file is released under the GPLv2.
- *
- */
-
-#include <linux/suspend.h>
-#include <linux/syscalls.h>
-#include <linux/reboot.h>
-#include <linux/string.h>
-#include <linux/device.h>
-#include <linux/kmod.h>
-#include <linux/delay.h>
-#include <linux/fs.h>
-#include <linux/mount.h>
-#include <linux/pm.h>
-#include <linux/console.h>
-#include <linux/cpu.h>
-#include <linux/freezer.h>
-#include <scsi/scsi_scan.h>
-#include <asm/suspend.h>
-
-#include "power.h"
-
-
-static int noresume = 0;
-static char resume_file[256] = CONFIG_PM_STD_PARTITION;
-dev_t swsusp_resume_device;
-sector_t swsusp_resume_block;
-
-enum {
-	HIBERNATION_INVALID,
-	HIBERNATION_PLATFORM,
-	HIBERNATION_TEST,
-	HIBERNATION_TESTPROC,
-	HIBERNATION_SHUTDOWN,
-	HIBERNATION_REBOOT,
-	/* keep last */
-	__HIBERNATION_AFTER_LAST
-};
-#define HIBERNATION_MAX (__HIBERNATION_AFTER_LAST-1)
-#define HIBERNATION_FIRST (HIBERNATION_INVALID + 1)
-
-static int hibernation_mode = HIBERNATION_SHUTDOWN;
-
-static struct platform_hibernation_ops *hibernation_ops;
-
-/**
- * hibernation_set_ops - set the global hibernate operations
- * @ops: the hibernation operations to use in subsequent hibernation transitions
- */
-
-void hibernation_set_ops(struct platform_hibernation_ops *ops)
-{
-	if (ops && !(ops->begin && ops->end &&  ops->pre_snapshot
-	    && ops->prepare && ops->finish && ops->enter && ops->pre_restore
-	    && ops->restore_cleanup)) {
-		WARN_ON(1);
-		return;
-	}
-	mutex_lock(&pm_mutex);
-	hibernation_ops = ops;
-	if (ops)
-		hibernation_mode = HIBERNATION_PLATFORM;
-	else if (hibernation_mode == HIBERNATION_PLATFORM)
-		hibernation_mode = HIBERNATION_SHUTDOWN;
-
-	mutex_unlock(&pm_mutex);
-}
-
-static bool entering_platform_hibernation;
-
-bool system_entering_hibernation(void)
-{
-	return entering_platform_hibernation;
-}
-EXPORT_SYMBOL(system_entering_hibernation);
-
-#ifdef CONFIG_PM_DEBUG
-static void hibernation_debug_sleep(void)
-{
-	printk(KERN_INFO "hibernation debug: Waiting for 5 seconds.\n");
-	mdelay(5000);
-}
-
-static int hibernation_testmode(int mode)
-{
-	if (hibernation_mode == mode) {
-		hibernation_debug_sleep();
-		return 1;
-	}
-	return 0;
-}
-
-static int hibernation_test(int level)
-{
-	if (pm_test_level == level) {
-		hibernation_debug_sleep();
-		return 1;
-	}
-	return 0;
-}
-#else /* !CONFIG_PM_DEBUG */
-static int hibernation_testmode(int mode) { return 0; }
-static int hibernation_test(int level) { return 0; }
-#endif /* !CONFIG_PM_DEBUG */
-
-/**
- *	platform_begin - tell the platform driver that we're starting
- *	hibernation
- */
-
-static int platform_begin(int platform_mode)
-{
-	return (platform_mode && hibernation_ops) ?
-		hibernation_ops->begin() : 0;
-}
-
-/**
- *	platform_end - tell the platform driver that we've entered the
- *	working state
- */
-
-static void platform_end(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->end();
-}
-
-/**
- *	platform_pre_snapshot - prepare the machine for hibernation using the
- *	platform driver if so configured and return an error code if it fails
- */
-
-static int platform_pre_snapshot(int platform_mode)
-{
-	return (platform_mode && hibernation_ops) ?
-		hibernation_ops->pre_snapshot() : 0;
-}
-
-/**
- *	platform_leave - prepare the machine for switching to the normal mode
- *	of operation using the platform driver (called with interrupts disabled)
- */
-
-static void platform_leave(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->leave();
-}
-
-/**
- *	platform_finish - switch the machine to the normal mode of operation
- *	using the platform driver (must be called after platform_prepare())
- */
-
-static void platform_finish(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->finish();
-}
-
-/**
- *	platform_pre_restore - prepare the platform for the restoration from a
- *	hibernation image.  If the restore fails after this function has been
- *	called, platform_restore_cleanup() must be called.
- */
-
-static int platform_pre_restore(int platform_mode)
-{
-	return (platform_mode && hibernation_ops) ?
-		hibernation_ops->pre_restore() : 0;
-}
-
-/**
- *	platform_restore_cleanup - switch the platform to the normal mode of
- *	operation after a failing restore.  If platform_pre_restore() has been
- *	called before the failing restore, this function must be called too,
- *	regardless of the result of platform_pre_restore().
- */
-
-static void platform_restore_cleanup(int platform_mode)
-{
-	if (platform_mode && hibernation_ops)
-		hibernation_ops->restore_cleanup();
-}
-
-/**
- *	platform_recover - recover the platform from a failure to suspend
- *	devices.
- */
-
-static void platform_recover(int platform_mode)
-{
-	if (platform_mode && hibernation_ops && hibernation_ops->recover)
-		hibernation_ops->recover();
-}
-
-/**
- *	create_image - freeze devices that need to be frozen with interrupts
- *	off, create the hibernation image and thaw those devices.  Control
- *	reappears in this routine after a restore.
- */
-
-static int create_image(int platform_mode)
-{
-	int error;
-
-	error = arch_prepare_suspend();
-	if (error)
-		return error;
-
-	/* At this point, dpm_suspend_start() has been called, but *not*
-	 * dpm_suspend_noirq(). We *must* call dpm_suspend_noirq() now.
-	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
-	 * become desynchronized with the actual state of the hardware
-	 * at resume time, and evil weirdness ensues.
-	 */
-	error = dpm_suspend_noirq(PMSG_FREEZE);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to power down, "
-			"aborting hibernation\n");
-		return error;
-	}
-
-	error = platform_pre_snapshot(platform_mode);
-	if (error || hibernation_test(TEST_PLATFORM))
-		goto Platform_finish;
-
-	error = disable_nonboot_cpus();
-	if (error || hibernation_test(TEST_CPUS)
-	    || hibernation_testmode(HIBERNATION_TEST))
-		goto Enable_cpus;
-
-	local_irq_disable();
-
-	error = sysdev_suspend(PMSG_FREEZE);
-	if (error) {
-		printk(KERN_ERR "PM: Some system devices failed to power down, "
-			"aborting hibernation\n");
-		goto Enable_irqs;
-	}
-
-	if (hibernation_test(TEST_CORE))
-		goto Power_up;
-
-	in_suspend = 1;
-	save_processor_state();
-	error = swsusp_arch_suspend();
-	if (error)
-		printk(KERN_ERR "PM: Error %d creating hibernation image\n",
-			error);
-	/* Restore control flow magically appears here */
-	restore_processor_state();
-	if (!in_suspend)
-		platform_leave(platform_mode);
-
- Power_up:
-	sysdev_resume();
-	/* NOTE:  dpm_resume_noirq() is just a resume() for devices
-	 * that suspended with irqs off ... no overall powerup.
-	 */
-
- Enable_irqs:
-	local_irq_enable();
-
- Enable_cpus:
-	enable_nonboot_cpus();
-
- Platform_finish:
-	platform_finish(platform_mode);
-
-	dpm_resume_noirq(in_suspend ?
-		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
-
-	return error;
-}
-
-/**
- *	hibernation_snapshot - quiesce devices and create the hibernation
- *	snapshot image.
- *	@platform_mode - if set, use the platform driver, if available, to
- *			 prepare the platform firmware for the power transition.
- *
- *	Must be called with pm_mutex held
- */
-
-int hibernation_snapshot(int platform_mode)
-{
-	int error;
-
-	error = platform_begin(platform_mode);
-	if (error)
-		return error;
-
-	/* Free memory before shutting down devices. */
-	error = swsusp_shrink_memory();
-	if (error)
-		goto Close;
-
-	suspend_console();
-	error = dpm_suspend_start(PMSG_FREEZE);
-	if (error)
-		goto Recover_platform;
-
-	if (hibernation_test(TEST_DEVICES))
-		goto Recover_platform;
-
-	error = create_image(platform_mode);
-	/* Control returns here after successful restore */
-
- Resume_devices:
-	dpm_resume_end(in_suspend ?
-		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
-	resume_console();
- Close:
-	platform_end(platform_mode);
-	return error;
-
- Recover_platform:
-	platform_recover(platform_mode);
-	goto Resume_devices;
-}
-
-/**
- *	resume_target_kernel - prepare devices that need to be suspended with
- *	interrupts off, restore the contents of highmem that have not been
- *	restored yet from the image and run the low level code that will restore
- *	the remaining contents of memory and switch to the just restored target
- *	kernel.
- */
-
-static int resume_target_kernel(bool platform_mode)
-{
-	int error;
-
-	error = dpm_suspend_noirq(PMSG_QUIESCE);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to power down, "
-			"aborting resume\n");
-		return error;
-	}
-
-	error = platform_pre_restore(platform_mode);
-	if (error)
-		goto Cleanup;
-
-	error = disable_nonboot_cpus();
-	if (error)
-		goto Enable_cpus;
-
-	local_irq_disable();
-
-	error = sysdev_suspend(PMSG_QUIESCE);
-	if (error)
-		goto Enable_irqs;
-
-	/* We'll ignore saved state, but this gets preempt count (etc) right */
-	save_processor_state();
-	error = restore_highmem();
-	if (!error) {
-		error = swsusp_arch_resume();
-		/*
-		 * The code below is only ever reached in case of a failure.
-		 * Otherwise execution continues at place where
-		 * swsusp_arch_suspend() was called
-		 */
-		BUG_ON(!error);
-		/* This call to restore_highmem() undos the previous one */
-		restore_highmem();
-	}
-	/*
-	 * The only reason why swsusp_arch_resume() can fail is memory being
-	 * very tight, so we have to free it as soon as we can to avoid
-	 * subsequent failures
-	 */
-	swsusp_free();
-	restore_processor_state();
-	touch_softlockup_watchdog();
-
-	sysdev_resume();
-
- Enable_irqs:
-	local_irq_enable();
-
- Enable_cpus:
-	enable_nonboot_cpus();
-
- Cleanup:
-	platform_restore_cleanup(platform_mode);
-
-	dpm_resume_noirq(PMSG_RECOVER);
-
-	return error;
-}
-
-/**
- *	hibernation_restore - quiesce devices and restore the hibernation
- *	snapshot image.  If successful, control returns in hibernation_snaphot()
- *	@platform_mode - if set, use the platform driver, if available, to
- *			 prepare the platform firmware for the transition.
- *
- *	Must be called with pm_mutex held
- */
-
-int hibernation_restore(int platform_mode)
-{
-	int error;
-
-	pm_prepare_console();
-	suspend_console();
-	error = dpm_suspend_start(PMSG_QUIESCE);
-	if (!error) {
-		error = resume_target_kernel(platform_mode);
-		dpm_resume_end(PMSG_RECOVER);
-	}
-	resume_console();
-	pm_restore_console();
-	return error;
-}
-
-/**
- *	hibernation_platform_enter - enter the hibernation state using the
- *	platform driver (if available)
- */
-
-int hibernation_platform_enter(void)
-{
-	int error;
-
-	if (!hibernation_ops)
-		return -ENOSYS;
-
-	/*
-	 * We have cancelled the power transition by running
-	 * hibernation_ops->finish() before saving the image, so we should let
-	 * the firmware know that we're going to enter the sleep state after all
-	 */
-	error = hibernation_ops->begin();
-	if (error)
-		goto Close;
-
-	entering_platform_hibernation = true;
-	suspend_console();
-	error = dpm_suspend_start(PMSG_HIBERNATE);
-	if (error) {
-		if (hibernation_ops->recover)
-			hibernation_ops->recover();
-		goto Resume_devices;
-	}
-
-	error = dpm_suspend_noirq(PMSG_HIBERNATE);
-	if (error)
-		goto Resume_devices;
-
-	error = hibernation_ops->prepare();
-	if (error)
-		goto Platofrm_finish;
-
-	error = disable_nonboot_cpus();
-	if (error)
-		goto Platofrm_finish;
-
-	local_irq_disable();
-	sysdev_suspend(PMSG_HIBERNATE);
-	hibernation_ops->enter();
-	/* We should never get here */
-	while (1);
-
-	/*
-	 * We don't need to reenable the nonboot CPUs or resume consoles, since
-	 * the system is going to be halted anyway.
-	 */
- Platofrm_finish:
-	hibernation_ops->finish();
-
-	dpm_suspend_noirq(PMSG_RESTORE);
-
- Resume_devices:
-	entering_platform_hibernation = false;
-	dpm_resume_end(PMSG_RESTORE);
-	resume_console();
-
- Close:
-	hibernation_ops->end();
-
-	return error;
-}
-
-/**
- *	power_down - Shut the machine down for hibernation.
- *
- *	Use the platform driver, if configured so; otherwise try
- *	to power off or reboot.
- */
-
-static void power_down(void)
-{
-	switch (hibernation_mode) {
-	case HIBERNATION_TEST:
-	case HIBERNATION_TESTPROC:
-		break;
-	case HIBERNATION_REBOOT:
-		kernel_restart(NULL);
-		break;
-	case HIBERNATION_PLATFORM:
-		hibernation_platform_enter();
-	case HIBERNATION_SHUTDOWN:
-		kernel_power_off();
-		break;
-	}
-	kernel_halt();
-	/*
-	 * Valid image is on the disk, if we continue we risk serious data
-	 * corruption after resume.
-	 */
-	printk(KERN_CRIT "PM: Please power down manually\n");
-	while(1);
-}
-
-static int prepare_processes(void)
-{
-	int error = 0;
-
-	if (freeze_processes()) {
-		error = -EBUSY;
-		thaw_processes();
-	}
-	return error;
-}
-
-/**
- *	hibernate - The granpappy of the built-in hibernation management
- */
-
-int hibernate(void)
-{
-	int error;
-
-	mutex_lock(&pm_mutex);
-	/* The snapshot device should not be opened while we're running */
-	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
-		error = -EBUSY;
-		goto Unlock;
-	}
-
-	pm_prepare_console();
-	error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
-	if (error)
-		goto Exit;
-
-	error = usermodehelper_disable();
-	if (error)
-		goto Exit;
-
-	/* Allocate memory management structures */
-	error = create_basic_memory_bitmaps();
-	if (error)
-		goto Exit;
-
-	printk(KERN_INFO "PM: Syncing filesystems ... ");
-	sys_sync();
-	printk("done.\n");
-
-	error = prepare_processes();
-	if (error)
-		goto Finish;
-
-	if (hibernation_test(TEST_FREEZER))
-		goto Thaw;
-
-	if (hibernation_testmode(HIBERNATION_TESTPROC))
-		goto Thaw;
-
-	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
-	if (in_suspend && !error) {
-		unsigned int flags = 0;
-
-		if (hibernation_mode == HIBERNATION_PLATFORM)
-			flags |= SF_PLATFORM_MODE;
-		pr_debug("PM: writing image.\n");
-		error = swsusp_write(flags);
-		swsusp_free();
-		if (!error)
-			power_down();
-	} else {
-		pr_debug("PM: Image restored successfully.\n");
-		swsusp_free();
-	}
- Thaw:
-	thaw_processes();
- Finish:
-	free_basic_memory_bitmaps();
-	usermodehelper_enable();
- Exit:
-	pm_notifier_call_chain(PM_POST_HIBERNATION);
-	pm_restore_console();
-	atomic_inc(&snapshot_device_available);
- Unlock:
-	mutex_unlock(&pm_mutex);
-	return error;
-}
-
-
-/**
- *	software_resume - Resume from a saved image.
- *
- *	Called as a late_initcall (so all devices are discovered and
- *	initialized), we call swsusp to see if we have a saved image or not.
- *	If so, we quiesce devices, the restore the saved image. We will
- *	return above (in hibernate() ) if everything goes well.
- *	Otherwise, we fail gracefully and return to the normally
- *	scheduled program.
- *
- */
-
-static int software_resume(void)
-{
-	int error;
-	unsigned int flags;
-
-	/*
-	 * If the user said "noresume".. bail out early.
-	 */
-	if (noresume)
-		return 0;
-
-	/*
-	 * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
-	 * is configured into the kernel. Since the regular hibernate
-	 * trigger path is via sysfs which takes a buffer mutex before
-	 * calling hibernate functions (which take pm_mutex) this can
-	 * cause lockdep to complain about a possible ABBA deadlock
-	 * which cannot happen since we're in the boot code here and
-	 * sysfs can't be invoked yet. Therefore, we use a subclass
-	 * here to avoid lockdep complaining.
-	 */
-	mutex_lock_nested(&pm_mutex, SINGLE_DEPTH_NESTING);
-
-	if (swsusp_resume_device)
-		goto Check_image;
-
-	if (!strlen(resume_file)) {
-		error = -ENOENT;
-		goto Unlock;
-	}
-
-	pr_debug("PM: Checking image partition %s\n", resume_file);
-
-	/* Check if the device is there */
-	swsusp_resume_device = name_to_dev_t(resume_file);
-	if (!swsusp_resume_device) {
-		/*
-		 * Some device discovery might still be in progress; we need
-		 * to wait for this to finish.
-		 */
-		wait_for_device_probe();
-		/*
-		 * We can't depend on SCSI devices being available after loading
-		 * one of their modules until scsi_complete_async_scans() is
-		 * called and the resume device usually is a SCSI one.
-		 */
-		scsi_complete_async_scans();
-
-		swsusp_resume_device = name_to_dev_t(resume_file);
-		if (!swsusp_resume_device) {
-			error = -ENODEV;
-			goto Unlock;
-		}
-	}
-
- Check_image:
-	pr_debug("PM: Resume from partition %d:%d\n",
-		MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
-
-	pr_debug("PM: Checking hibernation image.\n");
-	error = swsusp_check();
-	if (error)
-		goto Unlock;
-
-	/* The snapshot device should not be opened while we're running */
-	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
-		error = -EBUSY;
-		goto Unlock;
-	}
-
-	pm_prepare_console();
-	error = pm_notifier_call_chain(PM_RESTORE_PREPARE);
-	if (error)
-		goto Finish;
-
-	error = usermodehelper_disable();
-	if (error)
-		goto Finish;
-
-	error = create_basic_memory_bitmaps();
-	if (error)
-		goto Finish;
-
-	pr_debug("PM: Preparing processes for restore.\n");
-	error = prepare_processes();
-	if (error) {
-		swsusp_close(FMODE_READ);
-		goto Done;
-	}
-
-	pr_debug("PM: Reading hibernation image.\n");
-
-	error = swsusp_read(&flags);
-	if (!error)
-		hibernation_restore(flags & SF_PLATFORM_MODE);
-
-	printk(KERN_ERR "PM: Restore failed, recovering.\n");
-	swsusp_free();
-	thaw_processes();
- Done:
-	free_basic_memory_bitmaps();
-	usermodehelper_enable();
- Finish:
-	pm_notifier_call_chain(PM_POST_RESTORE);
-	pm_restore_console();
-	atomic_inc(&snapshot_device_available);
-	/* For success case, the suspend path will release the lock */
- Unlock:
-	mutex_unlock(&pm_mutex);
-	pr_debug("PM: Resume from disk failed.\n");
-	return error;
-}
-
-late_initcall(software_resume);
-
-
-static const char * const hibernation_modes[] = {
-	[HIBERNATION_PLATFORM]	= "platform",
-	[HIBERNATION_SHUTDOWN]	= "shutdown",
-	[HIBERNATION_REBOOT]	= "reboot",
-	[HIBERNATION_TEST]	= "test",
-	[HIBERNATION_TESTPROC]	= "testproc",
-};
-
-/**
- *	disk - Control hibernation mode
- *
- *	Suspend-to-disk can be handled in several ways. We have a few options
- *	for putting the system to sleep - using the platform driver (e.g. ACPI
- *	or other hibernation_ops), powering off the system or rebooting the
- *	system (for testing) as well as the two test modes.
- *
- *	The system can support 'platform', and that is known a priori (and
- *	encoded by the presence of hibernation_ops). However, the user may
- *	choose 'shutdown' or 'reboot' as alternatives, as well as one fo the
- *	test modes, 'test' or 'testproc'.
- *
- *	show() will display what the mode is currently set to.
- *	store() will accept one of
- *
- *	'platform'
- *	'shutdown'
- *	'reboot'
- *	'test'
- *	'testproc'
- *
- *	It will only change to 'platform' if the system
- *	supports it (as determined by having hibernation_ops).
- */
-
-static ssize_t disk_show(struct kobject *kobj, struct kobj_attribute *attr,
-			 char *buf)
-{
-	int i;
-	char *start = buf;
-
-	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
-		if (!hibernation_modes[i])
-			continue;
-		switch (i) {
-		case HIBERNATION_SHUTDOWN:
-		case HIBERNATION_REBOOT:
-		case HIBERNATION_TEST:
-		case HIBERNATION_TESTPROC:
-			break;
-		case HIBERNATION_PLATFORM:
-			if (hibernation_ops)
-				break;
-			/* not a valid mode, continue with loop */
-			continue;
-		}
-		if (i == hibernation_mode)
-			buf += sprintf(buf, "[%s] ", hibernation_modes[i]);
-		else
-			buf += sprintf(buf, "%s ", hibernation_modes[i]);
-	}
-	buf += sprintf(buf, "\n");
-	return buf-start;
-}
-
-
-static ssize_t disk_store(struct kobject *kobj, struct kobj_attribute *attr,
-			  const char *buf, size_t n)
-{
-	int error = 0;
-	int i;
-	int len;
-	char *p;
-	int mode = HIBERNATION_INVALID;
-
-	p = memchr(buf, '\n', n);
-	len = p ? p - buf : n;
-
-	mutex_lock(&pm_mutex);
-	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
-		if (len == strlen(hibernation_modes[i])
-		    && !strncmp(buf, hibernation_modes[i], len)) {
-			mode = i;
-			break;
-		}
-	}
-	if (mode != HIBERNATION_INVALID) {
-		switch (mode) {
-		case HIBERNATION_SHUTDOWN:
-		case HIBERNATION_REBOOT:
-		case HIBERNATION_TEST:
-		case HIBERNATION_TESTPROC:
-			hibernation_mode = mode;
-			break;
-		case HIBERNATION_PLATFORM:
-			if (hibernation_ops)
-				hibernation_mode = mode;
-			else
-				error = -EINVAL;
-		}
-	} else
-		error = -EINVAL;
-
-	if (!error)
-		pr_debug("PM: Hibernation mode set to '%s'\n",
-			 hibernation_modes[mode]);
-	mutex_unlock(&pm_mutex);
-	return error ? error : n;
-}
-
-power_attr(disk);
-
-static ssize_t resume_show(struct kobject *kobj, struct kobj_attribute *attr,
-			   char *buf)
-{
-	return sprintf(buf,"%d:%d\n", MAJOR(swsusp_resume_device),
-		       MINOR(swsusp_resume_device));
-}
-
-static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
-			    const char *buf, size_t n)
-{
-	unsigned int maj, min;
-	dev_t res;
-	int ret = -EINVAL;
-
-	if (sscanf(buf, "%u:%u", &maj, &min) != 2)
-		goto out;
-
-	res = MKDEV(maj,min);
-	if (maj != MAJOR(res) || min != MINOR(res))
-		goto out;
-
-	mutex_lock(&pm_mutex);
-	swsusp_resume_device = res;
-	mutex_unlock(&pm_mutex);
-	printk(KERN_INFO "PM: Starting manual resume from disk\n");
-	noresume = 0;
-	software_resume();
-	ret = n;
- out:
-	return ret;
-}
-
-power_attr(resume);
-
-static ssize_t image_size_show(struct kobject *kobj, struct kobj_attribute *attr,
-			       char *buf)
-{
-	return sprintf(buf, "%lu\n", image_size);
-}
-
-static ssize_t image_size_store(struct kobject *kobj, struct kobj_attribute *attr,
-				const char *buf, size_t n)
-{
-	unsigned long size;
-
-	if (sscanf(buf, "%lu", &size) == 1) {
-		image_size = size;
-		return n;
-	}
-
-	return -EINVAL;
-}
-
-power_attr(image_size);
-
-static struct attribute * g[] = {
-	&disk_attr.attr,
-	&resume_attr.attr,
-	&image_size_attr.attr,
-	NULL,
-};
-
-
-static struct attribute_group attr_group = {
-	.attrs = g,
-};
-
-
-static int __init pm_disk_init(void)
-{
-	return sysfs_create_group(power_kobj, &attr_group);
-}
-
-core_initcall(pm_disk_init);
-
-
-static int __init resume_setup(char *str)
-{
-	if (noresume)
-		return 1;
-
-	strncpy( resume_file, str, 255 );
-	return 1;
-}
-
-static int __init resume_offset_setup(char *str)
-{
-	unsigned long long offset;
-
-	if (noresume)
-		return 1;
-
-	if (sscanf(str, "%llu", &offset) == 1)
-		swsusp_resume_block = offset;
-
-	return 1;
-}
-
-static int __init noresume_setup(char *str)
-{
-	noresume = 1;
-	return 1;
-}
-
-__setup("noresume", noresume_setup);
-__setup("resume_offset=", resume_offset_setup);
-__setup("resume=", resume_setup);
Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/power/hibernate.c
@@ -0,0 +1,955 @@
+/*
+ * kernel/power/hibernate.c - Hibernation (a.k.a suspend-to-disk) support.
+ *
+ * Copyright (c) 2003 Patrick Mochel
+ * Copyright (c) 2003 Open Source Development Lab
+ * Copyright (c) 2004 Pavel Machek <pavel@suse.cz>
+ * Copyright (c) 2009 Rafael J. Wysocki, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/suspend.h>
+#include <linux/syscalls.h>
+#include <linux/reboot.h>
+#include <linux/string.h>
+#include <linux/device.h>
+#include <linux/kmod.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/pm.h>
+#include <linux/console.h>
+#include <linux/cpu.h>
+#include <linux/freezer.h>
+#include <scsi/scsi_scan.h>
+#include <asm/suspend.h>
+
+#include "power.h"
+
+
+static int noresume = 0;
+static char resume_file[256] = CONFIG_PM_STD_PARTITION;
+dev_t swsusp_resume_device;
+sector_t swsusp_resume_block;
+
+enum {
+	HIBERNATION_INVALID,
+	HIBERNATION_PLATFORM,
+	HIBERNATION_TEST,
+	HIBERNATION_TESTPROC,
+	HIBERNATION_SHUTDOWN,
+	HIBERNATION_REBOOT,
+	/* keep last */
+	__HIBERNATION_AFTER_LAST
+};
+#define HIBERNATION_MAX (__HIBERNATION_AFTER_LAST-1)
+#define HIBERNATION_FIRST (HIBERNATION_INVALID + 1)
+
+static int hibernation_mode = HIBERNATION_SHUTDOWN;
+
+static struct platform_hibernation_ops *hibernation_ops;
+
+/**
+ * hibernation_set_ops - set the global hibernate operations
+ * @ops: the hibernation operations to use in subsequent hibernation transitions
+ */
+
+void hibernation_set_ops(struct platform_hibernation_ops *ops)
+{
+	if (ops && !(ops->begin && ops->end &&  ops->pre_snapshot
+	    && ops->prepare && ops->finish && ops->enter && ops->pre_restore
+	    && ops->restore_cleanup)) {
+		WARN_ON(1);
+		return;
+	}
+	mutex_lock(&pm_mutex);
+	hibernation_ops = ops;
+	if (ops)
+		hibernation_mode = HIBERNATION_PLATFORM;
+	else if (hibernation_mode == HIBERNATION_PLATFORM)
+		hibernation_mode = HIBERNATION_SHUTDOWN;
+
+	mutex_unlock(&pm_mutex);
+}
+
+static bool entering_platform_hibernation;
+
+bool system_entering_hibernation(void)
+{
+	return entering_platform_hibernation;
+}
+EXPORT_SYMBOL(system_entering_hibernation);
+
+#ifdef CONFIG_PM_DEBUG
+static void hibernation_debug_sleep(void)
+{
+	printk(KERN_INFO "hibernation debug: Waiting for 5 seconds.\n");
+	mdelay(5000);
+}
+
+static int hibernation_testmode(int mode)
+{
+	if (hibernation_mode == mode) {
+		hibernation_debug_sleep();
+		return 1;
+	}
+	return 0;
+}
+
+static int hibernation_test(int level)
+{
+	if (pm_test_level == level) {
+		hibernation_debug_sleep();
+		return 1;
+	}
+	return 0;
+}
+#else /* !CONFIG_PM_DEBUG */
+static int hibernation_testmode(int mode) { return 0; }
+static int hibernation_test(int level) { return 0; }
+#endif /* !CONFIG_PM_DEBUG */
+
+/**
+ *	platform_begin - tell the platform driver that we're starting
+ *	hibernation
+ */
+
+static int platform_begin(int platform_mode)
+{
+	return (platform_mode && hibernation_ops) ?
+		hibernation_ops->begin() : 0;
+}
+
+/**
+ *	platform_end - tell the platform driver that we've entered the
+ *	working state
+ */
+
+static void platform_end(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->end();
+}
+
+/**
+ *	platform_pre_snapshot - prepare the machine for hibernation using the
+ *	platform driver if so configured and return an error code if it fails
+ */
+
+static int platform_pre_snapshot(int platform_mode)
+{
+	return (platform_mode && hibernation_ops) ?
+		hibernation_ops->pre_snapshot() : 0;
+}
+
+/**
+ *	platform_leave - prepare the machine for switching to the normal mode
+ *	of operation using the platform driver (called with interrupts disabled)
+ */
+
+static void platform_leave(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->leave();
+}
+
+/**
+ *	platform_finish - switch the machine to the normal mode of operation
+ *	using the platform driver (must be called after platform_prepare())
+ */
+
+static void platform_finish(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->finish();
+}
+
+/**
+ *	platform_pre_restore - prepare the platform for the restoration from a
+ *	hibernation image.  If the restore fails after this function has been
+ *	called, platform_restore_cleanup() must be called.
+ */
+
+static int platform_pre_restore(int platform_mode)
+{
+	return (platform_mode && hibernation_ops) ?
+		hibernation_ops->pre_restore() : 0;
+}
+
+/**
+ *	platform_restore_cleanup - switch the platform to the normal mode of
+ *	operation after a failing restore.  If platform_pre_restore() has been
+ *	called before the failing restore, this function must be called too,
+ *	regardless of the result of platform_pre_restore().
+ */
+
+static void platform_restore_cleanup(int platform_mode)
+{
+	if (platform_mode && hibernation_ops)
+		hibernation_ops->restore_cleanup();
+}
+
+/**
+ *	platform_recover - recover the platform from a failure to suspend
+ *	devices.
+ */
+
+static void platform_recover(int platform_mode)
+{
+	if (platform_mode && hibernation_ops && hibernation_ops->recover)
+		hibernation_ops->recover();
+}
+
+/**
+ *	create_image - freeze devices that need to be frozen with interrupts
+ *	off, create the hibernation image and thaw those devices.  Control
+ *	reappears in this routine after a restore.
+ */
+
+static int create_image(int platform_mode)
+{
+	int error;
+
+	error = arch_prepare_suspend();
+	if (error)
+		return error;
+
+	/* At this point, dpm_suspend_start() has been called, but *not*
+	 * dpm_suspend_noirq(). We *must* call dpm_suspend_noirq() now.
+	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
+	 * become desynchronized with the actual state of the hardware
+	 * at resume time, and evil weirdness ensues.
+	 */
+	error = dpm_suspend_noirq(PMSG_FREEZE);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down, "
+			"aborting hibernation\n");
+		return error;
+	}
+
+	error = platform_pre_snapshot(platform_mode);
+	if (error || hibernation_test(TEST_PLATFORM))
+		goto Platform_finish;
+
+	error = disable_nonboot_cpus();
+	if (error || hibernation_test(TEST_CPUS)
+	    || hibernation_testmode(HIBERNATION_TEST))
+		goto Enable_cpus;
+
+	local_irq_disable();
+
+	error = sysdev_suspend(PMSG_FREEZE);
+	if (error) {
+		printk(KERN_ERR "PM: Some system devices failed to power down, "
+			"aborting hibernation\n");
+		goto Enable_irqs;
+	}
+
+	if (hibernation_test(TEST_CORE))
+		goto Power_up;
+
+	in_suspend = 1;
+	save_processor_state();
+	error = swsusp_arch_suspend();
+	if (error)
+		printk(KERN_ERR "PM: Error %d creating hibernation image\n",
+			error);
+	/* Restore control flow magically appears here */
+	restore_processor_state();
+	if (!in_suspend)
+		platform_leave(platform_mode);
+
+ Power_up:
+	sysdev_resume();
+	/* NOTE:  dpm_resume_noirq() is just a resume() for devices
+	 * that suspended with irqs off ... no overall powerup.
+	 */
+
+ Enable_irqs:
+	local_irq_enable();
+
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Platform_finish:
+	platform_finish(platform_mode);
+
+	dpm_resume_noirq(in_suspend ?
+		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+
+	return error;
+}
+
+/**
+ *	hibernation_snapshot - quiesce devices and create the hibernation
+ *	snapshot image.
+ *	@platform_mode - if set, use the platform driver, if available, to
+ *			 prepare the platform firmware for the power transition.
+ *
+ *	Must be called with pm_mutex held
+ */
+
+int hibernation_snapshot(int platform_mode)
+{
+	int error;
+
+	error = platform_begin(platform_mode);
+	if (error)
+		return error;
+
+	/* Free memory before shutting down devices. */
+	error = swsusp_shrink_memory();
+	if (error)
+		goto Close;
+
+	suspend_console();
+	error = dpm_suspend_start(PMSG_FREEZE);
+	if (error)
+		goto Recover_platform;
+
+	if (hibernation_test(TEST_DEVICES))
+		goto Recover_platform;
+
+	error = create_image(platform_mode);
+	/* Control returns here after successful restore */
+
+ Resume_devices:
+	dpm_resume_end(in_suspend ?
+		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+	resume_console();
+ Close:
+	platform_end(platform_mode);
+	return error;
+
+ Recover_platform:
+	platform_recover(platform_mode);
+	goto Resume_devices;
+}
+
+/**
+ *	resume_target_kernel - prepare devices that need to be suspended with
+ *	interrupts off, restore the contents of highmem that have not been
+ *	restored yet from the image and run the low level code that will restore
+ *	the remaining contents of memory and switch to the just restored target
+ *	kernel.
+ */
+
+static int resume_target_kernel(bool platform_mode)
+{
+	int error;
+
+	error = dpm_suspend_noirq(PMSG_QUIESCE);
+	if (error) {
+		printk(KERN_ERR "PM: Some devices failed to power down, "
+			"aborting resume\n");
+		return error;
+	}
+
+	error = platform_pre_restore(platform_mode);
+	if (error)
+		goto Cleanup;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Enable_cpus;
+
+	local_irq_disable();
+
+	error = sysdev_suspend(PMSG_QUIESCE);
+	if (error)
+		goto Enable_irqs;
+
+	/* We'll ignore saved state, but this gets preempt count (etc) right */
+	save_processor_state();
+	error = restore_highmem();
+	if (!error) {
+		error = swsusp_arch_resume();
+		/*
+		 * The code below is only ever reached in case of a failure.
+		 * Otherwise execution continues at place where
+		 * swsusp_arch_suspend() was called
+		 */
+		BUG_ON(!error);
+		/* This call to restore_highmem() undos the previous one */
+		restore_highmem();
+	}
+	/*
+	 * The only reason why swsusp_arch_resume() can fail is memory being
+	 * very tight, so we have to free it as soon as we can to avoid
+	 * subsequent failures
+	 */
+	swsusp_free();
+	restore_processor_state();
+	touch_softlockup_watchdog();
+
+	sysdev_resume();
+
+ Enable_irqs:
+	local_irq_enable();
+
+ Enable_cpus:
+	enable_nonboot_cpus();
+
+ Cleanup:
+	platform_restore_cleanup(platform_mode);
+
+	dpm_resume_noirq(PMSG_RECOVER);
+
+	return error;
+}
+
+/**
+ *	hibernation_restore - quiesce devices and restore the hibernation
+ *	snapshot image.  If successful, control returns in hibernation_snaphot()
+ *	@platform_mode - if set, use the platform driver, if available, to
+ *			 prepare the platform firmware for the transition.
+ *
+ *	Must be called with pm_mutex held
+ */
+
+int hibernation_restore(int platform_mode)
+{
+	int error;
+
+	pm_prepare_console();
+	suspend_console();
+	error = dpm_suspend_start(PMSG_QUIESCE);
+	if (!error) {
+		error = resume_target_kernel(platform_mode);
+		dpm_resume_end(PMSG_RECOVER);
+	}
+	resume_console();
+	pm_restore_console();
+	return error;
+}
+
+/**
+ *	hibernation_platform_enter - enter the hibernation state using the
+ *	platform driver (if available)
+ */
+
+int hibernation_platform_enter(void)
+{
+	int error;
+
+	if (!hibernation_ops)
+		return -ENOSYS;
+
+	/*
+	 * We have cancelled the power transition by running
+	 * hibernation_ops->finish() before saving the image, so we should let
+	 * the firmware know that we're going to enter the sleep state after all
+	 */
+	error = hibernation_ops->begin();
+	if (error)
+		goto Close;
+
+	entering_platform_hibernation = true;
+	suspend_console();
+	error = dpm_suspend_start(PMSG_HIBERNATE);
+	if (error) {
+		if (hibernation_ops->recover)
+			hibernation_ops->recover();
+		goto Resume_devices;
+	}
+
+	error = dpm_suspend_noirq(PMSG_HIBERNATE);
+	if (error)
+		goto Resume_devices;
+
+	error = hibernation_ops->prepare();
+	if (error)
+		goto Platofrm_finish;
+
+	error = disable_nonboot_cpus();
+	if (error)
+		goto Platofrm_finish;
+
+	local_irq_disable();
+	sysdev_suspend(PMSG_HIBERNATE);
+	hibernation_ops->enter();
+	/* We should never get here */
+	while (1);
+
+	/*
+	 * We don't need to reenable the nonboot CPUs or resume consoles, since
+	 * the system is going to be halted anyway.
+	 */
+ Platofrm_finish:
+	hibernation_ops->finish();
+
+	dpm_suspend_noirq(PMSG_RESTORE);
+
+ Resume_devices:
+	entering_platform_hibernation = false;
+	dpm_resume_end(PMSG_RESTORE);
+	resume_console();
+
+ Close:
+	hibernation_ops->end();
+
+	return error;
+}
+
+/**
+ *	power_down - Shut the machine down for hibernation.
+ *
+ *	Use the platform driver, if configured so; otherwise try
+ *	to power off or reboot.
+ */
+
+static void power_down(void)
+{
+	switch (hibernation_mode) {
+	case HIBERNATION_TEST:
+	case HIBERNATION_TESTPROC:
+		break;
+	case HIBERNATION_REBOOT:
+		kernel_restart(NULL);
+		break;
+	case HIBERNATION_PLATFORM:
+		hibernation_platform_enter();
+	case HIBERNATION_SHUTDOWN:
+		kernel_power_off();
+		break;
+	}
+	kernel_halt();
+	/*
+	 * Valid image is on the disk, if we continue we risk serious data
+	 * corruption after resume.
+	 */
+	printk(KERN_CRIT "PM: Please power down manually\n");
+	while(1);
+}
+
+static int prepare_processes(void)
+{
+	int error = 0;
+
+	if (freeze_processes()) {
+		error = -EBUSY;
+		thaw_processes();
+	}
+	return error;
+}
+
+/**
+ *	hibernate - The granpappy of the built-in hibernation management
+ */
+
+int hibernate(void)
+{
+	int error;
+
+	mutex_lock(&pm_mutex);
+	/* The snapshot device should not be opened while we're running */
+	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
+		error = -EBUSY;
+		goto Unlock;
+	}
+
+	pm_prepare_console();
+	error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
+	if (error)
+		goto Exit;
+
+	error = usermodehelper_disable();
+	if (error)
+		goto Exit;
+
+	/* Allocate memory management structures */
+	error = create_basic_memory_bitmaps();
+	if (error)
+		goto Exit;
+
+	printk(KERN_INFO "PM: Syncing filesystems ... ");
+	sys_sync();
+	printk("done.\n");
+
+	error = prepare_processes();
+	if (error)
+		goto Finish;
+
+	if (hibernation_test(TEST_FREEZER))
+		goto Thaw;
+
+	if (hibernation_testmode(HIBERNATION_TESTPROC))
+		goto Thaw;
+
+	error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
+	if (in_suspend && !error) {
+		unsigned int flags = 0;
+
+		if (hibernation_mode == HIBERNATION_PLATFORM)
+			flags |= SF_PLATFORM_MODE;
+		pr_debug("PM: writing image.\n");
+		error = swsusp_write(flags);
+		swsusp_free();
+		if (!error)
+			power_down();
+	} else {
+		pr_debug("PM: Image restored successfully.\n");
+		swsusp_free();
+	}
+ Thaw:
+	thaw_processes();
+ Finish:
+	free_basic_memory_bitmaps();
+	usermodehelper_enable();
+ Exit:
+	pm_notifier_call_chain(PM_POST_HIBERNATION);
+	pm_restore_console();
+	atomic_inc(&snapshot_device_available);
+ Unlock:
+	mutex_unlock(&pm_mutex);
+	return error;
+}
+
+
+/**
+ *	software_resume - Resume from a saved image.
+ *
+ *	Called as a late_initcall (so all devices are discovered and
+ *	initialized), we call swsusp to see if we have a saved image or not.
+ *	If so, we quiesce devices, the restore the saved image. We will
+ *	return above (in hibernate() ) if everything goes well.
+ *	Otherwise, we fail gracefully and return to the normally
+ *	scheduled program.
+ *
+ */
+
+static int software_resume(void)
+{
+	int error;
+	unsigned int flags;
+
+	/*
+	 * If the user said "noresume".. bail out early.
+	 */
+	if (noresume)
+		return 0;
+
+	/*
+	 * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
+	 * is configured into the kernel. Since the regular hibernate
+	 * trigger path is via sysfs which takes a buffer mutex before
+	 * calling hibernate functions (which take pm_mutex) this can
+	 * cause lockdep to complain about a possible ABBA deadlock
+	 * which cannot happen since we're in the boot code here and
+	 * sysfs can't be invoked yet. Therefore, we use a subclass
+	 * here to avoid lockdep complaining.
+	 */
+	mutex_lock_nested(&pm_mutex, SINGLE_DEPTH_NESTING);
+
+	if (swsusp_resume_device)
+		goto Check_image;
+
+	if (!strlen(resume_file)) {
+		error = -ENOENT;
+		goto Unlock;
+	}
+
+	pr_debug("PM: Checking image partition %s\n", resume_file);
+
+	/* Check if the device is there */
+	swsusp_resume_device = name_to_dev_t(resume_file);
+	if (!swsusp_resume_device) {
+		/*
+		 * Some device discovery might still be in progress; we need
+		 * to wait for this to finish.
+		 */
+		wait_for_device_probe();
+		/*
+		 * We can't depend on SCSI devices being available after loading
+		 * one of their modules until scsi_complete_async_scans() is
+		 * called and the resume device usually is a SCSI one.
+		 */
+		scsi_complete_async_scans();
+
+		swsusp_resume_device = name_to_dev_t(resume_file);
+		if (!swsusp_resume_device) {
+			error = -ENODEV;
+			goto Unlock;
+		}
+	}
+
+ Check_image:
+	pr_debug("PM: Resume from partition %d:%d\n",
+		MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
+
+	pr_debug("PM: Checking hibernation image.\n");
+	error = swsusp_check();
+	if (error)
+		goto Unlock;
+
+	/* The snapshot device should not be opened while we're running */
+	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
+		error = -EBUSY;
+		goto Unlock;
+	}
+
+	pm_prepare_console();
+	error = pm_notifier_call_chain(PM_RESTORE_PREPARE);
+	if (error)
+		goto Finish;
+
+	error = usermodehelper_disable();
+	if (error)
+		goto Finish;
+
+	error = create_basic_memory_bitmaps();
+	if (error)
+		goto Finish;
+
+	pr_debug("PM: Preparing processes for restore.\n");
+	error = prepare_processes();
+	if (error) {
+		swsusp_close(FMODE_READ);
+		goto Done;
+	}
+
+	pr_debug("PM: Reading hibernation image.\n");
+
+	error = swsusp_read(&flags);
+	if (!error)
+		hibernation_restore(flags & SF_PLATFORM_MODE);
+
+	printk(KERN_ERR "PM: Restore failed, recovering.\n");
+	swsusp_free();
+	thaw_processes();
+ Done:
+	free_basic_memory_bitmaps();
+	usermodehelper_enable();
+ Finish:
+	pm_notifier_call_chain(PM_POST_RESTORE);
+	pm_restore_console();
+	atomic_inc(&snapshot_device_available);
+	/* For success case, the suspend path will release the lock */
+ Unlock:
+	mutex_unlock(&pm_mutex);
+	pr_debug("PM: Resume from disk failed.\n");
+	return error;
+}
+
+late_initcall(software_resume);
+
+
+static const char * const hibernation_modes[] = {
+	[HIBERNATION_PLATFORM]	= "platform",
+	[HIBERNATION_SHUTDOWN]	= "shutdown",
+	[HIBERNATION_REBOOT]	= "reboot",
+	[HIBERNATION_TEST]	= "test",
+	[HIBERNATION_TESTPROC]	= "testproc",
+};
+
+/**
+ *	disk - Control hibernation mode
+ *
+ *	Suspend-to-disk can be handled in several ways. We have a few options
+ *	for putting the system to sleep - using the platform driver (e.g. ACPI
+ *	or other hibernation_ops), powering off the system or rebooting the
+ *	system (for testing) as well as the two test modes.
+ *
+ *	The system can support 'platform', and that is known a priori (and
+ *	encoded by the presence of hibernation_ops). However, the user may
+ *	choose 'shutdown' or 'reboot' as alternatives, as well as one fo the
+ *	test modes, 'test' or 'testproc'.
+ *
+ *	show() will display what the mode is currently set to.
+ *	store() will accept one of
+ *
+ *	'platform'
+ *	'shutdown'
+ *	'reboot'
+ *	'test'
+ *	'testproc'
+ *
+ *	It will only change to 'platform' if the system
+ *	supports it (as determined by having hibernation_ops).
+ */
+
+static ssize_t disk_show(struct kobject *kobj, struct kobj_attribute *attr,
+			 char *buf)
+{
+	int i;
+	char *start = buf;
+
+	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
+		if (!hibernation_modes[i])
+			continue;
+		switch (i) {
+		case HIBERNATION_SHUTDOWN:
+		case HIBERNATION_REBOOT:
+		case HIBERNATION_TEST:
+		case HIBERNATION_TESTPROC:
+			break;
+		case HIBERNATION_PLATFORM:
+			if (hibernation_ops)
+				break;
+			/* not a valid mode, continue with loop */
+			continue;
+		}
+		if (i == hibernation_mode)
+			buf += sprintf(buf, "[%s] ", hibernation_modes[i]);
+		else
+			buf += sprintf(buf, "%s ", hibernation_modes[i]);
+	}
+	buf += sprintf(buf, "\n");
+	return buf-start;
+}
+
+
+static ssize_t disk_store(struct kobject *kobj, struct kobj_attribute *attr,
+			  const char *buf, size_t n)
+{
+	int error = 0;
+	int i;
+	int len;
+	char *p;
+	int mode = HIBERNATION_INVALID;
+
+	p = memchr(buf, '\n', n);
+	len = p ? p - buf : n;
+
+	mutex_lock(&pm_mutex);
+	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
+		if (len == strlen(hibernation_modes[i])
+		    && !strncmp(buf, hibernation_modes[i], len)) {
+			mode = i;
+			break;
+		}
+	}
+	if (mode != HIBERNATION_INVALID) {
+		switch (mode) {
+		case HIBERNATION_SHUTDOWN:
+		case HIBERNATION_REBOOT:
+		case HIBERNATION_TEST:
+		case HIBERNATION_TESTPROC:
+			hibernation_mode = mode;
+			break;
+		case HIBERNATION_PLATFORM:
+			if (hibernation_ops)
+				hibernation_mode = mode;
+			else
+				error = -EINVAL;
+		}
+	} else
+		error = -EINVAL;
+
+	if (!error)
+		pr_debug("PM: Hibernation mode set to '%s'\n",
+			 hibernation_modes[mode]);
+	mutex_unlock(&pm_mutex);
+	return error ? error : n;
+}
+
+power_attr(disk);
+
+static ssize_t resume_show(struct kobject *kobj, struct kobj_attribute *attr,
+			   char *buf)
+{
+	return sprintf(buf,"%d:%d\n", MAJOR(swsusp_resume_device),
+		       MINOR(swsusp_resume_device));
+}
+
+static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
+			    const char *buf, size_t n)
+{
+	unsigned int maj, min;
+	dev_t res;
+	int ret = -EINVAL;
+
+	if (sscanf(buf, "%u:%u", &maj, &min) != 2)
+		goto out;
+
+	res = MKDEV(maj,min);
+	if (maj != MAJOR(res) || min != MINOR(res))
+		goto out;
+
+	mutex_lock(&pm_mutex);
+	swsusp_resume_device = res;
+	mutex_unlock(&pm_mutex);
+	printk(KERN_INFO "PM: Starting manual resume from disk\n");
+	noresume = 0;
+	software_resume();
+	ret = n;
+ out:
+	return ret;
+}
+
+power_attr(resume);
+
+static ssize_t image_size_show(struct kobject *kobj, struct kobj_attribute *attr,
+			       char *buf)
+{
+	return sprintf(buf, "%lu\n", image_size);
+}
+
+static ssize_t image_size_store(struct kobject *kobj, struct kobj_attribute *attr,
+				const char *buf, size_t n)
+{
+	unsigned long size;
+
+	if (sscanf(buf, "%lu", &size) == 1) {
+		image_size = size;
+		return n;
+	}
+
+	return -EINVAL;
+}
+
+power_attr(image_size);
+
+static struct attribute * g[] = {
+	&disk_attr.attr,
+	&resume_attr.attr,
+	&image_size_attr.attr,
+	NULL,
+};
+
+
+static struct attribute_group attr_group = {
+	.attrs = g,
+};
+
+
+static int __init pm_disk_init(void)
+{
+	return sysfs_create_group(power_kobj, &attr_group);
+}
+
+core_initcall(pm_disk_init);
+
+
+static int __init resume_setup(char *str)
+{
+	if (noresume)
+		return 1;
+
+	strncpy( resume_file, str, 255 );
+	return 1;
+}
+
+static int __init resume_offset_setup(char *str)
+{
+	unsigned long long offset;
+
+	if (noresume)
+		return 1;
+
+	if (sscanf(str, "%llu", &offset) == 1)
+		swsusp_resume_block = offset;
+
+	return 1;
+}
+
+static int __init noresume_setup(char *str)
+{
+	noresume = 1;
+	return 1;
+}
+
+__setup("noresume", noresume_setup);
+__setup("resume_offset=", resume_offset_setup);
+__setup("resume=", resume_setup);

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code
  2009-06-06 22:54 [RFC][PATCH 0/2] PM: Rearrange core suspend code Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2009-06-07 20:51 ` [RFC][PATCH 0/2] PM: Rearrange core suspend code Alan Stern
@ 2009-06-07 20:51 ` Alan Stern
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code) Rafael J. Wysocki
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
  5 siblings, 2 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-07 20:51 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, ACPI Devel Maling List, LKML

On Sun, 7 Jun 2009, Rafael J. Wysocki wrote:

> Hi,
> 
> Here's something I wanted to do quite some time ago.
> 
> kernel/power/main.c becomes more and more difficult to maintain over time,
> since it contains both the suspend to RAM core code and some common PM code
> that is also used for hibernation.  For this reason [1/2] separates the suspend
> to RAM code from main.c and puts it into two new files (the test facility is,
> again, separated from the core code for clarity).
> 
> [2/2] renames kernel/power/disk.c to kernel/power/hibernate.c, because the role
> of this file is analogous to kernel/power/suspend.c (introduced by [1/2]).
> 
> Comments welcome.

Looks like a good idea to me.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code
  2009-06-06 22:54 [RFC][PATCH 0/2] PM: Rearrange core suspend code Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2009-06-06 22:56 ` Rafael J. Wysocki
@ 2009-06-07 20:51 ` Alan Stern
  2009-06-07 20:51 ` [linux-pm] " Alan Stern
  5 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-07 20:51 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, pm list, LKML

On Sun, 7 Jun 2009, Rafael J. Wysocki wrote:

> Hi,
> 
> Here's something I wanted to do quite some time ago.
> 
> kernel/power/main.c becomes more and more difficult to maintain over time,
> since it contains both the suspend to RAM core code and some common PM code
> that is also used for hibernation.  For this reason [1/2] separates the suspend
> to RAM code from main.c and puts it into two new files (the test facility is,
> again, separated from the core code for clarity).
> 
> [2/2] renames kernel/power/disk.c to kernel/power/hibernate.c, because the role
> of this file is analogous to kernel/power/suspend.c (introduced by [1/2]).
> 
> Comments welcome.

Looks like a good idea to me.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 20:51 ` [linux-pm] " Alan Stern
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code) Rafael J. Wysocki
@ 2009-06-07 21:46   ` Rafael J. Wysocki
  2009-06-07 22:02     ` Run-time PM idea (was: " Oliver Neukum
                       ` (5 more replies)
  1 sibling, 6 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-07 21:46 UTC (permalink / raw)
  To: Alan Stern; +Cc: pm list, ACPI Devel Maling List, LKML, Magnus Damm

On Sunday 07 June 2009, Alan Stern wrote:
> On Sun, 7 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Hi,
> > 
> > Here's something I wanted to do quite some time ago.
> > 
> > kernel/power/main.c becomes more and more difficult to maintain over time,
> > since it contains both the suspend to RAM core code and some common PM code
> > that is also used for hibernation.  For this reason [1/2] separates the suspend
> > to RAM code from main.c and puts it into two new files (the test facility is,
> > again, separated from the core code for clarity).
> > 
> > [2/2] renames kernel/power/disk.c to kernel/power/hibernate.c, because the role
> > of this file is analogous to kernel/power/suspend.c (introduced by [1/2]).
> > 
> > Comments welcome.
> 
> Looks like a good idea to me.

Great, thanks for your feedback! :-)

BTW, I've been considering the run-time PM a bit recently and the result is
below (on top of this series).

I noticed that since resume can be scheduled while suspend is in progress,
we need two work structures in struct device, one for suspend and one for
resume.  Also, in theory, we may want to resume the device before the suspend
has a chance to run, so there should be some synchronization between them,
which is done with the help of the spinlock in dev_pm_info.

The general idea is that drivers or bus types may use pm_schedule_suspend()
to put a suspend request into the work queue and pm_schedule_resume() to
queue a resume request or cancel a pending suspend request.  There's no
requirement to use these functions, but I think they may be helpful in some
simple cases.

It may be necessary to resume a device synchronously, but I'm still thinking
how to implement that.

Please have a look.

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    6 +
 drivers/base/power/runtime.c |  163 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   36 ++++++++-
 include/linux/pm_runtime.h   |   82 +++++++++++++++++++++
 kernel/power/Kconfig         |   14 +++
 kernel/power/main.c          |   17 ++++
 7 files changed, 316 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -204,3 +204,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,8 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +167,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +193,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +330,31 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+enum rpm_state {
+	RPM_UNKNOWN = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	unsigned int		suspend_autocancel:1;
+	unsigned int		resume_autocancel:1;
+	unsigned int		suspend_aborted:1;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,163 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.resume_autocancel = false;
+	dev->power.suspend_autocancel = false;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	pm_lock_device(dev);
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	}
+	dev->power.suspend_autocancel = false;
+	dev->power.runtime_status = RPM_SUSPENDING;
+	pm_unlock_device(dev);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_SUSPENDED;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * __pm_schedule_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before @delay elapses.
+ */
+void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+			   bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+	dev->power.suspend_autocancel = autocancel;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_IDLE;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	pm_lock_device(dev);
+	dev->power.resume_autocancel = false;
+	if (dev->power.runtime_status != RPM_SUSPENDED)
+		goto out;
+	pm_unlock_device(dev);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_ACTIVE;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * __pm_schedule_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before pm_autoresume() can be run.
+ */
+void __pm_schedule_resume(struct device *dev, bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status == RPM_IDLE) {
+		dev->power.suspend_autocancel = false;
+		dev->power.suspend_aborted = true;
+		cancel_delayed_work(&dev->power.suspend_work);
+		dev->power.runtime_status = RPM_ACTIVE;
+	} else if (dev->power.runtime_status != RPM_ACTIVE) {
+		dev->power.resume_autocancel = autocancel;
+		INIT_WORK(&dev->power.resume_work, pm_autoresume);
+		queue_work(pm_wq, &dev->power.resume_work);
+	}
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_runtime_autocancel - Cancel run-time PM requests during system resume.
+ * @dev: Device to handle.
+ *
+ * If dev->power.suspend_autocancel is set during resume from a system sleep
+ * state, there is a run-time suspend request pending that has to be cancelled,
+ * so cancel it, and analogously for pending run-time resume requests.
+ *
+ * This function is only called by the PM core and must not be used by bus types
+ * and device drivers.  Moreover, it is called when the workqueue is frozen, so
+ * it is guaranteed that the autosuspend callbacks are not running at that time.
+ */
+void pm_runtime_autocancel(struct device *dev)
+{
+	pm_lock_device(dev);
+	if (dev->power.suspend_autocancel) {
+		cancel_delayed_work(&dev->power.suspend_work);
+		pm_runtime_reset(dev);
+	} else if (dev->power.resume_autocancel) {
+		work_clear_pending(&dev->power.resume_work);
+		pm_runtime_reset(dev);
+	}
+	pm_unlock_device(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,82 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+				   bool autocancel);
+extern void __pm_schedule_resume(struct device *dev, bool autocancel);
+extern void pm_runtime_autocancel(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void pm_lock_device(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+}
+
+static inline void pm_unlock_device(struct device *dev)
+{
+	spin_unlock(&dev->power.lock);
+}
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void __pm_schedule_suspend(struct device *dev,
+					  unsigned long delay,
+					  bool autocancel) {}
+static inline void __pm_schedule_resume(struct device *dev, bool autocancel) {}
+static inline void pm_runtime_autocancel(struct device *dev) {}
+
+static inline void pm_lock_device(struct device *dev) {}
+static inline void pm_unlock_device(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_schedule_suspend(struct device *dev, unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, false);
+}
+
+static inline void pm_schedule_suspend_autocancel(struct device *dev,
+						   unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, true);
+}
+
+static inline void pm_schedule_resume(struct device *dev)
+{
+	__pm_schedule_resume(dev, false);
+}
+
+static inline void pm_schedule_resume_autocancel(struct device *dev)
+{
+	__pm_schedule_resume(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -355,7 +357,7 @@ void dpm_resume_noirq(pm_message_t state
 	struct device *dev;
 
 	mutex_lock(&dpm_list_mtx);
-	list_for_each_entry(dev, &dpm_list, power.entry)
+	list_for_each_entry(dev, &dpm_list, power.entry) {
 		if (dev->power.status > DPM_OFF) {
 			int error;
 
@@ -364,6 +366,8 @@ void dpm_resume_noirq(pm_message_t state
 			if (error)
 				pm_dev_err(dev, state, " early", error);
 		}
+		pm_runtime_autocancel(dev);
+	}
 	mutex_unlock(&dpm_list_mtx);
 	resume_device_irqs();
 }

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 20:51 ` [linux-pm] " Alan Stern
@ 2009-06-07 21:46   ` Rafael J. Wysocki
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-07 21:46 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, pm list, LKML

On Sunday 07 June 2009, Alan Stern wrote:
> On Sun, 7 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Hi,
> > 
> > Here's something I wanted to do quite some time ago.
> > 
> > kernel/power/main.c becomes more and more difficult to maintain over time,
> > since it contains both the suspend to RAM core code and some common PM code
> > that is also used for hibernation.  For this reason [1/2] separates the suspend
> > to RAM code from main.c and puts it into two new files (the test facility is,
> > again, separated from the core code for clarity).
> > 
> > [2/2] renames kernel/power/disk.c to kernel/power/hibernate.c, because the role
> > of this file is analogous to kernel/power/suspend.c (introduced by [1/2]).
> > 
> > Comments welcome.
> 
> Looks like a good idea to me.

Great, thanks for your feedback! :-)

BTW, I've been considering the run-time PM a bit recently and the result is
below (on top of this series).

I noticed that since resume can be scheduled while suspend is in progress,
we need two work structures in struct device, one for suspend and one for
resume.  Also, in theory, we may want to resume the device before the suspend
has a chance to run, so there should be some synchronization between them,
which is done with the help of the spinlock in dev_pm_info.

The general idea is that drivers or bus types may use pm_schedule_suspend()
to put a suspend request into the work queue and pm_schedule_resume() to
queue a resume request or cancel a pending suspend request.  There's no
requirement to use these functions, but I think they may be helpful in some
simple cases.

It may be necessary to resume a device synchronously, but I'm still thinking
how to implement that.

Please have a look.

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    6 +
 drivers/base/power/runtime.c |  163 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   36 ++++++++-
 include/linux/pm_runtime.h   |   82 +++++++++++++++++++++
 kernel/power/Kconfig         |   14 +++
 kernel/power/main.c          |   17 ++++
 7 files changed, 316 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -204,3 +204,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,8 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +167,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +193,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +330,31 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+enum rpm_state {
+	RPM_UNKNOWN = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	unsigned int		suspend_autocancel:1;
+	unsigned int		resume_autocancel:1;
+	unsigned int		suspend_aborted:1;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,163 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.resume_autocancel = false;
+	dev->power.suspend_autocancel = false;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	pm_lock_device(dev);
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	}
+	dev->power.suspend_autocancel = false;
+	dev->power.runtime_status = RPM_SUSPENDING;
+	pm_unlock_device(dev);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_SUSPENDED;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * __pm_schedule_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before @delay elapses.
+ */
+void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+			   bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+	dev->power.suspend_autocancel = autocancel;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_IDLE;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	pm_lock_device(dev);
+	dev->power.resume_autocancel = false;
+	if (dev->power.runtime_status != RPM_SUSPENDED)
+		goto out;
+	pm_unlock_device(dev);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_ACTIVE;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * __pm_schedule_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before pm_autoresume() can be run.
+ */
+void __pm_schedule_resume(struct device *dev, bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status == RPM_IDLE) {
+		dev->power.suspend_autocancel = false;
+		dev->power.suspend_aborted = true;
+		cancel_delayed_work(&dev->power.suspend_work);
+		dev->power.runtime_status = RPM_ACTIVE;
+	} else if (dev->power.runtime_status != RPM_ACTIVE) {
+		dev->power.resume_autocancel = autocancel;
+		INIT_WORK(&dev->power.resume_work, pm_autoresume);
+		queue_work(pm_wq, &dev->power.resume_work);
+	}
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_runtime_autocancel - Cancel run-time PM requests during system resume.
+ * @dev: Device to handle.
+ *
+ * If dev->power.suspend_autocancel is set during resume from a system sleep
+ * state, there is a run-time suspend request pending that has to be cancelled,
+ * so cancel it, and analogously for pending run-time resume requests.
+ *
+ * This function is only called by the PM core and must not be used by bus types
+ * and device drivers.  Moreover, it is called when the workqueue is frozen, so
+ * it is guaranteed that the autosuspend callbacks are not running at that time.
+ */
+void pm_runtime_autocancel(struct device *dev)
+{
+	pm_lock_device(dev);
+	if (dev->power.suspend_autocancel) {
+		cancel_delayed_work(&dev->power.suspend_work);
+		pm_runtime_reset(dev);
+	} else if (dev->power.resume_autocancel) {
+		work_clear_pending(&dev->power.resume_work);
+		pm_runtime_reset(dev);
+	}
+	pm_unlock_device(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,82 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+				   bool autocancel);
+extern void __pm_schedule_resume(struct device *dev, bool autocancel);
+extern void pm_runtime_autocancel(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void pm_lock_device(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+}
+
+static inline void pm_unlock_device(struct device *dev)
+{
+	spin_unlock(&dev->power.lock);
+}
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void __pm_schedule_suspend(struct device *dev,
+					  unsigned long delay,
+					  bool autocancel) {}
+static inline void __pm_schedule_resume(struct device *dev, bool autocancel) {}
+static inline void pm_runtime_autocancel(struct device *dev) {}
+
+static inline void pm_lock_device(struct device *dev) {}
+static inline void pm_unlock_device(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_schedule_suspend(struct device *dev, unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, false);
+}
+
+static inline void pm_schedule_suspend_autocancel(struct device *dev,
+						   unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, true);
+}
+
+static inline void pm_schedule_resume(struct device *dev)
+{
+	__pm_schedule_resume(dev, false);
+}
+
+static inline void pm_schedule_resume_autocancel(struct device *dev)
+{
+	__pm_schedule_resume(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -355,7 +357,7 @@ void dpm_resume_noirq(pm_message_t state
 	struct device *dev;
 
 	mutex_lock(&dpm_list_mtx);
-	list_for_each_entry(dev, &dpm_list, power.entry)
+	list_for_each_entry(dev, &dpm_list, power.entry) {
 		if (dev->power.status > DPM_OFF) {
 			int error;
 
@@ -364,6 +366,8 @@ void dpm_resume_noirq(pm_message_t state
 			if (error)
 				pm_dev_err(dev, state, " early", error);
 		}
+		pm_runtime_autocancel(dev);
+	}
 	mutex_unlock(&dpm_list_mtx);
 	resume_device_irqs();
 }

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-07 22:02       ` Oliver Neukum
  2009-06-07 22:02       ` Oliver Neukum
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-07 22:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, pm list, ACPI Devel Maling List, LKML, Magnus Damm

Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> + * Use @work to get the device object the resume has been scheduled for,
> + * check if the device is really suspended and run the ->autoresume()
> callback + * from the device's bus type driver.  Update the run-time PM
> flags in the + * device object to reflect the current status of the device.
> + */
> +static void pm_autoresume(struct work_struct *work)
> +{

Why do you pass it a struct work pointer?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-07 22:02       ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-07 22:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, pm list, ACPI Devel Maling List, LKML, Magnus Damm

Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> + * Use @work to get the device object the resume has been scheduled for,
> + * check if the device is really suspended and run the ->autoresume()
> callback + * from the device's bus type driver.  Update the run-time PM
> flags in the + * device object to reflect the current status of the device.
> + */
> +static void pm_autoresume(struct work_struct *work)
> +{

Why do you pass it a struct work pointer?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-07 22:02     ` Oliver Neukum
  2009-06-07 22:02       ` Oliver Neukum
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-07 22:02 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, pm list, LKML

Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> + * Use @work to get the device object the resume has been scheduled for,
> + * check if the device is really suspended and run the ->autoresume()
> callback + * from the device's bus type driver.  Update the run-time PM
> flags in the + * device object to reflect the current status of the device.
> + */
> +static void pm_autoresume(struct work_struct *work)
> +{

Why do you pass it a struct work pointer?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
                       ` (2 preceding siblings ...)
  2009-06-07 22:05     ` Run-time PM idea (was: " Oliver Neukum
@ 2009-06-07 22:05     ` Oliver Neukum
  2009-06-08 11:29       ` Rafael J. Wysocki
  2009-06-08 11:29       ` [linux-pm] " Rafael J. Wysocki
  2009-06-08  6:54     ` Ingo Molnar
  2009-06-08  6:54     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  5 siblings, 2 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-07 22:05 UTC (permalink / raw)
  To: linux-pm; +Cc: Rafael J. Wysocki, Alan Stern, ACPI Devel Maling List, LKML

Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> It may be necessary to resume a device synchronously, but I'm still
> thinking how to implement that.

This will absolutely be the default. You resume a device because you want
it to do something now. It seems to me that you making your problem worse
by using a spinlock as a lock. A mutex would make it easier.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
  2009-06-07 22:02     ` Run-time PM idea (was: " Oliver Neukum
  2009-06-07 22:02       ` Oliver Neukum
@ 2009-06-07 22:05     ` Oliver Neukum
  2009-06-07 22:05     ` [linux-pm] " Oliver Neukum
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-07 22:05 UTC (permalink / raw)
  To: linux-pm; +Cc: ACPI Devel Maling List, LKML

Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> It may be necessary to resume a device synchronously, but I'm still
> thinking how to implement that.

This will absolutely be the default. You resume a device because you want
it to do something now. It seems to me that you making your problem worse
by using a spinlock as a lock. A mutex would make it easier.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core
  2009-06-06 22:55 ` Rafael J. Wysocki
@ 2009-06-08  6:36   ` Pavel Machek
  2009-06-08  6:36   ` Pavel Machek
  1 sibling, 0 replies; 199+ messages in thread
From: Pavel Machek @ 2009-06-08  6:36 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, LKML, ACPI Devel Maling List, Len Brown

On Sun 2009-06-07 00:55:54, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Move the suspend to RAM and standby code from kernel/power/main.c
> to two separate files, kernel/power/suspend.c containing the basic
> functions and kernel/power/suspend-test.c containing the automatic
> suspend test facility based on the RTC clock alarm.
> 
> There are no changes in functionality related to these modifications.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

ACK.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core
  2009-06-06 22:55 ` Rafael J. Wysocki
  2009-06-08  6:36   ` Pavel Machek
@ 2009-06-08  6:36   ` Pavel Machek
  1 sibling, 0 replies; 199+ messages in thread
From: Pavel Machek @ 2009-06-08  6:36 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, pm list, LKML

On Sun 2009-06-07 00:55:54, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Move the suspend to RAM and standby code from kernel/power/main.c
> to two separate files, kernel/power/suspend.c containing the basic
> functions and kernel/power/suspend-test.c containing the automatic
> suspend test facility based on the RTC clock alarm.
> 
> There are no changes in functionality related to these modifications.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

ACK.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [RFC][PATCH 2/2] PM/Hibernate: Rename disk.c to hibernate.c
  2009-06-06 22:56 ` Rafael J. Wysocki
  2009-06-08  6:37   ` Pavel Machek
@ 2009-06-08  6:37   ` Pavel Machek
  1 sibling, 0 replies; 199+ messages in thread
From: Pavel Machek @ 2009-06-08  6:37 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: pm list, LKML, ACPI Devel Maling List, Len Brown

On Sun 2009-06-07 00:56:54, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Change the name of kernel/power/disk.c to kernel/power/hibernate.c
> in analogy with the file names introduced by the changes that
> separated the suspend to RAM and standby funtionality from the
> common PM functions.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

ACK.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [RFC][PATCH 2/2] PM/Hibernate: Rename disk.c to hibernate.c
  2009-06-06 22:56 ` Rafael J. Wysocki
@ 2009-06-08  6:37   ` Pavel Machek
  2009-06-08  6:37   ` Pavel Machek
  1 sibling, 0 replies; 199+ messages in thread
From: Pavel Machek @ 2009-06-08  6:37 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, pm list, LKML

On Sun 2009-06-07 00:56:54, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Change the name of kernel/power/disk.c to kernel/power/hibernate.c
> in analogy with the file names introduced by the changes that
> separated the suspend to RAM and standby funtionality from the
> common PM functions.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

ACK.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
                       ` (4 preceding siblings ...)
  2009-06-08  6:54     ` Ingo Molnar
@ 2009-06-08  6:54     ` Ingo Molnar
  2009-06-08 11:30       ` Rafael J. Wysocki
  2009-06-08 11:30       ` Rafael J. Wysocki
  5 siblings, 2 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08  6:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, pm list, ACPI Devel Maling List, LKML, Magnus Damm


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> +config PM_RUNTIME
> +	bool "Run-time PM core functionality"
> +	depends on PM
> +	---help---
> +	  Enable functionality allowing I/O devices to be put into energy-saving
> +	  (low power) states at run time (or autosuspended) after a specified
> +	  period of inactivity and woken up in response to a hardware-generated
> +	  wake-up event or a driver's request.
> +
> +	  Hardware support is generally required for this functionality to work
> +	  and the bus type drivers of the buses the devices are on are
> +	  responsibile for the actual handling of the autosuspend requests and
> +	  wake-up events.

Halleluya! :-)

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
                       ` (3 preceding siblings ...)
  2009-06-07 22:05     ` [linux-pm] " Oliver Neukum
@ 2009-06-08  6:54     ` Ingo Molnar
  2009-06-08  6:54     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  5 siblings, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08  6:54 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, pm list, LKML


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> +config PM_RUNTIME
> +	bool "Run-time PM core functionality"
> +	depends on PM
> +	---help---
> +	  Enable functionality allowing I/O devices to be put into energy-saving
> +	  (low power) states at run time (or autosuspended) after a specified
> +	  period of inactivity and woken up in response to a hardware-generated
> +	  wake-up event or a driver's request.
> +
> +	  Hardware support is generally required for this functionality to work
> +	  and the bus type drivers of the buses the devices are on are
> +	  responsibile for the actual handling of the autosuspend requests and
> +	  wake-up events.

Halleluya! :-)

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 22:05     ` [linux-pm] " Oliver Neukum
  2009-06-08 11:29       ` Rafael J. Wysocki
@ 2009-06-08 11:29       ` Rafael J. Wysocki
  2009-06-08 12:04         ` Oliver Neukum
                           ` (3 more replies)
  1 sibling, 4 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 11:29 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-pm, Alan Stern, ACPI Devel Maling List, LKML

On Monday 08 June 2009, Oliver Neukum wrote:
> Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> > It may be necessary to resume a device synchronously, but I'm still
> > thinking how to implement that.
> 
> This will absolutely be the default. You resume a device because you want
> it to do something now. It seems to me that you making your problem worse
> by using a spinlock as a lock. A mutex would make it easier.

But I need to be able to call __pm_schedule_resume() (at least) from interrupt
context and I can't use a mutex from there.  Otherwise I'd have used a mutex. :-)

Anyway, below is a version with synchronous resume.

Thanks,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    6 -
 drivers/base/power/runtime.c |  223 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   36 ++++++
 include/linux/pm_runtime.h   |   82 +++++++++++++++
 kernel/power/Kconfig         |   14 ++
 kernel/power/main.c          |   17 +++
 7 files changed, 376 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -204,3 +204,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,8 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +167,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +193,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +330,31 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+enum rpm_state {
+	RPM_UNKNOWN = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	unsigned int		suspend_autocancel:1;
+	unsigned int		resume_autocancel:1;
+	unsigned int		suspend_aborted:1;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,223 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.resume_autocancel = false;
+	dev->power.suspend_autocancel = false;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	pm_lock_device(dev);
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	}
+	dev->power.suspend_autocancel = false;
+	dev->power.runtime_status = RPM_SUSPENDING;
+	pm_unlock_device(dev);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_SUSPENDED;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * __pm_schedule_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before @delay elapses.
+ */
+void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+			   bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+	dev->power.suspend_autocancel = autocancel;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_IDLE;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	pm_lock_device(dev);
+	dev->power.resume_autocancel = false;
+	if (dev->power.runtime_status != RPM_SUSPENDED)
+		goto out;
+	pm_unlock_device(dev);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_ACTIVE;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_autocancel = false;
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * __pm_schedule_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before pm_autoresume() can be run.
+ */
+void __pm_schedule_resume(struct device *dev, bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+	} else if (dev->power.runtime_status != RPM_ACTIVE) {
+		dev->power.resume_autocancel = autocancel;
+		queue_work(pm_wq, &dev->power.resume_work);
+	}
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	pm_lock_device(dev);
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		/*
+		 * The ->autosuspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		pm_unlock_device(dev);
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		pm_lock_device(dev);
+	}
+
+	if (dev->power.runtime_status != RPM_SUSPENDED) {
+		error = -EINVAL;
+		goto out;
+	}
+	pm_unlock_device(dev);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_ACTIVE;
+ out:
+	pm_unlock_device(dev);
+
+	return error;
+}
+
+/**
+ * pm_runtime_autocancel - Cancel run-time PM requests during system resume.
+ * @dev: Device to handle.
+ *
+ * If dev->power.suspend_autocancel is set during resume from a system sleep
+ * state, there is a run-time suspend request pending that has to be cancelled,
+ * so cancel it, and analogously for pending run-time resume requests.
+ *
+ * This function is only called by the PM core and must not be used by bus types
+ * and device drivers.  Moreover, it is called when the workqueue is frozen, so
+ * it is guaranteed that the autosuspend callbacks are not running at that time.
+ */
+void pm_runtime_autocancel(struct device *dev)
+{
+	pm_lock_device(dev);
+	if (dev->power.suspend_autocancel) {
+		cancel_delayed_work(&dev->power.suspend_work);
+		pm_runtime_reset(dev);
+	} else if (dev->power.resume_autocancel) {
+		work_clear_pending(&dev->power.resume_work);
+		pm_runtime_reset(dev);
+	}
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	INIT_WORK(&dev->power.resume_work, pm_autoresume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,82 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+				   bool autocancel);
+extern void __pm_schedule_resume(struct device *dev, bool autocancel);
+extern void pm_runtime_autocancel(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void pm_lock_device(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+}
+
+static inline void pm_unlock_device(struct device *dev)
+{
+	spin_unlock(&dev->power.lock);
+}
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void __pm_schedule_suspend(struct device *dev,
+					  unsigned long delay,
+					  bool autocancel) {}
+static inline void __pm_schedule_resume(struct device *dev, bool autocancel) {}
+static inline void pm_runtime_autocancel(struct device *dev) {}
+
+static inline void pm_lock_device(struct device *dev) {}
+static inline void pm_unlock_device(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_schedule_suspend(struct device *dev, unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, false);
+}
+
+static inline void pm_schedule_suspend_autocancel(struct device *dev,
+						   unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, true);
+}
+
+static inline void pm_schedule_resume(struct device *dev)
+{
+	__pm_schedule_resume(dev, false);
+}
+
+static inline void pm_schedule_resume_autocancel(struct device *dev)
+{
+	__pm_schedule_resume(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -355,7 +357,7 @@ void dpm_resume_noirq(pm_message_t state
 	struct device *dev;
 
 	mutex_lock(&dpm_list_mtx);
-	list_for_each_entry(dev, &dpm_list, power.entry)
+	list_for_each_entry(dev, &dpm_list, power.entry) {
 		if (dev->power.status > DPM_OFF) {
 			int error;
 
@@ -364,6 +366,8 @@ void dpm_resume_noirq(pm_message_t state
 			if (error)
 				pm_dev_err(dev, state, " early", error);
 		}
+		pm_runtime_autocancel(dev);
+	}
 	mutex_unlock(&dpm_list_mtx);
 	resume_device_irqs();
 }

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-07 22:05     ` [linux-pm] " Oliver Neukum
@ 2009-06-08 11:29       ` Rafael J. Wysocki
  2009-06-08 11:29       ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 11:29 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Monday 08 June 2009, Oliver Neukum wrote:
> Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> > It may be necessary to resume a device synchronously, but I'm still
> > thinking how to implement that.
> 
> This will absolutely be the default. You resume a device because you want
> it to do something now. It seems to me that you making your problem worse
> by using a spinlock as a lock. A mutex would make it easier.

But I need to be able to call __pm_schedule_resume() (at least) from interrupt
context and I can't use a mutex from there.  Otherwise I'd have used a mutex. :-)

Anyway, below is a version with synchronous resume.

Thanks,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    6 -
 drivers/base/power/runtime.c |  223 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   36 ++++++
 include/linux/pm_runtime.h   |   82 +++++++++++++++
 kernel/power/Kconfig         |   14 ++
 kernel/power/main.c          |   17 +++
 7 files changed, 376 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -204,3 +204,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,8 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +167,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +193,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +330,31 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+enum rpm_state {
+	RPM_UNKNOWN = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	unsigned int		suspend_autocancel:1;
+	unsigned int		resume_autocancel:1;
+	unsigned int		suspend_aborted:1;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,223 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.resume_autocancel = false;
+	dev->power.suspend_autocancel = false;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	pm_lock_device(dev);
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	}
+	dev->power.suspend_autocancel = false;
+	dev->power.runtime_status = RPM_SUSPENDING;
+	pm_unlock_device(dev);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_SUSPENDED;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * __pm_schedule_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before @delay elapses.
+ */
+void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+			   bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+	dev->power.suspend_autocancel = autocancel;
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_IDLE;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	pm_lock_device(dev);
+	dev->power.resume_autocancel = false;
+	if (dev->power.runtime_status != RPM_SUSPENDED)
+		goto out;
+	pm_unlock_device(dev);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_ACTIVE;
+ out:
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_autocancel = false;
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * __pm_schedule_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ * @autocancel: If set, the request will be cancelled during a resume from a
+ *	system-wide sleep state if it happens before pm_autoresume() can be run.
+ */
+void __pm_schedule_resume(struct device *dev, bool autocancel)
+{
+	pm_lock_device(dev);
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+	} else if (dev->power.runtime_status != RPM_ACTIVE) {
+		dev->power.resume_autocancel = autocancel;
+		queue_work(pm_wq, &dev->power.resume_work);
+	}
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	pm_lock_device(dev);
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		/*
+		 * The ->autosuspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		pm_unlock_device(dev);
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		pm_lock_device(dev);
+	}
+
+	if (dev->power.runtime_status != RPM_SUSPENDED) {
+		error = -EINVAL;
+		goto out;
+	}
+	pm_unlock_device(dev);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	pm_lock_device(dev);
+	dev->power.runtime_status = error ? RPM_UNKNOWN : RPM_ACTIVE;
+ out:
+	pm_unlock_device(dev);
+
+	return error;
+}
+
+/**
+ * pm_runtime_autocancel - Cancel run-time PM requests during system resume.
+ * @dev: Device to handle.
+ *
+ * If dev->power.suspend_autocancel is set during resume from a system sleep
+ * state, there is a run-time suspend request pending that has to be cancelled,
+ * so cancel it, and analogously for pending run-time resume requests.
+ *
+ * This function is only called by the PM core and must not be used by bus types
+ * and device drivers.  Moreover, it is called when the workqueue is frozen, so
+ * it is guaranteed that the autosuspend callbacks are not running at that time.
+ */
+void pm_runtime_autocancel(struct device *dev)
+{
+	pm_lock_device(dev);
+	if (dev->power.suspend_autocancel) {
+		cancel_delayed_work(&dev->power.suspend_work);
+		pm_runtime_reset(dev);
+	} else if (dev->power.resume_autocancel) {
+		work_clear_pending(&dev->power.resume_work);
+		pm_runtime_reset(dev);
+	}
+	pm_unlock_device(dev);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	INIT_WORK(&dev->power.resume_work, pm_autoresume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,82 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void __pm_schedule_suspend(struct device *dev, unsigned long delay,
+				   bool autocancel);
+extern void __pm_schedule_resume(struct device *dev, bool autocancel);
+extern void pm_runtime_autocancel(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void pm_lock_device(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+}
+
+static inline void pm_unlock_device(struct device *dev)
+{
+	spin_unlock(&dev->power.lock);
+}
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void __pm_schedule_suspend(struct device *dev,
+					  unsigned long delay,
+					  bool autocancel) {}
+static inline void __pm_schedule_resume(struct device *dev, bool autocancel) {}
+static inline void pm_runtime_autocancel(struct device *dev) {}
+
+static inline void pm_lock_device(struct device *dev) {}
+static inline void pm_unlock_device(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_schedule_suspend(struct device *dev, unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, false);
+}
+
+static inline void pm_schedule_suspend_autocancel(struct device *dev,
+						   unsigned long delay)
+{
+	__pm_schedule_suspend(dev, delay, true);
+}
+
+static inline void pm_schedule_resume(struct device *dev)
+{
+	__pm_schedule_resume(dev, false);
+}
+
+static inline void pm_schedule_resume_autocancel(struct device *dev)
+{
+	__pm_schedule_resume(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -355,7 +357,7 @@ void dpm_resume_noirq(pm_message_t state
 	struct device *dev;
 
 	mutex_lock(&dpm_list_mtx);
-	list_for_each_entry(dev, &dpm_list, power.entry)
+	list_for_each_entry(dev, &dpm_list, power.entry) {
 		if (dev->power.status > DPM_OFF) {
 			int error;
 
@@ -364,6 +366,8 @@ void dpm_resume_noirq(pm_message_t state
 			if (error)
 				pm_dev_err(dev, state, " early", error);
 		}
+		pm_runtime_autocancel(dev);
+	}
 	mutex_unlock(&dpm_list_mtx);
 	resume_device_irqs();
 }

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08  6:54     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 11:30       ` Rafael J. Wysocki
  2009-06-08 13:05         ` Ingo Molnar
  2009-06-08 13:05         ` Ingo Molnar
  2009-06-08 11:30       ` Rafael J. Wysocki
  1 sibling, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Stern, pm list, ACPI Devel Maling List, LKML, Magnus Damm

On Monday 08 June 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > +config PM_RUNTIME
> > +	bool "Run-time PM core functionality"
> > +	depends on PM
> > +	---help---
> > +	  Enable functionality allowing I/O devices to be put into energy-saving
> > +	  (low power) states at run time (or autosuspended) after a specified
> > +	  period of inactivity and woken up in response to a hardware-generated
> > +	  wake-up event or a driver's request.
> > +
> > +	  Hardware support is generally required for this functionality to work
> > +	  and the bus type drivers of the buses the devices are on are
> > +	  responsibile for the actual handling of the autosuspend requests and
> > +	  wake-up events.
> 
> Halleluya! :-)

I guess this means you like the general idea. ;-)

Well, we've been discussing it for quite a while and since more and more people
are interested, I'm giving it a high priority.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08  6:54     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 11:30       ` Rafael J. Wysocki
@ 2009-06-08 11:30       ` Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 11:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: ACPI Devel Maling List, pm list, LKML

On Monday 08 June 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > +config PM_RUNTIME
> > +	bool "Run-time PM core functionality"
> > +	depends on PM
> > +	---help---
> > +	  Enable functionality allowing I/O devices to be put into energy-saving
> > +	  (low power) states at run time (or autosuspended) after a specified
> > +	  period of inactivity and woken up in response to a hardware-generated
> > +	  wake-up event or a driver's request.
> > +
> > +	  Hardware support is generally required for this functionality to work
> > +	  and the bus type drivers of the buses the devices are on are
> > +	  responsibile for the actual handling of the autosuspend requests and
> > +	  wake-up events.
> 
> Halleluya! :-)

I guess this means you like the general idea. ;-)

Well, we've been discussing it for quite a while and since more and more people
are interested, I'm giving it a high priority.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 11:29       ` [linux-pm] " Rafael J. Wysocki
@ 2009-06-08 12:04         ` Oliver Neukum
  2009-06-08 18:34           ` Rafael J. Wysocki
  2009-06-08 18:34           ` Rafael J. Wysocki
  2009-06-08 12:04         ` Oliver Neukum
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-08 12:04 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, Alan Stern, ACPI Devel Maling List, LKML

Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:

> But I need to be able to call __pm_schedule_resume() (at least) from
> interrupt context and I can't use a mutex from there.  Otherwise I'd have
> used a mutex. :-)

I see.

> Anyway, below is a version with synchronous resume.

You are assuming autosuspend should always be with a delay. Why?

Secondly, you are not using a counter. Therefore only one driver can
control the PM state of a device at a given time. Is that wise?

> + * __pm_schedule_suspend - Schedule run-time suspend of given device.
> + * @dev: Device to suspend.
> + * @delay: Time to wait before attempting to suspend the device.

In which unit of time? If this is to go into kerneldoc that must be specified.

> + * @autocancel: If set, the request will be cancelled during a resume from
> a + *	system-wide sleep state if it happens before @delay elapses.

Why is this needed?

> + */
> +void __pm_schedule_suspend(struct device *dev, unsigned long delay,
> +			   bool autocancel)

[..]


> +
> +/**
> + * __pm_schedule_resume - Schedule run-time resume of given device.
> + * @dev: Device to resume.
> + * @autocancel: If set, the request will be cancelled during a resume from
> a + *	system-wide sleep state if it happens before pm_autoresume() can be
> run. + */

Eeek! This is a bad idea. You never want to a resume to be cancelled.

> +void __pm_schedule_resume(struct device *dev, bool autocancel)

[..]
> +int pm_resume_sync(struct device *dev)
> +{
> +	int error = 0;
> +
> +	pm_lock_device(dev);
> +	if (dev->power.runtime_status == RPM_IDLE) {
> +		/* ->autosuspend() hasn't started yet, no need to resume. */
> +		pm_cancel_suspend(dev);
> +		goto out;
> +	}
> +
> +	if (dev->power.runtime_status == RPM_SUSPENDING) {
> +		/*
> +		 * The ->autosuspend() callback is being executed right now,
> +		 * wait for it to complete.
> +		 */
> +		pm_unlock_device(dev);
> +		cancel_delayed_work_sync(&dev->power.suspend_work);

That is the most glorious abuse of an API I've seen this year :-)

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 11:29       ` [linux-pm] " Rafael J. Wysocki
  2009-06-08 12:04         ` Oliver Neukum
@ 2009-06-08 12:04         ` Oliver Neukum
  2009-06-08 20:35         ` Alan Stern
  2009-06-08 20:35           ` Alan Stern
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-08 12:04 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:

> But I need to be able to call __pm_schedule_resume() (at least) from
> interrupt context and I can't use a mutex from there.  Otherwise I'd have
> used a mutex. :-)

I see.

> Anyway, below is a version with synchronous resume.

You are assuming autosuspend should always be with a delay. Why?

Secondly, you are not using a counter. Therefore only one driver can
control the PM state of a device at a given time. Is that wise?

> + * __pm_schedule_suspend - Schedule run-time suspend of given device.
> + * @dev: Device to suspend.
> + * @delay: Time to wait before attempting to suspend the device.

In which unit of time? If this is to go into kerneldoc that must be specified.

> + * @autocancel: If set, the request will be cancelled during a resume from
> a + *	system-wide sleep state if it happens before @delay elapses.

Why is this needed?

> + */
> +void __pm_schedule_suspend(struct device *dev, unsigned long delay,
> +			   bool autocancel)

[..]


> +
> +/**
> + * __pm_schedule_resume - Schedule run-time resume of given device.
> + * @dev: Device to resume.
> + * @autocancel: If set, the request will be cancelled during a resume from
> a + *	system-wide sleep state if it happens before pm_autoresume() can be
> run. + */

Eeek! This is a bad idea. You never want to a resume to be cancelled.

> +void __pm_schedule_resume(struct device *dev, bool autocancel)

[..]
> +int pm_resume_sync(struct device *dev)
> +{
> +	int error = 0;
> +
> +	pm_lock_device(dev);
> +	if (dev->power.runtime_status == RPM_IDLE) {
> +		/* ->autosuspend() hasn't started yet, no need to resume. */
> +		pm_cancel_suspend(dev);
> +		goto out;
> +	}
> +
> +	if (dev->power.runtime_status == RPM_SUSPENDING) {
> +		/*
> +		 * The ->autosuspend() callback is being executed right now,
> +		 * wait for it to complete.
> +		 */
> +		pm_unlock_device(dev);
> +		cancel_delayed_work_sync(&dev->power.suspend_work);

That is the most glorious abuse of an API I've seen this year :-)

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 11:30       ` Rafael J. Wysocki
@ 2009-06-08 13:05         ` Ingo Molnar
  2009-06-08 13:11           ` Matthew Garrett
  2009-06-08 13:11           ` Matthew Garrett
  2009-06-08 13:05         ` Ingo Molnar
  1 sibling, 2 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 13:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, pm list, ACPI Devel Maling List, LKML, Magnus Damm


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Monday 08 June 2009, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > +config PM_RUNTIME
> > > +	bool "Run-time PM core functionality"
> > > +	depends on PM
> > > +	---help---
> > > +	  Enable functionality allowing I/O devices to be put into energy-saving
> > > +	  (low power) states at run time (or autosuspended) after a specified
> > > +	  period of inactivity and woken up in response to a hardware-generated
> > > +	  wake-up event or a driver's request.
> > > +
> > > +	  Hardware support is generally required for this functionality to work
> > > +	  and the bus type drivers of the buses the devices are on are
> > > +	  responsibile for the actual handling of the autosuspend requests and
> > > +	  wake-up events.
> > 
> > Halleluya! :-)
> 
> I guess this means you like the general idea. ;-)
> 
> Well, we've been discussing it for quite a while and since more 
> and more people are interested, I'm giving it a high priority.

Cool. I think that if within a few years we could achieve that every 
default distro (both on desktops and on servers) triggers PM 
functionality runtime on common hardware, we'd both have lower power 
consumption in general, and we'd have more robust suspend-resume 
code as well.

It would also be far more debuggable if the various suspend/resume 
bits were triggered and used independently and runtime, allowing 
bugs to be 'spread out'. Right now if they fail they fail in a very 
hard to debug spot (in the s2ram/s2disk codepaths), which makes 
their hacking rather challenging. (which i'm sure you are well aware 
of ;-)

So yes, i like the idea, a lot.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 11:30       ` Rafael J. Wysocki
  2009-06-08 13:05         ` Ingo Molnar
@ 2009-06-08 13:05         ` Ingo Molnar
  1 sibling, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 13:05 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, pm list, LKML


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Monday 08 June 2009, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > +config PM_RUNTIME
> > > +	bool "Run-time PM core functionality"
> > > +	depends on PM
> > > +	---help---
> > > +	  Enable functionality allowing I/O devices to be put into energy-saving
> > > +	  (low power) states at run time (or autosuspended) after a specified
> > > +	  period of inactivity and woken up in response to a hardware-generated
> > > +	  wake-up event or a driver's request.
> > > +
> > > +	  Hardware support is generally required for this functionality to work
> > > +	  and the bus type drivers of the buses the devices are on are
> > > +	  responsibile for the actual handling of the autosuspend requests and
> > > +	  wake-up events.
> > 
> > Halleluya! :-)
> 
> I guess this means you like the general idea. ;-)
> 
> Well, we've been discussing it for quite a while and since more 
> and more people are interested, I'm giving it a high priority.

Cool. I think that if within a few years we could achieve that every 
default distro (both on desktops and on servers) triggers PM 
functionality runtime on common hardware, we'd both have lower power 
consumption in general, and we'd have more robust suspend-resume 
code as well.

It would also be far more debuggable if the various suspend/resume 
bits were triggered and used independently and runtime, allowing 
bugs to be 'spread out'. Right now if they fail they fail in a very 
hard to debug spot (in the s2ram/s2disk codepaths), which makes 
their hacking rather challenging. (which i'm sure you are well aware 
of ;-)

So yes, i like the idea, a lot.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:05         ` Ingo Molnar
@ 2009-06-08 13:11           ` Matthew Garrett
  2009-06-08 13:22             ` Run-time PM idea (was: " Ingo Molnar
  2009-06-08 13:22             ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 13:11           ` Matthew Garrett
  1 sibling, 2 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm

On Mon, Jun 08, 2009 at 03:05:09PM +0200, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Well, we've been discussing it for quite a while and since more 
> > and more people are interested, I'm giving it a high priority.
> 
> Cool. I think that if within a few years we could achieve that every 
> default distro (both on desktops and on servers) triggers PM 
> functionality runtime on common hardware, we'd both have lower power 
> consumption in general, and we'd have more robust suspend-resume 
> code as well.

The difficulty is in determining when it's viable to autosuspend a given 
device. There's a limit to how much we can determine purely from kernel 
state (for instance, we could suspend ahci when there's no pending disk 
access, but we'd lose hotplug notifications) so there's going to have to 
be some level of userspace policy determination. Having the 
infrastructure in the kernel is an important part of this, but there'll 
be some distance to go after that.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:05         ` Ingo Molnar
  2009-06-08 13:11           ` Matthew Garrett
@ 2009-06-08 13:11           ` Matthew Garrett
  1 sibling, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 08, 2009 at 03:05:09PM +0200, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Well, we've been discussing it for quite a while and since more 
> > and more people are interested, I'm giving it a high priority.
> 
> Cool. I think that if within a few years we could achieve that every 
> default distro (both on desktops and on servers) triggers PM 
> functionality runtime on common hardware, we'd both have lower power 
> consumption in general, and we'd have more robust suspend-resume 
> code as well.

The difficulty is in determining when it's viable to autosuspend a given 
device. There's a limit to how much we can determine purely from kernel 
state (for instance, we could suspend ahci when there's no pending disk 
access, but we'd lose hotplug notifications) so there's going to have to 
be some level of userspace policy determination. Having the 
infrastructure in the kernel is an important part of this, but there'll 
be some distance to go after that.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:11           ` Matthew Garrett
  2009-06-08 13:22             ` Run-time PM idea (was: " Ingo Molnar
@ 2009-06-08 13:22             ` Ingo Molnar
  2009-06-08 13:32               ` Matthew Garrett
                                 ` (3 more replies)
  1 sibling, 4 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 13:22 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 03:05:09PM +0200, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > Well, we've been discussing it for quite a while and since more 
> > > and more people are interested, I'm giving it a high priority.
> > 
> > Cool. I think that if within a few years we could achieve that every 
> > default distro (both on desktops and on servers) triggers PM 
> > functionality runtime on common hardware, we'd both have lower power 
> > consumption in general, and we'd have more robust suspend-resume 
> > code as well.
> 
> The difficulty is in determining when it's viable to autosuspend a 
> given device. There's a limit to how much we can determine purely 
> from kernel state (for instance, we could suspend ahci when 
> there's no pending disk access, but we'd lose hotplug 
> notifications) so there's going to have to be some level of 
> userspace policy determination. Having the infrastructure in the 
> kernel is an important part of this, but there'll be some distance 
> to go after that.

What will the 'user space policy' bit do what the kernel cannot?

If you mean the user has to configure something manually - that wont 
really happen in practice. We are happy if they know where to put 
those USB sticks in ;-)

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:11           ` Matthew Garrett
@ 2009-06-08 13:22             ` Ingo Molnar
  2009-06-08 13:22             ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  1 sibling, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 13:22 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: LKML, ACPI Devel Maling List, pm list


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 03:05:09PM +0200, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > Well, we've been discussing it for quite a while and since more 
> > > and more people are interested, I'm giving it a high priority.
> > 
> > Cool. I think that if within a few years we could achieve that every 
> > default distro (both on desktops and on servers) triggers PM 
> > functionality runtime on common hardware, we'd both have lower power 
> > consumption in general, and we'd have more robust suspend-resume 
> > code as well.
> 
> The difficulty is in determining when it's viable to autosuspend a 
> given device. There's a limit to how much we can determine purely 
> from kernel state (for instance, we could suspend ahci when 
> there's no pending disk access, but we'd lose hotplug 
> notifications) so there's going to have to be some level of 
> userspace policy determination. Having the infrastructure in the 
> kernel is an important part of this, but there'll be some distance 
> to go after that.

What will the 'user space policy' bit do what the kernel cannot?

If you mean the user has to configure something manually - that wont 
really happen in practice. We are happy if they know where to put 
those USB sticks in ;-)

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:22             ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 13:32               ` Matthew Garrett
  2009-06-08 13:46                 ` Run-time PM idea (was: " Ingo Molnar
  2009-06-08 13:46                 ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 13:32               ` Run-time PM idea (was: " Matthew Garrett
                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm

On Mon, Jun 08, 2009 at 03:22:35PM +0200, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > The difficulty is in determining when it's viable to autosuspend a 
> > given device. There's a limit to how much we can determine purely 
> > from kernel state (for instance, we could suspend ahci when 
> > there's no pending disk access, but we'd lose hotplug 
> > notifications) so there's going to have to be some level of 
> > userspace policy determination. Having the infrastructure in the 
> > kernel is an important part of this, but there'll be some distance 
> > to go after that.
> 
> What will the 'user space policy' bit do what the kernel cannot?

How does the kernel know whether the user cares about SATA hotplug or 
not?

> If you mean the user has to configure something manually - that wont 
> really happen in practice. We are happy if they know where to put 
> those USB sticks in ;-)

It'll be up to the distributions to provide sane defaults and let them 
be reconfigured as necessary, depending on the information we have from 
the user and maybe platform-specific knowledge. But this is a difficult 
problem - we need to be smart about all the potential sources of 
information in order to pick an appropriate policy, and the kernel's not 
the right layer to do some of this information collection.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:22             ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 13:32               ` Matthew Garrett
@ 2009-06-08 13:32               ` Matthew Garrett
  2009-06-08 13:39               ` Oliver Neukum
  2009-06-08 13:39               ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
  3 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 08, 2009 at 03:22:35PM +0200, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > The difficulty is in determining when it's viable to autosuspend a 
> > given device. There's a limit to how much we can determine purely 
> > from kernel state (for instance, we could suspend ahci when 
> > there's no pending disk access, but we'd lose hotplug 
> > notifications) so there's going to have to be some level of 
> > userspace policy determination. Having the infrastructure in the 
> > kernel is an important part of this, but there'll be some distance 
> > to go after that.
> 
> What will the 'user space policy' bit do what the kernel cannot?

How does the kernel know whether the user cares about SATA hotplug or 
not?

> If you mean the user has to configure something manually - that wont 
> really happen in practice. We are happy if they know where to put 
> those USB sticks in ;-)

It'll be up to the distributions to provide sane defaults and let them 
be reconfigured as necessary, depending on the information we have from 
the user and maybe platform-specific knowledge. But this is a difficult 
problem - we need to be smart about all the potential sources of 
information in order to pick an appropriate policy, and the kernel's not 
the right layer to do some of this information collection.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:22             ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
                                 ` (2 preceding siblings ...)
  2009-06-08 13:39               ` Oliver Neukum
@ 2009-06-08 13:39               ` Oliver Neukum
  2009-06-08 13:44                 ` Run-time PM idea (was: " Matthew Garrett
                                   ` (3 more replies)
  3 siblings, 4 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-08 13:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matthew Garrett, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm

Am Montag, 8. Juni 2009 15:22:35 schrieb Ingo Molnar:

> What will the 'user space policy' bit do what the kernel cannot?
>
> If you mean the user has to configure something manually - that wont
> really happen in practice. We are happy if they know where to put
> those USB sticks in ;-)

User space need not be the user. Currently user space doesn't tell
the kernel how much functionality it needs. open/close give a binary
opposition which badly maps onto the graduated capabilities devices
have.

For example do you really need every key pressed while the screen saver
is running or is it enough for the keyboard to be able to generate a wakeup
event?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:22             ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 13:32               ` Matthew Garrett
  2009-06-08 13:32               ` Run-time PM idea (was: " Matthew Garrett
@ 2009-06-08 13:39               ` Oliver Neukum
  2009-06-08 13:39               ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-08 13:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

Am Montag, 8. Juni 2009 15:22:35 schrieb Ingo Molnar:

> What will the 'user space policy' bit do what the kernel cannot?
>
> If you mean the user has to configure something manually - that wont
> really happen in practice. We are happy if they know where to put
> those USB sticks in ;-)

User space need not be the user. Currently user space doesn't tell
the kernel how much functionality it needs. open/close give a binary
opposition which badly maps onto the graduated capabilities devices
have.

For example do you really need every key pressed while the screen saver
is running or is it enough for the keyboard to be able to generate a wakeup
event?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:39               ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
  2009-06-08 13:44                 ` Run-time PM idea (was: " Matthew Garrett
@ 2009-06-08 13:44                 ` Matthew Garrett
  2009-06-08 14:21                 ` Ingo Molnar
  2009-06-08 14:21                 ` Run-time PM idea (was: " Ingo Molnar
  3 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:44 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Ingo Molnar, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm

On Mon, Jun 08, 2009 at 03:39:19PM +0200, Oliver Neukum wrote:

> For example do you really need every key pressed while the screen saver
> is running or is it enough for the keyboard to be able to generate a wakeup
> event?

Depends whether you have a media player running or not...

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:39               ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
@ 2009-06-08 13:44                 ` Matthew Garrett
  2009-06-08 13:44                 ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:44 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: LKML, ACPI Devel Maling List, Ingo Molnar, pm list

On Mon, Jun 08, 2009 at 03:39:19PM +0200, Oliver Neukum wrote:

> For example do you really need every key pressed while the screen saver
> is running or is it enough for the keyboard to be able to generate a wakeup
> event?

Depends whether you have a media player running or not...

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:32               ` Matthew Garrett
  2009-06-08 13:46                 ` Run-time PM idea (was: " Ingo Molnar
@ 2009-06-08 13:46                 ` Ingo Molnar
  2009-06-08 13:54                   ` Run-time PM idea (was: " Matthew Garrett
                                     ` (3 more replies)
  1 sibling, 4 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 13:46 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 03:22:35PM +0200, Ingo Molnar wrote:
> > 
> > * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > > The difficulty is in determining when it's viable to autosuspend a 
> > > given device. There's a limit to how much we can determine purely 
> > > from kernel state (for instance, we could suspend ahci when 
> > > there's no pending disk access, but we'd lose hotplug 
> > > notifications) so there's going to have to be some level of 
> > > userspace policy determination. Having the infrastructure in the 
> > > kernel is an important part of this, but there'll be some distance 
> > > to go after that.
> > 
> > What will the 'user space policy' bit do what the kernel cannot?
> 
> How does the kernel know whether the user cares about SATA hotplug 
> or not?

The typical user probably doesnt know what 'SATA' means, and 
probably has very vague concepts about 'hotplug' as well.

The kernel default should be: 'yes, if the kernel feature is enabled 
and if the hardware can support it' (it's not on a blacklist of some 
sort, etc., etc.).

> > If you mean the user has to configure something manually - that 
> > wont really happen in practice. We are happy if they know where 
> > to put those USB sticks in ;-)
> 
> It'll be up to the distributions to provide sane defaults and let 
> them be reconfigured as necessary, depending on the information we 
> have from the user and maybe platform-specific knowledge. But this 
> is a difficult problem - we need to be smart about all the 
> potential sources of information in order to pick an appropriate 
> policy, and the kernel's not the right layer to do some of this 
> information collection.

What sources of information exactly? Again, the user wont enter 
anything, in 95% of the cases - in the remaining 3% of cases what is 
entered is wrong and only in another 2% of cases is it correct ;-)

Sane kernel defaults are important and the kernel sure knows what 
kind of hardware it runs on. This 'let the user decide policy' for 
something as fundamental (and also as arcane) as power saving mode 
is really a disease that has caused a lot of unnecessary pain in 
Linux in the past 15 years.

Sure, there might be tradeoffs when a piece of hardware cannot be 
turned off sanely: obviously the monitor might not know it 
(currently) whether someone is watching it, and 
wake-on-packet-for-me is not commonly implemented in wireless and 
wired networking cards so turning off an active networking card 
might not be possible without the user asking allowing that 
imperfect mode of power saving.

But there are plenty of cases where turning off hardware is fine, 
and the broken special cases will go away as technology advances, 
and we should not design based on broken concepts.

( Providing a way to _override_ those defaults is of course natural,
  via /sysfs, should the user express an interest in tweaking it, or
  should the kernel get it so wrong that a distro wants to work it
  around. But your argument seems to be "push configuration and
  handling into user-space" which is really backwards. )

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:32               ` Matthew Garrett
@ 2009-06-08 13:46                 ` Ingo Molnar
  2009-06-08 13:46                 ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  1 sibling, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 13:46 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: LKML, ACPI Devel Maling List, pm list


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 03:22:35PM +0200, Ingo Molnar wrote:
> > 
> > * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > > The difficulty is in determining when it's viable to autosuspend a 
> > > given device. There's a limit to how much we can determine purely 
> > > from kernel state (for instance, we could suspend ahci when 
> > > there's no pending disk access, but we'd lose hotplug 
> > > notifications) so there's going to have to be some level of 
> > > userspace policy determination. Having the infrastructure in the 
> > > kernel is an important part of this, but there'll be some distance 
> > > to go after that.
> > 
> > What will the 'user space policy' bit do what the kernel cannot?
> 
> How does the kernel know whether the user cares about SATA hotplug 
> or not?

The typical user probably doesnt know what 'SATA' means, and 
probably has very vague concepts about 'hotplug' as well.

The kernel default should be: 'yes, if the kernel feature is enabled 
and if the hardware can support it' (it's not on a blacklist of some 
sort, etc., etc.).

> > If you mean the user has to configure something manually - that 
> > wont really happen in practice. We are happy if they know where 
> > to put those USB sticks in ;-)
> 
> It'll be up to the distributions to provide sane defaults and let 
> them be reconfigured as necessary, depending on the information we 
> have from the user and maybe platform-specific knowledge. But this 
> is a difficult problem - we need to be smart about all the 
> potential sources of information in order to pick an appropriate 
> policy, and the kernel's not the right layer to do some of this 
> information collection.

What sources of information exactly? Again, the user wont enter 
anything, in 95% of the cases - in the remaining 3% of cases what is 
entered is wrong and only in another 2% of cases is it correct ;-)

Sane kernel defaults are important and the kernel sure knows what 
kind of hardware it runs on. This 'let the user decide policy' for 
something as fundamental (and also as arcane) as power saving mode 
is really a disease that has caused a lot of unnecessary pain in 
Linux in the past 15 years.

Sure, there might be tradeoffs when a piece of hardware cannot be 
turned off sanely: obviously the monitor might not know it 
(currently) whether someone is watching it, and 
wake-on-packet-for-me is not commonly implemented in wireless and 
wired networking cards so turning off an active networking card 
might not be possible without the user asking allowing that 
imperfect mode of power saving.

But there are plenty of cases where turning off hardware is fine, 
and the broken special cases will go away as technology advances, 
and we should not design based on broken concepts.

( Providing a way to _override_ those defaults is of course natural,
  via /sysfs, should the user express an interest in tweaking it, or
  should the kernel get it so wrong that a distro wants to work it
  around. But your argument seems to be "push configuration and
  handling into user-space" which is really backwards. )

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:46                 ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 13:54                   ` Run-time PM idea (was: " Matthew Garrett
@ 2009-06-08 13:54                   ` Matthew Garrett
  2009-06-08 14:24                     ` Run-time PM idea (was: " Ingo Molnar
  2009-06-08 14:24                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 13:58                   ` Oliver Neukum
  2009-06-08 13:58                     ` Oliver Neukum
  3 siblings, 2 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm

On Mon, Jun 08, 2009 at 03:46:47PM +0200, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > How does the kernel know whether the user cares about SATA hotplug 
> > or not?
> 
> The typical user probably doesnt know what 'SATA' means, and 
> probably has very vague concepts about 'hotplug' as well.

eSATA is pretty common now.

> The kernel default should be: 'yes, if the kernel feature is enabled 
> and if the hardware can support it' (it's not on a blacklist of some 
> sort, etc., etc.).

The problem with this kind of default is that you get people who are 
confused that their hardware doesn't work. If the kernel doesn't have 
enough information to make a decision it should err on the side of 
functionality - we're talking about fairly low-level power savings, but 
potentially several years of aggregate confusion on the part of users.

> > It'll be up to the distributions to provide sane defaults and let 
> > them be reconfigured as necessary, depending on the information we 
> > have from the user and maybe platform-specific knowledge. But this 
> > is a difficult problem - we need to be smart about all the 
> > potential sources of information in order to pick an appropriate 
> > policy, and the kernel's not the right layer to do some of this 
> > information collection.
> 
> What sources of information exactly? Again, the user wont enter 
> anything, in 95% of the cases - in the remaining 3% of cases what is 
> entered is wrong and only in another 2% of cases is it correct ;-)

Users are generally ok at realising correlation between a setting change 
and something no longer working, so as long as you provide that they'll 
be happy. I agree that this sucks. What we actually want is some means 
of reliably identifying whether a port is hotplug or not, but eSATA 
makes this very difficult.

> Sure, there might be tradeoffs when a piece of hardware cannot be 
> turned off sanely: obviously the monitor might not know it 
> (currently) whether someone is watching it, and 
> wake-on-packet-for-me is not commonly implemented in wireless and 
> wired networking cards so turning off an active networking card 
> might not be possible without the user asking allowing that 
> imperfect mode of power saving.

These cases can all be handled with sufficiently intelligent userland, 
so I'm not worried about them.

> ( Providing a way to _override_ those defaults is of course natural,
>   via /sysfs, should the user express an interest in tweaking it, or
>   should the kernel get it so wrong that a distro wants to work it
>   around. But your argument seems to be "push configuration and
>   handling into user-space" which is really backwards. )

My argument is "Hardware should work, and if the kernel default is for 
it to be broken then the default is wrong". We went through this for USB 
autosuspend. Userspace simply has more available information than the 
kernel, and it's not just a matter of static configuration (though that 
may be part of it). For instance, Oliver's example of screensavers and 
USB keyboards. If nothing's paying attention to volume keys (or if the 
keyboard doesn't have any) then you can enable remote wakeup and suspend 
the keyboard. If something /is/ paying attention to volume keys, you 
can't do that. That's the kind of case I'm discussing.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:46                 ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 13:54                   ` Matthew Garrett
  2009-06-08 13:54                   ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 13:54 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 08, 2009 at 03:46:47PM +0200, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > How does the kernel know whether the user cares about SATA hotplug 
> > or not?
> 
> The typical user probably doesnt know what 'SATA' means, and 
> probably has very vague concepts about 'hotplug' as well.

eSATA is pretty common now.

> The kernel default should be: 'yes, if the kernel feature is enabled 
> and if the hardware can support it' (it's not on a blacklist of some 
> sort, etc., etc.).

The problem with this kind of default is that you get people who are 
confused that their hardware doesn't work. If the kernel doesn't have 
enough information to make a decision it should err on the side of 
functionality - we're talking about fairly low-level power savings, but 
potentially several years of aggregate confusion on the part of users.

> > It'll be up to the distributions to provide sane defaults and let 
> > them be reconfigured as necessary, depending on the information we 
> > have from the user and maybe platform-specific knowledge. But this 
> > is a difficult problem - we need to be smart about all the 
> > potential sources of information in order to pick an appropriate 
> > policy, and the kernel's not the right layer to do some of this 
> > information collection.
> 
> What sources of information exactly? Again, the user wont enter 
> anything, in 95% of the cases - in the remaining 3% of cases what is 
> entered is wrong and only in another 2% of cases is it correct ;-)

Users are generally ok at realising correlation between a setting change 
and something no longer working, so as long as you provide that they'll 
be happy. I agree that this sucks. What we actually want is some means 
of reliably identifying whether a port is hotplug or not, but eSATA 
makes this very difficult.

> Sure, there might be tradeoffs when a piece of hardware cannot be 
> turned off sanely: obviously the monitor might not know it 
> (currently) whether someone is watching it, and 
> wake-on-packet-for-me is not commonly implemented in wireless and 
> wired networking cards so turning off an active networking card 
> might not be possible without the user asking allowing that 
> imperfect mode of power saving.

These cases can all be handled with sufficiently intelligent userland, 
so I'm not worried about them.

> ( Providing a way to _override_ those defaults is of course natural,
>   via /sysfs, should the user express an interest in tweaking it, or
>   should the kernel get it so wrong that a distro wants to work it
>   around. But your argument seems to be "push configuration and
>   handling into user-space" which is really backwards. )

My argument is "Hardware should work, and if the kernel default is for 
it to be broken then the default is wrong". We went through this for USB 
autosuspend. Userspace simply has more available information than the 
kernel, and it's not just a matter of static configuration (though that 
may be part of it). For instance, Oliver's example of screensavers and 
USB keyboards. If nothing's paying attention to volume keys (or if the 
keyboard doesn't have any) then you can enable remote wakeup and suspend 
the keyboard. If something /is/ paying attention to volume keys, you 
can't do that. That's the kind of case I'm discussing.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:46                 ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 13:58                     ` Oliver Neukum
  2009-06-08 13:54                   ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-08 13:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matthew Garrett, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm

Am Montag, 8. Juni 2009 15:46:47 schrieb Ingo Molnar:
> ( Providing a way to _override_ those defaults is of course natural,
>   via /sysfs, should the user express an interest in tweaking it, or
>   should the kernel get it so wrong that a distro wants to work it
>   around. But your argument seems to be "push configuration and
>   handling into user-space" which is really backwards. )

If we agree that the default shall be that the kernel doesn't switch
off features of the hardware for power saving by default, does this
make a practical difference to keeping the configuration in user space?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-08 13:58                     ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-08 13:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matthew Garrett, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm

Am Montag, 8. Juni 2009 15:46:47 schrieb Ingo Molnar:
> ( Providing a way to _override_ those defaults is of course natural,
>   via /sysfs, should the user express an interest in tweaking it, or
>   should the kernel get it so wrong that a distro wants to work it
>   around. But your argument seems to be "push configuration and
>   handling into user-space" which is really backwards. )

If we agree that the default shall be that the kernel doesn't switch
off features of the hardware for power saving by default, does this
make a practical difference to keeping the configuration in user space?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:46                 ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 13:54                   ` Run-time PM idea (was: " Matthew Garrett
  2009-06-08 13:54                   ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
@ 2009-06-08 13:58                   ` Oliver Neukum
  2009-06-08 13:58                     ` Oliver Neukum
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-08 13:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

Am Montag, 8. Juni 2009 15:46:47 schrieb Ingo Molnar:
> ( Providing a way to _override_ those defaults is of course natural,
>   via /sysfs, should the user express an interest in tweaking it, or
>   should the kernel get it so wrong that a distro wants to work it
>   around. But your argument seems to be "push configuration and
>   handling into user-space" which is really backwards. )

If we agree that the default shall be that the kernel doesn't switch
off features of the hardware for power saving by default, does this
make a practical difference to keeping the configuration in user space?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:39               ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
  2009-06-08 13:44                 ` Run-time PM idea (was: " Matthew Garrett
  2009-06-08 13:44                 ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
@ 2009-06-08 14:21                 ` Ingo Molnar
  2009-06-08 14:30                   ` Matthew Garrett
                                     ` (3 more replies)
  2009-06-08 14:21                 ` Run-time PM idea (was: " Ingo Molnar
  3 siblings, 4 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 14:21 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Matthew Garrett, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm


* Oliver Neukum <oliver@neukum.org> wrote:

> Am Montag, 8. Juni 2009 15:22:35 schrieb Ingo Molnar:
> 
> > What will the 'user space policy' bit do what the kernel cannot?
> >
> > If you mean the user has to configure something manually - that 
> > wont really happen in practice. We are happy if they know where 
> > to put those USB sticks in ;-)
> 
> User space need not be the user. Currently user space doesn't tell 
> the kernel how much functionality it needs. open/close give a 
> binary opposition which badly maps onto the graduated capabilities 
> devices have.

If the kernel isnt told what capabilities are used that's buggy code 
then.

> For example do you really need every key pressed while the screen 
> saver is running or is it enough for the keyboard to be able to 
> generate a wakeup event?

The sane default here is to suspend the keyboard, except if an audio 
app is running that binds to the volume keys of the keyboard.

If the 'keyboard' is properly abstracted in the kernel and the 
kernel driver _knows_ that the volume keys are in use, this is not a 
problem.

Arguing otherwise is just saying the equivalent of: "we have a 
broken model to utilize hardware, and instead of fixing it properly, 
introduce an _even more broken_ model, because in the current model 
things cannot be made to work".

The kernel _needs_ to have precise information about whether a piece 
of hardware is in use or not.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:39               ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
                                   ` (2 preceding siblings ...)
  2009-06-08 14:21                 ` Ingo Molnar
@ 2009-06-08 14:21                 ` Ingo Molnar
  3 siblings, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 14:21 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: LKML, ACPI Devel Maling List, pm list


* Oliver Neukum <oliver@neukum.org> wrote:

> Am Montag, 8. Juni 2009 15:22:35 schrieb Ingo Molnar:
> 
> > What will the 'user space policy' bit do what the kernel cannot?
> >
> > If you mean the user has to configure something manually - that 
> > wont really happen in practice. We are happy if they know where 
> > to put those USB sticks in ;-)
> 
> User space need not be the user. Currently user space doesn't tell 
> the kernel how much functionality it needs. open/close give a 
> binary opposition which badly maps onto the graduated capabilities 
> devices have.

If the kernel isnt told what capabilities are used that's buggy code 
then.

> For example do you really need every key pressed while the screen 
> saver is running or is it enough for the keyboard to be able to 
> generate a wakeup event?

The sane default here is to suspend the keyboard, except if an audio 
app is running that binds to the volume keys of the keyboard.

If the 'keyboard' is properly abstracted in the kernel and the 
kernel driver _knows_ that the volume keys are in use, this is not a 
problem.

Arguing otherwise is just saying the equivalent of: "we have a 
broken model to utilize hardware, and instead of fixing it properly, 
introduce an _even more broken_ model, because in the current model 
things cannot be made to work".

The kernel _needs_ to have precise information about whether a piece 
of hardware is in use or not.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:54                   ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
  2009-06-08 14:24                     ` Run-time PM idea (was: " Ingo Molnar
@ 2009-06-08 14:24                     ` Ingo Molnar
  2009-06-08 14:35                         ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
  1 sibling, 1 reply; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 14:24 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 03:46:47PM +0200, Ingo Molnar wrote:
> > 
> > * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
>
> > > How does the kernel know whether the user cares about SATA 
> > > hotplug or not?
> > 
> > The typical user probably doesnt know what 'SATA' means, and 
> > probably has very vague concepts about 'hotplug' as well.
> 
> eSATA is pretty common now.

[ And 99% of the CPUs have an IDT still 99.9% of the users dont know 
  what it is :) ]

> > The kernel default should be: 'yes, if the kernel feature is 
> > enabled and if the hardware can support it' (it's not on a 
> > blacklist of some sort, etc., etc.).
> 
> The problem with this kind of default is that you get people who 
> are confused that their hardware doesn't work.

If the hardware 'doesnt work' that is a kernel bug. Hardware that 
_cannot be suspended_ safely (physically) should not be 
auto-suspended, of course.

> If the kernel doesn't have enough information to make a decision 
> it should err on the side of functionality - we're talking about 
> fairly low-level power savings, but potentially several years of 
> aggregate confusion on the part of users.

the difference between a 10W and a 1W footprint is a long series of 
'low-level power savings'.

If users are getting confused and if hardware gets broken then tha's 
a plain bug and the wrong path is being walked.

> > What sources of information exactly? Again, the user wont enter 
> > anything, in 95% of the cases - in the remaining 3% of cases 
> > what is entered is wrong and only in another 2% of cases is it 
> > correct ;-)
> 
> Users are generally ok at realising correlation between a setting 
> change and something no longer working, so as long as you provide 
> that they'll be happy. I agree that this sucks. What we actually 
> want is some means of reliably identifying whether a port is 
> hotplug or not, but eSATA makes this very difficult.

Is it impossible?

> > Sure, there might be tradeoffs when a piece of hardware cannot 
> > be turned off sanely: obviously the monitor might not know it 
> > (currently) whether someone is watching it, and 
> > wake-on-packet-for-me is not commonly implemented in wireless 
> > and wired networking cards so turning off an active networking 
> > card might not be possible without the user asking allowing that 
> > imperfect mode of power saving.
> 
> These cases can all be handled with sufficiently intelligent 
> userland, so I'm not worried about them.
> 
> > ( Providing a way to _override_ those defaults is of course natural,
> >   via /sysfs, should the user express an interest in tweaking it, or
> >   should the kernel get it so wrong that a distro wants to work it
> >   around. But your argument seems to be "push configuration and
> >   handling into user-space" which is really backwards. )
> 
> My argument is "Hardware should work, and if the kernel default is 
> for it to be broken then the default is wrong". We went through 
> this for USB autosuspend. Userspace simply has more available 
> information than the kernel, and it's not just a matter of static 
> configuration (though that may be part of it). For instance, 
> Oliver's example of screensavers and USB keyboards. If nothing's 
> paying attention to volume keys (or if the keyboard doesn't have 
> any) then you can enable remote wakeup and suspend the keyboard. 
> If something /is/ paying attention to volume keys, you can't do 
> that. That's the kind of case I'm discussing.

See my reply to Oliver. This is really advocating a broken model of 
device usage. That volume key usage dependency is being hidden from 
the kernel, and then you want to kludge it around by pushing suspend 
functionality to user-space? That way lies madness. The proper way 
is to close the device if it's not used by anything. Then the kernel 
can auto-suspend it just like it could auto-suspend network 
interfaces that are not in use, or like it could auto-suspend a 
dislay port that has no monitor or other output device attached.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 13:54                   ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
@ 2009-06-08 14:24                     ` Ingo Molnar
  2009-06-08 14:24                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  1 sibling, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 14:24 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: LKML, ACPI Devel Maling List, pm list


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 03:46:47PM +0200, Ingo Molnar wrote:
> > 
> > * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
>
> > > How does the kernel know whether the user cares about SATA 
> > > hotplug or not?
> > 
> > The typical user probably doesnt know what 'SATA' means, and 
> > probably has very vague concepts about 'hotplug' as well.
> 
> eSATA is pretty common now.

[ And 99% of the CPUs have an IDT still 99.9% of the users dont know 
  what it is :) ]

> > The kernel default should be: 'yes, if the kernel feature is 
> > enabled and if the hardware can support it' (it's not on a 
> > blacklist of some sort, etc., etc.).
> 
> The problem with this kind of default is that you get people who 
> are confused that their hardware doesn't work.

If the hardware 'doesnt work' that is a kernel bug. Hardware that 
_cannot be suspended_ safely (physically) should not be 
auto-suspended, of course.

> If the kernel doesn't have enough information to make a decision 
> it should err on the side of functionality - we're talking about 
> fairly low-level power savings, but potentially several years of 
> aggregate confusion on the part of users.

the difference between a 10W and a 1W footprint is a long series of 
'low-level power savings'.

If users are getting confused and if hardware gets broken then tha's 
a plain bug and the wrong path is being walked.

> > What sources of information exactly? Again, the user wont enter 
> > anything, in 95% of the cases - in the remaining 3% of cases 
> > what is entered is wrong and only in another 2% of cases is it 
> > correct ;-)
> 
> Users are generally ok at realising correlation between a setting 
> change and something no longer working, so as long as you provide 
> that they'll be happy. I agree that this sucks. What we actually 
> want is some means of reliably identifying whether a port is 
> hotplug or not, but eSATA makes this very difficult.

Is it impossible?

> > Sure, there might be tradeoffs when a piece of hardware cannot 
> > be turned off sanely: obviously the monitor might not know it 
> > (currently) whether someone is watching it, and 
> > wake-on-packet-for-me is not commonly implemented in wireless 
> > and wired networking cards so turning off an active networking 
> > card might not be possible without the user asking allowing that 
> > imperfect mode of power saving.
> 
> These cases can all be handled with sufficiently intelligent 
> userland, so I'm not worried about them.
> 
> > ( Providing a way to _override_ those defaults is of course natural,
> >   via /sysfs, should the user express an interest in tweaking it, or
> >   should the kernel get it so wrong that a distro wants to work it
> >   around. But your argument seems to be "push configuration and
> >   handling into user-space" which is really backwards. )
> 
> My argument is "Hardware should work, and if the kernel default is 
> for it to be broken then the default is wrong". We went through 
> this for USB autosuspend. Userspace simply has more available 
> information than the kernel, and it's not just a matter of static 
> configuration (though that may be part of it). For instance, 
> Oliver's example of screensavers and USB keyboards. If nothing's 
> paying attention to volume keys (or if the keyboard doesn't have 
> any) then you can enable remote wakeup and suspend the keyboard. 
> If something /is/ paying attention to volume keys, you can't do 
> that. That's the kind of case I'm discussing.

See my reply to Oliver. This is really advocating a broken model of 
device usage. That volume key usage dependency is being hidden from 
the kernel, and then you want to kludge it around by pushing suspend 
functionality to user-space? That way lies madness. The proper way 
is to close the device if it's not used by anything. Then the kernel 
can auto-suspend it just like it could auto-suspend network 
interfaces that are not in use, or like it could auto-suspend a 
dislay port that has no monitor or other output device attached.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:21                 ` Ingo Molnar
@ 2009-06-08 14:30                   ` Matthew Garrett
  2009-06-08 15:06                     ` Run-time PM idea (was: " Ingo Molnar
  2009-06-08 15:06                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 14:30                   ` Run-time PM idea (was: " Matthew Garrett
                                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 14:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oliver Neukum, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm

On Mon, Jun 08, 2009 at 04:21:54PM +0200, Ingo Molnar wrote:

> The kernel _needs_ to have precise information about whether a piece 
> of hardware is in use or not.

The kernel can only have that information if userspace tells it. What 
we're quibbling over is whether the kernel should be explicitly told 
about the requirement (ie, every time an app makes a key grab in X the 
kernel gets told about it) or whether it should be implicit (userspace 
knows that a key grab has been made and so requests that the keyboard 
not be suspended).

We *can* put all of that complexity in the kernel. The question is 
whether it buys us anything. We'd have to modify huge chunks of 
userspace and in the process we'd end up limited to whatever policy 
happens to exist in the version of the kernel the user is running.

I'd like the kernel to expose this functionality but leave the policy 
decisions to userland.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:21                 ` Ingo Molnar
  2009-06-08 14:30                   ` Matthew Garrett
@ 2009-06-08 14:30                   ` Matthew Garrett
  2009-06-09 22:44                   ` Jiri Kosina
  2009-06-09 22:44                   ` Run-time PM idea (was: Re: [linux-pm] " Jiri Kosina
  3 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 14:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 08, 2009 at 04:21:54PM +0200, Ingo Molnar wrote:

> The kernel _needs_ to have precise information about whether a piece 
> of hardware is in use or not.

The kernel can only have that information if userspace tells it. What 
we're quibbling over is whether the kernel should be explicitly told 
about the requirement (ie, every time an app makes a key grab in X the 
kernel gets told about it) or whether it should be implicit (userspace 
knows that a key grab has been made and so requests that the keyboard 
not be suspended).

We *can* put all of that complexity in the kernel. The question is 
whether it buys us anything. We'd have to modify huge chunks of 
userspace and in the process we'd end up limited to whatever policy 
happens to exist in the version of the kernel the user is running.

I'd like the kernel to expose this functionality but leave the policy 
decisions to userland.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:24                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 14:35                         ` Matthew Garrett
  0 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 14:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 08, 2009 at 04:24:50PM +0200, Ingo Molnar wrote:
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > eSATA is pretty common now.
> 
> [ And 99% of the CPUs have an IDT still 99.9% of the users dont know 
>   what it is :) ]

Users know that there's a socket on the front of their computer that 
they can plug a hard drive into, and if that doesn't work then they're 
going to be upset.

> > The problem with this kind of default is that you get people who 
> > are confused that their hardware doesn't work.
> 
> If the hardware 'doesnt work' that is a kernel bug. Hardware that 
> _cannot be suspended_ safely (physically) should not be 
> auto-suspended, of course.

So, like I said, the kernel can't automatically suspend AHCI unless it's 
received some information from elsewhere that tells it it's ok to. The 
kernel can't know if there's an eSATA port or not.

> > If the kernel doesn't have enough information to make a decision 
> > it should err on the side of functionality - we're talking about 
> > fairly low-level power savings, but potentially several years of 
> > aggregate confusion on the part of users.
> 
> the difference between a 10W and a 1W footprint is a long series of 
> 'low-level power savings'.
> 
> If users are getting confused and if hardware gets broken then tha's 
> a plain bug and the wrong path is being walked.

Yes. And powersaving is a tradeoff between functionality and power 
consumption. The kernel doesn't know what level of functionality a given 
user requires. It *can't* know that itself.

> > Users are generally ok at realising correlation between a setting 
> > change and something no longer working, so as long as you provide 
> > that they'll be happy. I agree that this sucks. What we actually 
> > want is some means of reliably identifying whether a port is 
> > hotplug or not, but eSATA makes this very difficult.
> 
> Is it impossible?

To the best of my knowledge, yes.

> > My argument is "Hardware should work, and if the kernel default is 
> > for it to be broken then the default is wrong". We went through 
> > this for USB autosuspend. Userspace simply has more available 
> > information than the kernel, and it's not just a matter of static 
> > configuration (though that may be part of it). For instance, 
> > Oliver's example of screensavers and USB keyboards. If nothing's 
> > paying attention to volume keys (or if the keyboard doesn't have 
> > any) then you can enable remote wakeup and suspend the keyboard. 
> > If something /is/ paying attention to volume keys, you can't do 
> > that. That's the kind of case I'm discussing.
> 
> See my reply to Oliver. This is really advocating a broken model of 
> device usage. That volume key usage dependency is being hidden from 
> the kernel, and then you want to kludge it around by pushing suspend 
> functionality to user-space? That way lies madness. The proper way 
> is to close the device if it's not used by anything. Then the kernel 
> can auto-suspend it just like it could auto-suspend network 
> interfaces that are not in use, or like it could auto-suspend a 
> dislay port that has no monitor or other output device attached.

No, we can't just close it - then we won't get notification that a key's 
been hit in order to unlock the screensaver. Yes, we can greatly expand 
the userland-visible interface to every piece of hardware in order to 
make this work, but that's a huge amount of effort to avoid a model 
where userspace sets some tunables appropriately.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-08 14:35                         ` Matthew Garrett
  0 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 14:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm

On Mon, Jun 08, 2009 at 04:24:50PM +0200, Ingo Molnar wrote:
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > eSATA is pretty common now.
> 
> [ And 99% of the CPUs have an IDT still 99.9% of the users dont know 
>   what it is :) ]

Users know that there's a socket on the front of their computer that 
they can plug a hard drive into, and if that doesn't work then they're 
going to be upset.

> > The problem with this kind of default is that you get people who 
> > are confused that their hardware doesn't work.
> 
> If the hardware 'doesnt work' that is a kernel bug. Hardware that 
> _cannot be suspended_ safely (physically) should not be 
> auto-suspended, of course.

So, like I said, the kernel can't automatically suspend AHCI unless it's 
received some information from elsewhere that tells it it's ok to. The 
kernel can't know if there's an eSATA port or not.

> > If the kernel doesn't have enough information to make a decision 
> > it should err on the side of functionality - we're talking about 
> > fairly low-level power savings, but potentially several years of 
> > aggregate confusion on the part of users.
> 
> the difference between a 10W and a 1W footprint is a long series of 
> 'low-level power savings'.
> 
> If users are getting confused and if hardware gets broken then tha's 
> a plain bug and the wrong path is being walked.

Yes. And powersaving is a tradeoff between functionality and power 
consumption. The kernel doesn't know what level of functionality a given 
user requires. It *can't* know that itself.

> > Users are generally ok at realising correlation between a setting 
> > change and something no longer working, so as long as you provide 
> > that they'll be happy. I agree that this sucks. What we actually 
> > want is some means of reliably identifying whether a port is 
> > hotplug or not, but eSATA makes this very difficult.
> 
> Is it impossible?

To the best of my knowledge, yes.

> > My argument is "Hardware should work, and if the kernel default is 
> > for it to be broken then the default is wrong". We went through 
> > this for USB autosuspend. Userspace simply has more available 
> > information than the kernel, and it's not just a matter of static 
> > configuration (though that may be part of it). For instance, 
> > Oliver's example of screensavers and USB keyboards. If nothing's 
> > paying attention to volume keys (or if the keyboard doesn't have 
> > any) then you can enable remote wakeup and suspend the keyboard. 
> > If something /is/ paying attention to volume keys, you can't do 
> > that. That's the kind of case I'm discussing.
> 
> See my reply to Oliver. This is really advocating a broken model of 
> device usage. That volume key usage dependency is being hidden from 
> the kernel, and then you want to kludge it around by pushing suspend 
> functionality to user-space? That way lies madness. The proper way 
> is to close the device if it's not used by anything. Then the kernel 
> can auto-suspend it just like it could auto-suspend network 
> interfaces that are not in use, or like it could auto-suspend a 
> dislay port that has no monitor or other output device attached.

No, we can't just close it - then we won't get notification that a key's 
been hit in order to unlock the screensaver. Yes, we can greatly expand 
the userland-visible interface to every piece of hardware in order to 
make this work, but that's a huge amount of effort to avoid a model 
where userspace sets some tunables appropriately.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:35                         ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
  (?)
  (?)
@ 2009-06-08 14:44                         ` Ingo Molnar
  2009-06-08 14:51                           ` Matthew Garrett
  2009-06-08 14:51                           ` Run-time PM idea (was: " Matthew Garrett
  -1 siblings, 2 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 14:44 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 04:24:50PM +0200, Ingo Molnar wrote:
> > * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > > eSATA is pretty common now.
> > 
> > [ And 99% of the CPUs have an IDT still 99.9% of the users dont know 
> >   what it is :) ]
> 
> Users know that there's a socket on the front of their computer that 
> they can plug a hard drive into, and if that doesn't work then they're 
> going to be upset.
> 
> > > The problem with this kind of default is that you get people who 
> > > are confused that their hardware doesn't work.
> > 
> > If the hardware 'doesnt work' that is a kernel bug. Hardware that 
> > _cannot be suspended_ safely (physically) should not be 
> > auto-suspended, of course.
> 
> So, like I said, the kernel can't automatically suspend AHCI unless it's 
> received some information from elsewhere that tells it it's ok to. The 
> kernel can't know if there's an eSATA port or not.
> 
> > > If the kernel doesn't have enough information to make a decision 
> > > it should err on the side of functionality - we're talking about 
> > > fairly low-level power savings, but potentially several years of 
> > > aggregate confusion on the part of users.
> > 
> > the difference between a 10W and a 1W footprint is a long series of 
> > 'low-level power savings'.
> > 
> > If users are getting confused and if hardware gets broken then tha's 
> > a plain bug and the wrong path is being walked.
> 
> Yes. And powersaving is a tradeoff between functionality and power 
> consumption. The kernel doesn't know what level of functionality a 
> given user requires. It *can't* know that itself.
> 
> > > Users are generally ok at realising correlation between a setting 
> > > change and something no longer working, so as long as you provide 
> > > that they'll be happy. I agree that this sucks. What we actually 
> > > want is some means of reliably identifying whether a port is 
> > > hotplug or not, but eSATA makes this very difficult.
> > 
> > Is it impossible?
> 
> To the best of my knowledge, yes.
> 
> > > My argument is "Hardware should work, and if the kernel default is 
> > > for it to be broken then the default is wrong". We went through 
> > > this for USB autosuspend. Userspace simply has more available 
> > > information than the kernel, and it's not just a matter of static 
> > > configuration (though that may be part of it). For instance, 
> > > Oliver's example of screensavers and USB keyboards. If nothing's 
> > > paying attention to volume keys (or if the keyboard doesn't have 
> > > any) then you can enable remote wakeup and suspend the keyboard. 
> > > If something /is/ paying attention to volume keys, you can't do 
> > > that. That's the kind of case I'm discussing.
> > 
> > See my reply to Oliver. This is really advocating a broken model 
> > of device usage. That volume key usage dependency is being 
> > hidden from the kernel, and then you want to kludge it around by 
> > pushing suspend functionality to user-space? That way lies 
> > madness. The proper way is to close the device if it's not used 
> > by anything. Then the kernel can auto-suspend it just like it 
> > could auto-suspend network interfaces that are not in use, or 
> > like it could auto-suspend a dislay port that has no monitor or 
> > other output device attached.
> 
> No, we can't just close it - then we won't get notification that a 
> key's been hit in order to unlock the screensaver. [...]

Looks like a broken notification model.

> [...] Yes, we can greatly expand the userland-visible interface to 
> every piece of hardware in order to make this work, but that's a 
> huge amount of effort to avoid a model where userspace sets some 
> tunables appropriately.

What huge amount of effort? All you are doing is to track the "is 
the device really used" state in user-space - and, if the current 
desktop experience is any measure, highly imperfectly so.

What i'm suggesting is to track it properly in the kernel. It's not 
like the kernel doesnt need to know whether a piece of hardware is 
under use or not ...

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:35                         ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
  (?)
@ 2009-06-08 14:44                         ` Ingo Molnar
  -1 siblings, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 14:44 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: LKML, ACPI Devel Maling List, pm list


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 04:24:50PM +0200, Ingo Molnar wrote:
> > * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > > eSATA is pretty common now.
> > 
> > [ And 99% of the CPUs have an IDT still 99.9% of the users dont know 
> >   what it is :) ]
> 
> Users know that there's a socket on the front of their computer that 
> they can plug a hard drive into, and if that doesn't work then they're 
> going to be upset.
> 
> > > The problem with this kind of default is that you get people who 
> > > are confused that their hardware doesn't work.
> > 
> > If the hardware 'doesnt work' that is a kernel bug. Hardware that 
> > _cannot be suspended_ safely (physically) should not be 
> > auto-suspended, of course.
> 
> So, like I said, the kernel can't automatically suspend AHCI unless it's 
> received some information from elsewhere that tells it it's ok to. The 
> kernel can't know if there's an eSATA port or not.
> 
> > > If the kernel doesn't have enough information to make a decision 
> > > it should err on the side of functionality - we're talking about 
> > > fairly low-level power savings, but potentially several years of 
> > > aggregate confusion on the part of users.
> > 
> > the difference between a 10W and a 1W footprint is a long series of 
> > 'low-level power savings'.
> > 
> > If users are getting confused and if hardware gets broken then tha's 
> > a plain bug and the wrong path is being walked.
> 
> Yes. And powersaving is a tradeoff between functionality and power 
> consumption. The kernel doesn't know what level of functionality a 
> given user requires. It *can't* know that itself.
> 
> > > Users are generally ok at realising correlation between a setting 
> > > change and something no longer working, so as long as you provide 
> > > that they'll be happy. I agree that this sucks. What we actually 
> > > want is some means of reliably identifying whether a port is 
> > > hotplug or not, but eSATA makes this very difficult.
> > 
> > Is it impossible?
> 
> To the best of my knowledge, yes.
> 
> > > My argument is "Hardware should work, and if the kernel default is 
> > > for it to be broken then the default is wrong". We went through 
> > > this for USB autosuspend. Userspace simply has more available 
> > > information than the kernel, and it's not just a matter of static 
> > > configuration (though that may be part of it). For instance, 
> > > Oliver's example of screensavers and USB keyboards. If nothing's 
> > > paying attention to volume keys (or if the keyboard doesn't have 
> > > any) then you can enable remote wakeup and suspend the keyboard. 
> > > If something /is/ paying attention to volume keys, you can't do 
> > > that. That's the kind of case I'm discussing.
> > 
> > See my reply to Oliver. This is really advocating a broken model 
> > of device usage. That volume key usage dependency is being 
> > hidden from the kernel, and then you want to kludge it around by 
> > pushing suspend functionality to user-space? That way lies 
> > madness. The proper way is to close the device if it's not used 
> > by anything. Then the kernel can auto-suspend it just like it 
> > could auto-suspend network interfaces that are not in use, or 
> > like it could auto-suspend a dislay port that has no monitor or 
> > other output device attached.
> 
> No, we can't just close it - then we won't get notification that a 
> key's been hit in order to unlock the screensaver. [...]

Looks like a broken notification model.

> [...] Yes, we can greatly expand the userland-visible interface to 
> every piece of hardware in order to make this work, but that's a 
> huge amount of effort to avoid a model where userspace sets some 
> tunables appropriately.

What huge amount of effort? All you are doing is to track the "is 
the device really used" state in user-space - and, if the current 
desktop experience is any measure, highly imperfectly so.

What i'm suggesting is to track it properly in the kernel. It's not 
like the kernel doesnt need to know whether a piece of hardware is 
under use or not ...

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:44                         ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 14:51                           ` Matthew Garrett
  2009-06-24 15:03                               ` Run-time PM idea (was: Re: [linux-pm] " Pavel Machek
  2009-06-08 14:51                           ` Run-time PM idea (was: " Matthew Garrett
  1 sibling, 1 reply; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 14:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Alan Stern, pm list, ACPI Devel Maling List,
	LKML, Magnus Damm

On Mon, Jun 08, 2009 at 04:44:55PM +0200, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > No, we can't just close it - then we won't get notification that a 
> > key's been hit in order to unlock the screensaver. [...]
> 
> Looks like a broken notification model.

We've closed the input device. Where are we supposed to get the input 
event from?

> > [...] Yes, we can greatly expand the userland-visible interface to 
> > every piece of hardware in order to make this work, but that's a 
> > huge amount of effort to avoid a model where userspace sets some 
> > tunables appropriately.
> 
> What huge amount of effort? All you are doing is to track the "is 
> the device really used" state in user-space - and, if the current 
> desktop experience is any measure, highly imperfectly so.
> 
> What i'm suggesting is to track it properly in the kernel. It's not 
> like the kernel doesnt need to know whether a piece of hardware is 
> under use or not ...

So, for instance, we need to add interfaces like "I care about hotplug 
events on this SATA port" and "I'm listening for these keys so please 
don't suspend the device" and "The service bound to this port needs to 
maintain network connectivity and the one bound to this port doesn't, so 
only put the wireless card into deep powersave if the first exits", and 
then we need to wait for userspace to adopt these interfaces before we 
can enable any of the functionality because otherwise old userspace will 
be broken with new kernels.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:44                         ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 14:51                           ` Matthew Garrett
@ 2009-06-08 14:51                           ` Matthew Garrett
  1 sibling, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 14:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 08, 2009 at 04:44:55PM +0200, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > No, we can't just close it - then we won't get notification that a 
> > key's been hit in order to unlock the screensaver. [...]
> 
> Looks like a broken notification model.

We've closed the input device. Where are we supposed to get the input 
event from?

> > [...] Yes, we can greatly expand the userland-visible interface to 
> > every piece of hardware in order to make this work, but that's a 
> > huge amount of effort to avoid a model where userspace sets some 
> > tunables appropriately.
> 
> What huge amount of effort? All you are doing is to track the "is 
> the device really used" state in user-space - and, if the current 
> desktop experience is any measure, highly imperfectly so.
> 
> What i'm suggesting is to track it properly in the kernel. It's not 
> like the kernel doesnt need to know whether a piece of hardware is 
> under use or not ...

So, for instance, we need to add interfaces like "I care about hotplug 
events on this SATA port" and "I'm listening for these keys so please 
don't suspend the device" and "The service bound to this port needs to 
maintain network connectivity and the one bound to this port doesn't, so 
only put the wireless card into deep powersave if the first exits", and 
then we need to wait for userspace to adopt these interfaces before we 
can enable any of the functionality because otherwise old userspace will 
be broken with new kernels.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:30                   ` Matthew Garrett
  2009-06-08 15:06                     ` Run-time PM idea (was: " Ingo Molnar
@ 2009-06-08 15:06                     ` Ingo Molnar
  2009-06-08 15:11                       ` Matthew Garrett
                                         ` (3 more replies)
  1 sibling, 4 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 15:06 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Oliver Neukum, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 04:21:54PM +0200, Ingo Molnar wrote:
> 
> > The kernel _needs_ to have precise information about whether a piece 
> > of hardware is in use or not.
> 
> The kernel can only have that information if userspace tells it. 
> What we're quibbling over is whether the kernel should be 
> explicitly told about the requirement (ie, every time an app makes 
> a key grab in X the kernel gets told about it) or whether it 
> should be implicit (userspace knows that a key grab has been made 
> and so requests that the keyboard not be suspended).
> 
> We *can* put all of that complexity in the kernel. The question is 
> whether it buys us anything. We'd have to modify huge chunks of 
> userspace and in the process we'd end up limited to whatever 
> policy happens to exist in the version of the kernel the user is 
> running.
> 
> I'd like the kernel to expose this functionality but leave the 
> policy decisions to userland.

The thing is, suspending something that is being used and relied on 
by an app is a _bug_. This is rather fundamental: hardware state and 
usage tracking is _NOT POLICY_.

Having a global override of "dont ever suspend anything here, 
because i say so" _is_ policy.

Providing _essential_ functionality to not suspend a keyboard while 
an app relies on it is _NOT_ policy.

I will even buy the argument that most current hardware cannot be 
auto-suspended safely.

But if you think that tracking the usage state of the hardware is 
'complexity', then you very much dont know what you are talking 
about. The main task of the kernel is to track hardware usage and to 
abstract away the fact that the same hardware is used by multiple 
tasks, and to do it safely. It's what the kernel does all day.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:30                   ` Matthew Garrett
@ 2009-06-08 15:06                     ` Ingo Molnar
  2009-06-08 15:06                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  1 sibling, 0 replies; 199+ messages in thread
From: Ingo Molnar @ 2009-06-08 15:06 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: LKML, ACPI Devel Maling List, pm list


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Mon, Jun 08, 2009 at 04:21:54PM +0200, Ingo Molnar wrote:
> 
> > The kernel _needs_ to have precise information about whether a piece 
> > of hardware is in use or not.
> 
> The kernel can only have that information if userspace tells it. 
> What we're quibbling over is whether the kernel should be 
> explicitly told about the requirement (ie, every time an app makes 
> a key grab in X the kernel gets told about it) or whether it 
> should be implicit (userspace knows that a key grab has been made 
> and so requests that the keyboard not be suspended).
> 
> We *can* put all of that complexity in the kernel. The question is 
> whether it buys us anything. We'd have to modify huge chunks of 
> userspace and in the process we'd end up limited to whatever 
> policy happens to exist in the version of the kernel the user is 
> running.
> 
> I'd like the kernel to expose this functionality but leave the 
> policy decisions to userland.

The thing is, suspending something that is being used and relied on 
by an app is a _bug_. This is rather fundamental: hardware state and 
usage tracking is _NOT POLICY_.

Having a global override of "dont ever suspend anything here, 
because i say so" _is_ policy.

Providing _essential_ functionality to not suspend a keyboard while 
an app relies on it is _NOT_ policy.

I will even buy the argument that most current hardware cannot be 
auto-suspended safely.

But if you think that tracking the usage state of the hardware is 
'complexity', then you very much dont know what you are talking 
about. The main task of the kernel is to track hardware usage and to 
abstract away the fact that the same hardware is used by multiple 
tasks, and to do it safely. It's what the kernel does all day.

	Ingo

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 15:06                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 15:11                       ` Matthew Garrett
  2009-06-08 15:11                       ` Run-time PM idea (was: " Matthew Garrett
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 15:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oliver Neukum, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm

On Mon, Jun 08, 2009 at 05:06:03PM +0200, Ingo Molnar wrote:

> But if you think that tracking the usage state of the hardware is 
> 'complexity', then you very much dont know what you are talking 
> about. The main task of the kernel is to track hardware usage and to 
> abstract away the fact that the same hardware is used by multiple 
> tasks, and to do it safely. It's what the kernel does all day.

What I'm saying is that you don't *know* what the usage state of the 
hardware is, and in many cases you can't know. A given user may be happy 
to sacrifice their SATA hotplug support. Another with identical hardware 
may not. A given network application may be mission critical and 
intolerant of the network interface being shut down. The same 
application in a different context may not. We'd need to provide a 
bewildering array of interfaces to distinguish between these situations, 
and we'd be unable to turn on autosuspend until the entirity of 
userspace had been ported to them.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 15:06                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 15:11                       ` Matthew Garrett
@ 2009-06-08 15:11                       ` Matthew Garrett
  2009-06-08 16:29                       ` Ray Lee
  2009-06-08 16:29                         ` Ray Lee
  3 siblings, 0 replies; 199+ messages in thread
From: Matthew Garrett @ 2009-06-08 15:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 08, 2009 at 05:06:03PM +0200, Ingo Molnar wrote:

> But if you think that tracking the usage state of the hardware is 
> 'complexity', then you very much dont know what you are talking 
> about. The main task of the kernel is to track hardware usage and to 
> abstract away the fact that the same hardware is used by multiple 
> tasks, and to do it safely. It's what the kernel does all day.

What I'm saying is that you don't *know* what the usage state of the 
hardware is, and in many cases you can't know. A given user may be happy 
to sacrifice their SATA hotplug support. Another with identical hardware 
may not. A given network application may be mission critical and 
intolerant of the network interface being shut down. The same 
application in a different context may not. We'd need to provide a 
bewildering array of interfaces to distinguish between these situations, 
and we'd be unable to turn on autosuspend until the entirity of 
userspace had been ported to them.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 15:06                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
@ 2009-06-08 16:29                         ` Ray Lee
  2009-06-08 15:11                       ` Run-time PM idea (was: " Matthew Garrett
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Ray Lee @ 2009-06-08 16:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matthew Garrett, Oliver Neukum, Rafael J. Wysocki, Alan Stern,
	pm list, ACPI Devel Maling List, LKML, Magnus Damm

On Mon, Jun 8, 2009 at 8:06 AM, Ingo Molnar<mingo@elte.hu> wrote:
>
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:

>> I'd like the kernel to expose this functionality but leave the
>> policy decisions to userland.
>
> The thing is, suspending something that is being used and relied on
> by an app is a _bug_. This is rather fundamental: hardware state and
> usage tracking is _NOT POLICY_.

Yes. But, actual example time: what about the case where completely
turning off a laptop's DVD drive saves extra power, but then also
turns off kernel and userspace notification of a disc being inserted?

One of the Other OS's handles this by having a popup when the battery
gets low, asking if it's okay to turn the drive off. This is part of
what Matthew is talking about here.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM:  Rearrange core suspend code)
@ 2009-06-08 16:29                         ` Ray Lee
  0 siblings, 0 replies; 199+ messages in thread
From: Ray Lee @ 2009-06-08 16:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matthew Garrett, Oliver Neukum, Rafael J. Wysocki, Alan Stern,
	pm list, ACPI Devel Maling List, LKML, Magnus Damm

On Mon, Jun 8, 2009 at 8:06 AM, Ingo Molnar<mingo@elte.hu> wrote:
>
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:

>> I'd like the kernel to expose this functionality but leave the
>> policy decisions to userland.
>
> The thing is, suspending something that is being used and relied on
> by an app is a _bug_. This is rather fundamental: hardware state and
> usage tracking is _NOT POLICY_.

Yes. But, actual example time: what about the case where completely
turning off a laptop's DVD drive saves extra power, but then also
turns off kernel and userspace notification of a disc being inserted?

One of the Other OS's handles this by having a popup when the battery
gets low, asking if it's okay to turn the drive off. This is part of
what Matthew is talking about here.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 15:06                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
  2009-06-08 15:11                       ` Matthew Garrett
  2009-06-08 15:11                       ` Run-time PM idea (was: " Matthew Garrett
@ 2009-06-08 16:29                       ` Ray Lee
  2009-06-08 16:29                         ` Ray Lee
  3 siblings, 0 replies; 199+ messages in thread
From: Ray Lee @ 2009-06-08 16:29 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, Jun 8, 2009 at 8:06 AM, Ingo Molnar<mingo@elte.hu> wrote:
>
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:

>> I'd like the kernel to expose this functionality but leave the
>> policy decisions to userland.
>
> The thing is, suspending something that is being used and relied on
> by an app is a _bug_. This is rather fundamental: hardware state and
> usage tracking is _NOT POLICY_.

Yes. But, actual example time: what about the case where completely
turning off a laptop's DVD drive saves extra power, but then also
turns off kernel and userspace notification of a disc being inserted?

One of the Other OS's handles this by having a popup when the battery
gets low, asking if it's okay to turn the drive off. This is part of
what Matthew is talking about here.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 12:04         ` Oliver Neukum
@ 2009-06-08 18:34           ` Rafael J. Wysocki
  2009-06-09  7:25             ` Oliver Neukum
  2009-06-09  7:25             ` [linux-pm] " Oliver Neukum
  2009-06-08 18:34           ` Rafael J. Wysocki
  1 sibling, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 18:34 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-pm, Alan Stern, ACPI Devel Maling List, LKML

On Monday 08 June 2009, Oliver Neukum wrote:
> Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:
> 
> > But I need to be able to call __pm_schedule_resume() (at least) from
> > interrupt context and I can't use a mutex from there.  Otherwise I'd have
> > used a mutex. :-)
> 
> I see.
> 
> > Anyway, below is a version with synchronous resume.
> 
> You are assuming autosuspend should always be with a delay. Why?

I couldn't invent a valid case for doing it without a delay.  Perhaps my
imagination is constrained too much. ;-)

> Secondly, you are not using a counter. Therefore only one driver can
> control the PM state of a device at a given time. Is that wise?

I didn't think about it to be honest.  Obviously this patch doesn't cover all
of the possible cases and I'm not even sure it's worth trying to cover them
upfront.

> > + * __pm_schedule_suspend - Schedule run-time suspend of given device.
> > + * @dev: Device to suspend.
> > + * @delay: Time to wait before attempting to suspend the device.
> 
> In which unit of time? If this is to go into kerneldoc that must be specified.

That's in jiifies.  Yes, I should have documented it.

> > + * @autocancel: If set, the request will be cancelled during a resume from
> > a + *	system-wide sleep state if it happens before @delay elapses.
> 
> Why is this needed?

In some subsystems, like PCI, devices will be resumed by the BIOS
unconditionally in the majority of cases and then it's not worth trying to
complete run-time PM requests from before the suspend.

> > + */
> > +void __pm_schedule_suspend(struct device *dev, unsigned long delay,
> > +			   bool autocancel)
> 
> [..]
> 
> 
> > +
> > +/**
> > + * __pm_schedule_resume - Schedule run-time resume of given device.
> > + * @dev: Device to resume.
> > + * @autocancel: If set, the request will be cancelled during a resume from
> > a + *	system-wide sleep state if it happens before pm_autoresume() can be
> > run. + */
> 
> Eeek! This is a bad idea. You never want to a resume to be cancelled.

Sometimes I do (see above).

> > +void __pm_schedule_resume(struct device *dev, bool autocancel)
> 
> [..]
> > +int pm_resume_sync(struct device *dev)
> > +{
> > +	int error = 0;
> > +
> > +	pm_lock_device(dev);
> > +	if (dev->power.runtime_status == RPM_IDLE) {
> > +		/* ->autosuspend() hasn't started yet, no need to resume. */
> > +		pm_cancel_suspend(dev);
> > +		goto out;
> > +	}
> > +
> > +	if (dev->power.runtime_status == RPM_SUSPENDING) {
> > +		/*
> > +		 * The ->autosuspend() callback is being executed right now,
> > +		 * wait for it to complete.
> > +		 */
> > +		pm_unlock_device(dev);
> > +		cancel_delayed_work_sync(&dev->power.suspend_work);
> 
> That is the most glorious abuse of an API I've seen this year :-)

Heh.

OK, what would you do instead?

Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 12:04         ` Oliver Neukum
  2009-06-08 18:34           ` Rafael J. Wysocki
@ 2009-06-08 18:34           ` Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 18:34 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Monday 08 June 2009, Oliver Neukum wrote:
> Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:
> 
> > But I need to be able to call __pm_schedule_resume() (at least) from
> > interrupt context and I can't use a mutex from there.  Otherwise I'd have
> > used a mutex. :-)
> 
> I see.
> 
> > Anyway, below is a version with synchronous resume.
> 
> You are assuming autosuspend should always be with a delay. Why?

I couldn't invent a valid case for doing it without a delay.  Perhaps my
imagination is constrained too much. ;-)

> Secondly, you are not using a counter. Therefore only one driver can
> control the PM state of a device at a given time. Is that wise?

I didn't think about it to be honest.  Obviously this patch doesn't cover all
of the possible cases and I'm not even sure it's worth trying to cover them
upfront.

> > + * __pm_schedule_suspend - Schedule run-time suspend of given device.
> > + * @dev: Device to suspend.
> > + * @delay: Time to wait before attempting to suspend the device.
> 
> In which unit of time? If this is to go into kerneldoc that must be specified.

That's in jiifies.  Yes, I should have documented it.

> > + * @autocancel: If set, the request will be cancelled during a resume from
> > a + *	system-wide sleep state if it happens before @delay elapses.
> 
> Why is this needed?

In some subsystems, like PCI, devices will be resumed by the BIOS
unconditionally in the majority of cases and then it's not worth trying to
complete run-time PM requests from before the suspend.

> > + */
> > +void __pm_schedule_suspend(struct device *dev, unsigned long delay,
> > +			   bool autocancel)
> 
> [..]
> 
> 
> > +
> > +/**
> > + * __pm_schedule_resume - Schedule run-time resume of given device.
> > + * @dev: Device to resume.
> > + * @autocancel: If set, the request will be cancelled during a resume from
> > a + *	system-wide sleep state if it happens before pm_autoresume() can be
> > run. + */
> 
> Eeek! This is a bad idea. You never want to a resume to be cancelled.

Sometimes I do (see above).

> > +void __pm_schedule_resume(struct device *dev, bool autocancel)
> 
> [..]
> > +int pm_resume_sync(struct device *dev)
> > +{
> > +	int error = 0;
> > +
> > +	pm_lock_device(dev);
> > +	if (dev->power.runtime_status == RPM_IDLE) {
> > +		/* ->autosuspend() hasn't started yet, no need to resume. */
> > +		pm_cancel_suspend(dev);
> > +		goto out;
> > +	}
> > +
> > +	if (dev->power.runtime_status == RPM_SUSPENDING) {
> > +		/*
> > +		 * The ->autosuspend() callback is being executed right now,
> > +		 * wait for it to complete.
> > +		 */
> > +		pm_unlock_device(dev);
> > +		cancel_delayed_work_sync(&dev->power.suspend_work);
> 
> That is the most glorious abuse of an API I've seen this year :-)

Heh.

OK, what would you do instead?

Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 11:29       ` [linux-pm] " Rafael J. Wysocki
@ 2009-06-08 20:35           ` Alan Stern
  2009-06-08 12:04         ` Oliver Neukum
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-08 20:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:

> On Monday 08 June 2009, Oliver Neukum wrote:
> > Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> > > It may be necessary to resume a device synchronously, but I'm still
> > > thinking how to implement that.
> > 
> > This will absolutely be the default. You resume a device because you want
> > it to do something now. It seems to me that you making your problem worse
> > by using a spinlock as a lock. A mutex would make it easier.
> 
> But I need to be able to call __pm_schedule_resume() (at least) from interrupt
> context and I can't use a mutex from there.  Otherwise I'd have used a mutex. :-)
> 
> Anyway, below is a version with synchronous resume.

There are a few things here which need further thought:

The implementation of pm_lock_device() assumes it will never be called 
with interrupts disabled.  This is a bad assumption.

Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
of its own for not carrying out an autosuspend.  When this happens the 
device's state isn't unknown.

The scheme doesn't include any mechanism for communicating runtime
power information up the device tree.  When a device is autosuspended,
its parent's driver should be told so that the driver can consider
autosuspending the parent.  Likewise, if we want to autoresume a device 
below an autosuspended parent, the parent should be autoresumed first.  
Did you want to make the bus subsystem responsible for all of this?  
What about device's whose parent belongs to a different subsystem?

There should be a sysfs interface (like the one in USB) to allow
userspace to prevent a device from being autosuspended -- and perhaps
also to force it to be suspended.

What about devices that have more than two runtime power states?  For
example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
suspended}.

That's what I come up with on a first reading.  There may be more later 
on...  :-)

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-08 20:35           ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-08 20:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:

> On Monday 08 June 2009, Oliver Neukum wrote:
> > Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> > > It may be necessary to resume a device synchronously, but I'm still
> > > thinking how to implement that.
> > 
> > This will absolutely be the default. You resume a device because you want
> > it to do something now. It seems to me that you making your problem worse
> > by using a spinlock as a lock. A mutex would make it easier.
> 
> But I need to be able to call __pm_schedule_resume() (at least) from interrupt
> context and I can't use a mutex from there.  Otherwise I'd have used a mutex. :-)
> 
> Anyway, below is a version with synchronous resume.

There are a few things here which need further thought:

The implementation of pm_lock_device() assumes it will never be called 
with interrupts disabled.  This is a bad assumption.

Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
of its own for not carrying out an autosuspend.  When this happens the 
device's state isn't unknown.

The scheme doesn't include any mechanism for communicating runtime
power information up the device tree.  When a device is autosuspended,
its parent's driver should be told so that the driver can consider
autosuspending the parent.  Likewise, if we want to autoresume a device 
below an autosuspended parent, the parent should be autoresumed first.  
Did you want to make the bus subsystem responsible for all of this?  
What about device's whose parent belongs to a different subsystem?

There should be a sysfs interface (like the one in USB) to allow
userspace to prevent a device from being autosuspended -- and perhaps
also to force it to be suspended.

What about devices that have more than two runtime power states?  For
example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
suspended}.

That's what I come up with on a first reading.  There may be more later 
on...  :-)

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 11:29       ` [linux-pm] " Rafael J. Wysocki
  2009-06-08 12:04         ` Oliver Neukum
  2009-06-08 12:04         ` Oliver Neukum
@ 2009-06-08 20:35         ` Alan Stern
  2009-06-08 20:35           ` Alan Stern
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-08 20:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:

> On Monday 08 June 2009, Oliver Neukum wrote:
> > Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> > > It may be necessary to resume a device synchronously, but I'm still
> > > thinking how to implement that.
> > 
> > This will absolutely be the default. You resume a device because you want
> > it to do something now. It seems to me that you making your problem worse
> > by using a spinlock as a lock. A mutex would make it easier.
> 
> But I need to be able to call __pm_schedule_resume() (at least) from interrupt
> context and I can't use a mutex from there.  Otherwise I'd have used a mutex. :-)
> 
> Anyway, below is a version with synchronous resume.

There are a few things here which need further thought:

The implementation of pm_lock_device() assumes it will never be called 
with interrupts disabled.  This is a bad assumption.

Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
of its own for not carrying out an autosuspend.  When this happens the 
device's state isn't unknown.

The scheme doesn't include any mechanism for communicating runtime
power information up the device tree.  When a device is autosuspended,
its parent's driver should be told so that the driver can consider
autosuspending the parent.  Likewise, if we want to autoresume a device 
below an autosuspended parent, the parent should be autoresumed first.  
Did you want to make the bus subsystem responsible for all of this?  
What about device's whose parent belongs to a different subsystem?

There should be a sysfs interface (like the one in USB) to allow
userspace to prevent a device from being autosuspended -- and perhaps
also to force it to be suspended.

What about devices that have more than two runtime power states?  For
example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
suspended}.

That's what I come up with on a first reading.  There may be more later 
on...  :-)

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 20:35           ` Alan Stern
  (?)
@ 2009-06-08 21:31           ` Rafael J. Wysocki
  2009-06-09  2:49               ` Alan Stern
                               ` (3 more replies)
  -1 siblings, 4 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 21:31 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Monday 08 June 2009, Alan Stern wrote:
> On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:
> 
> > On Monday 08 June 2009, Oliver Neukum wrote:
> > > Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> > > > It may be necessary to resume a device synchronously, but I'm still
> > > > thinking how to implement that.
> > > 
> > > This will absolutely be the default. You resume a device because you want
> > > it to do something now. It seems to me that you making your problem worse
> > > by using a spinlock as a lock. A mutex would make it easier.
> > 
> > But I need to be able to call __pm_schedule_resume() (at least) from interrupt
> > context and I can't use a mutex from there.  Otherwise I'd have used a mutex. :-)
> > 
> > Anyway, below is a version with synchronous resume.
> 
> There are a few things here which need further thought:
> 
> The implementation of pm_lock_device() assumes it will never be called 
> with interrupts disabled.  This is a bad assumption.

Indeed.

> Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> of its own for not carrying out an autosuspend.  When this happens the 
> device's state isn't unknown.

I'm not sure what you mean exactly.

If ->autosuspend() fails, the device power state may be known, but the core
can't be sure if the device is active.  This information is available to the
driver and/or the bus type, which should change the status to whatever is
appropriate.

The name of this constant may be confusing, but I didn't have any better ideas.

> The scheme doesn't include any mechanism for communicating runtime
> power information up the device tree.  When a device is autosuspended,
> its parent's driver should be told so that the driver can consider
> autosuspending the parent.

I thought the bus type's ->autosuspend() callback could take care of this.

> Likewise, if we want to autoresume a device below an autosuspended parent,
> the parent should be autoresumed first.  Did you want to make the bus
> subsystem responsible for all of this?

Yes, that was the idea.
 
> What about device's whose parent belongs to a different subsystem?

Good question. :-)

I think that requires some research.  Probably a USB device attached to a PCI
USB controller is a good example here, but we first need to have a prototype
implementation for PCI to carry out some testing.

In fact I'd like to avoid the complexity for now and consider one bus type at a
time.  Especially that, for example, for PCI we won't autosuspend bridges
initially, so this case is going to be really simple.

> There should be a sysfs interface (like the one in USB) to allow
> userspace to prevent a device from being autosuspended -- and perhaps
> also to force it to be suspended.

To prevent a device from being suspended - yes.  To force it to stay suspended
- I'm not sure.

Anyway, that will be the next step.

> What about devices that have more than two runtime power states?  For
> example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> suspended}.

That has to be bus type-specific.

In the case of PCI all of the low power states (D1-D3) are in fact substates of
"suspended", because we generally need to quiesce the device before putting
it into any of these states.

I'm not sure if we can introduce more "levels of suspension", so to speak, at
the core level, but in any case we can easily distinguish between "device
quiesced and in a low power state" and "device fully active".

So, in this picture the device is "suspended" from the core's point of view
once it's bus type's ->autosuspend() callback has been successfully executed. 

> That's what I come up with on a first reading.  There may be more later 
> on...  :-)

Sure.

Thanks for your comments! :-)

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 20:35           ` Alan Stern
  (?)
  (?)
@ 2009-06-08 21:31           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-08 21:31 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Monday 08 June 2009, Alan Stern wrote:
> On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:
> 
> > On Monday 08 June 2009, Oliver Neukum wrote:
> > > Am Sonntag, 7. Juni 2009 23:46:59 schrieb Rafael J. Wysocki:
> > > > It may be necessary to resume a device synchronously, but I'm still
> > > > thinking how to implement that.
> > > 
> > > This will absolutely be the default. You resume a device because you want
> > > it to do something now. It seems to me that you making your problem worse
> > > by using a spinlock as a lock. A mutex would make it easier.
> > 
> > But I need to be able to call __pm_schedule_resume() (at least) from interrupt
> > context and I can't use a mutex from there.  Otherwise I'd have used a mutex. :-)
> > 
> > Anyway, below is a version with synchronous resume.
> 
> There are a few things here which need further thought:
> 
> The implementation of pm_lock_device() assumes it will never be called 
> with interrupts disabled.  This is a bad assumption.

Indeed.

> Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> of its own for not carrying out an autosuspend.  When this happens the 
> device's state isn't unknown.

I'm not sure what you mean exactly.

If ->autosuspend() fails, the device power state may be known, but the core
can't be sure if the device is active.  This information is available to the
driver and/or the bus type, which should change the status to whatever is
appropriate.

The name of this constant may be confusing, but I didn't have any better ideas.

> The scheme doesn't include any mechanism for communicating runtime
> power information up the device tree.  When a device is autosuspended,
> its parent's driver should be told so that the driver can consider
> autosuspending the parent.

I thought the bus type's ->autosuspend() callback could take care of this.

> Likewise, if we want to autoresume a device below an autosuspended parent,
> the parent should be autoresumed first.  Did you want to make the bus
> subsystem responsible for all of this?

Yes, that was the idea.
 
> What about device's whose parent belongs to a different subsystem?

Good question. :-)

I think that requires some research.  Probably a USB device attached to a PCI
USB controller is a good example here, but we first need to have a prototype
implementation for PCI to carry out some testing.

In fact I'd like to avoid the complexity for now and consider one bus type at a
time.  Especially that, for example, for PCI we won't autosuspend bridges
initially, so this case is going to be really simple.

> There should be a sysfs interface (like the one in USB) to allow
> userspace to prevent a device from being autosuspended -- and perhaps
> also to force it to be suspended.

To prevent a device from being suspended - yes.  To force it to stay suspended
- I'm not sure.

Anyway, that will be the next step.

> What about devices that have more than two runtime power states?  For
> example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> suspended}.

That has to be bus type-specific.

In the case of PCI all of the low power states (D1-D3) are in fact substates of
"suspended", because we generally need to quiesce the device before putting
it into any of these states.

I'm not sure if we can introduce more "levels of suspension", so to speak, at
the core level, but in any case we can easily distinguish between "device
quiesced and in a low power state" and "device fully active".

So, in this picture the device is "suspended" from the core's point of view
once it's bus type's ->autosuspend() callback has been successfully executed. 

> That's what I come up with on a first reading.  There may be more later 
> on...  :-)

Sure.

Thanks for your comments! :-)

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 21:31           ` Rafael J. Wysocki
@ 2009-06-09  2:49               ` Alan Stern
  2009-06-09  2:49             ` Alan Stern
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-09  2:49 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:

> > Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> > of its own for not carrying out an autosuspend.  When this happens the 
> > device's state isn't unknown.
> 
> I'm not sure what you mean exactly.
> 
> If ->autosuspend() fails, the device power state may be known, but the core
> can't be sure if the device is active.  This information is available to the
> driver and/or the bus type, which should change the status to whatever is
> appropriate.

But no matter what the driver or bus type sets the state to, your 
pm_autosuspend() will change it to one of RPM_UNKNOWN or RPM_SUSPENDED.  
Neither might be right.

> The name of this constant may be confusing, but I didn't have any better ideas.

It's not clear what RPM_ACTIVE, RPM_IDLE, and RPM_SUSPENDED are 
supposed to mean; this should be documented in the code.  Also, why 
isn't there RPM_RESUMING?

By the way, a legitimate reason for aborting an autosuspend is if the
device's driver requires remote wakeup to be enabled during suspend but
the user has disabled it.

> > The scheme doesn't include any mechanism for communicating runtime
> > power information up the device tree.  When a device is autosuspended,
> > its parent's driver should be told so that the driver can consider
> > autosuspending the parent.
> 
> I thought the bus type's ->autosuspend() callback could take care of this.

Shouldn't this happen after the device's state has changed to 
RPM_SUSPENDED?  That's not until after the callback returns.

> > There should be a sysfs interface (like the one in USB) to allow
> > userspace to prevent a device from being autosuspended -- and perhaps
> > also to force it to be suspended.
> 
> To prevent a device from being suspended - yes.  To force it to stay suspended
> - I'm not sure.

I'm not sure either.  Oliver Neukum requested it originally and it has
been useful for debugging, but I haven't seen many places where it
would come in useful in practice.

> > What about devices that have more than two runtime power states?  For
> > example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> > suspended}.
> 
> That has to be bus type-specific.
> 
> In the case of PCI all of the low power states (D1-D3) are in fact substates of
> "suspended", because we generally need to quiesce the device before putting
> it into any of these states.
> 
> I'm not sure if we can introduce more "levels of suspension", so to speak, at
> the core level, but in any case we can easily distinguish between "device
> quiesced and in a low power state" and "device fully active".
> 
> So, in this picture the device is "suspended" from the core's point of view
> once it's bus type's ->autosuspend() callback has been successfully executed. 

This too should be documented in the code.  Or in a Documentation file.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-09  2:49               ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-09  2:49 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:

> > Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> > of its own for not carrying out an autosuspend.  When this happens the 
> > device's state isn't unknown.
> 
> I'm not sure what you mean exactly.
> 
> If ->autosuspend() fails, the device power state may be known, but the core
> can't be sure if the device is active.  This information is available to the
> driver and/or the bus type, which should change the status to whatever is
> appropriate.

But no matter what the driver or bus type sets the state to, your 
pm_autosuspend() will change it to one of RPM_UNKNOWN or RPM_SUSPENDED.  
Neither might be right.

> The name of this constant may be confusing, but I didn't have any better ideas.

It's not clear what RPM_ACTIVE, RPM_IDLE, and RPM_SUSPENDED are 
supposed to mean; this should be documented in the code.  Also, why 
isn't there RPM_RESUMING?

By the way, a legitimate reason for aborting an autosuspend is if the
device's driver requires remote wakeup to be enabled during suspend but
the user has disabled it.

> > The scheme doesn't include any mechanism for communicating runtime
> > power information up the device tree.  When a device is autosuspended,
> > its parent's driver should be told so that the driver can consider
> > autosuspending the parent.
> 
> I thought the bus type's ->autosuspend() callback could take care of this.

Shouldn't this happen after the device's state has changed to 
RPM_SUSPENDED?  That's not until after the callback returns.

> > There should be a sysfs interface (like the one in USB) to allow
> > userspace to prevent a device from being autosuspended -- and perhaps
> > also to force it to be suspended.
> 
> To prevent a device from being suspended - yes.  To force it to stay suspended
> - I'm not sure.

I'm not sure either.  Oliver Neukum requested it originally and it has
been useful for debugging, but I haven't seen many places where it
would come in useful in practice.

> > What about devices that have more than two runtime power states?  For
> > example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> > suspended}.
> 
> That has to be bus type-specific.
> 
> In the case of PCI all of the low power states (D1-D3) are in fact substates of
> "suspended", because we generally need to quiesce the device before putting
> it into any of these states.
> 
> I'm not sure if we can introduce more "levels of suspension", so to speak, at
> the core level, but in any case we can easily distinguish between "device
> quiesced and in a low power state" and "device fully active".
> 
> So, in this picture the device is "suspended" from the core's point of view
> once it's bus type's ->autosuspend() callback has been successfully executed. 

This too should be documented in the code.  Or in a Documentation file.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 21:31           ` Rafael J. Wysocki
  2009-06-09  2:49               ` Alan Stern
@ 2009-06-09  2:49             ` Alan Stern
  2009-06-09  7:31               ` Oliver Neukum
  2009-06-09  7:31             ` Oliver Neukum
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-09  2:49 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:

> > Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> > of its own for not carrying out an autosuspend.  When this happens the 
> > device's state isn't unknown.
> 
> I'm not sure what you mean exactly.
> 
> If ->autosuspend() fails, the device power state may be known, but the core
> can't be sure if the device is active.  This information is available to the
> driver and/or the bus type, which should change the status to whatever is
> appropriate.

But no matter what the driver or bus type sets the state to, your 
pm_autosuspend() will change it to one of RPM_UNKNOWN or RPM_SUSPENDED.  
Neither might be right.

> The name of this constant may be confusing, but I didn't have any better ideas.

It's not clear what RPM_ACTIVE, RPM_IDLE, and RPM_SUSPENDED are 
supposed to mean; this should be documented in the code.  Also, why 
isn't there RPM_RESUMING?

By the way, a legitimate reason for aborting an autosuspend is if the
device's driver requires remote wakeup to be enabled during suspend but
the user has disabled it.

> > The scheme doesn't include any mechanism for communicating runtime
> > power information up the device tree.  When a device is autosuspended,
> > its parent's driver should be told so that the driver can consider
> > autosuspending the parent.
> 
> I thought the bus type's ->autosuspend() callback could take care of this.

Shouldn't this happen after the device's state has changed to 
RPM_SUSPENDED?  That's not until after the callback returns.

> > There should be a sysfs interface (like the one in USB) to allow
> > userspace to prevent a device from being autosuspended -- and perhaps
> > also to force it to be suspended.
> 
> To prevent a device from being suspended - yes.  To force it to stay suspended
> - I'm not sure.

I'm not sure either.  Oliver Neukum requested it originally and it has
been useful for debugging, but I haven't seen many places where it
would come in useful in practice.

> > What about devices that have more than two runtime power states?  For
> > example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> > suspended}.
> 
> That has to be bus type-specific.
> 
> In the case of PCI all of the low power states (D1-D3) are in fact substates of
> "suspended", because we generally need to quiesce the device before putting
> it into any of these states.
> 
> I'm not sure if we can introduce more "levels of suspension", so to speak, at
> the core level, but in any case we can easily distinguish between "device
> quiesced and in a low power state" and "device fully active".
> 
> So, in this picture the device is "suspended" from the core's point of view
> once it's bus type's ->autosuspend() callback has been successfully executed. 

This too should be documented in the code.  Or in a Documentation file.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 18:34           ` Rafael J. Wysocki
  2009-06-09  7:25             ` Oliver Neukum
@ 2009-06-09  7:25             ` Oliver Neukum
  2009-06-09 14:33               ` Alan Stern
                                 ` (3 more replies)
  1 sibling, 4 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09  7:25 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, Alan Stern, ACPI Devel Maling List, LKML

Am Montag, 8. Juni 2009 20:34:30 schrieb Rafael J. Wysocki:
> On Monday 08 June 2009, Oliver Neukum wrote:
> > Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:

> > Secondly, you are not using a counter. Therefore only one driver can
> > control the PM state of a device at a given time. Is that wise?
>
> I didn't think about it to be honest.  Obviously this patch doesn't cover
> all of the possible cases and I'm not even sure it's worth trying to cover
> them upfront.

I am thinking of multimedia cards, which have separate drivers for i2c, tuner
and so on.

> > Why is this needed?
>
> In some subsystems, like PCI, devices will be resumed by the BIOS
> unconditionally in the majority of cases and then it's not worth trying to
> complete run-time PM requests from before the suspend.

But why is it worth canceling them? That feature seems to be an unnecessary
complication. As long as you can safely suspend them, why not do it?

> > > +/**
> > > + * __pm_schedule_resume - Schedule run-time resume of given device.
> > > + * @dev: Device to resume.
> > > + * @autocancel: If set, the request will be cancelled during a resume from a 
> > > + *	system-wide sleep state if it happens before pm_autoresume() can be run.
> > > + */
> >
> > Eeek! This is a bad idea. You never want to a resume to be cancelled.
>
> Sometimes I do (see above).

Well no. A driver requests a resume because it has to.
This needs a defined call sequence.

Do you guarantee that autoresume follows autosuspend or not?
If not what sequences can happen? Obviously an autosuspended device
can be unplugged.
But the problem here is STR or STD. How do you notify drivers that the BIOS
has resumed their device instead of autoresume() being called? A driver
has to know that its device has become active without its knowledge.

> > > +		cancel_delayed_work_sync(&dev->power.suspend_work);
> >
> > That is the most glorious abuse of an API I've seen this year :-)
>
> Heh.
>
> OK, what would you do instead?

A waitqueue.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 18:34           ` Rafael J. Wysocki
@ 2009-06-09  7:25             ` Oliver Neukum
  2009-06-09  7:25             ` [linux-pm] " Oliver Neukum
  1 sibling, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09  7:25 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Montag, 8. Juni 2009 20:34:30 schrieb Rafael J. Wysocki:
> On Monday 08 June 2009, Oliver Neukum wrote:
> > Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:

> > Secondly, you are not using a counter. Therefore only one driver can
> > control the PM state of a device at a given time. Is that wise?
>
> I didn't think about it to be honest.  Obviously this patch doesn't cover
> all of the possible cases and I'm not even sure it's worth trying to cover
> them upfront.

I am thinking of multimedia cards, which have separate drivers for i2c, tuner
and so on.

> > Why is this needed?
>
> In some subsystems, like PCI, devices will be resumed by the BIOS
> unconditionally in the majority of cases and then it's not worth trying to
> complete run-time PM requests from before the suspend.

But why is it worth canceling them? That feature seems to be an unnecessary
complication. As long as you can safely suspend them, why not do it?

> > > +/**
> > > + * __pm_schedule_resume - Schedule run-time resume of given device.
> > > + * @dev: Device to resume.
> > > + * @autocancel: If set, the request will be cancelled during a resume from a 
> > > + *	system-wide sleep state if it happens before pm_autoresume() can be run.
> > > + */
> >
> > Eeek! This is a bad idea. You never want to a resume to be cancelled.
>
> Sometimes I do (see above).

Well no. A driver requests a resume because it has to.
This needs a defined call sequence.

Do you guarantee that autoresume follows autosuspend or not?
If not what sequences can happen? Obviously an autosuspended device
can be unplugged.
But the problem here is STR or STD. How do you notify drivers that the BIOS
has resumed their device instead of autoresume() being called? A driver
has to know that its device has become active without its knowledge.

> > > +		cancel_delayed_work_sync(&dev->power.suspend_work);
> >
> > That is the most glorious abuse of an API I've seen this year :-)
>
> Heh.
>
> OK, what would you do instead?

A waitqueue.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 21:31           ` Rafael J. Wysocki
@ 2009-06-09  7:31               ` Oliver Neukum
  2009-06-09  2:49             ` Alan Stern
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09  7:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Montag, 8. Juni 2009 23:31:58 schrieb Rafael J. Wysocki:
> If ->autosuspend() fails, the device power state may be known, but the core
> can't be sure if the device is active.  This information is available to
> the driver and/or the bus type, which should change the status to whatever
> is appropriate.

That is quite confusing. You'd better define error returns.
One that would mean that the suspension has failed but the device is
unaffected, and another one that means that the device is in an
undefined state now.

> > The scheme doesn't include any mechanism for communicating runtime
> > power information up the device tree.  When a device is autosuspended,
> > its parent's driver should be told so that the driver can consider
> > autosuspending the parent.
>
> I thought the bus type's ->autosuspend() callback could take care of this.

That can't work because you have to operate between busses.

> > Likewise, if we want to autoresume a device below an autosuspended
> > parent, the parent should be autoresumed first.  Did you want to make the
> > bus subsystem responsible for all of this?
>
> Yes, that was the idea.

That is an important point. Can some subsytems operate with a parent still
suspended?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-09  7:31               ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09  7:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Montag, 8. Juni 2009 23:31:58 schrieb Rafael J. Wysocki:
> If ->autosuspend() fails, the device power state may be known, but the core
> can't be sure if the device is active.  This information is available to
> the driver and/or the bus type, which should change the status to whatever
> is appropriate.

That is quite confusing. You'd better define error returns.
One that would mean that the suspension has failed but the device is
unaffected, and another one that means that the device is in an
undefined state now.

> > The scheme doesn't include any mechanism for communicating runtime
> > power information up the device tree.  When a device is autosuspended,
> > its parent's driver should be told so that the driver can consider
> > autosuspending the parent.
>
> I thought the bus type's ->autosuspend() callback could take care of this.

That can't work because you have to operate between busses.

> > Likewise, if we want to autoresume a device below an autosuspended
> > parent, the parent should be autoresumed first.  Did you want to make the
> > bus subsystem responsible for all of this?
>
> Yes, that was the idea.

That is an important point. Can some subsytems operate with a parent still
suspended?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 21:31           ` Rafael J. Wysocki
                               ` (2 preceding siblings ...)
  2009-06-09  7:31               ` Oliver Neukum
@ 2009-06-09  7:31             ` Oliver Neukum
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09  7:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Montag, 8. Juni 2009 23:31:58 schrieb Rafael J. Wysocki:
> If ->autosuspend() fails, the device power state may be known, but the core
> can't be sure if the device is active.  This information is available to
> the driver and/or the bus type, which should change the status to whatever
> is appropriate.

That is quite confusing. You'd better define error returns.
One that would mean that the suspension has failed but the device is
unaffected, and another one that means that the device is in an
undefined state now.

> > The scheme doesn't include any mechanism for communicating runtime
> > power information up the device tree.  When a device is autosuspended,
> > its parent's driver should be told so that the driver can consider
> > autosuspending the parent.
>
> I thought the bus type's ->autosuspend() callback could take care of this.

That can't work because you have to operate between busses.

> > Likewise, if we want to autoresume a device below an autosuspended
> > parent, the parent should be autoresumed first.  Did you want to make the
> > bus subsystem responsible for all of this?
>
> Yes, that was the idea.

That is an important point. Can some subsytems operate with a parent still
suspended?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  7:25             ` [linux-pm] " Oliver Neukum
@ 2009-06-09 14:33                 ` Alan Stern
  2009-06-09 14:33                 ` Alan Stern
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-09 14:33 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

On Tue, 9 Jun 2009, Oliver Neukum wrote:

> But the problem here is STR or STD. How do you notify drivers that the BIOS
> has resumed their device instead of autoresume() being called? A driver
> has to know that its device has become active without its knowledge.

That would be a wonderful contradiction in terms.  :-)

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-09 14:33                 ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-09 14:33 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

On Tue, 9 Jun 2009, Oliver Neukum wrote:

> But the problem here is STR or STD. How do you notify drivers that the BIOS
> has resumed their device instead of autoresume() being called? A driver
> has to know that its device has become active without its knowledge.

That would be a wonderful contradiction in terms.  :-)

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  7:25             ` [linux-pm] " Oliver Neukum
@ 2009-06-09 14:33               ` Alan Stern
  2009-06-09 14:33                 ` Alan Stern
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-09 14:33 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Tue, 9 Jun 2009, Oliver Neukum wrote:

> But the problem here is STR or STD. How do you notify drivers that the BIOS
> has resumed their device instead of autoresume() being called? A driver
> has to know that its device has become active without its knowledge.

That would be a wonderful contradiction in terms.  :-)

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09 14:33                 ` Alan Stern
@ 2009-06-09 14:48                   ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09 14:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

Am Dienstag, 9. Juni 2009 16:33:12 schrieb Alan Stern:
> On Tue, 9 Jun 2009, Oliver Neukum wrote:
> > But the problem here is STR or STD. How do you notify drivers that the
> > BIOS has resumed their device instead of autoresume() being called? A
> > driver has to know that its device has become active without its
> > knowledge.
>
> That would be a wonderful contradiction in terms.  :-)

A practical application of the uncertainty principle in quantum computing.

But you have to notify a driver if you notice that a device's power state
has been changed without the knowledge of its driver.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-09 14:48                   ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09 14:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

Am Dienstag, 9. Juni 2009 16:33:12 schrieb Alan Stern:
> On Tue, 9 Jun 2009, Oliver Neukum wrote:
> > But the problem here is STR or STD. How do you notify drivers that the
> > BIOS has resumed their device instead of autoresume() being called? A
> > driver has to know that its device has become active without its
> > knowledge.
>
> That would be a wonderful contradiction in terms.  :-)

A practical application of the uncertainty principle in quantum computing.

But you have to notify a driver if you notice that a device's power state
has been changed without the knowledge of its driver.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09 14:33                 ` Alan Stern
  (?)
  (?)
@ 2009-06-09 14:48                 ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-09 14:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Dienstag, 9. Juni 2009 16:33:12 schrieb Alan Stern:
> On Tue, 9 Jun 2009, Oliver Neukum wrote:
> > But the problem here is STR or STD. How do you notify drivers that the
> > BIOS has resumed their device instead of autoresume() being called? A
> > driver has to know that its device has become active without its
> > knowledge.
>
> That would be a wonderful contradiction in terms.  :-)

A practical application of the uncertainty principle in quantum computing.

But you have to notify a driver if you notice that a device's power state
has been changed without the knowledge of its driver.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  7:25             ` [linux-pm] " Oliver Neukum
  2009-06-09 14:33               ` Alan Stern
  2009-06-09 14:33                 ` Alan Stern
@ 2009-06-09 22:44               ` Rafael J. Wysocki
  2009-06-09 22:44               ` Rafael J. Wysocki
  3 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-09 22:44 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-pm, Alan Stern, ACPI Devel Maling List, LKML

On Tuesday 09 June 2009, Oliver Neukum wrote:
> Am Montag, 8. Juni 2009 20:34:30 schrieb Rafael J. Wysocki:
> > On Monday 08 June 2009, Oliver Neukum wrote:
> > > Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:
> 
> > > Secondly, you are not using a counter. Therefore only one driver can
> > > control the PM state of a device at a given time. Is that wise?
> >
> > I didn't think about it to be honest.  Obviously this patch doesn't cover
> > all of the possible cases and I'm not even sure it's worth trying to cover
> > them upfront.
> 
> I am thinking of multimedia cards, which have separate drivers for i2c, tuner
> and so on.

Hmm, OK there.  But there's only one bus type per device anyway, isn't it?
So I'm not sure how a counter can help in this case.

> > > Why is this needed?
> >
> > In some subsystems, like PCI, devices will be resumed by the BIOS
> > unconditionally in the majority of cases and then it's not worth trying to
> > complete run-time PM requests from before the suspend.
> 
> But why is it worth canceling them? That feature seems to be an unnecessary
> complication. As long as you can safely suspend them, why not do it?

Because that's an operation that need not be necessary.  I'd like to avoid
unnecessary operations, but you're right, it can be done differently.

> > > > +/**
> > > > + * __pm_schedule_resume - Schedule run-time resume of given device.
> > > > + * @dev: Device to resume.
> > > > + * @autocancel: If set, the request will be cancelled during a resume from a 
> > > > + *	system-wide sleep state if it happens before pm_autoresume() can be run.
> > > > + */
> > >
> > > Eeek! This is a bad idea. You never want to a resume to be cancelled.
> >
> > Sometimes I do (see above).
> 
> Well no. A driver requests a resume because it has to.
> This needs a defined call sequence.
> 
> Do you guarantee that autoresume follows autosuspend or not?

Not necessarily.  If there's an autoresume request pending during STR or STD,
the "sleep resume" will do very much the same thing as the autoresume, it will
put the device into the full power state.  IOW, the "sleep resume" can satisfy
an autoresume request, so there should be a mechanism to cancel pending
autoresume requests during 'sleep resume'.  Still, it may be better if the
driver's or bus type's ->resume() does that.

> If not what sequences can happen? Obviously an autosuspended device
> can be unplugged.
> But the problem here is STR or STD. How do you notify drivers that the BIOS
> has resumed their device instead of autoresume() being called? A driver
> has to know that its device has become active without its knowledge.

Actually, the driver will know what happens to the device anyway, because
its ->resume() callback is going to be executed and it has to be synchronized
with the ->auto[suspend|resume]() callbacks.

> > > > +		cancel_delayed_work_sync(&dev->power.suspend_work);
> > >
> > > That is the most glorious abuse of an API I've seen this year :-)
> >
> > Heh.
> >
> > OK, what would you do instead?
> 
> A waitqueue.

Or perhaps a completion?

OK

I tried to address the majority of your comments in the new version of the
patch which I'm going to send in a while in a reply to an Alan's message.

Best,
Rafael


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  7:25             ` [linux-pm] " Oliver Neukum
                                 ` (2 preceding siblings ...)
  2009-06-09 22:44               ` [linux-pm] " Rafael J. Wysocki
@ 2009-06-09 22:44               ` Rafael J. Wysocki
  3 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-09 22:44 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Tuesday 09 June 2009, Oliver Neukum wrote:
> Am Montag, 8. Juni 2009 20:34:30 schrieb Rafael J. Wysocki:
> > On Monday 08 June 2009, Oliver Neukum wrote:
> > > Am Montag, 8. Juni 2009 13:29:26 schrieb Rafael J. Wysocki:
> 
> > > Secondly, you are not using a counter. Therefore only one driver can
> > > control the PM state of a device at a given time. Is that wise?
> >
> > I didn't think about it to be honest.  Obviously this patch doesn't cover
> > all of the possible cases and I'm not even sure it's worth trying to cover
> > them upfront.
> 
> I am thinking of multimedia cards, which have separate drivers for i2c, tuner
> and so on.

Hmm, OK there.  But there's only one bus type per device anyway, isn't it?
So I'm not sure how a counter can help in this case.

> > > Why is this needed?
> >
> > In some subsystems, like PCI, devices will be resumed by the BIOS
> > unconditionally in the majority of cases and then it's not worth trying to
> > complete run-time PM requests from before the suspend.
> 
> But why is it worth canceling them? That feature seems to be an unnecessary
> complication. As long as you can safely suspend them, why not do it?

Because that's an operation that need not be necessary.  I'd like to avoid
unnecessary operations, but you're right, it can be done differently.

> > > > +/**
> > > > + * __pm_schedule_resume - Schedule run-time resume of given device.
> > > > + * @dev: Device to resume.
> > > > + * @autocancel: If set, the request will be cancelled during a resume from a 
> > > > + *	system-wide sleep state if it happens before pm_autoresume() can be run.
> > > > + */
> > >
> > > Eeek! This is a bad idea. You never want to a resume to be cancelled.
> >
> > Sometimes I do (see above).
> 
> Well no. A driver requests a resume because it has to.
> This needs a defined call sequence.
> 
> Do you guarantee that autoresume follows autosuspend or not?

Not necessarily.  If there's an autoresume request pending during STR or STD,
the "sleep resume" will do very much the same thing as the autoresume, it will
put the device into the full power state.  IOW, the "sleep resume" can satisfy
an autoresume request, so there should be a mechanism to cancel pending
autoresume requests during 'sleep resume'.  Still, it may be better if the
driver's or bus type's ->resume() does that.

> If not what sequences can happen? Obviously an autosuspended device
> can be unplugged.
> But the problem here is STR or STD. How do you notify drivers that the BIOS
> has resumed their device instead of autoresume() being called? A driver
> has to know that its device has become active without its knowledge.

Actually, the driver will know what happens to the device anyway, because
its ->resume() callback is going to be executed and it has to be synchronized
with the ->auto[suspend|resume]() callbacks.

> > > > +		cancel_delayed_work_sync(&dev->power.suspend_work);
> > >
> > > That is the most glorious abuse of an API I've seen this year :-)
> >
> > Heh.
> >
> > OK, what would you do instead?
> 
> A waitqueue.

Or perhaps a completion?

OK

I tried to address the majority of your comments in the new version of the
patch which I'm going to send in a while in a reply to an Alan's message.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:21                 ` Ingo Molnar
                                     ` (2 preceding siblings ...)
  2009-06-09 22:44                   ` Jiri Kosina
@ 2009-06-09 22:44                   ` Jiri Kosina
  3 siblings, 0 replies; 199+ messages in thread
From: Jiri Kosina @ 2009-06-09 22:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oliver Neukum, Matthew Garrett, Rafael J. Wysocki, Alan Stern,
	pm list, ACPI Devel Maling List, LKML, Magnus Damm

On Mon, 8 Jun 2009, Ingo Molnar wrote:

> > For example do you really need every key pressed while the screen 
> > saver is running or is it enough for the keyboard to be able to 
> > generate a wakeup event?
> The sane default here is to suspend the keyboard, except if an audio 
> app is running that binds to the volume keys of the keyboard.
> If the 'keyboard' is properly abstracted in the kernel and the 
> kernel driver _knows_ that the volume keys are in use, this is not a 
> problem.

So, if you want to abstract this properly, you are proposing that the 
application should in some sense "bind to keyboard keys"?

That has several drawbacks:

- applications in the current universe don't do that
- it's awful overhead: 
  + it apparently wouldn't have any other use than for waking up from 
    autosuspended mode (possibly while screensaver is running)
  + I believe that application writers will find it a little boring to 
    have to start all their main() functions with explicit eunumeration of 
    the keys the application is expecting :)
- even if we require applications to do so, there will be ones violating 
  this rule (i.e. kernel only knows what userspace tells him, in this 
  situation ... is this reliable enough?)

To sum it up -- I don't think that what you are proposing will work.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:21                 ` Ingo Molnar
  2009-06-08 14:30                   ` Matthew Garrett
  2009-06-08 14:30                   ` Run-time PM idea (was: " Matthew Garrett
@ 2009-06-09 22:44                   ` Jiri Kosina
  2009-06-09 22:44                   ` Run-time PM idea (was: Re: [linux-pm] " Jiri Kosina
  3 siblings, 0 replies; 199+ messages in thread
From: Jiri Kosina @ 2009-06-09 22:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, ACPI Devel Maling List, pm list

On Mon, 8 Jun 2009, Ingo Molnar wrote:

> > For example do you really need every key pressed while the screen 
> > saver is running or is it enough for the keyboard to be able to 
> > generate a wakeup event?
> The sane default here is to suspend the keyboard, except if an audio 
> app is running that binds to the volume keys of the keyboard.
> If the 'keyboard' is properly abstracted in the kernel and the 
> kernel driver _knows_ that the volume keys are in use, this is not a 
> problem.

So, if you want to abstract this properly, you are proposing that the 
application should in some sense "bind to keyboard keys"?

That has several drawbacks:

- applications in the current universe don't do that
- it's awful overhead: 
  + it apparently wouldn't have any other use than for waking up from 
    autosuspended mode (possibly while screensaver is running)
  + I believe that application writers will find it a little boring to 
    have to start all their main() functions with explicit eunumeration of 
    the keys the application is expecting :)
- even if we require applications to do so, there will be ones violating 
  this rule (i.e. kernel only knows what userspace tells him, in this 
  situation ... is this reliable enough?)

To sum it up -- I don't think that what you are proposing will work.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  2:49               ` Alan Stern
  (?)
@ 2009-06-09 22:57               ` Rafael J. Wysocki
  2009-06-10  8:29                 ` [patch update] " Rafael J. Wysocki
                                   ` (3 more replies)
  -1 siblings, 4 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-09 22:57 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Tuesday 09 June 2009, Alan Stern wrote:
> On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> > > of its own for not carrying out an autosuspend.  When this happens the 
> > > device's state isn't unknown.
> > 
> > I'm not sure what you mean exactly.
> > 
> > If ->autosuspend() fails, the device power state may be known, but the core
> > can't be sure if the device is active.  This information is available to the
> > driver and/or the bus type, which should change the status to whatever is
> > appropriate.
> 
> But no matter what the driver or bus type sets the state to, your 
> pm_autosuspend() will change it to one of RPM_UNKNOWN or RPM_SUSPENDED.  
> Neither might be right.

The idea is that if ->autosuspend() or ->autoresume() returns an error code,
this is a situation the PM core cannot recover from by itself, so it shouldn't
pretend it knows what's happened.  Instead, it marks the device as "I don't
know if it is safe to touch this" and won't handle it until the device driver
or bus type clears the status.

> > The name of this constant may be confusing, but I didn't have any better ideas.
> 
> It's not clear what RPM_ACTIVE, RPM_IDLE, and RPM_SUSPENDED are 
> supposed to mean; this should be documented in the code.  Also, why 
> isn't there RPM_RESUMING?

Yes, there should be.  In fact it's in the current version of the patch, which
is appended.  Also, there's a comment explaining the meaning of the RPM_*
constants in pm.h .

> By the way, a legitimate reason for aborting an autosuspend is if the
> device's driver requires remote wakeup to be enabled during suspend but
> the user has disabled it.

Do you mean the user has disabled the remote wakeup?

> > > The scheme doesn't include any mechanism for communicating runtime
> > > power information up the device tree.  When a device is autosuspended,
> > > its parent's driver should be told so that the driver can consider
> > > autosuspending the parent.
> > 
> > I thought the bus type's ->autosuspend() callback could take care of this.
> 
> Shouldn't this happen after the device's state has changed to 
> RPM_SUSPENDED?  That's not until after the callback returns.

OK, I tried to address the issue of parent suspend/resume in the new
version of the patch below (I'm not sure if I did the nesting of spinlocks in
pm_request_resume() correctly).

> > > There should be a sysfs interface (like the one in USB) to allow
> > > userspace to prevent a device from being autosuspended -- and perhaps
> > > also to force it to be suspended.
> > 
> > To prevent a device from being suspended - yes.  To force it to stay suspended
> > - I'm not sure.
> 
> I'm not sure either.  Oliver Neukum requested it originally and it has
> been useful for debugging, but I haven't seen many places where it
> would come in useful in practice.

The problem with it is that the user space may not know if it is safe to keep
a device suspended and if it is not, the kernel will have to ignore the setting
anyway, so I'm not sure what's the point (except for debugging).

> > > What about devices that have more than two runtime power states?  For
> > > example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> > > suspended}.
> > 
> > That has to be bus type-specific.
> > 
> > In the case of PCI all of the low power states (D1-D3) are in fact substates of
> > "suspended", because we generally need to quiesce the device before putting
> > it into any of these states.
> > 
> > I'm not sure if we can introduce more "levels of suspension", so to speak, at
> > the core level, but in any case we can easily distinguish between "device
> > quiesced and in a low power state" and "device fully active".
> > 
> > So, in this picture the device is "suspended" from the core's point of view
> > once it's bus type's ->autosuspend() callback has been successfully executed. 
> 
> This too should be documented in the code.  Or in a Documentation file.

OK

I tried to address your comments and the Oliver's comments too in the new
version of the patch below.  Please have a look and tell me what you think.

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    2 
 drivers/base/power/runtime.c |  318 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   76 ++++++++++
 include/linux/pm_runtime.h   |   50 ++++++
 kernel/power/Kconfig         |   14 +
 kernel/power/main.c          |   17 ++
 7 files changed, 476 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +194,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +331,70 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->autosuspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->autosuspend() callback has completed
+ *			successfully.  The device is regarded as suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->autoresume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's ->autosuspend()
+ *			or ->autoresume() callback returns error code.
+ */
+enum rpm_state {
+	RPM_ERROR = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+	RPM_WAKE,
+	RPM_RESUMING,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct completion	suspend_done;
+	unsigned int		suspend_aborted:1;
+	struct work_struct	resume_work;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,318 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_device_suspended - Check if given device has been suspended at run time.
+ * @dev: Device to check.
+ * @data: Ignored.
+ *
+ * Returns 0 if the device has been suspended or -EBUSY otherwise.
+ */
+static int pm_device_suspended(struct device *dev, void *data)
+{
+	int ret;
+
+	spin_lock(&dev->power.lock);
+
+	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
+
+	spin_unlock(&dev->power.lock);
+
+	return ret;
+}
+
+/**
+ * pm_check_children - Check if all children of a device have been suspended.
+ * @dev: Device to check.
+ *
+ * Returns 0 if all children of the device have been suspended or -EBUSY
+ * otherwise.
+ */
+static int pm_check_children(struct device *dev)
+{
+	return device_for_each_child(dev, NULL, pm_device_suspended);
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	} else if (dev->power.runtime_status != RPM_IDLE) {
+		goto out;
+	} else if (pm_check_children(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.suspend_done);
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	spin_lock(&dev->power.lock);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_SUSPENDED;
+	complete(&dev->power.suspend_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ */
+void pm_request_suspend(struct device *dev, unsigned long delay)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status != RPM_WAKE)
+		goto out;
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+void pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->parent->power.lock, flags);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	} else if (dev->power.runtime_status != RPM_SUSPENDING
+	    && dev->power.runtime_status != RPM_SUSPENDED) {
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock(&dev->power.lock);
+	spin_unlock_irqrestore(&dev->parent->power.lock, flags);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_ACTIVE) {
+		goto out;
+	} if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		spin_unlock(&dev->power.lock);
+
+		/*
+		 * The ->autosuspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		wait_for_completion(&dev->power.suspend_done);
+	} else if (dev->power.runtime_status == RPM_SUSPENDED) {
+		spin_unlock(&dev->power.lock);
+
+		/* The device's parent may also be suspended.  Resume it. */
+		error = pm_resume_sync(dev->parent);
+		if (error)
+			return error;
+	} else {
+		spin_unlock(&dev->power.lock);
+	}
+
+	spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_RESUMING)
+		/* There's another resume running in parallel with us. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status != RPM_SUSPENDED)
+		error = -EINVAL;
+	if (error) {
+		spin_unlock(&dev->parent->power.lock);
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	spin_unlock(&dev->power.lock);
+	spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+
+ out:
+	spin_unlock(&dev->power.lock);
+
+	return error;
+}
+
+/**
+ * pm_cancel_autosuspend - Cancel a pending autosuspend request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autosuspend(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	cancel_delayed_work(&dev->power.suspend_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_cancel_autoresume - Cancel a pending autoresume request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autoresume(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	work_clear_pending(&dev->power.resume_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	INIT_WORK(&dev->power.resume_work, pm_autoresume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,50 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_request_suspend(struct device *dev, unsigned long delay);
+extern void pm_request_resume(struct device *dev);
+extern int pm_resume_sync(struct device *dev);
+extern void pm_cancel_autosuspend(struct device *dev);
+extern void pm_cancel_autoresume(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_request_suspend(struct device *dev, unsigned long delay);
+static inline void pm_request_resume(struct device *dev) {}
+static inline int pm_resume_sync(struct device *dev) { return -ENOSYS; }
+static inline void pm_cancel_autosuspend(struct device *dev) {}
+static inline void pm_cancel_autoresume(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  2:49               ` Alan Stern
  (?)
  (?)
@ 2009-06-09 22:57               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-09 22:57 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Tuesday 09 June 2009, Alan Stern wrote:
> On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> > > of its own for not carrying out an autosuspend.  When this happens the 
> > > device's state isn't unknown.
> > 
> > I'm not sure what you mean exactly.
> > 
> > If ->autosuspend() fails, the device power state may be known, but the core
> > can't be sure if the device is active.  This information is available to the
> > driver and/or the bus type, which should change the status to whatever is
> > appropriate.
> 
> But no matter what the driver or bus type sets the state to, your 
> pm_autosuspend() will change it to one of RPM_UNKNOWN or RPM_SUSPENDED.  
> Neither might be right.

The idea is that if ->autosuspend() or ->autoresume() returns an error code,
this is a situation the PM core cannot recover from by itself, so it shouldn't
pretend it knows what's happened.  Instead, it marks the device as "I don't
know if it is safe to touch this" and won't handle it until the device driver
or bus type clears the status.

> > The name of this constant may be confusing, but I didn't have any better ideas.
> 
> It's not clear what RPM_ACTIVE, RPM_IDLE, and RPM_SUSPENDED are 
> supposed to mean; this should be documented in the code.  Also, why 
> isn't there RPM_RESUMING?

Yes, there should be.  In fact it's in the current version of the patch, which
is appended.  Also, there's a comment explaining the meaning of the RPM_*
constants in pm.h .

> By the way, a legitimate reason for aborting an autosuspend is if the
> device's driver requires remote wakeup to be enabled during suspend but
> the user has disabled it.

Do you mean the user has disabled the remote wakeup?

> > > The scheme doesn't include any mechanism for communicating runtime
> > > power information up the device tree.  When a device is autosuspended,
> > > its parent's driver should be told so that the driver can consider
> > > autosuspending the parent.
> > 
> > I thought the bus type's ->autosuspend() callback could take care of this.
> 
> Shouldn't this happen after the device's state has changed to 
> RPM_SUSPENDED?  That's not until after the callback returns.

OK, I tried to address the issue of parent suspend/resume in the new
version of the patch below (I'm not sure if I did the nesting of spinlocks in
pm_request_resume() correctly).

> > > There should be a sysfs interface (like the one in USB) to allow
> > > userspace to prevent a device from being autosuspended -- and perhaps
> > > also to force it to be suspended.
> > 
> > To prevent a device from being suspended - yes.  To force it to stay suspended
> > - I'm not sure.
> 
> I'm not sure either.  Oliver Neukum requested it originally and it has
> been useful for debugging, but I haven't seen many places where it
> would come in useful in practice.

The problem with it is that the user space may not know if it is safe to keep
a device suspended and if it is not, the kernel will have to ignore the setting
anyway, so I'm not sure what's the point (except for debugging).

> > > What about devices that have more than two runtime power states?  For
> > > example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> > > suspended}.
> > 
> > That has to be bus type-specific.
> > 
> > In the case of PCI all of the low power states (D1-D3) are in fact substates of
> > "suspended", because we generally need to quiesce the device before putting
> > it into any of these states.
> > 
> > I'm not sure if we can introduce more "levels of suspension", so to speak, at
> > the core level, but in any case we can easily distinguish between "device
> > quiesced and in a low power state" and "device fully active".
> > 
> > So, in this picture the device is "suspended" from the core's point of view
> > once it's bus type's ->autosuspend() callback has been successfully executed. 
> 
> This too should be documented in the code.  Or in a Documentation file.

OK

I tried to address your comments and the Oliver's comments too in the new
version of the patch below.  Please have a look and tell me what you think.

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    2 
 drivers/base/power/runtime.c |  318 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   76 ++++++++++
 include/linux/pm_runtime.h   |   50 ++++++
 kernel/power/Kconfig         |   14 +
 kernel/power/main.c          |   17 ++
 7 files changed, 476 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +194,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +331,70 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->autosuspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->autosuspend() callback has completed
+ *			successfully.  The device is regarded as suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->autoresume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's ->autosuspend()
+ *			or ->autoresume() callback returns error code.
+ */
+enum rpm_state {
+	RPM_ERROR = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+	RPM_WAKE,
+	RPM_RESUMING,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct completion	suspend_done;
+	unsigned int		suspend_aborted:1;
+	struct work_struct	resume_work;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,318 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_device_suspended - Check if given device has been suspended at run time.
+ * @dev: Device to check.
+ * @data: Ignored.
+ *
+ * Returns 0 if the device has been suspended or -EBUSY otherwise.
+ */
+static int pm_device_suspended(struct device *dev, void *data)
+{
+	int ret;
+
+	spin_lock(&dev->power.lock);
+
+	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
+
+	spin_unlock(&dev->power.lock);
+
+	return ret;
+}
+
+/**
+ * pm_check_children - Check if all children of a device have been suspended.
+ * @dev: Device to check.
+ *
+ * Returns 0 if all children of the device have been suspended or -EBUSY
+ * otherwise.
+ */
+static int pm_check_children(struct device *dev)
+{
+	return device_for_each_child(dev, NULL, pm_device_suspended);
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	} else if (dev->power.runtime_status != RPM_IDLE) {
+		goto out;
+	} else if (pm_check_children(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.suspend_done);
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	spin_lock(&dev->power.lock);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_SUSPENDED;
+	complete(&dev->power.suspend_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ */
+void pm_request_suspend(struct device *dev, unsigned long delay)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status != RPM_WAKE)
+		goto out;
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+void pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->parent->power.lock, flags);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	} else if (dev->power.runtime_status != RPM_SUSPENDING
+	    && dev->power.runtime_status != RPM_SUSPENDED) {
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock(&dev->power.lock);
+	spin_unlock_irqrestore(&dev->parent->power.lock, flags);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_ACTIVE) {
+		goto out;
+	} if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		spin_unlock(&dev->power.lock);
+
+		/*
+		 * The ->autosuspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		wait_for_completion(&dev->power.suspend_done);
+	} else if (dev->power.runtime_status == RPM_SUSPENDED) {
+		spin_unlock(&dev->power.lock);
+
+		/* The device's parent may also be suspended.  Resume it. */
+		error = pm_resume_sync(dev->parent);
+		if (error)
+			return error;
+	} else {
+		spin_unlock(&dev->power.lock);
+	}
+
+	spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_RESUMING)
+		/* There's another resume running in parallel with us. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status != RPM_SUSPENDED)
+		error = -EINVAL;
+	if (error) {
+		spin_unlock(&dev->parent->power.lock);
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	spin_unlock(&dev->power.lock);
+	spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+
+ out:
+	spin_unlock(&dev->power.lock);
+
+	return error;
+}
+
+/**
+ * pm_cancel_autosuspend - Cancel a pending autosuspend request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autosuspend(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	cancel_delayed_work(&dev->power.suspend_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_cancel_autoresume - Cancel a pending autoresume request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autoresume(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	work_clear_pending(&dev->power.resume_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	INIT_WORK(&dev->power.resume_work, pm_autoresume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,50 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_request_suspend(struct device *dev, unsigned long delay);
+extern void pm_request_resume(struct device *dev);
+extern int pm_resume_sync(struct device *dev);
+extern void pm_cancel_autosuspend(struct device *dev);
+extern void pm_cancel_autoresume(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_request_suspend(struct device *dev, unsigned long delay);
+static inline void pm_request_resume(struct device *dev) {}
+static inline int pm_resume_sync(struct device *dev) { return -ENOSYS; }
+static inline void pm_cancel_autosuspend(struct device *dev) {}
+static inline void pm_cancel_autoresume(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  7:31               ` Oliver Neukum
  (?)
  (?)
@ 2009-06-09 23:02               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-09 23:02 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

On Tuesday 09 June 2009, Oliver Neukum wrote:
> Am Montag, 8. Juni 2009 23:31:58 schrieb Rafael J. Wysocki:
> > If ->autosuspend() fails, the device power state may be known, but the core
> > can't be sure if the device is active.  This information is available to
> > the driver and/or the bus type, which should change the status to whatever
> > is appropriate.
> 
> That is quite confusing. You'd better define error returns.

That might work too, but the information need not be available to the driver
immediately.  It may need to schedule a reset of the device to recover from
the error condition, for example.

> One that would mean that the suspension has failed but the device is
> unaffected, and another one that means that the device is in an
> undefined state now.
>
> > > The scheme doesn't include any mechanism for communicating runtime
> > > power information up the device tree.  When a device is autosuspended,
> > > its parent's driver should be told so that the driver can consider
> > > autosuspending the parent.
> >
> > I thought the bus type's ->autosuspend() callback could take care of this.
> 
> That can't work because you have to operate between busses.

OK, point taken.

> > > Likewise, if we want to autoresume a device below an autosuspended
> > > parent, the parent should be autoresumed first.  Did you want to make the
> > > bus subsystem responsible for all of this?
> >
> > Yes, that was the idea.
> 
> That is an important point. Can some subsytems operate with a parent still
> suspended?

OK, I see the value of doing that at the core level.

I tried to address this in the new version of the patch, which has been sent
in my last reply to Alan.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09  7:31               ` Oliver Neukum
  (?)
@ 2009-06-09 23:02               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-09 23:02 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Tuesday 09 June 2009, Oliver Neukum wrote:
> Am Montag, 8. Juni 2009 23:31:58 schrieb Rafael J. Wysocki:
> > If ->autosuspend() fails, the device power state may be known, but the core
> > can't be sure if the device is active.  This information is available to
> > the driver and/or the bus type, which should change the status to whatever
> > is appropriate.
> 
> That is quite confusing. You'd better define error returns.

That might work too, but the information need not be available to the driver
immediately.  It may need to schedule a reset of the device to recover from
the error condition, for example.

> One that would mean that the suspension has failed but the device is
> unaffected, and another one that means that the device is in an
> undefined state now.
>
> > > The scheme doesn't include any mechanism for communicating runtime
> > > power information up the device tree.  When a device is autosuspended,
> > > its parent's driver should be told so that the driver can consider
> > > autosuspending the parent.
> >
> > I thought the bus type's ->autosuspend() callback could take care of this.
> 
> That can't work because you have to operate between busses.

OK, point taken.

> > > Likewise, if we want to autoresume a device below an autosuspended
> > > parent, the parent should be autoresumed first.  Did you want to make the
> > > bus subsystem responsible for all of this?
> >
> > Yes, that was the idea.
> 
> That is an important point. Can some subsytems operate with a parent still
> suspended?

OK, I see the value of doing that at the core level.

I tried to address this in the new version of the patch, which has been sent
in my last reply to Alan.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09 22:57               ` Rafael J. Wysocki
  2009-06-10  8:29                 ` [patch update] " Rafael J. Wysocki
@ 2009-06-10  8:29                 ` Rafael J. Wysocki
  2009-06-10 14:20                   ` [patch update] " Oliver Neukum
                                     ` (5 more replies)
  2009-06-10 20:48                   ` Alan Stern
  2009-06-10 20:48                 ` Alan Stern
  3 siblings, 6 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10  8:29 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Wednesday 10 June 2009, Rafael J. Wysocki wrote:
> On Tuesday 09 June 2009, Alan Stern wrote:
> > On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:
> > 
> > > > Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> > > > of its own for not carrying out an autosuspend.  When this happens the 
> > > > device's state isn't unknown.
> > > 
> > > I'm not sure what you mean exactly.
> > > 
> > > If ->autosuspend() fails, the device power state may be known, but the core
> > > can't be sure if the device is active.  This information is available to the
> > > driver and/or the bus type, which should change the status to whatever is
> > > appropriate.
> > 
> > But no matter what the driver or bus type sets the state to, your 
> > pm_autosuspend() will change it to one of RPM_UNKNOWN or RPM_SUSPENDED.  
> > Neither might be right.
> 
> The idea is that if ->autosuspend() or ->autoresume() returns an error code,
> this is a situation the PM core cannot recover from by itself, so it shouldn't
> pretend it knows what's happened.  Instead, it marks the device as "I don't
> know if it is safe to touch this" and won't handle it until the device driver
> or bus type clears the status.
> 
> > > The name of this constant may be confusing, but I didn't have any better ideas.
> > 
> > It's not clear what RPM_ACTIVE, RPM_IDLE, and RPM_SUSPENDED are 
> > supposed to mean; this should be documented in the code.  Also, why 
> > isn't there RPM_RESUMING?
> 
> Yes, there should be.  In fact it's in the current version of the patch, which
> is appended.  Also, there's a comment explaining the meaning of the RPM_*
> constants in pm.h .
> 
> > By the way, a legitimate reason for aborting an autosuspend is if the
> > device's driver requires remote wakeup to be enabled during suspend but
> > the user has disabled it.
> 
> Do you mean the user has disabled the remote wakeup?
> 
> > > > The scheme doesn't include any mechanism for communicating runtime
> > > > power information up the device tree.  When a device is autosuspended,
> > > > its parent's driver should be told so that the driver can consider
> > > > autosuspending the parent.
> > > 
> > > I thought the bus type's ->autosuspend() callback could take care of this.
> > 
> > Shouldn't this happen after the device's state has changed to 
> > RPM_SUSPENDED?  That's not until after the callback returns.
> 
> OK, I tried to address the issue of parent suspend/resume in the new
> version of the patch below (I'm not sure if I did the nesting of spinlocks in
> pm_request_resume() correctly).
> 
> > > > There should be a sysfs interface (like the one in USB) to allow
> > > > userspace to prevent a device from being autosuspended -- and perhaps
> > > > also to force it to be suspended.
> > > 
> > > To prevent a device from being suspended - yes.  To force it to stay suspended
> > > - I'm not sure.
> > 
> > I'm not sure either.  Oliver Neukum requested it originally and it has
> > been useful for debugging, but I haven't seen many places where it
> > would come in useful in practice.
> 
> The problem with it is that the user space may not know if it is safe to keep
> a device suspended and if it is not, the kernel will have to ignore the setting
> anyway, so I'm not sure what's the point (except for debugging).
> 
> > > > What about devices that have more than two runtime power states?  For
> > > > example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> > > > suspended}.
> > > 
> > > That has to be bus type-specific.
> > > 
> > > In the case of PCI all of the low power states (D1-D3) are in fact substates of
> > > "suspended", because we generally need to quiesce the device before putting
> > > it into any of these states.
> > > 
> > > I'm not sure if we can introduce more "levels of suspension", so to speak, at
> > > the core level, but in any case we can easily distinguish between "device
> > > quiesced and in a low power state" and "device fully active".
> > > 
> > > So, in this picture the device is "suspended" from the core's point of view
> > > once it's bus type's ->autosuspend() callback has been successfully executed. 
> > 
> > This too should be documented in the code.  Or in a Documentation file.
> 
> OK
> 
> I tried to address your comments and the Oliver's comments too in the new
> version of the patch below.  Please have a look and tell me what you think.

Argh, I forgot about some important things.

First, there are devices with no parent (actually, it would be much easier if
they had a default dummy parent, but that's a separate issue).

Second, the parent has to be taken into account in the asynchronous resume
path too (which BTW is more complicated).

Finally, I decided to follow the Oliver's suggestion that some error codes returned
by ->autosuspend() and ->autoresume() may be regarded as "go back to the
previous state" information.  I chose to use -EAGAIN and -EBUSY for this
purpose.

Updated patch follows, sorry for the confusion.

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    2 
 drivers/base/power/runtime.c |  393 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   78 ++++++++
 include/linux/pm_runtime.h   |   50 +++++
 kernel/power/Kconfig         |   14 +
 kernel/power/main.c          |   17 +
 7 files changed, 553 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +194,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +331,72 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->autosuspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->autosuspend() callback has completed
+ *			successfully.  The device is regarded as suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->autoresume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's ->autosuspend()
+ *			or ->autoresume() callback returns error code other than
+ *			-EAGAIN or -EBUSY.
+ */
+
+enum rpm_state {
+	RPM_ERROR = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+	RPM_WAKE,
+	RPM_RESUMING,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	unsigned int		suspend_aborted:1;
+	struct work_struct	resume_work;
+	struct completion	work_done;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,393 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_device_suspended - Check if given device has been suspended at run time.
+ * @dev: Device to check.
+ * @data: Ignored.
+ *
+ * Returns 0 if the device has been suspended or -EBUSY otherwise.
+ */
+static int pm_device_suspended(struct device *dev, void *data)
+{
+	int ret;
+
+	spin_lock(&dev->power.lock);
+
+	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
+
+	spin_unlock(&dev->power.lock);
+
+	return ret;
+}
+
+/**
+ * pm_check_children - Check if all children of a device have been suspended.
+ * @dev: Device to check.
+ *
+ * Returns 0 if all children of the device have been suspended or -EBUSY
+ * otherwise.
+ */
+static int pm_check_children(struct device *dev)
+{
+	return device_for_each_child(dev, NULL, pm_device_suspended);
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	} else if (dev->power.runtime_status != RPM_IDLE) {
+		goto out;
+	} else if (pm_check_children(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ */
+void pm_request_suspend(struct device *dev, unsigned long delay)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+ repeat:
+	if (dev->power.runtime_status != RPM_WAKE) {
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	} else if (dev->parent
+	    && dev->parent->power.runtime_status != RPM_ACTIVE) {
+		if (dev->parent->power.runtime_status == RPM_RESUMING) {
+			spin_unlock(&dev->power.lock);
+			spin_unlock(&dev->parent->power.lock);
+
+			wait_for_completion(&dev->parent->power.work_done);
+
+			spin_lock(&dev->parent->power.lock);
+			spin_lock(&dev->power.lock);
+		}
+		if (dev->parent->power.runtime_status != RPM_ACTIVE) {
+			spin_unlock(&dev->parent->power.lock);
+			goto out;
+		}
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+void pm_request_resume(struct device *dev)
+{
+	unsigned long parent_flags = 0, flags;
+
+ repeat:
+	if (dev->parent)
+		spin_lock_irqsave(&dev->parent->power.lock, parent_flags);
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* Autosuspend request is pending, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	} else if (dev->power.runtime_status != RPM_SUSPENDING
+	    && dev->power.runtime_status != RPM_SUSPENDED) {
+		goto out;
+	} else if (dev->parent
+	    && (dev->parent->power.runtime_status == RPM_IDLE
+	      || dev->parent->power.runtime_status == RPM_SUSPENDING
+	      || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+
+		/* We have to resume the parent first. */
+		pm_request_resume(dev->parent);
+
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+	if (dev->parent)
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_ACTIVE) {
+		goto out;
+	} if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		spin_unlock(&dev->power.lock);
+
+		/*
+		 * The ->autosuspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		wait_for_completion(&dev->power.work_done);
+	} else if (dev->power.runtime_status == RPM_SUSPENDED && dev->parent) {
+		spin_unlock(&dev->power.lock);
+
+		/* The device's parent may also be suspended.  Resume it. */
+		error = pm_resume_sync(dev->parent);
+		if (error)
+			return error;
+	} else {
+		spin_unlock(&dev->power.lock);
+	}
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_RESUMING)
+		/* There's another resume running in parallel with us. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status != RPM_SUSPENDED)
+		error = -EINVAL;
+	if (error) {
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+
+	return error;
+}
+
+/**
+ * pm_cancel_autosuspend - Cancel a pending autosuspend request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autosuspend(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	cancel_delayed_work(&dev->power.suspend_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_cancel_autoresume - Cancel a pending autoresume request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autoresume(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	work_clear_pending(&dev->power.resume_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	INIT_WORK(&dev->power.resume_work, pm_autoresume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,50 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_request_suspend(struct device *dev, unsigned long delay);
+extern void pm_request_resume(struct device *dev);
+extern int pm_resume_sync(struct device *dev);
+extern void pm_cancel_autosuspend(struct device *dev);
+extern void pm_cancel_autoresume(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_request_suspend(struct device *dev, unsigned long delay);
+static inline void pm_request_resume(struct device *dev) {}
+static inline int pm_resume_sync(struct device *dev) { return -ENOSYS; }
+static inline void pm_cancel_autosuspend(struct device *dev) {}
+static inline void pm_cancel_autoresume(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09 22:57               ` Rafael J. Wysocki
@ 2009-06-10  8:29                 ` Rafael J. Wysocki
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10  8:29 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Wednesday 10 June 2009, Rafael J. Wysocki wrote:
> On Tuesday 09 June 2009, Alan Stern wrote:
> > On Mon, 8 Jun 2009, Rafael J. Wysocki wrote:
> > 
> > > > Use of the RPM_UNKNOWN state isn't good.  A bus may have valid reasons 
> > > > of its own for not carrying out an autosuspend.  When this happens the 
> > > > device's state isn't unknown.
> > > 
> > > I'm not sure what you mean exactly.
> > > 
> > > If ->autosuspend() fails, the device power state may be known, but the core
> > > can't be sure if the device is active.  This information is available to the
> > > driver and/or the bus type, which should change the status to whatever is
> > > appropriate.
> > 
> > But no matter what the driver or bus type sets the state to, your 
> > pm_autosuspend() will change it to one of RPM_UNKNOWN or RPM_SUSPENDED.  
> > Neither might be right.
> 
> The idea is that if ->autosuspend() or ->autoresume() returns an error code,
> this is a situation the PM core cannot recover from by itself, so it shouldn't
> pretend it knows what's happened.  Instead, it marks the device as "I don't
> know if it is safe to touch this" and won't handle it until the device driver
> or bus type clears the status.
> 
> > > The name of this constant may be confusing, but I didn't have any better ideas.
> > 
> > It's not clear what RPM_ACTIVE, RPM_IDLE, and RPM_SUSPENDED are 
> > supposed to mean; this should be documented in the code.  Also, why 
> > isn't there RPM_RESUMING?
> 
> Yes, there should be.  In fact it's in the current version of the patch, which
> is appended.  Also, there's a comment explaining the meaning of the RPM_*
> constants in pm.h .
> 
> > By the way, a legitimate reason for aborting an autosuspend is if the
> > device's driver requires remote wakeup to be enabled during suspend but
> > the user has disabled it.
> 
> Do you mean the user has disabled the remote wakeup?
> 
> > > > The scheme doesn't include any mechanism for communicating runtime
> > > > power information up the device tree.  When a device is autosuspended,
> > > > its parent's driver should be told so that the driver can consider
> > > > autosuspending the parent.
> > > 
> > > I thought the bus type's ->autosuspend() callback could take care of this.
> > 
> > Shouldn't this happen after the device's state has changed to 
> > RPM_SUSPENDED?  That's not until after the callback returns.
> 
> OK, I tried to address the issue of parent suspend/resume in the new
> version of the patch below (I'm not sure if I did the nesting of spinlocks in
> pm_request_resume() correctly).
> 
> > > > There should be a sysfs interface (like the one in USB) to allow
> > > > userspace to prevent a device from being autosuspended -- and perhaps
> > > > also to force it to be suspended.
> > > 
> > > To prevent a device from being suspended - yes.  To force it to stay suspended
> > > - I'm not sure.
> > 
> > I'm not sure either.  Oliver Neukum requested it originally and it has
> > been useful for debugging, but I haven't seen many places where it
> > would come in useful in practice.
> 
> The problem with it is that the user space may not know if it is safe to keep
> a device suspended and if it is not, the kernel will have to ignore the setting
> anyway, so I'm not sure what's the point (except for debugging).
> 
> > > > What about devices that have more than two runtime power states?  For
> > > > example, you can't squeeze PCI's {D0,D1,D2,D3hot} range into {running,
> > > > suspended}.
> > > 
> > > That has to be bus type-specific.
> > > 
> > > In the case of PCI all of the low power states (D1-D3) are in fact substates of
> > > "suspended", because we generally need to quiesce the device before putting
> > > it into any of these states.
> > > 
> > > I'm not sure if we can introduce more "levels of suspension", so to speak, at
> > > the core level, but in any case we can easily distinguish between "device
> > > quiesced and in a low power state" and "device fully active".
> > > 
> > > So, in this picture the device is "suspended" from the core's point of view
> > > once it's bus type's ->autosuspend() callback has been successfully executed. 
> > 
> > This too should be documented in the code.  Or in a Documentation file.
> 
> OK
> 
> I tried to address your comments and the Oliver's comments too in the new
> version of the patch below.  Please have a look and tell me what you think.

Argh, I forgot about some important things.

First, there are devices with no parent (actually, it would be much easier if
they had a default dummy parent, but that's a separate issue).

Second, the parent has to be taken into account in the asynchronous resume
path too (which BTW is more complicated).

Finally, I decided to follow the Oliver's suggestion that some error codes returned
by ->autosuspend() and ->autoresume() may be regarded as "go back to the
previous state" information.  I chose to use -EAGAIN and -EBUSY for this
purpose.

Updated patch follows, sorry for the confusion.

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    2 
 drivers/base/power/runtime.c |  393 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   78 ++++++++
 include/linux/pm_runtime.h   |   50 +++++
 kernel/power/Kconfig         |   14 +
 kernel/power/main.c          |   17 +
 7 files changed, 553 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +194,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*autosuspend)(struct device *dev);
+	int (*autoresume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +331,72 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->autosuspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->autosuspend() callback has completed
+ *			successfully.  The device is regarded as suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->autoresume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's ->autosuspend()
+ *			or ->autoresume() callback returns error code other than
+ *			-EAGAIN or -EBUSY.
+ */
+
+enum rpm_state {
+	RPM_ERROR = -1,
+	RPM_ACTIVE,
+	RPM_IDLE,
+	RPM_SUSPENDING,
+	RPM_SUSPENDED,
+	RPM_WAKE,
+	RPM_RESUMING,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef	CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	unsigned int		suspend_aborted:1;
+	struct work_struct	resume_work;
+	struct completion	work_done;
+	enum rpm_state		runtime_status;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,393 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_device_suspended - Check if given device has been suspended at run time.
+ * @dev: Device to check.
+ * @data: Ignored.
+ *
+ * Returns 0 if the device has been suspended or -EBUSY otherwise.
+ */
+static int pm_device_suspended(struct device *dev, void *data)
+{
+	int ret;
+
+	spin_lock(&dev->power.lock);
+
+	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
+
+	spin_unlock(&dev->power.lock);
+
+	return ret;
+}
+
+/**
+ * pm_check_children - Check if all children of a device have been suspended.
+ * @dev: Device to check.
+ *
+ * Returns 0 if all children of the device have been suspended or -EBUSY
+ * otherwise.
+ */
+static int pm_check_children(struct device *dev)
+{
+	return device_for_each_child(dev, NULL, pm_device_suspended);
+}
+
+/**
+ * pm_autosuspend - Run autosuspend callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->autosuspend() callback from the device's bus type driver.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+static void pm_autosuspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	} else if (dev->power.runtime_status != RPM_IDLE) {
+		goto out;
+	} else if (pm_check_children(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend)
+		error = dev->bus->pm->autosuspend(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ */
+void pm_request_suspend(struct device *dev, unsigned long delay)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_autoresume - Run autoresume callback of given device object's bus type.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->autoresume() callback
+ * from the device's bus type driver.  Update the run-time PM flags in the
+ * device object to reflect the current status of the device.
+ */
+static void pm_autoresume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+ repeat:
+	if (dev->power.runtime_status != RPM_WAKE) {
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	} else if (dev->parent
+	    && dev->parent->power.runtime_status != RPM_ACTIVE) {
+		if (dev->parent->power.runtime_status == RPM_RESUMING) {
+			spin_unlock(&dev->power.lock);
+			spin_unlock(&dev->parent->power.lock);
+
+			wait_for_completion(&dev->parent->power.work_done);
+
+			spin_lock(&dev->parent->power.lock);
+			spin_lock(&dev->power.lock);
+		}
+		if (dev->parent->power.runtime_status != RPM_ACTIVE) {
+			spin_unlock(&dev->parent->power.lock);
+			goto out;
+		}
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+void pm_request_resume(struct device *dev)
+{
+	unsigned long parent_flags = 0, flags;
+
+ repeat:
+	if (dev->parent)
+		spin_lock_irqsave(&dev->parent->power.lock, parent_flags);
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* Autosuspend request is pending, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	} else if (dev->power.runtime_status != RPM_SUSPENDING
+	    && dev->power.runtime_status != RPM_SUSPENDED) {
+		goto out;
+	} else if (dev->parent
+	    && (dev->parent->power.runtime_status == RPM_IDLE
+	      || dev->parent->power.runtime_status == RPM_SUSPENDING
+	      || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+
+		/* We have to resume the parent first. */
+		pm_request_resume(dev->parent);
+
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+	if (dev->parent)
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_ACTIVE) {
+		goto out;
+	} if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->autosuspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		spin_unlock(&dev->power.lock);
+
+		/*
+		 * The ->autosuspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		wait_for_completion(&dev->power.work_done);
+	} else if (dev->power.runtime_status == RPM_SUSPENDED && dev->parent) {
+		spin_unlock(&dev->power.lock);
+
+		/* The device's parent may also be suspended.  Resume it. */
+		error = pm_resume_sync(dev->parent);
+		if (error)
+			return error;
+	} else {
+		spin_unlock(&dev->power.lock);
+	}
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_RESUMING)
+		/* There's another resume running in parallel with us. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status != RPM_SUSPENDED)
+		error = -EINVAL;
+	if (error) {
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->autoresume)
+		error = dev->bus->pm->autoresume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+
+	return error;
+}
+
+/**
+ * pm_cancel_autosuspend - Cancel a pending autosuspend request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autosuspend(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	cancel_delayed_work(&dev->power.suspend_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_cancel_autoresume - Cancel a pending autoresume request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autoresume(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	work_clear_pending(&dev->power.resume_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_autosuspend);
+	INIT_WORK(&dev->power.resume_work, pm_autoresume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,50 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_request_suspend(struct device *dev, unsigned long delay);
+extern void pm_request_resume(struct device *dev);
+extern int pm_resume_sync(struct device *dev);
+extern void pm_cancel_autosuspend(struct device *dev);
+extern void pm_cancel_autoresume(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_request_suspend(struct device *dev, unsigned long delay);
+static inline void pm_request_resume(struct device *dev) {}
+static inline int pm_resume_sync(struct device *dev) { return -ENOSYS; }
+static inline void pm_cancel_autosuspend(struct device *dev) {}
+static inline void pm_cancel_autoresume(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-10 14:20                   ` [patch update] " Oliver Neukum
@ 2009-06-10 14:20                   ` Oliver Neukum
  2009-06-10 19:27                     ` Rafael J. Wysocki
  2009-06-10 19:27                     ` Rafael J. Wysocki
  2009-06-10 21:14                   ` Alan Stern
                                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 14:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Mittwoch, 10. Juni 2009 10:29:26 schrieb Rafael J. Wysocki:
> Argh, I forgot about some important things.
>
> First, there are devices with no parent (actually, it would be much easier
> if they had a default dummy parent, but that's a separate issue).
>
> Second, the parent has to be taken into account in the asynchronous resume
> path too (which BTW is more complicated).

What happens if the parent's parent is also suspended? It seems to me that
you must code this recursively.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-10 14:20                   ` Oliver Neukum
  2009-06-10 14:20                   ` [patch update] Re: [linux-pm] " Oliver Neukum
                                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 14:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Mittwoch, 10. Juni 2009 10:29:26 schrieb Rafael J. Wysocki:
> Argh, I forgot about some important things.
>
> First, there are devices with no parent (actually, it would be much easier
> if they had a default dummy parent, but that's a separate issue).
>
> Second, the parent has to be taken into account in the asynchronous resume
> path too (which BTW is more complicated).

What happens if the parent's parent is also suspended? It seems to me that
you must code this recursively.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 14:20                   ` [patch update] Re: [linux-pm] " Oliver Neukum
@ 2009-06-10 19:27                     ` Rafael J. Wysocki
  2009-06-10 21:38                         ` Oliver Neukum
  2009-06-10 21:38                       ` Oliver Neukum
  2009-06-10 19:27                     ` Rafael J. Wysocki
  1 sibling, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 19:27 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

On Wednesday 10 June 2009, Oliver Neukum wrote:
> Am Mittwoch, 10. Juni 2009 10:29:26 schrieb Rafael J. Wysocki:
> > Argh, I forgot about some important things.
> >
> > First, there are devices with no parent (actually, it would be much easier
> > if they had a default dummy parent, but that's a separate issue).
> >
> > Second, the parent has to be taken into account in the asynchronous resume
> > path too (which BTW is more complicated).
> 
> What happens if the parent's parent is also suspended? It seems to me that
> you must code this recursively.

Hmm, I thought I did.

[Looks]

pm_request_resume(dev) will call pm_request_resume(dev->parent), if necessary,
and that will call pm_request_resume(dev->parent->parent) and so on.  Each of
them will queue a work item and the one for the topmost parent will be queued
first.  So, the resume requests for all parents will be executed before the
one for the device, due to the fact that the workqueue is singlethread.

Well, there is a bug related to it, namely pm_autosuspend() may change the
status to RPM_SUSPENDED after pm_request_resume() has changed it to
RPM_WAKE, that needs fixing.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 14:20                   ` [patch update] Re: [linux-pm] " Oliver Neukum
  2009-06-10 19:27                     ` Rafael J. Wysocki
@ 2009-06-10 19:27                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 19:27 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Wednesday 10 June 2009, Oliver Neukum wrote:
> Am Mittwoch, 10. Juni 2009 10:29:26 schrieb Rafael J. Wysocki:
> > Argh, I forgot about some important things.
> >
> > First, there are devices with no parent (actually, it would be much easier
> > if they had a default dummy parent, but that's a separate issue).
> >
> > Second, the parent has to be taken into account in the asynchronous resume
> > path too (which BTW is more complicated).
> 
> What happens if the parent's parent is also suspended? It seems to me that
> you must code this recursively.

Hmm, I thought I did.

[Looks]

pm_request_resume(dev) will call pm_request_resume(dev->parent), if necessary,
and that will call pm_request_resume(dev->parent->parent) and so on.  Each of
them will queue a work item and the one for the topmost parent will be queued
first.  So, the resume requests for all parents will be executed before the
one for the device, due to the fact that the workqueue is singlethread.

Well, there is a bug related to it, namely pm_autosuspend() may change the
status to RPM_SUSPENDED after pm_request_resume() has changed it to
RPM_WAKE, that needs fixing.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09 22:57               ` Rafael J. Wysocki
@ 2009-06-10 20:48                   ` Alan Stern
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 20:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:

> > By the way, a legitimate reason for aborting an autosuspend is if the
> > device's driver requires remote wakeup to be enabled during suspend but
> > the user has disabled it.
> 
> Do you mean the user has disabled the remote wakeup?

Yes, by writing to the power/wakeup attribute.


> > > > There should be a sysfs interface (like the one in USB) to allow
> > > > userspace to prevent a device from being autosuspended -- and perhaps
> > > > also to force it to be suspended.
> > > 
> > > To prevent a device from being suspended - yes.  To force it to stay suspended
> > > - I'm not sure.
> > 
> > I'm not sure either.  Oliver Neukum requested it originally and it has
> > been useful for debugging, but I haven't seen many places where it
> > would come in useful in practice.

I did think of one use for this feature.  It's unique to USB,
however...

In Windows, you're not supposed to unplug a hot-unpluggable device
without first telling the OS -- there's a "Safely Remove Hardware"  
applet.  When you tell the applet you want to remove a USB device, the
system disables the device's port and then says it's okay to unplug the
device.  Now Linux doesn't have any user API for disabling USB ports,
but suspending a port has the same effect (the device can't distinguish
a disable from a suspend).

It turns out that some devices (MP3 players, for instance) have
incorporated this into their design.  They display a "Safe to unplug"  
message when their port is disabled or suspended.  People like to see
this message -- it makes them feel good about unplugging the device --
and the only way to get it under Linux is by forcing the device to be
suspended.  :-)

> The problem with it is that the user space may not know if it is safe to keep
> a device suspended and if it is not, the kernel will have to ignore the setting
> anyway, so I'm not sure what's the point (except for debugging).

This falls into the category of "The user knows better".  If the user
specifically tells the kernel to suspend a device (rather than just
letting it autosuspend), and this causes a problem, then it's the
user's own fault.

After all, who's really the master?  Us or the kernel?

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-10 20:48                   ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 20:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:

> > By the way, a legitimate reason for aborting an autosuspend is if the
> > device's driver requires remote wakeup to be enabled during suspend but
> > the user has disabled it.
> 
> Do you mean the user has disabled the remote wakeup?

Yes, by writing to the power/wakeup attribute.


> > > > There should be a sysfs interface (like the one in USB) to allow
> > > > userspace to prevent a device from being autosuspended -- and perhaps
> > > > also to force it to be suspended.
> > > 
> > > To prevent a device from being suspended - yes.  To force it to stay suspended
> > > - I'm not sure.
> > 
> > I'm not sure either.  Oliver Neukum requested it originally and it has
> > been useful for debugging, but I haven't seen many places where it
> > would come in useful in practice.

I did think of one use for this feature.  It's unique to USB,
however...

In Windows, you're not supposed to unplug a hot-unpluggable device
without first telling the OS -- there's a "Safely Remove Hardware"  
applet.  When you tell the applet you want to remove a USB device, the
system disables the device's port and then says it's okay to unplug the
device.  Now Linux doesn't have any user API for disabling USB ports,
but suspending a port has the same effect (the device can't distinguish
a disable from a suspend).

It turns out that some devices (MP3 players, for instance) have
incorporated this into their design.  They display a "Safe to unplug"  
message when their port is disabled or suspended.  People like to see
this message -- it makes them feel good about unplugging the device --
and the only way to get it under Linux is by forcing the device to be
suspended.  :-)

> The problem with it is that the user space may not know if it is safe to keep
> a device suspended and if it is not, the kernel will have to ignore the setting
> anyway, so I'm not sure what's the point (except for debugging).

This falls into the category of "The user knows better".  If the user
specifically tells the kernel to suspend a device (rather than just
letting it autosuspend), and this causes a problem, then it's the
user's own fault.

After all, who's really the master?  Us or the kernel?

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-09 22:57               ` Rafael J. Wysocki
                                   ` (2 preceding siblings ...)
  2009-06-10 20:48                   ` Alan Stern
@ 2009-06-10 20:48                 ` Alan Stern
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 20:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:

> > By the way, a legitimate reason for aborting an autosuspend is if the
> > device's driver requires remote wakeup to be enabled during suspend but
> > the user has disabled it.
> 
> Do you mean the user has disabled the remote wakeup?

Yes, by writing to the power/wakeup attribute.


> > > > There should be a sysfs interface (like the one in USB) to allow
> > > > userspace to prevent a device from being autosuspended -- and perhaps
> > > > also to force it to be suspended.
> > > 
> > > To prevent a device from being suspended - yes.  To force it to stay suspended
> > > - I'm not sure.
> > 
> > I'm not sure either.  Oliver Neukum requested it originally and it has
> > been useful for debugging, but I haven't seen many places where it
> > would come in useful in practice.

I did think of one use for this feature.  It's unique to USB,
however...

In Windows, you're not supposed to unplug a hot-unpluggable device
without first telling the OS -- there's a "Safely Remove Hardware"  
applet.  When you tell the applet you want to remove a USB device, the
system disables the device's port and then says it's okay to unplug the
device.  Now Linux doesn't have any user API for disabling USB ports,
but suspending a port has the same effect (the device can't distinguish
a disable from a suspend).

It turns out that some devices (MP3 players, for instance) have
incorporated this into their design.  They display a "Safe to unplug"  
message when their port is disabled or suspended.  People like to see
this message -- it makes them feel good about unplugging the device --
and the only way to get it under Linux is by forcing the device to be
suspended.  :-)

> The problem with it is that the user space may not know if it is safe to keep
> a device suspended and if it is not, the kernel will have to ignore the setting
> anyway, so I'm not sure what's the point (except for debugging).

This falls into the category of "The user knows better".  If the user
specifically tells the kernel to suspend a device (rather than just
letting it autosuspend), and this causes a problem, then it's the
user's own fault.

After all, who's really the master?  Us or the kernel?

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
                                     ` (2 preceding siblings ...)
  2009-06-10 21:14                   ` Alan Stern
@ 2009-06-10 21:14                   ` Alan Stern
  2009-06-10 21:31                     ` [patch update] " Rafael J. Wysocki
  2009-06-10 21:31                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-11  5:18                     ` Magnus Damm
  2009-06-11  5:18                   ` Magnus Damm
  5 siblings, 2 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 21:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:

> > The idea is that if ->autosuspend() or ->autoresume() returns an error code,
> > this is a situation the PM core cannot recover from by itself, so it shouldn't
> > pretend it knows what's happened.  Instead, it marks the device as "I don't
> > know if it is safe to touch this" and won't handle it until the device driver
> > or bus type clears the status.

I'm still not sure this is a good idea.  When would the device driver 
clear the status?  The autosuspend and autoresume methods run 
asynchronously, so after they're done the driver doesn't get a chance 
to do anything.

It might be best just to set the status to RPM_ACTIVE if a runtime 
suspend fails and RPM_SUSPENDED if a runtime resume fails.

> Finally, I decided to follow the Oliver's suggestion that some error codes returned
> by ->autosuspend() and ->autoresume() may be regarded as "go back to the
> previous state" information.  I chose to use -EAGAIN and -EBUSY for this
> purpose.

Maybe...


>  struct dev_pm_info {
>  	pm_message_t		power_state;
> -	unsigned		can_wakeup:1;
> -	unsigned		should_wakeup:1;
> +	unsigned int		can_wakeup:1;
> +	unsigned int		should_wakeup:1;
>  	enum dpm_state		status;		/* Owned by the PM core */
>  #ifdef	CONFIG_PM_SLEEP
>  	struct list_head	entry;
>  #endif
> +#ifdef	CONFIG_PM_RUNTIME
> +	struct delayed_work	suspend_work;
> +	unsigned int		suspend_aborted:1;
> +	struct work_struct	resume_work;
> +	struct completion	work_done;
> +	enum rpm_state		runtime_status;
> +	spinlock_t		lock;
> +#endif
>  };

You know, it doesn't make any sense to have a suspend and a resume 
both pending at the same time.  So you could add only a delayed_work 
structure and use its embedded work_struct for resume requests.

Also, you might borrow a trick from Dave Brownell.  Define the RPM_*
values so that the individual bits have meanings.  Then instead of
testing for multiple possible values of runtime_status, you could do a
simple bit test.

> +/**
> + * pm_device_suspended - Check if given device has been suspended at run time.
> + * @dev: Device to check.
> + * @data: Ignored.
> + *
> + * Returns 0 if the device has been suspended or -EBUSY otherwise.
> + */
> +static int pm_device_suspended(struct device *dev, void *data)
> +{
> +	int ret;
> +
> +	spin_lock(&dev->power.lock);
> +
> +	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
> +
> +	spin_unlock(&dev->power.lock);

How does acquiring the lock help here?

> +/**
> + * pm_check_children - Check if all children of a device have been suspended.
> + * @dev: Device to check.
> + *
> + * Returns 0 if all children of the device have been suspended or -EBUSY
> + * otherwise.
> + */

We might want to do a runtime suspend even if the device's children
aren't already suspended.  For example, you could suspend a link while
leaving the device on the other end of the link at full power --
especially if powering down the device is slow but changing the link's
power level is fast.

> +/**
> + * pm_autosuspend - Run autosuspend callback of given device object's bus type.
> + * @work: Work structure used for scheduling the execution of this function.
> + *
> + * Use @work to get the device object the suspend has been scheduled for,
> + * check if the suspend request hasn't been cancelled and run the
> + * ->autosuspend() callback from the device's bus type driver.  Update the
> + * run-time PM flags in the device object to reflect the current status of the
> + * device.
> + */
> +static void pm_autosuspend(struct work_struct *work)

Can we call this something else?  "Autosuspend" implies that the 
suspend originated from within the kernel.  How about "pm_suspend_work" 
or "pm_runtime_suspend"?  Likewise for the resume routines.

I haven't checked the details of the code yet.  More later...

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-10 14:20                   ` [patch update] " Oliver Neukum
  2009-06-10 14:20                   ` [patch update] Re: [linux-pm] " Oliver Neukum
@ 2009-06-10 21:14                   ` Alan Stern
  2009-06-10 21:14                   ` [patch update] Re: [linux-pm] " Alan Stern
                                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 21:14 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:

> > The idea is that if ->autosuspend() or ->autoresume() returns an error code,
> > this is a situation the PM core cannot recover from by itself, so it shouldn't
> > pretend it knows what's happened.  Instead, it marks the device as "I don't
> > know if it is safe to touch this" and won't handle it until the device driver
> > or bus type clears the status.

I'm still not sure this is a good idea.  When would the device driver 
clear the status?  The autosuspend and autoresume methods run 
asynchronously, so after they're done the driver doesn't get a chance 
to do anything.

It might be best just to set the status to RPM_ACTIVE if a runtime 
suspend fails and RPM_SUSPENDED if a runtime resume fails.

> Finally, I decided to follow the Oliver's suggestion that some error codes returned
> by ->autosuspend() and ->autoresume() may be regarded as "go back to the
> previous state" information.  I chose to use -EAGAIN and -EBUSY for this
> purpose.

Maybe...


>  struct dev_pm_info {
>  	pm_message_t		power_state;
> -	unsigned		can_wakeup:1;
> -	unsigned		should_wakeup:1;
> +	unsigned int		can_wakeup:1;
> +	unsigned int		should_wakeup:1;
>  	enum dpm_state		status;		/* Owned by the PM core */
>  #ifdef	CONFIG_PM_SLEEP
>  	struct list_head	entry;
>  #endif
> +#ifdef	CONFIG_PM_RUNTIME
> +	struct delayed_work	suspend_work;
> +	unsigned int		suspend_aborted:1;
> +	struct work_struct	resume_work;
> +	struct completion	work_done;
> +	enum rpm_state		runtime_status;
> +	spinlock_t		lock;
> +#endif
>  };

You know, it doesn't make any sense to have a suspend and a resume 
both pending at the same time.  So you could add only a delayed_work 
structure and use its embedded work_struct for resume requests.

Also, you might borrow a trick from Dave Brownell.  Define the RPM_*
values so that the individual bits have meanings.  Then instead of
testing for multiple possible values of runtime_status, you could do a
simple bit test.

> +/**
> + * pm_device_suspended - Check if given device has been suspended at run time.
> + * @dev: Device to check.
> + * @data: Ignored.
> + *
> + * Returns 0 if the device has been suspended or -EBUSY otherwise.
> + */
> +static int pm_device_suspended(struct device *dev, void *data)
> +{
> +	int ret;
> +
> +	spin_lock(&dev->power.lock);
> +
> +	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
> +
> +	spin_unlock(&dev->power.lock);

How does acquiring the lock help here?

> +/**
> + * pm_check_children - Check if all children of a device have been suspended.
> + * @dev: Device to check.
> + *
> + * Returns 0 if all children of the device have been suspended or -EBUSY
> + * otherwise.
> + */

We might want to do a runtime suspend even if the device's children
aren't already suspended.  For example, you could suspend a link while
leaving the device on the other end of the link at full power --
especially if powering down the device is slow but changing the link's
power level is fast.

> +/**
> + * pm_autosuspend - Run autosuspend callback of given device object's bus type.
> + * @work: Work structure used for scheduling the execution of this function.
> + *
> + * Use @work to get the device object the suspend has been scheduled for,
> + * check if the suspend request hasn't been cancelled and run the
> + * ->autosuspend() callback from the device's bus type driver.  Update the
> + * run-time PM flags in the device object to reflect the current status of the
> + * device.
> + */
> +static void pm_autosuspend(struct work_struct *work)

Can we call this something else?  "Autosuspend" implies that the 
suspend originated from within the kernel.  How about "pm_suspend_work" 
or "pm_runtime_suspend"?  Likewise for the resume routines.

I haven't checked the details of the code yet.  More later...

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 20:48                   ` Alan Stern
  (?)
  (?)
@ 2009-06-10 21:15                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 21:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Wednesday 10 June 2009, Alan Stern wrote:
> On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > By the way, a legitimate reason for aborting an autosuspend is if the
> > > device's driver requires remote wakeup to be enabled during suspend but
> > > the user has disabled it.
> > 
> > Do you mean the user has disabled the remote wakeup?
> 
> Yes, by writing to the power/wakeup attribute.
> 
> 
> > > > > There should be a sysfs interface (like the one in USB) to allow
> > > > > userspace to prevent a device from being autosuspended -- and perhaps
> > > > > also to force it to be suspended.
> > > > 
> > > > To prevent a device from being suspended - yes.  To force it to stay suspended
> > > > - I'm not sure.
> > > 
> > > I'm not sure either.  Oliver Neukum requested it originally and it has
> > > been useful for debugging, but I haven't seen many places where it
> > > would come in useful in practice.
> 
> I did think of one use for this feature.  It's unique to USB,
> however...
> 
> In Windows, you're not supposed to unplug a hot-unpluggable device
> without first telling the OS -- there's a "Safely Remove Hardware"  
> applet.  When you tell the applet you want to remove a USB device, the
> system disables the device's port and then says it's okay to unplug the
> device.  Now Linux doesn't have any user API for disabling USB ports,
> but suspending a port has the same effect (the device can't distinguish
> a disable from a suspend).
> 
> It turns out that some devices (MP3 players, for instance) have
> incorporated this into their design.  They display a "Safe to unplug"  
> message when their port is disabled or suspended.  People like to see
> this message -- it makes them feel good about unplugging the device --
> and the only way to get it under Linux is by forcing the device to be
> suspended.  :-)

Well, I'd very much prefer to have a separate mechanism for that.

> > The problem with it is that the user space may not know if it is safe to keep
> > a device suspended and if it is not, the kernel will have to ignore the setting
> > anyway, so I'm not sure what's the point (except for debugging).
> 
> This falls into the category of "The user knows better".  If the user
> specifically tells the kernel to suspend a device (rather than just
> letting it autosuspend), and this causes a problem, then it's the
> user's own fault.
> 
> After all, who's really the master?  Us or the kernel?

Oh, that depends on who the user is.  If I'm the user, I'm the master, but in
case of a typical Windows user I'm afraid the kernel has to know better. ;-)

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 20:48                   ` Alan Stern
  (?)
@ 2009-06-10 21:15                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 21:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Wednesday 10 June 2009, Alan Stern wrote:
> On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > By the way, a legitimate reason for aborting an autosuspend is if the
> > > device's driver requires remote wakeup to be enabled during suspend but
> > > the user has disabled it.
> > 
> > Do you mean the user has disabled the remote wakeup?
> 
> Yes, by writing to the power/wakeup attribute.
> 
> 
> > > > > There should be a sysfs interface (like the one in USB) to allow
> > > > > userspace to prevent a device from being autosuspended -- and perhaps
> > > > > also to force it to be suspended.
> > > > 
> > > > To prevent a device from being suspended - yes.  To force it to stay suspended
> > > > - I'm not sure.
> > > 
> > > I'm not sure either.  Oliver Neukum requested it originally and it has
> > > been useful for debugging, but I haven't seen many places where it
> > > would come in useful in practice.
> 
> I did think of one use for this feature.  It's unique to USB,
> however...
> 
> In Windows, you're not supposed to unplug a hot-unpluggable device
> without first telling the OS -- there's a "Safely Remove Hardware"  
> applet.  When you tell the applet you want to remove a USB device, the
> system disables the device's port and then says it's okay to unplug the
> device.  Now Linux doesn't have any user API for disabling USB ports,
> but suspending a port has the same effect (the device can't distinguish
> a disable from a suspend).
> 
> It turns out that some devices (MP3 players, for instance) have
> incorporated this into their design.  They display a "Safe to unplug"  
> message when their port is disabled or suspended.  People like to see
> this message -- it makes them feel good about unplugging the device --
> and the only way to get it under Linux is by forcing the device to be
> suspended.  :-)

Well, I'd very much prefer to have a separate mechanism for that.

> > The problem with it is that the user space may not know if it is safe to keep
> > a device suspended and if it is not, the kernel will have to ignore the setting
> > anyway, so I'm not sure what's the point (except for debugging).
> 
> This falls into the category of "The user knows better".  If the user
> specifically tells the kernel to suspend a device (rather than just
> letting it autosuspend), and this causes a problem, then it's the
> user's own fault.
> 
> After all, who's really the master?  Us or the kernel?

Oh, that depends on who the user is.  If I'm the user, I'm the master, but in
case of a typical Windows user I'm afraid the kernel has to know better. ;-)

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:14                   ` [patch update] Re: [linux-pm] " Alan Stern
  2009-06-10 21:31                     ` [patch update] " Rafael J. Wysocki
@ 2009-06-10 21:31                     ` Rafael J. Wysocki
  2009-06-10 23:15                       ` [patch update] " Oliver Neukum
                                         ` (3 more replies)
  1 sibling, 4 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 21:31 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Wednesday 10 June 2009, Alan Stern wrote:
> On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > The idea is that if ->autosuspend() or ->autoresume() returns an error code,
> > > this is a situation the PM core cannot recover from by itself, so it shouldn't
> > > pretend it knows what's happened.  Instead, it marks the device as "I don't
> > > know if it is safe to touch this" and won't handle it until the device driver
> > > or bus type clears the status.
> 
> I'm still not sure this is a good idea.  When would the device driver 
> clear the status?  The autosuspend and autoresume methods run 
> asynchronously, so after they're done the driver doesn't get a chance 
> to do anything.
> 
> It might be best just to set the status to RPM_ACTIVE if a runtime 
> suspend fails and RPM_SUSPENDED if a runtime resume fails.
> 
> > Finally, I decided to follow the Oliver's suggestion that some error codes returned
> > by ->autosuspend() and ->autoresume() may be regarded as "go back to the
> > previous state" information.  I chose to use -EAGAIN and -EBUSY for this
> > purpose.
> 
> Maybe...
> 
> 
> >  struct dev_pm_info {
> >  	pm_message_t		power_state;
> > -	unsigned		can_wakeup:1;
> > -	unsigned		should_wakeup:1;
> > +	unsigned int		can_wakeup:1;
> > +	unsigned int		should_wakeup:1;
> >  	enum dpm_state		status;		/* Owned by the PM core */
> >  #ifdef	CONFIG_PM_SLEEP
> >  	struct list_head	entry;
> >  #endif
> > +#ifdef	CONFIG_PM_RUNTIME
> > +	struct delayed_work	suspend_work;
> > +	unsigned int		suspend_aborted:1;
> > +	struct work_struct	resume_work;
> > +	struct completion	work_done;
> > +	enum rpm_state		runtime_status;
> > +	spinlock_t		lock;
> > +#endif
> >  };
> 
> You know, it doesn't make any sense to have a suspend and a resume 
> both pending at the same time.
>
> So you could add only a delayed_work structure and use its embedded
> work_struct for resume requests.

I thought so too, but I was wrong. ;-)

If resume is requested while the suspend hasn't completed yet, we should
queue it (it's totally valid to request a suspending device to resume IMO), but
the delayed work is still being used by the workqueue code, so we can't modify
it.

> Also, you might borrow a trick from Dave Brownell.  Define the RPM_*
> values so that the individual bits have meanings.  Then instead of
> testing for multiple possible values of runtime_status, you could do a
> simple bit test.

Yes, I'm seriously considering using this approach.

> > +/**
> > + * pm_device_suspended - Check if given device has been suspended at run time.
> > + * @dev: Device to check.
> > + * @data: Ignored.
> > + *
> > + * Returns 0 if the device has been suspended or -EBUSY otherwise.
> > + */
> > +static int pm_device_suspended(struct device *dev, void *data)
> > +{
> > +	int ret;
> > +
> > +	spin_lock(&dev->power.lock);
> > +
> > +	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
> > +
> > +	spin_unlock(&dev->power.lock);
> 
> How does acquiring the lock help here?

OK, it doesn't.

> > +/**
> > + * pm_check_children - Check if all children of a device have been suspended.
> > + * @dev: Device to check.
> > + *
> > + * Returns 0 if all children of the device have been suspended or -EBUSY
> > + * otherwise.
> > + */
> 
> We might want to do a runtime suspend even if the device's children
> aren't already suspended.  For example, you could suspend a link while
> leaving the device on the other end of the link at full power --
> especially if powering down the device is slow but changing the link's
> power level is fast.

Well, this means that the dependencies between devices in the device tree are
pretty much useless for the run-time PM as far as the core is concerned.  In
which case, why did you mention them at all?

> > +/**
> > + * pm_autosuspend - Run autosuspend callback of given device object's bus type.
> > + * @work: Work structure used for scheduling the execution of this function.
> > + *
> > + * Use @work to get the device object the suspend has been scheduled for,
> > + * check if the suspend request hasn't been cancelled and run the
> > + * ->autosuspend() callback from the device's bus type driver.  Update the
> > + * run-time PM flags in the device object to reflect the current status of the
> > + * device.
> > + */
> > +static void pm_autosuspend(struct work_struct *work)
> 
> Can we call this something else?  "Autosuspend" implies that the 
> suspend originated from within the kernel.  How about "pm_suspend_work" 
> or "pm_runtime_suspend"?  Likewise for the resume routines.

OK

> I haven't checked the details of the code yet.  More later...

OK, thanks.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:14                   ` [patch update] Re: [linux-pm] " Alan Stern
@ 2009-06-10 21:31                     ` Rafael J. Wysocki
  2009-06-10 21:31                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 21:31 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Wednesday 10 June 2009, Alan Stern wrote:
> On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > The idea is that if ->autosuspend() or ->autoresume() returns an error code,
> > > this is a situation the PM core cannot recover from by itself, so it shouldn't
> > > pretend it knows what's happened.  Instead, it marks the device as "I don't
> > > know if it is safe to touch this" and won't handle it until the device driver
> > > or bus type clears the status.
> 
> I'm still not sure this is a good idea.  When would the device driver 
> clear the status?  The autosuspend and autoresume methods run 
> asynchronously, so after they're done the driver doesn't get a chance 
> to do anything.
> 
> It might be best just to set the status to RPM_ACTIVE if a runtime 
> suspend fails and RPM_SUSPENDED if a runtime resume fails.
> 
> > Finally, I decided to follow the Oliver's suggestion that some error codes returned
> > by ->autosuspend() and ->autoresume() may be regarded as "go back to the
> > previous state" information.  I chose to use -EAGAIN and -EBUSY for this
> > purpose.
> 
> Maybe...
> 
> 
> >  struct dev_pm_info {
> >  	pm_message_t		power_state;
> > -	unsigned		can_wakeup:1;
> > -	unsigned		should_wakeup:1;
> > +	unsigned int		can_wakeup:1;
> > +	unsigned int		should_wakeup:1;
> >  	enum dpm_state		status;		/* Owned by the PM core */
> >  #ifdef	CONFIG_PM_SLEEP
> >  	struct list_head	entry;
> >  #endif
> > +#ifdef	CONFIG_PM_RUNTIME
> > +	struct delayed_work	suspend_work;
> > +	unsigned int		suspend_aborted:1;
> > +	struct work_struct	resume_work;
> > +	struct completion	work_done;
> > +	enum rpm_state		runtime_status;
> > +	spinlock_t		lock;
> > +#endif
> >  };
> 
> You know, it doesn't make any sense to have a suspend and a resume 
> both pending at the same time.
>
> So you could add only a delayed_work structure and use its embedded
> work_struct for resume requests.

I thought so too, but I was wrong. ;-)

If resume is requested while the suspend hasn't completed yet, we should
queue it (it's totally valid to request a suspending device to resume IMO), but
the delayed work is still being used by the workqueue code, so we can't modify
it.

> Also, you might borrow a trick from Dave Brownell.  Define the RPM_*
> values so that the individual bits have meanings.  Then instead of
> testing for multiple possible values of runtime_status, you could do a
> simple bit test.

Yes, I'm seriously considering using this approach.

> > +/**
> > + * pm_device_suspended - Check if given device has been suspended at run time.
> > + * @dev: Device to check.
> > + * @data: Ignored.
> > + *
> > + * Returns 0 if the device has been suspended or -EBUSY otherwise.
> > + */
> > +static int pm_device_suspended(struct device *dev, void *data)
> > +{
> > +	int ret;
> > +
> > +	spin_lock(&dev->power.lock);
> > +
> > +	ret = dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
> > +
> > +	spin_unlock(&dev->power.lock);
> 
> How does acquiring the lock help here?

OK, it doesn't.

> > +/**
> > + * pm_check_children - Check if all children of a device have been suspended.
> > + * @dev: Device to check.
> > + *
> > + * Returns 0 if all children of the device have been suspended or -EBUSY
> > + * otherwise.
> > + */
> 
> We might want to do a runtime suspend even if the device's children
> aren't already suspended.  For example, you could suspend a link while
> leaving the device on the other end of the link at full power --
> especially if powering down the device is slow but changing the link's
> power level is fast.

Well, this means that the dependencies between devices in the device tree are
pretty much useless for the run-time PM as far as the core is concerned.  In
which case, why did you mention them at all?

> > +/**
> > + * pm_autosuspend - Run autosuspend callback of given device object's bus type.
> > + * @work: Work structure used for scheduling the execution of this function.
> > + *
> > + * Use @work to get the device object the suspend has been scheduled for,
> > + * check if the suspend request hasn't been cancelled and run the
> > + * ->autosuspend() callback from the device's bus type driver.  Update the
> > + * run-time PM flags in the device object to reflect the current status of the
> > + * device.
> > + */
> > +static void pm_autosuspend(struct work_struct *work)
> 
> Can we call this something else?  "Autosuspend" implies that the 
> suspend originated from within the kernel.  How about "pm_suspend_work" 
> or "pm_runtime_suspend"?  Likewise for the resume routines.

OK

> I haven't checked the details of the code yet.  More later...

OK, thanks.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 19:27                     ` Rafael J. Wysocki
@ 2009-06-10 21:38                         ` Oliver Neukum
  2009-06-10 21:38                       ` Oliver Neukum
  1 sibling, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 21:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Mittwoch, 10. Juni 2009 21:27:56 schrieb Rafael J. Wysocki:
> > What happens if the parent's parent is also suspended? It seems to me
> > that you must code this recursively.
>
> Hmm, I thought I did.
>
> [Looks]
>
> pm_request_resume(dev) will call pm_request_resume(dev->parent), if
> necessary, and that will call pm_request_resume(dev->parent->parent) and so
> on.  Each of them will queue a work item and the one for the topmost parent
> will be queued first.  So, the resume requests for all parents will be
> executed before the one for the device, due to the fact that the workqueue
> is singlethread.

Sneaky, I overlooked that.

> Well, there is a bug related to it, namely pm_autosuspend() may change the
> status to RPM_SUSPENDED after pm_request_resume() has changed it to
> RPM_WAKE, that needs fixing.

Ok, maybe this is related. You recurse if the parent isn't in RPM_ACTIVE.
But that is not enough. You must ensure that all the nodes higher up stay
in RPM_ACTIVE. It seems to me that you must go up until you find an
active node (or the root) and put it a blocked state.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-10 21:38                         ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 21:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Mittwoch, 10. Juni 2009 21:27:56 schrieb Rafael J. Wysocki:
> > What happens if the parent's parent is also suspended? It seems to me
> > that you must code this recursively.
>
> Hmm, I thought I did.
>
> [Looks]
>
> pm_request_resume(dev) will call pm_request_resume(dev->parent), if
> necessary, and that will call pm_request_resume(dev->parent->parent) and so
> on.  Each of them will queue a work item and the one for the topmost parent
> will be queued first.  So, the resume requests for all parents will be
> executed before the one for the device, due to the fact that the workqueue
> is singlethread.

Sneaky, I overlooked that.

> Well, there is a bug related to it, namely pm_autosuspend() may change the
> status to RPM_SUSPENDED after pm_request_resume() has changed it to
> RPM_WAKE, that needs fixing.

Ok, maybe this is related. You recurse if the parent isn't in RPM_ACTIVE.
But that is not enough. You must ensure that all the nodes higher up stay
in RPM_ACTIVE. It seems to me that you must go up until you find an
active node (or the root) and put it a blocked state.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 19:27                     ` Rafael J. Wysocki
  2009-06-10 21:38                         ` Oliver Neukum
@ 2009-06-10 21:38                       ` Oliver Neukum
  1 sibling, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 21:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Mittwoch, 10. Juni 2009 21:27:56 schrieb Rafael J. Wysocki:
> > What happens if the parent's parent is also suspended? It seems to me
> > that you must code this recursively.
>
> Hmm, I thought I did.
>
> [Looks]
>
> pm_request_resume(dev) will call pm_request_resume(dev->parent), if
> necessary, and that will call pm_request_resume(dev->parent->parent) and so
> on.  Each of them will queue a work item and the one for the topmost parent
> will be queued first.  So, the resume requests for all parents will be
> executed before the one for the device, due to the fact that the workqueue
> is singlethread.

Sneaky, I overlooked that.

> Well, there is a bug related to it, namely pm_autosuspend() may change the
> status to RPM_SUSPENDED after pm_request_resume() has changed it to
> RPM_WAKE, that needs fixing.

Ok, maybe this is related. You recurse if the parent isn't in RPM_ACTIVE.
But that is not enough. You must ensure that all the nodes higher up stay
in RPM_ACTIVE. It seems to me that you must go up until you find an
active node (or the root) and put it a blocked state.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:38                         ` Oliver Neukum
  (?)
  (?)
@ 2009-06-10 22:01                         ` Rafael J. Wysocki
  2009-06-10 23:07                             ` Oliver Neukum
  2009-06-10 23:07                           ` [patch update] " Oliver Neukum
  -1 siblings, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 22:01 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

On Wednesday 10 June 2009, Oliver Neukum wrote:
> Am Mittwoch, 10. Juni 2009 21:27:56 schrieb Rafael J. Wysocki:
> > > What happens if the parent's parent is also suspended? It seems to me
> > > that you must code this recursively.
> >
> > Hmm, I thought I did.
> >
> > [Looks]
> >
> > pm_request_resume(dev) will call pm_request_resume(dev->parent), if
> > necessary, and that will call pm_request_resume(dev->parent->parent) and so
> > on.  Each of them will queue a work item and the one for the topmost parent
> > will be queued first.  So, the resume requests for all parents will be
> > executed before the one for the device, due to the fact that the workqueue
> > is singlethread.
> 
> Sneaky, I overlooked that.
> 
> > Well, there is a bug related to it, namely pm_autosuspend() may change the
> > status to RPM_SUSPENDED after pm_request_resume() has changed it to
> > RPM_WAKE, that needs fixing.
> 
> Ok, maybe this is related. You recurse if the parent isn't in RPM_ACTIVE.
> But that is not enough. You must ensure that all the nodes higher up stay
> in RPM_ACTIVE. It seems to me that you must go up until you find an
> active node (or the root) and put it a blocked state.

If you're referring to pm_autoresume(), then this again is tricky.

We have queued up resume requests for the device's parent, its parent etc.,
the topmost one goes first.  The workqueue is singlethread, so pm_autoresume()
is going to be run for all parents before the device itself, so if that were the
only resume mechanism, it would be enough to check if the parent is RPM_ACTIVE.
*However*, there also is pm_resume_sync(), which can take the device directly
from RPM_SUSPENDED to RPM_RESUMING and that may be done in parallel with our
pm_autoresume().  That's why I put the wait_for_completion() in there.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:38                         ` Oliver Neukum
  (?)
@ 2009-06-10 22:01                         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-10 22:01 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Wednesday 10 June 2009, Oliver Neukum wrote:
> Am Mittwoch, 10. Juni 2009 21:27:56 schrieb Rafael J. Wysocki:
> > > What happens if the parent's parent is also suspended? It seems to me
> > > that you must code this recursively.
> >
> > Hmm, I thought I did.
> >
> > [Looks]
> >
> > pm_request_resume(dev) will call pm_request_resume(dev->parent), if
> > necessary, and that will call pm_request_resume(dev->parent->parent) and so
> > on.  Each of them will queue a work item and the one for the topmost parent
> > will be queued first.  So, the resume requests for all parents will be
> > executed before the one for the device, due to the fact that the workqueue
> > is singlethread.
> 
> Sneaky, I overlooked that.
> 
> > Well, there is a bug related to it, namely pm_autosuspend() may change the
> > status to RPM_SUSPENDED after pm_request_resume() has changed it to
> > RPM_WAKE, that needs fixing.
> 
> Ok, maybe this is related. You recurse if the parent isn't in RPM_ACTIVE.
> But that is not enough. You must ensure that all the nodes higher up stay
> in RPM_ACTIVE. It seems to me that you must go up until you find an
> active node (or the root) and put it a blocked state.

If you're referring to pm_autoresume(), then this again is tricky.

We have queued up resume requests for the device's parent, its parent etc.,
the topmost one goes first.  The workqueue is singlethread, so pm_autoresume()
is going to be run for all parents before the device itself, so if that were the
only resume mechanism, it would be enough to check if the parent is RPM_ACTIVE.
*However*, there also is pm_resume_sync(), which can take the device directly
from RPM_SUSPENDED to RPM_RESUMING and that may be done in parallel with our
pm_autoresume().  That's why I put the wait_for_completion() in there.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 22:01                         ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-10 23:07                             ` Oliver Neukum
  2009-06-10 23:07                           ` [patch update] " Oliver Neukum
  1 sibling, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 23:07 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> We have queued up resume requests for the device's parent, its parent etc.,
> the topmost one goes first.  The workqueue is singlethread, so
> pm_autoresume() is going to be run for all parents before the device
> itself, so if that were the only resume mechanism, it would be enough to
> check if the parent is RPM_ACTIVE.

            A (IDLE)
    /                                \
B (SUSPENDED)         C (SUSPENDED)

Suppose C is to be resumed. This means first in case of A the request
to suspend would be cancelled. Here you drop the locks:

+           && (dev->parent->power.runtime_status == RPM_IDLE
+             || dev->parent->power.runtime_status == RPM_SUSPENDING
+             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
+               spin_unlock_irqrestore(&dev->power.lock, flags);
+               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+
+               /* We have to resume the parent first. */
+               pm_request_resume(dev->parent);

But after pm_request_resume() returns there's no means to make sure
nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
you because you've scheduled nothing by that time. The suspension will
work because C is still in RPM_SUSPENDED.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-10 23:07                             ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 23:07 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> We have queued up resume requests for the device's parent, its parent etc.,
> the topmost one goes first.  The workqueue is singlethread, so
> pm_autoresume() is going to be run for all parents before the device
> itself, so if that were the only resume mechanism, it would be enough to
> check if the parent is RPM_ACTIVE.

            A (IDLE)
    /                                \
B (SUSPENDED)         C (SUSPENDED)

Suppose C is to be resumed. This means first in case of A the request
to suspend would be cancelled. Here you drop the locks:

+           && (dev->parent->power.runtime_status == RPM_IDLE
+             || dev->parent->power.runtime_status == RPM_SUSPENDING
+             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
+               spin_unlock_irqrestore(&dev->power.lock, flags);
+               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+
+               /* We have to resume the parent first. */
+               pm_request_resume(dev->parent);

But after pm_request_resume() returns there's no means to make sure
nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
you because you've scheduled nothing by that time. The suspension will
work because C is still in RPM_SUSPENDED.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 22:01                         ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-10 23:07                             ` Oliver Neukum
@ 2009-06-10 23:07                           ` Oliver Neukum
  1 sibling, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 23:07 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> We have queued up resume requests for the device's parent, its parent etc.,
> the topmost one goes first.  The workqueue is singlethread, so
> pm_autoresume() is going to be run for all parents before the device
> itself, so if that were the only resume mechanism, it would be enough to
> check if the parent is RPM_ACTIVE.

            A (IDLE)
    /                                \
B (SUSPENDED)         C (SUSPENDED)

Suppose C is to be resumed. This means first in case of A the request
to suspend would be cancelled. Here you drop the locks:

+           && (dev->parent->power.runtime_status == RPM_IDLE
+             || dev->parent->power.runtime_status == RPM_SUSPENDING
+             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
+               spin_unlock_irqrestore(&dev->power.lock, flags);
+               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+
+               /* We have to resume the parent first. */
+               pm_request_resume(dev->parent);

But after pm_request_resume() returns there's no means to make sure
nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
you because you've scheduled nothing by that time. The suspension will
work because C is still in RPM_SUSPENDED.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:31                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-10 23:15                         ` Oliver Neukum
  2009-06-10 23:15                         ` Oliver Neukum
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 23:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Mittwoch, 10. Juni 2009 23:31:13 schrieb Rafael J. Wysocki:
> > > +/**
> > > + * pm_check_children - Check if all children of a device have been
> > > suspended. + * @dev: Device to check.
> > > + *
> > > + * Returns 0 if all children of the device have been suspended or
> > > -EBUSY + * otherwise.
> > > + */
> >
> > We might want to do a runtime suspend even if the device's children
> > aren't already suspended.  For example, you could suspend a link while
> > leaving the device on the other end of the link at full power --
> > especially if powering down the device is slow but changing the link's
> > power level is fast.
>
> Well, this means that the dependencies between devices in the device tree
> are pretty much useless for the run-time PM as far as the core is
> concerned.  In which case, why did you mention them at all?

Some bussystems need this constraint others don't or only for some nodes.
We need a way to communicate this to the core.

	Regards
		Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-10 23:15                         ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 23:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Mittwoch, 10. Juni 2009 23:31:13 schrieb Rafael J. Wysocki:
> > > +/**
> > > + * pm_check_children - Check if all children of a device have been
> > > suspended. + * @dev: Device to check.
> > > + *
> > > + * Returns 0 if all children of the device have been suspended or
> > > -EBUSY + * otherwise.
> > > + */
> >
> > We might want to do a runtime suspend even if the device's children
> > aren't already suspended.  For example, you could suspend a link while
> > leaving the device on the other end of the link at full power --
> > especially if powering down the device is slow but changing the link's
> > power level is fast.
>
> Well, this means that the dependencies between devices in the device tree
> are pretty much useless for the run-time PM as far as the core is
> concerned.  In which case, why did you mention them at all?

Some bussystems need this constraint others don't or only for some nodes.
We need a way to communicate this to the core.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:31                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-10 23:15                       ` Oliver Neukum
  2009-06-10 23:15                         ` Oliver Neukum
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-10 23:15 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

Am Mittwoch, 10. Juni 2009 23:31:13 schrieb Rafael J. Wysocki:
> > > +/**
> > > + * pm_check_children - Check if all children of a device have been
> > > suspended. + * @dev: Device to check.
> > > + *
> > > + * Returns 0 if all children of the device have been suspended or
> > > -EBUSY + * otherwise.
> > > + */
> >
> > We might want to do a runtime suspend even if the device's children
> > aren't already suspended.  For example, you could suspend a link while
> > leaving the device on the other end of the link at full power --
> > especially if powering down the device is slow but changing the link's
> > power level is fast.
>
> Well, this means that the dependencies between devices in the device tree
> are pretty much useless for the run-time PM as far as the core is
> concerned.  In which case, why did you mention them at all?

Some bussystems need this constraint others don't or only for some nodes.
We need a way to communicate this to the core.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:31                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-10 23:15                       ` [patch update] " Oliver Neukum
  2009-06-10 23:15                         ` Oliver Neukum
@ 2009-06-10 23:42                       ` Alan Stern
  2009-06-11 14:17                           ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-10 23:42                       ` [patch update] " Alan Stern
  3 siblings, 1 reply; 199+ messages in thread
From: Alan Stern @ 2009-06-10 23:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:

> > You know, it doesn't make any sense to have a suspend and a resume 
> > both pending at the same time.
> >
> > So you could add only a delayed_work structure and use its embedded
> > work_struct for resume requests.
> 
> I thought so too, but I was wrong. ;-)
> 
> If resume is requested while the suspend hasn't completed yet, we should
> queue it (it's totally valid to request a suspending device to resume IMO), but
> the delayed work is still being used by the workqueue code, so we can't modify
> it.

Where is the delayed work still being used?  There's even a comment in 
run_workqueue() that says a work_struct can be freed by the function it 
calls.

> > We might want to do a runtime suspend even if the device's children
> > aren't already suspended.  For example, you could suspend a link while
> > leaving the device on the other end of the link at full power --
> > especially if powering down the device is slow but changing the link's
> > power level is fast.
> 
> Well, this means that the dependencies between devices in the device tree are
> pretty much useless for the run-time PM as far as the core is concerned.  In
> which case, why did you mention them at all?

The dependencies aren't totally useless.  It's still true that before
you resume a device, you have to autoresume its parent.  And it's still
true that when you suspend a device, the parent should be given a
chance to autosuspend.

I guess the real point is that the decision about whether all children
must be suspended should be made by the driver, not the PM core.

> > I haven't checked the details of the code yet.  More later...

One more thought...  The autosuspend and autoresume callbacks need to 
be mutually exclusive with probe and remove.  So somehow the driver 
core will need to block runtime PM calls.

It might also be nice to make sure that the driver core autoresumes a 
device before probing it and autosuspends a device (after some 
reasonable delay) after unbinding its driver.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 21:31                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
                                         ` (2 preceding siblings ...)
  2009-06-10 23:42                       ` Alan Stern
@ 2009-06-10 23:42                       ` Alan Stern
  3 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 23:42 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:

> > You know, it doesn't make any sense to have a suspend and a resume 
> > both pending at the same time.
> >
> > So you could add only a delayed_work structure and use its embedded
> > work_struct for resume requests.
> 
> I thought so too, but I was wrong. ;-)
> 
> If resume is requested while the suspend hasn't completed yet, we should
> queue it (it's totally valid to request a suspending device to resume IMO), but
> the delayed work is still being used by the workqueue code, so we can't modify
> it.

Where is the delayed work still being used?  There's even a comment in 
run_workqueue() that says a work_struct can be freed by the function it 
calls.

> > We might want to do a runtime suspend even if the device's children
> > aren't already suspended.  For example, you could suspend a link while
> > leaving the device on the other end of the link at full power --
> > especially if powering down the device is slow but changing the link's
> > power level is fast.
> 
> Well, this means that the dependencies between devices in the device tree are
> pretty much useless for the run-time PM as far as the core is concerned.  In
> which case, why did you mention them at all?

The dependencies aren't totally useless.  It's still true that before
you resume a device, you have to autoresume its parent.  And it's still
true that when you suspend a device, the parent should be given a
chance to autosuspend.

I guess the real point is that the decision about whether all children
must be suspended should be made by the driver, not the PM core.

> > I haven't checked the details of the code yet.  More later...

One more thought...  The autosuspend and autoresume callbacks need to 
be mutually exclusive with probe and remove.  So somehow the driver 
core will need to block runtime PM calls.

It might also be nice to make sure that the driver core autoresumes a 
device before probing it and autosuspends a device (after some 
reasonable delay) after unbinding its driver.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:07                             ` Oliver Neukum
@ 2009-06-10 23:42                               ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 23:42 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> > We have queued up resume requests for the device's parent, its parent etc.,
> > the topmost one goes first.  The workqueue is singlethread, so
> > pm_autoresume() is going to be run for all parents before the device
> > itself, so if that were the only resume mechanism, it would be enough to
> > check if the parent is RPM_ACTIVE.
> 
>             A (IDLE)
>     /                                \
> B (SUSPENDED)         C (SUSPENDED)
> 
> Suppose C is to be resumed. This means first in case of A the request
> to suspend would be cancelled. Here you drop the locks:
> 
> +           && (dev->parent->power.runtime_status == RPM_IDLE
> +             || dev->parent->power.runtime_status == RPM_SUSPENDING
> +             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
> +               spin_unlock_irqrestore(&dev->power.lock, flags);
> +               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
> +
> +               /* We have to resume the parent first. */
> +               pm_request_resume(dev->parent);
> 
> But after pm_request_resume() returns there's no means to make sure
> nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> you because you've scheduled nothing by that time. The suspension will
> work because C is still in RPM_SUSPENDED.

This is an example where usage counters come in handy.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-10 23:42                               ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 23:42 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> > We have queued up resume requests for the device's parent, its parent etc.,
> > the topmost one goes first.  The workqueue is singlethread, so
> > pm_autoresume() is going to be run for all parents before the device
> > itself, so if that were the only resume mechanism, it would be enough to
> > check if the parent is RPM_ACTIVE.
> 
>             A (IDLE)
>     /                                \
> B (SUSPENDED)         C (SUSPENDED)
> 
> Suppose C is to be resumed. This means first in case of A the request
> to suspend would be cancelled. Here you drop the locks:
> 
> +           && (dev->parent->power.runtime_status == RPM_IDLE
> +             || dev->parent->power.runtime_status == RPM_SUSPENDING
> +             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
> +               spin_unlock_irqrestore(&dev->power.lock, flags);
> +               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
> +
> +               /* We have to resume the parent first. */
> +               pm_request_resume(dev->parent);
> 
> But after pm_request_resume() returns there's no means to make sure
> nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> you because you've scheduled nothing by that time. The suspension will
> work because C is still in RPM_SUSPENDED.

This is an example where usage counters come in handy.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:07                             ` Oliver Neukum
  (?)
@ 2009-06-10 23:42                             ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-10 23:42 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> > We have queued up resume requests for the device's parent, its parent etc.,
> > the topmost one goes first.  The workqueue is singlethread, so
> > pm_autoresume() is going to be run for all parents before the device
> > itself, so if that were the only resume mechanism, it would be enough to
> > check if the parent is RPM_ACTIVE.
> 
>             A (IDLE)
>     /                                \
> B (SUSPENDED)         C (SUSPENDED)
> 
> Suppose C is to be resumed. This means first in case of A the request
> to suspend would be cancelled. Here you drop the locks:
> 
> +           && (dev->parent->power.runtime_status == RPM_IDLE
> +             || dev->parent->power.runtime_status == RPM_SUSPENDING
> +             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
> +               spin_unlock_irqrestore(&dev->power.lock, flags);
> +               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
> +
> +               /* We have to resume the parent first. */
> +               pm_request_resume(dev->parent);
> 
> But after pm_request_resume() returns there's no means to make sure
> nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> you because you've scheduled nothing by that time. The suspension will
> work because C is still in RPM_SUSPENDED.

This is an example where usage counters come in handy.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-11  5:18                     ` Magnus Damm
  2009-06-10 14:20                   ` [patch update] Re: [linux-pm] " Oliver Neukum
                                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-11  5:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

Hi Rafael,

On Wed, Jun 10, 2009 at 5:29 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
>> I tried to address your comments and the Oliver's comments too in the new
>> version of the patch below.  Please have a look and tell me what you think.
>
> Argh, I forgot about some important things.
>
> First, there are devices with no parent (actually, it would be much easier if
> they had a default dummy parent, but that's a separate issue).
>
> Second, the parent has to be taken into account in the asynchronous resume
> path too (which BTW is more complicated).
>
> Finally, I decided to follow the Oliver's suggestion that some error codes returned
> by ->autosuspend() and ->autoresume() may be regarded as "go back to the
> previous state" information.  I chose to use -EAGAIN and -EBUSY for this
> purpose.
>
> Updated patch follows, sorry for the confusion.

Thanks for your work on this. I think the patch looks very good in
general so I will not comment on the code itself. I do however have a
few high level questions:

Q1) Regarding pm_request_suspend(), would it be possible to get a
synchronous version or to make use of the completion somehow?

Q2) As pm_request_suspend() works today, the device is marked as
RPM_IDLE and the delayed work is queued up. There is no real decision
making going on except the time out. I'd like to let the bus code
decide when to autosuspend a buch of devices. Maybe the idle handling
should be broken out into a pm_request_idle() and pm_request_suspend()
can be modified to synchronously suspend devices marked with
pm_request_idle()?

Q3) Have you thought about how device drivers can inform the Runtime
PM that the device are idle and that they need the hardware to be
woken up? I'd like something similar to my "[PATCH 02/04] Driver Core:
Add idle and wakeup functions" patch but maybe not specific to
platform devices. We talked about adding some hooks to the bus_type
for this. Any ideas?

This is how I'm thinking of integrating your Runtime PM code with our
SuperH platform devices:

Device Idle Handling:

1) Device drivers call pm_device_idle() (See Q3) which invokes arch
specific platform bus code in the case of our SoC platform devices.
This bus code marks the device as idle using pm_request_idle(). (See
Q2). At this point light weight power management like clock stopping
may be performed as well.

2) The arch specific bus code knows how the platform devices are
grouped together (thanks to the data area in "[PATCH] Driver Core: Add
platform device arch data V3"), and when all devices in one power
domain are marked as idle the bus code calls the synchronous
pm_request_suspend() (no delay).

3) When all devices in the power domain are suspended the bus code can
turn off the power. The reason why I'd like to only autosuspend when
all devices are idle is simply that we don't get any power savings
from the per device autosuspend() callbacks, only from turning off
power to the entire per-domain. So bindly autosuspending and
autoresuming devices is just pure overhead unless we know we can do it
for all devices in the domain.

4) Over time when the code in 2) should be extended to handle latencies.

Device Wakeup Handling:

1) Device drivers call pm_device_wakeup() (See Q3). This invokes arch
specific bus code for our SoC platform devices. The bus code enables
clocks if needed and also calls pm_resume_sync().

2) After the call to pm_resume_sync() the pm_device_wakeup() call
returns and the device driver can access the hardware as usual.

That's it! Far from perfect but maybe a good start at least.

I have seen quite a few patches from you lately, nice work. To make
tracking of the Runtime PM patches easier, can you pleae consider
including the version of the patch in the subject when you post new
versions?

Thanks!

/ magnus
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re:  [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11  5:18                     ` Magnus Damm
  0 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-11  5:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

Hi Rafael,

On Wed, Jun 10, 2009 at 5:29 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
>> I tried to address your comments and the Oliver's comments too in the new
>> version of the patch below.  Please have a look and tell me what you think.
>
> Argh, I forgot about some important things.
>
> First, there are devices with no parent (actually, it would be much easier if
> they had a default dummy parent, but that's a separate issue).
>
> Second, the parent has to be taken into account in the asynchronous resume
> path too (which BTW is more complicated).
>
> Finally, I decided to follow the Oliver's suggestion that some error codes returned
> by ->autosuspend() and ->autoresume() may be regarded as "go back to the
> previous state" information.  I chose to use -EAGAIN and -EBUSY for this
> purpose.
>
> Updated patch follows, sorry for the confusion.

Thanks for your work on this. I think the patch looks very good in
general so I will not comment on the code itself. I do however have a
few high level questions:

Q1) Regarding pm_request_suspend(), would it be possible to get a
synchronous version or to make use of the completion somehow?

Q2) As pm_request_suspend() works today, the device is marked as
RPM_IDLE and the delayed work is queued up. There is no real decision
making going on except the time out. I'd like to let the bus code
decide when to autosuspend a buch of devices. Maybe the idle handling
should be broken out into a pm_request_idle() and pm_request_suspend()
can be modified to synchronously suspend devices marked with
pm_request_idle()?

Q3) Have you thought about how device drivers can inform the Runtime
PM that the device are idle and that they need the hardware to be
woken up? I'd like something similar to my "[PATCH 02/04] Driver Core:
Add idle and wakeup functions" patch but maybe not specific to
platform devices. We talked about adding some hooks to the bus_type
for this. Any ideas?

This is how I'm thinking of integrating your Runtime PM code with our
SuperH platform devices:

Device Idle Handling:

1) Device drivers call pm_device_idle() (See Q3) which invokes arch
specific platform bus code in the case of our SoC platform devices.
This bus code marks the device as idle using pm_request_idle(). (See
Q2). At this point light weight power management like clock stopping
may be performed as well.

2) The arch specific bus code knows how the platform devices are
grouped together (thanks to the data area in "[PATCH] Driver Core: Add
platform device arch data V3"), and when all devices in one power
domain are marked as idle the bus code calls the synchronous
pm_request_suspend() (no delay).

3) When all devices in the power domain are suspended the bus code can
turn off the power. The reason why I'd like to only autosuspend when
all devices are idle is simply that we don't get any power savings
from the per device autosuspend() callbacks, only from turning off
power to the entire per-domain. So bindly autosuspending and
autoresuming devices is just pure overhead unless we know we can do it
for all devices in the domain.

4) Over time when the code in 2) should be extended to handle latencies.

Device Wakeup Handling:

1) Device drivers call pm_device_wakeup() (See Q3). This invokes arch
specific bus code for our SoC platform devices. The bus code enables
clocks if needed and also calls pm_resume_sync().

2) After the call to pm_resume_sync() the pm_device_wakeup() call
returns and the device driver can access the hardware as usual.

That's it! Far from perfect but maybe a good start at least.

I have seen quite a few patches from you lately, nice work. To make
tracking of the Runtime PM patches easier, can you pleae consider
including the version of the patch in the subject when you post new
versions?

Thanks!

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
                                     ` (4 preceding siblings ...)
  2009-06-11  5:18                     ` Magnus Damm
@ 2009-06-11  5:18                   ` Magnus Damm
  5 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-11  5:18 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Hi Rafael,

On Wed, Jun 10, 2009 at 5:29 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
>> I tried to address your comments and the Oliver's comments too in the new
>> version of the patch below.  Please have a look and tell me what you think.
>
> Argh, I forgot about some important things.
>
> First, there are devices with no parent (actually, it would be much easier if
> they had a default dummy parent, but that's a separate issue).
>
> Second, the parent has to be taken into account in the asynchronous resume
> path too (which BTW is more complicated).
>
> Finally, I decided to follow the Oliver's suggestion that some error codes returned
> by ->autosuspend() and ->autoresume() may be regarded as "go back to the
> previous state" information.  I chose to use -EAGAIN and -EBUSY for this
> purpose.
>
> Updated patch follows, sorry for the confusion.

Thanks for your work on this. I think the patch looks very good in
general so I will not comment on the code itself. I do however have a
few high level questions:

Q1) Regarding pm_request_suspend(), would it be possible to get a
synchronous version or to make use of the completion somehow?

Q2) As pm_request_suspend() works today, the device is marked as
RPM_IDLE and the delayed work is queued up. There is no real decision
making going on except the time out. I'd like to let the bus code
decide when to autosuspend a buch of devices. Maybe the idle handling
should be broken out into a pm_request_idle() and pm_request_suspend()
can be modified to synchronously suspend devices marked with
pm_request_idle()?

Q3) Have you thought about how device drivers can inform the Runtime
PM that the device are idle and that they need the hardware to be
woken up? I'd like something similar to my "[PATCH 02/04] Driver Core:
Add idle and wakeup functions" patch but maybe not specific to
platform devices. We talked about adding some hooks to the bus_type
for this. Any ideas?

This is how I'm thinking of integrating your Runtime PM code with our
SuperH platform devices:

Device Idle Handling:

1) Device drivers call pm_device_idle() (See Q3) which invokes arch
specific platform bus code in the case of our SoC platform devices.
This bus code marks the device as idle using pm_request_idle(). (See
Q2). At this point light weight power management like clock stopping
may be performed as well.

2) The arch specific bus code knows how the platform devices are
grouped together (thanks to the data area in "[PATCH] Driver Core: Add
platform device arch data V3"), and when all devices in one power
domain are marked as idle the bus code calls the synchronous
pm_request_suspend() (no delay).

3) When all devices in the power domain are suspended the bus code can
turn off the power. The reason why I'd like to only autosuspend when
all devices are idle is simply that we don't get any power savings
from the per device autosuspend() callbacks, only from turning off
power to the entire per-domain. So bindly autosuspending and
autoresuming devices is just pure overhead unless we know we can do it
for all devices in the domain.

4) Over time when the code in 2) should be extended to handle latencies.

Device Wakeup Handling:

1) Device drivers call pm_device_wakeup() (See Q3). This invokes arch
specific bus code for our SoC platform devices. The bus code enables
clocks if needed and also calls pm_resume_sync().

2) After the call to pm_resume_sync() the pm_device_wakeup() call
returns and the device driver can access the hardware as usual.

That's it! Far from perfect but maybe a good start at least.

I have seen quite a few patches from you lately, nice work. To make
tracking of the Runtime PM patches easier, can you pleae consider
including the version of the patch in the subject when you post new
versions?

Thanks!

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:15                         ` Oliver Neukum
@ 2009-06-11  5:27                           ` Magnus Damm
  -1 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-11  5:27 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Alan Stern, Linux-pm mailing list,
	ACPI Devel Maling List, LKML

On Thu, Jun 11, 2009 at 8:15 AM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Mittwoch, 10. Juni 2009 23:31:13 schrieb Rafael J. Wysocki:
>> > > +/**
>> > > + * pm_check_children - Check if all children of a device have been
>> > > suspended. + * @dev: Device to check.
>> > > + *
>> > > + * Returns 0 if all children of the device have been suspended or
>> > > -EBUSY + * otherwise.
>> > > + */
>> >
>> > We might want to do a runtime suspend even if the device's children
>> > aren't already suspended.  For example, you could suspend a link while
>> > leaving the device on the other end of the link at full power --
>> > especially if powering down the device is slow but changing the link's
>> > power level is fast.
>>
>> Well, this means that the dependencies between devices in the device tree
>> are pretty much useless for the run-time PM as far as the core is
>> concerned.  In which case, why did you mention them at all?
>
> Some bussystems need this constraint others don't or only for some nodes.
> We need a way to communicate this to the core.

I agree that this depends on the bus.

Our SuperH on-chip SoC platform devices are arranged in a flat fashion
so no real problem there, but if there whould be dependencies then I
think we need to manage it recursively somehow.

Compare that to PM of our I2C driver and the I2C bus hanging off from
that. In that case I'd like to be able to autosuspend the I2C master
driver regardless of the I2C slave devices and their PM state.

/ magnus
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re:  [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11  5:27                           ` Magnus Damm
  0 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-11  5:27 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Alan Stern, Linux-pm mailing list,
	ACPI Devel Maling List, LKML

On Thu, Jun 11, 2009 at 8:15 AM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Mittwoch, 10. Juni 2009 23:31:13 schrieb Rafael J. Wysocki:
>> > > +/**
>> > > + * pm_check_children - Check if all children of a device have been
>> > > suspended. + * @dev: Device to check.
>> > > + *
>> > > + * Returns 0 if all children of the device have been suspended or
>> > > -EBUSY + * otherwise.
>> > > + */
>> >
>> > We might want to do a runtime suspend even if the device's children
>> > aren't already suspended.  For example, you could suspend a link while
>> > leaving the device on the other end of the link at full power --
>> > especially if powering down the device is slow but changing the link's
>> > power level is fast.
>>
>> Well, this means that the dependencies between devices in the device tree
>> are pretty much useless for the run-time PM as far as the core is
>> concerned.  In which case, why did you mention them at all?
>
> Some bussystems need this constraint others don't or only for some nodes.
> We need a way to communicate this to the core.

I agree that this depends on the bus.

Our SuperH on-chip SoC platform devices are arranged in a flat fashion
so no real problem there, but if there whould be dependencies then I
think we need to manage it recursively somehow.

Compare that to PM of our I2C driver and the I2C bus hanging off from
that. In that case I'd like to be able to autosuspend the I2C master
driver regardless of the I2C slave devices and their PM state.

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:15                         ` Oliver Neukum
  (?)
@ 2009-06-11  5:27                         ` Magnus Damm
  -1 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-11  5:27 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thu, Jun 11, 2009 at 8:15 AM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Mittwoch, 10. Juni 2009 23:31:13 schrieb Rafael J. Wysocki:
>> > > +/**
>> > > + * pm_check_children - Check if all children of a device have been
>> > > suspended. + * @dev: Device to check.
>> > > + *
>> > > + * Returns 0 if all children of the device have been suspended or
>> > > -EBUSY + * otherwise.
>> > > + */
>> >
>> > We might want to do a runtime suspend even if the device's children
>> > aren't already suspended.  For example, you could suspend a link while
>> > leaving the device on the other end of the link at full power --
>> > especially if powering down the device is slow but changing the link's
>> > power level is fast.
>>
>> Well, this means that the dependencies between devices in the device tree
>> are pretty much useless for the run-time PM as far as the core is
>> concerned.  In which case, why did you mention them at all?
>
> Some bussystems need this constraint others don't or only for some nodes.
> We need a way to communicate this to the core.

I agree that this depends on the bus.

Our SuperH on-chip SoC platform devices are arranged in a flat fashion
so no real problem there, but if there whould be dependencies then I
think we need to manage it recursively somehow.

Compare that to PM of our I2C driver and the I2C bus hanging off from
that. In that case I'd like to be able to autosuspend the I2C master
driver regardless of the I2C slave devices and their PM state.

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11  5:18                     ` Magnus Damm
  (?)
@ 2009-06-11  9:08                     ` Oliver Neukum
  2009-06-12  3:13                         ` [patch update] Re: [linux-pm] " Magnus Damm
  -1 siblings, 1 reply; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11  9:08 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Rafael J. Wysocki, Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
> 3) When all devices in the power domain are suspended the bus code can
> turn off the power. The reason why I'd like to only autosuspend when

So you are saying that you have power dependencies independent
of the device tree?

> all devices are idle is simply that we don't get any power savings
> from the per device autosuspend() callbacks, only from turning off
> power to the entire per-domain. So bindly autosuspending and
> autoresuming devices is just pure overhead unless we know we can do it
> for all devices in the domain.

Why can't you do this within the framework? You simply suspend when
all a domain's devices have been autosuspended.
I suppose we could have a helper.

int pm_autosuspend_in_domain(struct device *dev)
{
	int err;

	mutex_lock(dev->power_domain);
	if (! --dev->power_domain.active_devices)
		err = dev->power_domain->power_down(dev->power_domain);
	else
		err = 0;
	mutex_unlock(dev->power_domain);

	return err;
}

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11  5:18                     ` Magnus Damm
  (?)
  (?)
@ 2009-06-11  9:08                     ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11  9:08 UTC (permalink / raw)
  To: Magnus Damm; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
> 3) When all devices in the power domain are suspended the bus code can
> turn off the power. The reason why I'd like to only autosuspend when

So you are saying that you have power dependencies independent
of the device tree?

> all devices are idle is simply that we don't get any power savings
> from the per device autosuspend() callbacks, only from turning off
> power to the entire per-domain. So bindly autosuspending and
> autoresuming devices is just pure overhead unless we know we can do it
> for all devices in the domain.

Why can't you do this within the framework? You simply suspend when
all a domain's devices have been autosuspended.
I suppose we could have a helper.

int pm_autosuspend_in_domain(struct device *dev)
{
	int err;

	mutex_lock(dev->power_domain);
	if (! --dev->power_domain.active_devices)
		err = dev->power_domain->power_down(dev->power_domain);
	else
		err = 0;
	mutex_unlock(dev->power_domain);

	return err;
}

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:07                             ` Oliver Neukum
                                               ` (3 preceding siblings ...)
  (?)
@ 2009-06-11 13:46                             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 13:46 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

On Thursday 11 June 2009, Oliver Neukum wrote:
> Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> > We have queued up resume requests for the device's parent, its parent etc.,
> > the topmost one goes first.  The workqueue is singlethread, so
> > pm_autoresume() is going to be run for all parents before the device
> > itself, so if that were the only resume mechanism, it would be enough to
> > check if the parent is RPM_ACTIVE.
> 
>             A (IDLE)
>     /                                \
> B (SUSPENDED)         C (SUSPENDED)
> 
> Suppose C is to be resumed. This means first in case of A the request
> to suspend would be cancelled. Here you drop the locks:
> 
> +           && (dev->parent->power.runtime_status == RPM_IDLE
> +             || dev->parent->power.runtime_status == RPM_SUSPENDING
> +             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
> +               spin_unlock_irqrestore(&dev->power.lock, flags);
> +               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
> +
> +               /* We have to resume the parent first. */
> +               pm_request_resume(dev->parent);
> 
> But after pm_request_resume() returns there's no means to make sure
> nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> you because you've scheduled nothing by that time. The suspension will
> work because C is still in RPM_SUSPENDED.

That exactly is the bug I told you about in one of the previous messages. :-)

The solution I used in the current version of the patch (appended) is to have
separate bits for RPM_WAKE and RPM_SUSPENDED (and for the other status
constants), so that they both can be set at a time.

Well, there probably still are some bugs lurking in it ...

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    2 
 drivers/base/power/runtime.c |  415 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   82 ++++++++
 include/linux/pm_runtime.h   |   50 +++++
 kernel/power/Kconfig         |   14 +
 kernel/power/main.c          |   17 +
 7 files changed, 578 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +194,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +331,74 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+
+#define RPM_IN_SUSPEND	(RPM_SUSPENDING | RPM_SUSPENDED)
+#define RPM_INACTIVE	(RPM_IDLE | RPM_IN_SUSPEND)
+#define RPM_ERROR	(-1)
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	struct completion	work_done;
+	unsigned int		suspend_aborted:1;
+	unsigned int		runtime_status:5;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,415 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_device_suspended - Check if given device has been suspended at run time.
+ * @dev: Device to check.
+ * @data: Ignored.
+ *
+ * Returns 0 if the device has been suspended and it hasn't been requested to
+ * resume or -EBUSY otherwise.
+ */
+static int pm_device_suspended(struct device *dev, void *data)
+{
+	return dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
+}
+
+/**
+ * pm_check_children - Check if all children of a device have been suspended.
+ * @dev: Device to check.
+ *
+ * Returns 0 if all children of the device have been suspended or -EBUSY
+ * otherwise.
+ */
+static int pm_check_children(struct device *dev)
+{
+	return device_for_each_child(dev, NULL, pm_device_suspended);
+}
+
+/**
+ * pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->runtime_suspend() callback provided by the device's bus type driver.
+ * Update the run-time PM flags in the device object to reflect the current
+ * status of the device.
+ */
+static void pm_runtime_suspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	} else if (dev->power.runtime_status != RPM_IDLE) {
+		goto out;
+	} else if (pm_check_children(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock(&dev->power.lock);
+
+	/*
+	 * Resume request might have been queued in the meantime, in which case
+	 * the RPM_WAKE bit is also set in runtime_status.
+	 */
+	dev->power.runtime_status &= ~RPM_SUSPENDING;
+	switch (error) {
+	case 0:
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ */
+void pm_request_suspend(struct device *dev, unsigned long delay)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.
+ */
+static void pm_runtime_resume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	/*
+	 * Since the PM workqueue is singlethread, this function cannot run
+	 * in parallel with pm_runtime_suspend().  For this reason it is not
+	 * necessary to check if RPM_SUSPENDING is set in runtime_status of the
+	 * device.
+	 */
+ repeat:
+	if (!(dev->power.runtime_status & RPM_WAKE)) {
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	} else if (dev->parent
+	    && dev->parent->power.runtime_status != RPM_ACTIVE) {
+		/*
+		 * Although this function cannot run in parallel with another
+		 * instance of itself, it may be running in parallel with the
+		 * synchronous resume of another device.  In particular, that
+		 * may be the device's parent.
+		 */
+		if (dev->parent->power.runtime_status & RPM_RESUMING) {
+			spin_unlock(&dev->power.lock);
+			spin_unlock(&dev->parent->power.lock);
+
+			wait_for_completion(&dev->parent->power.work_done);
+
+			spin_lock(&dev->parent->power.lock);
+			spin_lock(&dev->power.lock);
+		}
+		if (dev->parent->power.runtime_status != RPM_ACTIVE) {
+			spin_unlock(&dev->parent->power.lock);
+			goto out;
+		}
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+void pm_request_resume(struct device *dev)
+{
+	unsigned long parent_flags = 0, flags;
+
+ repeat:
+	if (dev->parent)
+		spin_lock_irqsave(&dev->parent->power.lock, parent_flags);
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* Autosuspend request is pending, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	} else if (!(dev->power.runtime_status & RPM_IN_SUSPEND)) {
+		goto out;
+	} else if (dev->parent
+	    && (dev->parent->power.runtime_status & RPM_INACTIVE)) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+
+		/* We have to resume the parent first. */
+		pm_request_resume(dev->parent);
+
+		goto repeat;
+	}
+
+	/*
+	 * The device may be suspending at the moment and we can't clear the
+	 * RPM_SUSPENDING bit in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+	if (dev->parent)
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_ACTIVE) {
+		goto out;
+	} if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->runtime_suspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		spin_unlock(&dev->power.lock);
+
+		/*
+		 * The ->runtime_suspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		wait_for_completion(&dev->power.work_done);
+	} else if (dev->power.runtime_status == RPM_SUSPENDED && dev->parent) {
+		spin_unlock(&dev->power.lock);
+
+		/* The device's parent may also be suspended.  Resume it. */
+		error = pm_resume_sync(dev->parent);
+		if (error)
+			return error;
+	} else {
+		spin_unlock(&dev->power.lock);
+	}
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status & RPM_WAKE) {
+		/* There's a pending resume request that can be cancelled. */
+		work_clear_pending(&dev->power.resume_work);
+	} else if (dev->power.runtime_status == RPM_RESUMING) {
+		spin_unlock(&dev->power.lock);
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete.
+		 */
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_status == RPM_ACTIVE ? 0 : -EAGAIN;
+	} else if (!(dev->power.runtime_status & RPM_SUSPENDED)) {
+		error = -EINVAL;
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+
+	return error;
+}
+
+/**
+ * pm_cancel_autosuspend - Cancel a pending autosuspend request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autosuspend(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	cancel_delayed_work(&dev->power.suspend_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_cancel_autoresume - Cancel a pending autoresume request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autoresume(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	work_clear_pending(&dev->power.resume_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend);
+	INIT_WORK(&dev->power.resume_work, pm_runtime_resume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,50 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_request_suspend(struct device *dev, unsigned long delay);
+extern void pm_request_resume(struct device *dev);
+extern int pm_resume_sync(struct device *dev);
+extern void pm_cancel_autosuspend(struct device *dev);
+extern void pm_cancel_autoresume(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_request_suspend(struct device *dev, unsigned long delay);
+static inline void pm_request_resume(struct device *dev) {}
+static inline int pm_resume_sync(struct device *dev) { return -ENOSYS; }
+static inline void pm_cancel_autosuspend(struct device *dev) {}
+static inline void pm_cancel_autoresume(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:07                             ` Oliver Neukum
                                               ` (2 preceding siblings ...)
  (?)
@ 2009-06-11 13:46                             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 13:46 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Thursday 11 June 2009, Oliver Neukum wrote:
> Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> > We have queued up resume requests for the device's parent, its parent etc.,
> > the topmost one goes first.  The workqueue is singlethread, so
> > pm_autoresume() is going to be run for all parents before the device
> > itself, so if that were the only resume mechanism, it would be enough to
> > check if the parent is RPM_ACTIVE.
> 
>             A (IDLE)
>     /                                \
> B (SUSPENDED)         C (SUSPENDED)
> 
> Suppose C is to be resumed. This means first in case of A the request
> to suspend would be cancelled. Here you drop the locks:
> 
> +           && (dev->parent->power.runtime_status == RPM_IDLE
> +             || dev->parent->power.runtime_status == RPM_SUSPENDING
> +             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
> +               spin_unlock_irqrestore(&dev->power.lock, flags);
> +               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
> +
> +               /* We have to resume the parent first. */
> +               pm_request_resume(dev->parent);
> 
> But after pm_request_resume() returns there's no means to make sure
> nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> you because you've scheduled nothing by that time. The suspension will
> work because C is still in RPM_SUSPENDED.

That exactly is the bug I told you about in one of the previous messages. :-)

The solution I used in the current version of the patch (appended) is to have
separate bits for RPM_WAKE and RPM_SUSPENDED (and for the other status
constants), so that they both can be set at a time.

Well, there probably still are some bugs lurking in it ...

Best,
Rafael

---
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |    2 
 drivers/base/power/runtime.c |  415 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   82 ++++++++
 include/linux/pm_runtime.h   |   50 +++++
 kernel/power/Kconfig         |   14 +
 kernel/power/main.c          |   17 +
 7 files changed, 578 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,15 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are two callbacks related to run-time power management of devices:
+ *
+ * @autosuspend: Save the device registers and put it into an energy-saving (low
+ *	power) state at run-time, enable wake-up events as appropriate.
+ *
+ * @autoresume: Put the device into the full power state and restore its
+ *	registers (if applicable) at run time, in response to a wake-up event
+ *	generated by hardware or at a request of software.
  */
 
 struct dev_pm_ops {
@@ -182,6 +194,10 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+#ifdef CONFIG_PM_RUNTIME
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+#endif
 };
 
 /**
@@ -315,14 +331,74 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+
+#define RPM_IN_SUSPEND	(RPM_SUSPENDING | RPM_SUSPENDED)
+#define RPM_INACTIVE	(RPM_IDLE | RPM_IN_SUSPEND)
+#define RPM_ERROR	(-1)
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	struct completion	work_done;
+	unsigned int		suspend_aborted:1;
+	unsigned int		runtime_status:5;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,415 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+
+/**
+ * pm_runtime_reset - Clear all of the device run-time PM flags.
+ * @dev: Device object to clear the flags for.
+ */
+static void pm_runtime_reset(struct device *dev)
+{
+	dev->power.suspend_aborted = false;
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_device_suspended - Check if given device has been suspended at run time.
+ * @dev: Device to check.
+ * @data: Ignored.
+ *
+ * Returns 0 if the device has been suspended and it hasn't been requested to
+ * resume or -EBUSY otherwise.
+ */
+static int pm_device_suspended(struct device *dev, void *data)
+{
+	return dev->power.runtime_status == RPM_SUSPENDED ? 0 : -EBUSY;
+}
+
+/**
+ * pm_check_children - Check if all children of a device have been suspended.
+ * @dev: Device to check.
+ *
+ * Returns 0 if all children of the device have been suspended or -EBUSY
+ * otherwise.
+ */
+static int pm_check_children(struct device *dev)
+{
+	return device_for_each_child(dev, NULL, pm_device_suspended);
+}
+
+/**
+ * pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for,
+ * check if the suspend request hasn't been cancelled and run the
+ * ->runtime_suspend() callback provided by the device's bus type driver.
+ * Update the run-time PM flags in the device object to reflect the current
+ * status of the device.
+ */
+static void pm_runtime_suspend(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct device *dev = suspend_work_to_device(dw);
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.suspend_aborted) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		goto out;
+	} else if (dev->power.runtime_status != RPM_IDLE) {
+		goto out;
+	} else if (pm_check_children(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+
+	if (dev && dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock(&dev->power.lock);
+
+	/*
+	 * Resume request might have been queued in the meantime, in which case
+	 * the RPM_WAKE bit is also set in runtime_status.
+	 */
+	dev->power.runtime_status &= ~RPM_SUSPENDING;
+	switch (error) {
+	case 0:
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before attempting to suspend the device.
+ */
+void pm_request_suspend(struct device *dev, unsigned long delay)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_cancel_suspend - Cancel a pending suspend request for given device.
+ * @dev: Device to cancel the suspend request for.
+ *
+ * Should be called under pm_lock_device() and only if we are sure that the
+ * ->autosuspend() callback hasn't started to yet.
+ */
+static void pm_cancel_suspend(struct device *dev)
+{
+	dev->power.suspend_aborted = true;
+	cancel_delayed_work(&dev->power.suspend_work);
+	dev->power.runtime_status = RPM_ACTIVE;
+}
+
+/**
+ * pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for,
+ * check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.
+ */
+static void pm_runtime_resume(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	int error = 0;
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	/*
+	 * Since the PM workqueue is singlethread, this function cannot run
+	 * in parallel with pm_runtime_suspend().  For this reason it is not
+	 * necessary to check if RPM_SUSPENDING is set in runtime_status of the
+	 * device.
+	 */
+ repeat:
+	if (!(dev->power.runtime_status & RPM_WAKE)) {
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	} else if (dev->parent
+	    && dev->parent->power.runtime_status != RPM_ACTIVE) {
+		/*
+		 * Although this function cannot run in parallel with another
+		 * instance of itself, it may be running in parallel with the
+		 * synchronous resume of another device.  In particular, that
+		 * may be the device's parent.
+		 */
+		if (dev->parent->power.runtime_status & RPM_RESUMING) {
+			spin_unlock(&dev->power.lock);
+			spin_unlock(&dev->parent->power.lock);
+
+			wait_for_completion(&dev->parent->power.work_done);
+
+			spin_lock(&dev->parent->power.lock);
+			spin_lock(&dev->power.lock);
+		}
+		if (dev->parent->power.runtime_status != RPM_ACTIVE) {
+			spin_unlock(&dev->parent->power.lock);
+			goto out;
+		}
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+void pm_request_resume(struct device *dev)
+{
+	unsigned long parent_flags = 0, flags;
+
+ repeat:
+	if (dev->parent)
+		spin_lock_irqsave(&dev->parent->power.lock, parent_flags);
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		/* Autosuspend request is pending, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	} else if (!(dev->power.runtime_status & RPM_IN_SUSPEND)) {
+		goto out;
+	} else if (dev->parent
+	    && (dev->parent->power.runtime_status & RPM_INACTIVE)) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+
+		/* We have to resume the parent first. */
+		pm_request_resume(dev->parent);
+
+		goto repeat;
+	}
+
+	/*
+	 * The device may be suspending at the moment and we can't clear the
+	 * RPM_SUSPENDING bit in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+	if (dev->parent)
+		spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
+}
+
+/**
+ * pm_resume_sync - Resume given device waiting for the operation to complete.
+ * @dev: Device to resume.
+ *
+ * Resume the device synchronously, waiting for the operation to complete.  If
+ * autosuspend is in progress while this function is being run, wait for it to
+ * finish before resuming the device.  If the autosuspend is scheduled, but it
+ * hasn't started yet, cancel it and we're done.
+ */
+int pm_resume_sync(struct device *dev)
+{
+	int error = 0;
+
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status == RPM_ACTIVE) {
+		goto out;
+	} if (dev->power.runtime_status == RPM_IDLE) {
+		/* ->runtime_suspend() hasn't started yet, no need to resume. */
+		pm_cancel_suspend(dev);
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		spin_unlock(&dev->power.lock);
+
+		/*
+		 * The ->runtime_suspend() callback is being executed right now,
+		 * wait for it to complete.
+		 */
+		wait_for_completion(&dev->power.work_done);
+	} else if (dev->power.runtime_status == RPM_SUSPENDED && dev->parent) {
+		spin_unlock(&dev->power.lock);
+
+		/* The device's parent may also be suspended.  Resume it. */
+		error = pm_resume_sync(dev->parent);
+		if (error)
+			return error;
+	} else {
+		spin_unlock(&dev->power.lock);
+	}
+
+	if (dev->parent)
+		spin_lock(&dev->parent->power.lock);
+	spin_lock(&dev->power.lock);
+
+	if (dev->power.runtime_status & RPM_WAKE) {
+		/* There's a pending resume request that can be cancelled. */
+		work_clear_pending(&dev->power.resume_work);
+	} else if (dev->power.runtime_status == RPM_RESUMING) {
+		spin_unlock(&dev->power.lock);
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete.
+		 */
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_status == RPM_ACTIVE ? 0 : -EAGAIN;
+	} else if (!(dev->power.runtime_status & RPM_SUSPENDED)) {
+		error = -EINVAL;
+		if (dev->parent)
+			spin_unlock(&dev->parent->power.lock);
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock(&dev->power.lock);
+	if (dev->parent)
+		spin_unlock(&dev->parent->power.lock);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock(&dev->power.lock);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_SUSPENDED;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	complete(&dev->power.work_done);
+
+ out:
+	spin_unlock(&dev->power.lock);
+
+	return error;
+}
+
+/**
+ * pm_cancel_autosuspend - Cancel a pending autosuspend request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autosuspend(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	cancel_delayed_work(&dev->power.suspend_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_cancel_autoresume - Cancel a pending autoresume request for given device
+ * @dev: Device to handle.
+ *
+ * This routine is only supposed to be called when the run-time PM workqueue is
+ * frozen (i.e. during system-wide suspend or hibernation) when it is guaranteed
+ * that no work items are being executed.
+ */
+void pm_cancel_autoresume(struct device *dev)
+{
+	spin_lock(&dev->power.lock);
+
+	work_clear_pending(&dev->power.resume_work);
+	pm_runtime_reset(dev);
+
+	spin_unlock(&dev->power.lock);
+}
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to handle.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	pm_runtime_reset(dev);
+	spin_lock_init(&dev->power.lock);
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend);
+	INIT_WORK(&dev->power.resume_work, pm_runtime_resume);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,50 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_request_suspend(struct device *dev, unsigned long delay);
+extern void pm_request_resume(struct device *dev);
+extern int pm_resume_sync(struct device *dev);
+extern void pm_cancel_autosuspend(struct device *dev);
+extern void pm_cancel_autoresume(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct delayed_work *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_request_suspend(struct device *dev, unsigned long delay);
+static inline void pm_request_resume(struct device *dev) {}
+static inline int pm_resume_sync(struct device *dev) { return -ENOSYS; }
+static inline void pm_cancel_autosuspend(struct device *dev) {}
+static inline void pm_cancel_autoresume(struct device *dev) {}
+#endif /* !CONFIG_PM_RUNTIME */
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -88,6 +89,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_init(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:42                               ` Alan Stern
  (?)
@ 2009-06-11 13:48                               ` Rafael J. Wysocki
  2009-06-11 13:57                                 ` [patch update] " Oliver Neukum
  2009-06-11 13:57                                 ` [patch update] Re: [linux-pm] " Oliver Neukum
  -1 siblings, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 13:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Thu, 11 Jun 2009, Oliver Neukum wrote:
> 
> > Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> > > We have queued up resume requests for the device's parent, its parent etc.,
> > > the topmost one goes first.  The workqueue is singlethread, so
> > > pm_autoresume() is going to be run for all parents before the device
> > > itself, so if that were the only resume mechanism, it would be enough to
> > > check if the parent is RPM_ACTIVE.
> > 
> >             A (IDLE)
> >     /                                \
> > B (SUSPENDED)         C (SUSPENDED)
> > 
> > Suppose C is to be resumed. This means first in case of A the request
> > to suspend would be cancelled. Here you drop the locks:
> > 
> > +           && (dev->parent->power.runtime_status == RPM_IDLE
> > +             || dev->parent->power.runtime_status == RPM_SUSPENDING
> > +             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
> > +               spin_unlock_irqrestore(&dev->power.lock, flags);
> > +               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
> > +
> > +               /* We have to resume the parent first. */
> > +               pm_request_resume(dev->parent);
> > 
> > But after pm_request_resume() returns there's no means to make sure
> > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > you because you've scheduled nothing by that time. The suspension will
> > work because C is still in RPM_SUSPENDED.
> 
> This is an example where usage counters come in handy.

Do you mean we can count suspend/resume requests for a device?

Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:42                               ` Alan Stern
  (?)
  (?)
@ 2009-06-11 13:48                               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 13:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Thu, 11 Jun 2009, Oliver Neukum wrote:
> 
> > Am Donnerstag, 11. Juni 2009 00:01:20 schrieb Rafael J. Wysocki:
> > > We have queued up resume requests for the device's parent, its parent etc.,
> > > the topmost one goes first.  The workqueue is singlethread, so
> > > pm_autoresume() is going to be run for all parents before the device
> > > itself, so if that were the only resume mechanism, it would be enough to
> > > check if the parent is RPM_ACTIVE.
> > 
> >             A (IDLE)
> >     /                                \
> > B (SUSPENDED)         C (SUSPENDED)
> > 
> > Suppose C is to be resumed. This means first in case of A the request
> > to suspend would be cancelled. Here you drop the locks:
> > 
> > +           && (dev->parent->power.runtime_status == RPM_IDLE
> > +             || dev->parent->power.runtime_status == RPM_SUSPENDING
> > +             || dev->parent->power.runtime_status == RPM_SUSPENDED)) {
> > +               spin_unlock_irqrestore(&dev->power.lock, flags);
> > +               spin_unlock_irqrestore(&dev->parent->power.lock, parent_flags);
> > +
> > +               /* We have to resume the parent first. */
> > +               pm_request_resume(dev->parent);
> > 
> > But after pm_request_resume() returns there's no means to make sure
> > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > you because you've scheduled nothing by that time. The suspension will
> > work because C is still in RPM_SUSPENDED.
> 
> This is an example where usage counters come in handy.

Do you mean we can count suspend/resume requests for a device?

Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 13:48                               ` Rafael J. Wysocki
  2009-06-11 13:57                                 ` [patch update] " Oliver Neukum
@ 2009-06-11 13:57                                 ` Oliver Neukum
  2009-06-11 14:16                                   ` [patch update] " Alan Stern
  2009-06-11 14:16                                     ` Alan Stern
  1 sibling, 2 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 13:57 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 15:48:33 schrieb Rafael J. Wysocki:
> > > But after pm_request_resume() returns there's no means to make sure
> > > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > > you because you've scheduled nothing by that time. The suspension will
> > > work because C is still in RPM_SUSPENDED.
> >
> > This is an example where usage counters come in handy.
>
> Do you mean we can count suspend/resume requests for a device?

No, we count reasons a device cannot be suspended. Drivers are allowed to
add their own reasons. The core uses that mechanism to indicate that an
ongoing resumption lower down is also a reason.
The count going to zero is equivalent to a request to suspend.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 13:48                               ` Rafael J. Wysocki
@ 2009-06-11 13:57                                 ` Oliver Neukum
  2009-06-11 13:57                                 ` [patch update] Re: [linux-pm] " Oliver Neukum
  1 sibling, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 13:57 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Donnerstag, 11. Juni 2009 15:48:33 schrieb Rafael J. Wysocki:
> > > But after pm_request_resume() returns there's no means to make sure
> > > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > > you because you've scheduled nothing by that time. The suspension will
> > > work because C is still in RPM_SUSPENDED.
> >
> > This is an example where usage counters come in handy.
>
> Do you mean we can count suspend/resume requests for a device?

No, we count reasons a device cannot be suspended. Drivers are allowed to
add their own reasons. The core uses that mechanism to indicate that an
ongoing resumption lower down is also a reason.
The count going to zero is equivalent to a request to suspend.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 13:57                                 ` [patch update] Re: [linux-pm] " Oliver Neukum
@ 2009-06-11 14:16                                     ` Alan Stern
  2009-06-11 14:16                                     ` Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 14:16 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 15:48:33 schrieb Rafael J. Wysocki:
> > > > But after pm_request_resume() returns there's no means to make sure
> > > > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > > > you because you've scheduled nothing by that time. The suspension will
> > > > work because C is still in RPM_SUSPENDED.
> > >
> > > This is an example where usage counters come in handy.
> >
> > Do you mean we can count suspend/resume requests for a device?
> 
> No, we count reasons a device cannot be suspended. Drivers are allowed to
> add their own reasons. The core uses that mechanism to indicate that an
> ongoing resumption lower down is also a reason.
> The count going to zero is equivalent to a request to suspend.

Right.

Here's a related thought.  Change the resume routines as follows:

void pm_runtime_resume(struct device *dev)
{
// Do the actual resume ...
}
EXPORT_SYMBOL_GPL(pm_runtime_resume);

static void pm_runtime_resume_work(struct work_struct *work)
{
	pm_runtime_resume(resume_work_to_device(work));
}

Then there's no need for a separate pm_resume_sync(); drivers can
simply call pm_runtime_resume() directly.  The same trick works for 
suspending.

Of course, this means you have to give up the notion that all suspends 
and resumes are funnelled through the workqueue.  IMO that notion isn't 
worth keeping in any case.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11 14:16                                     ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 14:16 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Rafael J. Wysocki, linux-pm, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 15:48:33 schrieb Rafael J. Wysocki:
> > > > But after pm_request_resume() returns there's no means to make sure
> > > > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > > > you because you've scheduled nothing by that time. The suspension will
> > > > work because C is still in RPM_SUSPENDED.
> > >
> > > This is an example where usage counters come in handy.
> >
> > Do you mean we can count suspend/resume requests for a device?
> 
> No, we count reasons a device cannot be suspended. Drivers are allowed to
> add their own reasons. The core uses that mechanism to indicate that an
> ongoing resumption lower down is also a reason.
> The count going to zero is equivalent to a request to suspend.

Right.

Here's a related thought.  Change the resume routines as follows:

void pm_runtime_resume(struct device *dev)
{
// Do the actual resume ...
}
EXPORT_SYMBOL_GPL(pm_runtime_resume);

static void pm_runtime_resume_work(struct work_struct *work)
{
	pm_runtime_resume(resume_work_to_device(work));
}

Then there's no need for a separate pm_resume_sync(); drivers can
simply call pm_runtime_resume() directly.  The same trick works for 
suspending.

Of course, this means you have to give up the notion that all suspends 
and resumes are funnelled through the workqueue.  IMO that notion isn't 
worth keeping in any case.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 13:57                                 ` [patch update] Re: [linux-pm] " Oliver Neukum
@ 2009-06-11 14:16                                   ` Alan Stern
  2009-06-11 14:16                                     ` Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 14:16 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 15:48:33 schrieb Rafael J. Wysocki:
> > > > But after pm_request_resume() returns there's no means to make sure
> > > > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > > > you because you've scheduled nothing by that time. The suspension will
> > > > work because C is still in RPM_SUSPENDED.
> > >
> > > This is an example where usage counters come in handy.
> >
> > Do you mean we can count suspend/resume requests for a device?
> 
> No, we count reasons a device cannot be suspended. Drivers are allowed to
> add their own reasons. The core uses that mechanism to indicate that an
> ongoing resumption lower down is also a reason.
> The count going to zero is equivalent to a request to suspend.

Right.

Here's a related thought.  Change the resume routines as follows:

void pm_runtime_resume(struct device *dev)
{
// Do the actual resume ...
}
EXPORT_SYMBOL_GPL(pm_runtime_resume);

static void pm_runtime_resume_work(struct work_struct *work)
{
	pm_runtime_resume(resume_work_to_device(work));
}

Then there's no need for a separate pm_resume_sync(); drivers can
simply call pm_runtime_resume() directly.  The same trick works for 
suspending.

Of course, this means you have to give up the notion that all suspends 
and resumes are funnelled through the workqueue.  IMO that notion isn't 
worth keeping in any case.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-10 23:42                       ` Alan Stern
@ 2009-06-11 14:17                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 14:17 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > You know, it doesn't make any sense to have a suspend and a resume 
> > > both pending at the same time.
> > >
> > > So you could add only a delayed_work structure and use its embedded
> > > work_struct for resume requests.
> > 
> > I thought so too, but I was wrong. ;-)
> > 
> > If resume is requested while the suspend hasn't completed yet, we should
> > queue it (it's totally valid to request a suspending device to resume IMO), but
> > the delayed work is still being used by the workqueue code, so we can't modify
> > it.
> 
> Where is the delayed work still being used?  There's even a comment in 
> run_workqueue() that says a work_struct can be freed by the function it 
> calls.

You are right, I overlooked the comment and it wasn't clear to me from looking
at the code.

> > > We might want to do a runtime suspend even if the device's children
> > > aren't already suspended.  For example, you could suspend a link while
> > > leaving the device on the other end of the link at full power --
> > > especially if powering down the device is slow but changing the link's
> > > power level is fast.
> > 
> > Well, this means that the dependencies between devices in the device tree are
> > pretty much useless for the run-time PM as far as the core is concerned.  In
> > which case, why did you mention them at all?
> 
> The dependencies aren't totally useless.  It's still true that before
> you resume a device, you have to autoresume its parent.

Well, in fact if we don't have the requirement that the children of a device
have to be suspended for it to be able to suspend too, we have to check
all parents up the device tree up to the one that doesn't have a parent
and autoresume the ones that aren't active.

> And it's still true that when you suspend a device, the parent should be
> given a chance to autosuspend.
> 
> I guess the real point is that the decision about whether all children
> must be suspended should be made by the driver, not the PM core.

The point here is what the core is supposed to do.  Does it need to handle
this at all or leave it to the bus type and driver?

After reconsidering it for a while I think that we should define what
"suspended" is supposed to mean from the core point of view.  And my opinion
is that it should mean "device doesn't communicate with the CPUs and RAM due
to power management".  That need not be power management of the device itself,
but such that leads to the device not doing I/O.

Under this definition all devices behind an inactive link are suspended,
because they can't do any I/O.  Which appears to makes sense, because their
drivers have to be notified before the link is suspended and the link has to be
turned on for the devices to be able to communicate with the CPU and RAM.

If this definition is adopted, then it's quite clear that the device can only
be suspended if all of its children are suspended and it's always necessary
to resume the parent of a device in order to resume the device itself.

> > > I haven't checked the details of the code yet.  More later...
> 
> One more thought...  The autosuspend and autoresume callbacks need to 
> be mutually exclusive with probe and remove.  So somehow the driver 
> core will need to block runtime PM calls.

That's correct and I'm going to take care of this.

> It might also be nice to make sure that the driver core autoresumes a 
> device before probing it and autosuspends a device (after some 
> reasonable delay) after unbinding its driver.

Agreed.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11 14:17                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 14:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Wed, 10 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > You know, it doesn't make any sense to have a suspend and a resume 
> > > both pending at the same time.
> > >
> > > So you could add only a delayed_work structure and use its embedded
> > > work_struct for resume requests.
> > 
> > I thought so too, but I was wrong. ;-)
> > 
> > If resume is requested while the suspend hasn't completed yet, we should
> > queue it (it's totally valid to request a suspending device to resume IMO), but
> > the delayed work is still being used by the workqueue code, so we can't modify
> > it.
> 
> Where is the delayed work still being used?  There's even a comment in 
> run_workqueue() that says a work_struct can be freed by the function it 
> calls.

You are right, I overlooked the comment and it wasn't clear to me from looking
at the code.

> > > We might want to do a runtime suspend even if the device's children
> > > aren't already suspended.  For example, you could suspend a link while
> > > leaving the device on the other end of the link at full power --
> > > especially if powering down the device is slow but changing the link's
> > > power level is fast.
> > 
> > Well, this means that the dependencies between devices in the device tree are
> > pretty much useless for the run-time PM as far as the core is concerned.  In
> > which case, why did you mention them at all?
> 
> The dependencies aren't totally useless.  It's still true that before
> you resume a device, you have to autoresume its parent.

Well, in fact if we don't have the requirement that the children of a device
have to be suspended for it to be able to suspend too, we have to check
all parents up the device tree up to the one that doesn't have a parent
and autoresume the ones that aren't active.

> And it's still true that when you suspend a device, the parent should be
> given a chance to autosuspend.
> 
> I guess the real point is that the decision about whether all children
> must be suspended should be made by the driver, not the PM core.

The point here is what the core is supposed to do.  Does it need to handle
this at all or leave it to the bus type and driver?

After reconsidering it for a while I think that we should define what
"suspended" is supposed to mean from the core point of view.  And my opinion
is that it should mean "device doesn't communicate with the CPUs and RAM due
to power management".  That need not be power management of the device itself,
but such that leads to the device not doing I/O.

Under this definition all devices behind an inactive link are suspended,
because they can't do any I/O.  Which appears to makes sense, because their
drivers have to be notified before the link is suspended and the link has to be
turned on for the devices to be able to communicate with the CPU and RAM.

If this definition is adopted, then it's quite clear that the device can only
be suspended if all of its children are suspended and it's always necessary
to resume the parent of a device in order to resume the device itself.

> > > I haven't checked the details of the code yet.  More later...
> 
> One more thought...  The autosuspend and autoresume callbacks need to 
> be mutually exclusive with probe and remove.  So somehow the driver 
> core will need to block runtime PM calls.

That's correct and I'm going to take care of this.

> It might also be nice to make sure that the driver core autoresumes a 
> device before probing it and autosuspends a device (after some 
> reasonable delay) after unbinding its driver.

Agreed.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:17                           ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  (?)
  (?)
@ 2009-06-11 14:52                           ` Alan Stern
  2009-06-11 15:06                               ` Oliver Neukum
                                               ` (3 more replies)
  -1 siblings, 4 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Rafael J. Wysocki wrote:

> The point here is what the core is supposed to do.  Does it need to handle
> this at all or leave it to the bus type and driver?
> 
> After reconsidering it for a while I think that we should define what
> "suspended" is supposed to mean from the core point of view.  And my opinion
> is that it should mean "device doesn't communicate with the CPUs and RAM due
> to power management".  That need not be power management of the device itself,
> but such that leads to the device not doing I/O.
> 
> Under this definition all devices behind an inactive link are suspended,
> because they can't do any I/O.  Which appears to makes sense, because their
> drivers have to be notified before the link is suspended and the link has to be
> turned on for the devices to be able to communicate with the CPU and RAM.
> 
> If this definition is adopted, then it's quite clear that the device can only
> be suspended if all of its children are suspended and it's always necessary
> to resume the parent of a device in order to resume the device itself.

Okay, I'll agree to that.  It should be made clear that a device which 
is "suspended" according to this definition is not necessarily in a 
low-power state.  For example, before powering down the link to a disk 
drive you might want the drive's suspend method to flush the drive's 
cache, but it wouldn't have to spin the drive down.

(But this example leaves open the question of how we would go about
spinning down the disk.  Submitting a 15-minute (or whatever) delayed
autosuspend request wouldn't work; the request wouldn't be accepted
because the disk is already suspended as far as the PM core is
concerned.  The disk's driver would have to implement its own 
spin-down delayed_work.)

> > > > I haven't checked the details of the code yet.  More later...
> > 
> > One more thought...  The autosuspend and autoresume callbacks need to 
> > be mutually exclusive with probe and remove.  So somehow the driver 
> > core will need to block runtime PM calls.
> 
> That's correct and I'm going to take care of this.
> 
> > It might also be nice to make sure that the driver core autoresumes a 
> > device before probing it and autosuspends a device (after some 
> > reasonable delay) after unbinding its driver.
> 
> Agreed.

This is another case where a usage counter comes in handy.  The driver
core resumes the device and increments the counter -- thus preventing
any unwanted autosuspends -- before making the probe and remove calls.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:17                           ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  (?)
@ 2009-06-11 14:52                           ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thu, 11 Jun 2009, Rafael J. Wysocki wrote:

> The point here is what the core is supposed to do.  Does it need to handle
> this at all or leave it to the bus type and driver?
> 
> After reconsidering it for a while I think that we should define what
> "suspended" is supposed to mean from the core point of view.  And my opinion
> is that it should mean "device doesn't communicate with the CPUs and RAM due
> to power management".  That need not be power management of the device itself,
> but such that leads to the device not doing I/O.
> 
> Under this definition all devices behind an inactive link are suspended,
> because they can't do any I/O.  Which appears to makes sense, because their
> drivers have to be notified before the link is suspended and the link has to be
> turned on for the devices to be able to communicate with the CPU and RAM.
> 
> If this definition is adopted, then it's quite clear that the device can only
> be suspended if all of its children are suspended and it's always necessary
> to resume the parent of a device in order to resume the device itself.

Okay, I'll agree to that.  It should be made clear that a device which 
is "suspended" according to this definition is not necessarily in a 
low-power state.  For example, before powering down the link to a disk 
drive you might want the drive's suspend method to flush the drive's 
cache, but it wouldn't have to spin the drive down.

(But this example leaves open the question of how we would go about
spinning down the disk.  Submitting a 15-minute (or whatever) delayed
autosuspend request wouldn't work; the request wouldn't be accepted
because the disk is already suspended as far as the PM core is
concerned.  The disk's driver would have to implement its own 
spin-down delayed_work.)

> > > > I haven't checked the details of the code yet.  More later...
> > 
> > One more thought...  The autosuspend and autoresume callbacks need to 
> > be mutually exclusive with probe and remove.  So somehow the driver 
> > core will need to block runtime PM calls.
> 
> That's correct and I'm going to take care of this.
> 
> > It might also be nice to make sure that the driver core autoresumes a 
> > device before probing it and autosuspends a device (after some 
> > reasonable delay) after unbinding its driver.
> 
> Agreed.

This is another case where a usage counter comes in handy.  The driver
core resumes the device and increments the counter -- thus preventing
any unwanted autosuspends -- before making the probe and remove calls.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:52                           ` [patch update] Re: [linux-pm] " Alan Stern
@ 2009-06-11 15:06                               ` Oliver Neukum
  2009-06-11 15:06                             ` Oliver Neukum
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 15:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 16:52:03 schrieb Alan Stern:
> > Under this definition all devices behind an inactive link are suspended,
> > because they can't do any I/O.  Which appears to makes sense, because
> > their drivers have to be notified before the link is suspended and the
> > link has to be turned on for the devices to be able to communicate with
> > the CPU and RAM.
> >
> > If this definition is adopted, then it's quite clear that the device can
> > only be suspended if all of its children are suspended and it's always
> > necessary to resume the parent of a device in order to resume the device
> > itself.
>
> Okay, I'll agree to that.  It should be made clear that a device which
> is "suspended" according to this definition is not necessarily in a
> low-power state.  For example, before powering down the link to a disk
> drive you might want the drive's suspend method to flush the drive's
> cache, but it wouldn't have to spin the drive down.

This precludes handling busses that have low power states that are
left automatically. If such links are stacked the management of acceptable
latencies cannot be left to the busses.
An actual example are the link states of USB 3.0

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11 15:06                               ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 15:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 16:52:03 schrieb Alan Stern:
> > Under this definition all devices behind an inactive link are suspended,
> > because they can't do any I/O.  Which appears to makes sense, because
> > their drivers have to be notified before the link is suspended and the
> > link has to be turned on for the devices to be able to communicate with
> > the CPU and RAM.
> >
> > If this definition is adopted, then it's quite clear that the device can
> > only be suspended if all of its children are suspended and it's always
> > necessary to resume the parent of a device in order to resume the device
> > itself.
>
> Okay, I'll agree to that.  It should be made clear that a device which
> is "suspended" according to this definition is not necessarily in a
> low-power state.  For example, before powering down the link to a disk
> drive you might want the drive's suspend method to flush the drive's
> cache, but it wouldn't have to spin the drive down.

This precludes handling busses that have low power states that are
left automatically. If such links are stacked the management of acceptable
latencies cannot be left to the busses.
An actual example are the link states of USB 3.0

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:52                           ` [patch update] Re: [linux-pm] " Alan Stern
  2009-06-11 15:06                               ` Oliver Neukum
@ 2009-06-11 15:06                             ` Oliver Neukum
  2009-06-11 19:43                             ` Rafael J. Wysocki
  2009-06-11 19:43                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  3 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 15:06 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

Am Donnerstag, 11. Juni 2009 16:52:03 schrieb Alan Stern:
> > Under this definition all devices behind an inactive link are suspended,
> > because they can't do any I/O.  Which appears to makes sense, because
> > their drivers have to be notified before the link is suspended and the
> > link has to be turned on for the devices to be able to communicate with
> > the CPU and RAM.
> >
> > If this definition is adopted, then it's quite clear that the device can
> > only be suspended if all of its children are suspended and it's always
> > necessary to resume the parent of a device in order to resume the device
> > itself.
>
> Okay, I'll agree to that.  It should be made clear that a device which
> is "suspended" according to this definition is not necessarily in a
> low-power state.  For example, before powering down the link to a disk
> drive you might want the drive's suspend method to flush the drive's
> cache, but it wouldn't have to spin the drive down.

This precludes handling busses that have low power states that are
left automatically. If such links are stacked the management of acceptable
latencies cannot be left to the busses.
An actual example are the link states of USB 3.0

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 15:06                               ` Oliver Neukum
@ 2009-06-11 15:22                                 ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 15:22 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 16:52:03 schrieb Alan Stern:
> > > Under this definition all devices behind an inactive link are suspended,
> > > because they can't do any I/O.  Which appears to makes sense, because
> > > their drivers have to be notified before the link is suspended and the
> > > link has to be turned on for the devices to be able to communicate with
> > > the CPU and RAM.
> > >
> > > If this definition is adopted, then it's quite clear that the device can
> > > only be suspended if all of its children are suspended and it's always
> > > necessary to resume the parent of a device in order to resume the device
> > > itself.
> >
> > Okay, I'll agree to that.  It should be made clear that a device which
> > is "suspended" according to this definition is not necessarily in a
> > low-power state.  For example, before powering down the link to a disk
> > drive you might want the drive's suspend method to flush the drive's
> > cache, but it wouldn't have to spin the drive down.
> 
> This precludes handling busses that have low power states that are
> left automatically. If such links are stacked the management of acceptable
> latencies cannot be left to the busses.
> An actual example are the link states of USB 3.0

I don't understand.  Can you explain more fully?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11 15:22                                 ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 15:22 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 16:52:03 schrieb Alan Stern:
> > > Under this definition all devices behind an inactive link are suspended,
> > > because they can't do any I/O.  Which appears to makes sense, because
> > > their drivers have to be notified before the link is suspended and the
> > > link has to be turned on for the devices to be able to communicate with
> > > the CPU and RAM.
> > >
> > > If this definition is adopted, then it's quite clear that the device can
> > > only be suspended if all of its children are suspended and it's always
> > > necessary to resume the parent of a device in order to resume the device
> > > itself.
> >
> > Okay, I'll agree to that.  It should be made clear that a device which
> > is "suspended" according to this definition is not necessarily in a
> > low-power state.  For example, before powering down the link to a disk
> > drive you might want the drive's suspend method to flush the drive's
> > cache, but it wouldn't have to spin the drive down.
> 
> This precludes handling busses that have low power states that are
> left automatically. If such links are stacked the management of acceptable
> latencies cannot be left to the busses.
> An actual example are the link states of USB 3.0

I don't understand.  Can you explain more fully?

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 15:06                               ` Oliver Neukum
  (?)
  (?)
@ 2009-06-11 15:22                               ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 15:22 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 16:52:03 schrieb Alan Stern:
> > > Under this definition all devices behind an inactive link are suspended,
> > > because they can't do any I/O.  Which appears to makes sense, because
> > > their drivers have to be notified before the link is suspended and the
> > > link has to be turned on for the devices to be able to communicate with
> > > the CPU and RAM.
> > >
> > > If this definition is adopted, then it's quite clear that the device can
> > > only be suspended if all of its children are suspended and it's always
> > > necessary to resume the parent of a device in order to resume the device
> > > itself.
> >
> > Okay, I'll agree to that.  It should be made clear that a device which
> > is "suspended" according to this definition is not necessarily in a
> > low-power state.  For example, before powering down the link to a disk
> > drive you might want the drive's suspend method to flush the drive's
> > cache, but it wouldn't have to spin the drive down.
> 
> This precludes handling busses that have low power states that are
> left automatically. If such links are stacked the management of acceptable
> latencies cannot be left to the busses.
> An actual example are the link states of USB 3.0

I don't understand.  Can you explain more fully?

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 15:22                                 ` Alan Stern
@ 2009-06-11 16:05                                   ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 16:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 17:22:06 schrieb Alan Stern:
> > > Okay, I'll agree to that.  It should be made clear that a device which
> > > is "suspended" according to this definition is not necessarily in a
> > > low-power state.  For example, before powering down the link to a disk
> > > drive you might want the drive's suspend method to flush the drive's
> > > cache, but it wouldn't have to spin the drive down.
> >
> > This precludes handling busses that have low power states that are
> > left automatically. If such links are stacked the management of
> > acceptable latencies cannot be left to the busses.
> > An actual example are the link states of USB 3.0
>
> I don't understand.  Can you explain more fully?

I am talking about the U1 and U2 feature of USB 3.0.

Or abstractly any power saving state that does autoresume in hardware.
In these cases you know that you can enter a powersaving state that
will add X latency.

In terms of user space API we'll probably add a way for user space
to specify how much latency may be added for power management's sake.
If busses are stacked the "latency budget" has to be handled at core level.
If furthermore states that allow IO but with additional latency are ignored,
the budget will be calculated wrongly.

	Regards
		Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11 16:05                                   ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 16:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 17:22:06 schrieb Alan Stern:
> > > Okay, I'll agree to that.  It should be made clear that a device which
> > > is "suspended" according to this definition is not necessarily in a
> > > low-power state.  For example, before powering down the link to a disk
> > > drive you might want the drive's suspend method to flush the drive's
> > > cache, but it wouldn't have to spin the drive down.
> >
> > This precludes handling busses that have low power states that are
> > left automatically. If such links are stacked the management of
> > acceptable latencies cannot be left to the busses.
> > An actual example are the link states of USB 3.0
>
> I don't understand.  Can you explain more fully?

I am talking about the U1 and U2 feature of USB 3.0.

Or abstractly any power saving state that does autoresume in hardware.
In these cases you know that you can enter a powersaving state that
will add X latency.

In terms of user space API we'll probably add a way for user space
to specify how much latency may be added for power management's sake.
If busses are stacked the "latency budget" has to be handled at core level.
If furthermore states that allow IO but with additional latency are ignored,
the budget will be calculated wrongly.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 15:22                                 ` Alan Stern
  (?)
  (?)
@ 2009-06-11 16:05                                 ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 16:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

Am Donnerstag, 11. Juni 2009 17:22:06 schrieb Alan Stern:
> > > Okay, I'll agree to that.  It should be made clear that a device which
> > > is "suspended" according to this definition is not necessarily in a
> > > low-power state.  For example, before powering down the link to a disk
> > > drive you might want the drive's suspend method to flush the drive's
> > > cache, but it wouldn't have to spin the drive down.
> >
> > This precludes handling busses that have low power states that are
> > left automatically. If such links are stacked the management of
> > acceptable latencies cannot be left to the busses.
> > An actual example are the link states of USB 3.0
>
> I don't understand.  Can you explain more fully?

I am talking about the U1 and U2 feature of USB 3.0.

Or abstractly any power saving state that does autoresume in hardware.
In these cases you know that you can enter a powersaving state that
will add X latency.

In terms of user space API we'll probably add a way for user space
to specify how much latency may be added for power management's sake.
If busses are stacked the "latency budget" has to be handled at core level.
If furthermore states that allow IO but with additional latency are ignored,
the budget will be calculated wrongly.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 16:05                                   ` Oliver Neukum
@ 2009-06-11 18:36                                     ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 18:36 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 17:22:06 schrieb Alan Stern:
> > > > Okay, I'll agree to that.  It should be made clear that a device which
> > > > is "suspended" according to this definition is not necessarily in a
> > > > low-power state.  For example, before powering down the link to a disk
> > > > drive you might want the drive's suspend method to flush the drive's
> > > > cache, but it wouldn't have to spin the drive down.
> > >
> > > This precludes handling busses that have low power states that are
> > > left automatically. If such links are stacked the management of
> > > acceptable latencies cannot be left to the busses.
> > > An actual example are the link states of USB 3.0
> >
> > I don't understand.  Can you explain more fully?
> 
> I am talking about the U1 and U2 feature of USB 3.0.
> 
> Or abstractly any power saving state that does autoresume in hardware.
> In these cases you know that you can enter a powersaving state that
> will add X latency.
> 
> In terms of user space API we'll probably add a way for user space
> to specify how much latency may be added for power management's sake.
> If busses are stacked the "latency budget" has to be handled at core level.
> If furthermore states that allow IO but with additional latency are ignored,
> the budget will be calculated wrongly.

Okay, fine.  What does this have to do with Rafael's work?  Why does 
setting the status to RPM_SUSPENDED even when a device is not in a 
low-power state preclude handling buses that automatically change their 
power state?

I don't see any connection between Rafael's work and managing
latencies, beyond the obvious fact that a device will have a higher
latency when it is suspended than when it isn't.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11 18:36                                     ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 18:36 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 17:22:06 schrieb Alan Stern:
> > > > Okay, I'll agree to that.  It should be made clear that a device which
> > > > is "suspended" according to this definition is not necessarily in a
> > > > low-power state.  For example, before powering down the link to a disk
> > > > drive you might want the drive's suspend method to flush the drive's
> > > > cache, but it wouldn't have to spin the drive down.
> > >
> > > This precludes handling busses that have low power states that are
> > > left automatically. If such links are stacked the management of
> > > acceptable latencies cannot be left to the busses.
> > > An actual example are the link states of USB 3.0
> >
> > I don't understand.  Can you explain more fully?
> 
> I am talking about the U1 and U2 feature of USB 3.0.
> 
> Or abstractly any power saving state that does autoresume in hardware.
> In these cases you know that you can enter a powersaving state that
> will add X latency.
> 
> In terms of user space API we'll probably add a way for user space
> to specify how much latency may be added for power management's sake.
> If busses are stacked the "latency budget" has to be handled at core level.
> If furthermore states that allow IO but with additional latency are ignored,
> the budget will be calculated wrongly.

Okay, fine.  What does this have to do with Rafael's work?  Why does 
setting the status to RPM_SUSPENDED even when a device is not in a 
low-power state preclude handling buses that automatically change their 
power state?

I don't see any connection between Rafael's work and managing
latencies, beyond the obvious fact that a device will have a higher
latency when it is suspended than when it isn't.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 16:05                                   ` Oliver Neukum
  (?)
@ 2009-06-11 18:36                                   ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-11 18:36 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 17:22:06 schrieb Alan Stern:
> > > > Okay, I'll agree to that.  It should be made clear that a device which
> > > > is "suspended" according to this definition is not necessarily in a
> > > > low-power state.  For example, before powering down the link to a disk
> > > > drive you might want the drive's suspend method to flush the drive's
> > > > cache, but it wouldn't have to spin the drive down.
> > >
> > > This precludes handling busses that have low power states that are
> > > left automatically. If such links are stacked the management of
> > > acceptable latencies cannot be left to the busses.
> > > An actual example are the link states of USB 3.0
> >
> > I don't understand.  Can you explain more fully?
> 
> I am talking about the U1 and U2 feature of USB 3.0.
> 
> Or abstractly any power saving state that does autoresume in hardware.
> In these cases you know that you can enter a powersaving state that
> will add X latency.
> 
> In terms of user space API we'll probably add a way for user space
> to specify how much latency may be added for power management's sake.
> If busses are stacked the "latency budget" has to be handled at core level.
> If furthermore states that allow IO but with additional latency are ignored,
> the budget will be calculated wrongly.

Okay, fine.  What does this have to do with Rafael's work?  Why does 
setting the status to RPM_SUSPENDED even when a device is not in a 
low-power state preclude handling buses that automatically change their 
power state?

I don't see any connection between Rafael's work and managing
latencies, beyond the obvious fact that a device will have a higher
latency when it is suspended than when it isn't.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:16                                     ` Alan Stern
  (?)
  (?)
@ 2009-06-11 19:38                                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 19:38 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, linux-pm, ACPI Devel Maling List, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Thu, 11 Jun 2009, Oliver Neukum wrote:
> 
> > Am Donnerstag, 11. Juni 2009 15:48:33 schrieb Rafael J. Wysocki:
> > > > > But after pm_request_resume() returns there's no means to make sure
> > > > > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > > > > you because you've scheduled nothing by that time. The suspension will
> > > > > work because C is still in RPM_SUSPENDED.
> > > >
> > > > This is an example where usage counters come in handy.
> > >
> > > Do you mean we can count suspend/resume requests for a device?
> > 
> > No, we count reasons a device cannot be suspended. Drivers are allowed to
> > add their own reasons. The core uses that mechanism to indicate that an
> > ongoing resumption lower down is also a reason.
> > The count going to zero is equivalent to a request to suspend.
> 
> Right.

Ah.  *That* is what you had in mind.  Yes, we can do that.
 
> Here's a related thought.  Change the resume routines as follows:
>
> void pm_runtime_resume(struct device *dev)
> {
> // Do the actual resume ...
> }
> EXPORT_SYMBOL_GPL(pm_runtime_resume);
> 
> static void pm_runtime_resume_work(struct work_struct *work)
> {
> 	pm_runtime_resume(resume_work_to_device(work));
> }
> 
> Then there's no need for a separate pm_resume_sync(); drivers can
> simply call pm_runtime_resume() directly.  The same trick works for 
> suspending.
> 
> Of course, this means you have to give up the notion that all suspends 
> and resumes are funnelled through the workqueue.  IMO that notion isn't 
> worth keeping in any case.

That's already not the case for resuming.

Well, ISTR a reason why I thought pm_resume_sync() was needed anyway, but the
idea is actually good.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:16                                     ` Alan Stern
  (?)
@ 2009-06-11 19:38                                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 19:38 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Thu, 11 Jun 2009, Oliver Neukum wrote:
> 
> > Am Donnerstag, 11. Juni 2009 15:48:33 schrieb Rafael J. Wysocki:
> > > > > But after pm_request_resume() returns there's no means to make sure
> > > > > nothing alters it back to RPM_SUSPENDED. The workqueue doesn't help
> > > > > you because you've scheduled nothing by that time. The suspension will
> > > > > work because C is still in RPM_SUSPENDED.
> > > >
> > > > This is an example where usage counters come in handy.
> > >
> > > Do you mean we can count suspend/resume requests for a device?
> > 
> > No, we count reasons a device cannot be suspended. Drivers are allowed to
> > add their own reasons. The core uses that mechanism to indicate that an
> > ongoing resumption lower down is also a reason.
> > The count going to zero is equivalent to a request to suspend.
> 
> Right.

Ah.  *That* is what you had in mind.  Yes, we can do that.
 
> Here's a related thought.  Change the resume routines as follows:
>
> void pm_runtime_resume(struct device *dev)
> {
> // Do the actual resume ...
> }
> EXPORT_SYMBOL_GPL(pm_runtime_resume);
> 
> static void pm_runtime_resume_work(struct work_struct *work)
> {
> 	pm_runtime_resume(resume_work_to_device(work));
> }
> 
> Then there's no need for a separate pm_resume_sync(); drivers can
> simply call pm_runtime_resume() directly.  The same trick works for 
> suspending.
> 
> Of course, this means you have to give up the notion that all suspends 
> and resumes are funnelled through the workqueue.  IMO that notion isn't 
> worth keeping in any case.

That's already not the case for resuming.

Well, ISTR a reason why I thought pm_resume_sync() was needed anyway, but the
idea is actually good.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:52                           ` [patch update] Re: [linux-pm] " Alan Stern
                                               ` (2 preceding siblings ...)
  2009-06-11 19:43                             ` Rafael J. Wysocki
@ 2009-06-11 19:43                             ` Rafael J. Wysocki
  2009-06-12 14:25                               ` [patch update] " Alan Stern
  2009-06-12 14:25                               ` [patch update] Re: [linux-pm] " Alan Stern
  3 siblings, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 19:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Thu, 11 Jun 2009, Rafael J. Wysocki wrote:
> 
> > The point here is what the core is supposed to do.  Does it need to handle
> > this at all or leave it to the bus type and driver?
> > 
> > After reconsidering it for a while I think that we should define what
> > "suspended" is supposed to mean from the core point of view.  And my opinion
> > is that it should mean "device doesn't communicate with the CPUs and RAM due
> > to power management".  That need not be power management of the device itself,
> > but such that leads to the device not doing I/O.
> > 
> > Under this definition all devices behind an inactive link are suspended,
> > because they can't do any I/O.  Which appears to makes sense, because their
> > drivers have to be notified before the link is suspended and the link has to be
> > turned on for the devices to be able to communicate with the CPU and RAM.
> > 
> > If this definition is adopted, then it's quite clear that the device can only
> > be suspended if all of its children are suspended and it's always necessary
> > to resume the parent of a device in order to resume the device itself.
> 
> Okay, I'll agree to that.  It should be made clear that a device which 
> is "suspended" according to this definition is not necessarily in a 
> low-power state.  For example, before powering down the link to a disk 
> drive you might want the drive's suspend method to flush the drive's 
> cache, but it wouldn't have to spin the drive down.

Exactly.

> (But this example leaves open the question of how we would go about
> spinning down the disk.  Submitting a 15-minute (or whatever) delayed
> autosuspend request wouldn't work; the request wouldn't be accepted
> because the disk is already suspended as far as the PM core is
> concerned.  The disk's driver would have to implement its own 
> spin-down delayed_work.)

Yes, it would.

> > > > > I haven't checked the details of the code yet.  More later...
> > > 
> > > One more thought...  The autosuspend and autoresume callbacks need to 
> > > be mutually exclusive with probe and remove.  So somehow the driver 
> > > core will need to block runtime PM calls.
> > 
> > That's correct and I'm going to take care of this.
> > 
> > > It might also be nice to make sure that the driver core autoresumes a 
> > > device before probing it and autosuspends a device (after some 
> > > reasonable delay) after unbinding its driver.
> > 
> > Agreed.
> 
> This is another case where a usage counter comes in handy.  The driver
> core resumes the device and increments the counter -- thus preventing
> any unwanted autosuspends -- before making the probe and remove calls.

I like this idea.

BTW, where exactly the counter should be increased in that case?

I thought of driver_probe_device(), but is it sufficient?  Or is there a better
place?

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 14:52                           ` [patch update] Re: [linux-pm] " Alan Stern
  2009-06-11 15:06                               ` Oliver Neukum
  2009-06-11 15:06                             ` Oliver Neukum
@ 2009-06-11 19:43                             ` Rafael J. Wysocki
  2009-06-11 19:43                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  3 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-11 19:43 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thursday 11 June 2009, Alan Stern wrote:
> On Thu, 11 Jun 2009, Rafael J. Wysocki wrote:
> 
> > The point here is what the core is supposed to do.  Does it need to handle
> > this at all or leave it to the bus type and driver?
> > 
> > After reconsidering it for a while I think that we should define what
> > "suspended" is supposed to mean from the core point of view.  And my opinion
> > is that it should mean "device doesn't communicate with the CPUs and RAM due
> > to power management".  That need not be power management of the device itself,
> > but such that leads to the device not doing I/O.
> > 
> > Under this definition all devices behind an inactive link are suspended,
> > because they can't do any I/O.  Which appears to makes sense, because their
> > drivers have to be notified before the link is suspended and the link has to be
> > turned on for the devices to be able to communicate with the CPU and RAM.
> > 
> > If this definition is adopted, then it's quite clear that the device can only
> > be suspended if all of its children are suspended and it's always necessary
> > to resume the parent of a device in order to resume the device itself.
> 
> Okay, I'll agree to that.  It should be made clear that a device which 
> is "suspended" according to this definition is not necessarily in a 
> low-power state.  For example, before powering down the link to a disk 
> drive you might want the drive's suspend method to flush the drive's 
> cache, but it wouldn't have to spin the drive down.

Exactly.

> (But this example leaves open the question of how we would go about
> spinning down the disk.  Submitting a 15-minute (or whatever) delayed
> autosuspend request wouldn't work; the request wouldn't be accepted
> because the disk is already suspended as far as the PM core is
> concerned.  The disk's driver would have to implement its own 
> spin-down delayed_work.)

Yes, it would.

> > > > > I haven't checked the details of the code yet.  More later...
> > > 
> > > One more thought...  The autosuspend and autoresume callbacks need to 
> > > be mutually exclusive with probe and remove.  So somehow the driver 
> > > core will need to block runtime PM calls.
> > 
> > That's correct and I'm going to take care of this.
> > 
> > > It might also be nice to make sure that the driver core autoresumes a 
> > > device before probing it and autosuspends a device (after some 
> > > reasonable delay) after unbinding its driver.
> > 
> > Agreed.
> 
> This is another case where a usage counter comes in handy.  The driver
> core resumes the device and increments the counter -- thus preventing
> any unwanted autosuspends -- before making the probe and remove calls.

I like this idea.

BTW, where exactly the counter should be increased in that case?

I thought of driver_probe_device(), but is it sufficient?  Or is there a better
place?

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 18:36                                     ` Alan Stern
@ 2009-06-11 21:05                                       ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 21:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 20:36:30 schrieb Alan Stern:
> > Or abstractly any power saving state that does autoresume in hardware.
> > In these cases you know that you can enter a powersaving state that
> > will add X latency.
> >
> > In terms of user space API we'll probably add a way for user space
> > to specify how much latency may be added for power management's sake.
> > If busses are stacked the "latency budget" has to be handled at core
> > level. If furthermore states that allow IO but with additional latency
> > are ignored, the budget will be calculated wrongly.
>
> Okay, fine.  What does this have to do with Rafael's work?  Why does
> setting the status to RPM_SUSPENDED even when a device is not in a
> low-power state preclude handling buses that automatically change their
> power state?

For these cases the tree constraint does not apply.
I think there are devices who can be suspended while children are active
and devices which can not be. This is an attribute of the device and should
be evaluated by the core.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-11 21:05                                       ` Oliver Neukum
  0 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 21:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Donnerstag, 11. Juni 2009 20:36:30 schrieb Alan Stern:
> > Or abstractly any power saving state that does autoresume in hardware.
> > In these cases you know that you can enter a powersaving state that
> > will add X latency.
> >
> > In terms of user space API we'll probably add a way for user space
> > to specify how much latency may be added for power management's sake.
> > If busses are stacked the "latency budget" has to be handled at core
> > level. If furthermore states that allow IO but with additional latency
> > are ignored, the budget will be calculated wrongly.
>
> Okay, fine.  What does this have to do with Rafael's work?  Why does
> setting the status to RPM_SUSPENDED even when a device is not in a
> low-power state preclude handling buses that automatically change their
> power state?

For these cases the tree constraint does not apply.
I think there are devices who can be suspended while children are active
and devices which can not be. This is an attribute of the device and should
be evaluated by the core.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 18:36                                     ` Alan Stern
  (?)
@ 2009-06-11 21:05                                     ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-11 21:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

Am Donnerstag, 11. Juni 2009 20:36:30 schrieb Alan Stern:
> > Or abstractly any power saving state that does autoresume in hardware.
> > In these cases you know that you can enter a powersaving state that
> > will add X latency.
> >
> > In terms of user space API we'll probably add a way for user space
> > to specify how much latency may be added for power management's sake.
> > If busses are stacked the "latency budget" has to be handled at core
> > level. If furthermore states that allow IO but with additional latency
> > are ignored, the budget will be calculated wrongly.
>
> Okay, fine.  What does this have to do with Rafael's work?  Why does
> setting the status to RPM_SUSPENDED even when a device is not in a
> low-power state preclude handling buses that automatically change their
> power state?

For these cases the tree constraint does not apply.
I think there are devices who can be suspended while children are active
and devices which can not be. This is an attribute of the device and should
be evaluated by the core.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 21:05                                       ` Oliver Neukum
@ 2009-06-12  2:16                                         ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12  2:16 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 20:36:30 schrieb Alan Stern:
> > > Or abstractly any power saving state that does autoresume in hardware.
> > > In these cases you know that you can enter a powersaving state that
> > > will add X latency.
> > >
> > > In terms of user space API we'll probably add a way for user space
> > > to specify how much latency may be added for power management's sake.
> > > If busses are stacked the "latency budget" has to be handled at core
> > > level. If furthermore states that allow IO but with additional latency
> > > are ignored, the budget will be calculated wrongly.
> >
> > Okay, fine.  What does this have to do with Rafael's work?  Why does
> > setting the status to RPM_SUSPENDED even when a device is not in a
> > low-power state preclude handling buses that automatically change their
> > power state?
> 
> For these cases the tree constraint does not apply.

What tree constraint?  You mean that the PM core shouldn't allow 
devices to suspend unless all their children are suspended?  Why 
doesn't it still apply?

Remember, when Rafael and I say "suspend" here, we don't mean "go to a 
low-power state".  We mean "the PM core calls the runtime_suspend 
method".  No matter what actions the link hardware may decide to take 
on its own, the PM core will still want to observe the 
all-children-suspended restriction when calling runtime_suspend 
methods.

> I think there are devices who can be suspended while children are active
> and devices which can not be. This is an attribute of the device and should
> be evaluated by the core.

Clearly it should be decided by the driver.  Should there be a bit for
it in the dev_pm_info structure?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-12  2:16                                         ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12  2:16 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 20:36:30 schrieb Alan Stern:
> > > Or abstractly any power saving state that does autoresume in hardware.
> > > In these cases you know that you can enter a powersaving state that
> > > will add X latency.
> > >
> > > In terms of user space API we'll probably add a way for user space
> > > to specify how much latency may be added for power management's sake.
> > > If busses are stacked the "latency budget" has to be handled at core
> > > level. If furthermore states that allow IO but with additional latency
> > > are ignored, the budget will be calculated wrongly.
> >
> > Okay, fine.  What does this have to do with Rafael's work?  Why does
> > setting the status to RPM_SUSPENDED even when a device is not in a
> > low-power state preclude handling buses that automatically change their
> > power state?
> 
> For these cases the tree constraint does not apply.

What tree constraint?  You mean that the PM core shouldn't allow 
devices to suspend unless all their children are suspended?  Why 
doesn't it still apply?

Remember, when Rafael and I say "suspend" here, we don't mean "go to a 
low-power state".  We mean "the PM core calls the runtime_suspend 
method".  No matter what actions the link hardware may decide to take 
on its own, the PM core will still want to observe the 
all-children-suspended restriction when calling runtime_suspend 
methods.

> I think there are devices who can be suspended while children are active
> and devices which can not be. This is an attribute of the device and should
> be evaluated by the core.

Clearly it should be decided by the driver.  Should there be a bit for
it in the dev_pm_info structure?

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 21:05                                       ` Oliver Neukum
  (?)
  (?)
@ 2009-06-12  2:16                                       ` Alan Stern
  -1 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12  2:16 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thu, 11 Jun 2009, Oliver Neukum wrote:

> Am Donnerstag, 11. Juni 2009 20:36:30 schrieb Alan Stern:
> > > Or abstractly any power saving state that does autoresume in hardware.
> > > In these cases you know that you can enter a powersaving state that
> > > will add X latency.
> > >
> > > In terms of user space API we'll probably add a way for user space
> > > to specify how much latency may be added for power management's sake.
> > > If busses are stacked the "latency budget" has to be handled at core
> > > level. If furthermore states that allow IO but with additional latency
> > > are ignored, the budget will be calculated wrongly.
> >
> > Okay, fine.  What does this have to do with Rafael's work?  Why does
> > setting the status to RPM_SUSPENDED even when a device is not in a
> > low-power state preclude handling buses that automatically change their
> > power state?
> 
> For these cases the tree constraint does not apply.

What tree constraint?  You mean that the PM core shouldn't allow 
devices to suspend unless all their children are suspended?  Why 
doesn't it still apply?

Remember, when Rafael and I say "suspend" here, we don't mean "go to a 
low-power state".  We mean "the PM core calls the runtime_suspend 
method".  No matter what actions the link hardware may decide to take 
on its own, the PM core will still want to observe the 
all-children-suspended restriction when calling runtime_suspend 
methods.

> I think there are devices who can be suspended while children are active
> and devices which can not be. This is an attribute of the device and should
> be evaluated by the core.

Clearly it should be decided by the driver.  Should there be a bit for
it in the dev_pm_info structure?

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11  9:08                     ` Oliver Neukum
@ 2009-06-12  3:13                         ` Magnus Damm
  0 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-12  3:13 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

Hi Oliver,

On Thu, Jun 11, 2009 at 6:08 PM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
>> 3) When all devices in the power domain are suspended the bus code can
>> turn off the power. The reason why I'd like to only autosuspend when
>
> So you are saying that you have power dependencies independent
> of the device tree?

I can think of the following power dependencies:
- hardware bus topology
- clocks
- power domains

>> all devices are idle is simply that we don't get any power savings
>> from the per device autosuspend() callbacks, only from turning off
>> power to the entire per-domain. So bindly autosuspending and
>> autoresuming devices is just pure overhead unless we know we can do it
>> for all devices in the domain.
>
> Why can't you do this within the framework? You simply suspend when
> all a domain's devices have been autosuspended.

So you mean I should handle that in my arch/bus specific
dev->bus->pm->autosuspend() code? So instead of calling
dev->driver->pm->autosuspend() straight away I keep track of the use
count of the power domain and when the domain is unused I call
dev->driver->pm->autosuspend() for all devices in the power domain
before powering off?

I guess hooking in things in dev->bus->pm->autosuspend() is doable,
but then dev->power.runtime_status will be set to RPM_SUSPENDED even
though the actual device isn't suspended at all. And RPM_IDLE state
will be more or less unused since the drivers should pass a delay of
zero to make sure the bus code gets notified about the idle state
straight away.

Basically, for my use case it would make more sense to let the
bus_type directly decide when a device should be suspended instead of
using a timeout before calling the bus_type code. I rather let the
bus_type decide if a timeout should be used or not instead of using it
for all bus_types.

So I guess the plan is that drivers directly should invoke
pm_request_suspend() to notify the bus that they are idle? (I guess
similar to my platform_device_idle()?)

For my use case there is no point in having the delay in
pm_request_suspend(), we want to notify the bus about the per-device
idleness straight away. Using a delay in pm_request_suspend() before
calling the bus type autosuspend will just keep the current per-device
state away from the bus level and make sure we _cannot_ enter deep
sleep states. Which I believe will result in worse battery life
because we spend more time than necessary in not-so-deep sleep states.

So yes, I'd like to do things in dev->bus->pm->autosuspend(), and the
code is quite close. I can't figure out why anyone would want the
suspend delay at the current level though, but I guess other busses
want to use that?

Thanks for your comments,

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re:  [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-12  3:13                         ` Magnus Damm
  0 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-12  3:13 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Hi Oliver,

On Thu, Jun 11, 2009 at 6:08 PM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
>> 3) When all devices in the power domain are suspended the bus code can
>> turn off the power. The reason why I'd like to only autosuspend when
>
> So you are saying that you have power dependencies independent
> of the device tree?

I can think of the following power dependencies:
- hardware bus topology
- clocks
- power domains

>> all devices are idle is simply that we don't get any power savings
>> from the per device autosuspend() callbacks, only from turning off
>> power to the entire per-domain. So bindly autosuspending and
>> autoresuming devices is just pure overhead unless we know we can do it
>> for all devices in the domain.
>
> Why can't you do this within the framework? You simply suspend when
> all a domain's devices have been autosuspended.

So you mean I should handle that in my arch/bus specific
dev->bus->pm->autosuspend() code? So instead of calling
dev->driver->pm->autosuspend() straight away I keep track of the use
count of the power domain and when the domain is unused I call
dev->driver->pm->autosuspend() for all devices in the power domain
before powering off?

I guess hooking in things in dev->bus->pm->autosuspend() is doable,
but then dev->power.runtime_status will be set to RPM_SUSPENDED even
though the actual device isn't suspended at all. And RPM_IDLE state
will be more or less unused since the drivers should pass a delay of
zero to make sure the bus code gets notified about the idle state
straight away.

Basically, for my use case it would make more sense to let the
bus_type directly decide when a device should be suspended instead of
using a timeout before calling the bus_type code. I rather let the
bus_type decide if a timeout should be used or not instead of using it
for all bus_types.

So I guess the plan is that drivers directly should invoke
pm_request_suspend() to notify the bus that they are idle? (I guess
similar to my platform_device_idle()?)

For my use case there is no point in having the delay in
pm_request_suspend(), we want to notify the bus about the per-device
idleness straight away. Using a delay in pm_request_suspend() before
calling the bus type autosuspend will just keep the current per-device
state away from the bus level and make sure we _cannot_ enter deep
sleep states. Which I believe will result in worse battery life
because we spend more time than necessary in not-so-deep sleep states.

So yes, I'd like to do things in dev->bus->pm->autosuspend(), and the
code is quite close. I can't figure out why anyone would want the
suspend delay at the current level though, but I guess other busses
want to use that?

Thanks for your comments,

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  3:13                         ` [patch update] Re: [linux-pm] " Magnus Damm
  (?)
@ 2009-06-12  8:11                         ` Oliver Neukum
  2009-06-12 10:54                           ` [patch update] " Magnus Damm
  2009-06-12 10:54                             ` Magnus Damm
  -1 siblings, 2 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-12  8:11 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Rafael J. Wysocki, Alan Stern, linux-pm, ACPI Devel Maling List, LKML

Am Freitag, 12. Juni 2009 05:13:12 schrieb Magnus Damm:
> Hi Oliver,
>
> On Thu, Jun 11, 2009 at 6:08 PM, Oliver Neukum<oliver@neukum.org> wrote:
> > Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
> >> 3) When all devices in the power domain are suspended the bus code can
> >> turn off the power. The reason why I'd like to only autosuspend when
> >
> > So you are saying that you have power dependencies independent
> > of the device tree?
>
> I can think of the following power dependencies:
> - hardware bus topology
> - clocks
> - power domains

That means that some devices otherwise unrelated have a common power
switch, doesn't it?

> >> all devices are idle is simply that we don't get any power savings
> >> from the per device autosuspend() callbacks, only from turning off
> >> power to the entire per-domain. So bindly autosuspending and
> >> autoresuming devices is just pure overhead unless we know we can do it
> >> for all devices in the domain.
> >
> > Why can't you do this within the framework? You simply suspend when
> > all a domain's devices have been autosuspended.
>
> So you mean I should handle that in my arch/bus specific
> dev->bus->pm->autosuspend() code? So instead of calling
> dev->driver->pm->autosuspend() straight away I keep track of the use
> count of the power domain and when the domain is unused I call
> dev->driver->pm->autosuspend() for all devices in the power domain
> before powering off?

How much overhead do you have in autosuspend() if it actually powers
down the devices? If it is small, I suggest you really run the autosuspend
methods in the drivers but use a counter for the actual power switching
on a bus level.

> I guess hooking in things in dev->bus->pm->autosuspend() is doable,
> but then dev->power.runtime_status will be set to RPM_SUSPENDED even
> though the actual device isn't suspended at all. And RPM_IDLE state

Why do you care about a device being in RPM_SUSPENDED while
active. The inverse is a bug, but this seems harmless.

> will be more or less unused since the drivers should pass a delay of
> zero to make sure the bus code gets notified about the idle state
> straight away.

So? You are not getting a common code without a little overhead
for some cases.

> Basically, for my use case it would make more sense to let the
> bus_type directly decide when a device should be suspended instead of
> using a timeout before calling the bus_type code. I rather let the
> bus_type decide if a timeout should be used or not instead of using it
> for all bus_types.

So call with a delay of 0.

> So I guess the plan is that drivers directly should invoke
> pm_request_suspend() to notify the bus that they are idle? (I guess
> similar to my platform_device_idle()?)

Yes.

> So yes, I'd like to do things in dev->bus->pm->autosuspend(), and the
> code is quite close. I can't figure out why anyone would want the
> suspend delay at the current level though, but I guess other busses
> want to use that?

In your case resumption seems to cost almost no energy. In other
cases you must avoid short sleeps, as you conserve less energy
sleeping than it takes to resume.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  3:13                         ` [patch update] Re: [linux-pm] " Magnus Damm
  (?)
  (?)
@ 2009-06-12  8:11                         ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-12  8:11 UTC (permalink / raw)
  To: Magnus Damm; +Cc: ACPI Devel Maling List, linux-pm, LKML

Am Freitag, 12. Juni 2009 05:13:12 schrieb Magnus Damm:
> Hi Oliver,
>
> On Thu, Jun 11, 2009 at 6:08 PM, Oliver Neukum<oliver@neukum.org> wrote:
> > Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
> >> 3) When all devices in the power domain are suspended the bus code can
> >> turn off the power. The reason why I'd like to only autosuspend when
> >
> > So you are saying that you have power dependencies independent
> > of the device tree?
>
> I can think of the following power dependencies:
> - hardware bus topology
> - clocks
> - power domains

That means that some devices otherwise unrelated have a common power
switch, doesn't it?

> >> all devices are idle is simply that we don't get any power savings
> >> from the per device autosuspend() callbacks, only from turning off
> >> power to the entire per-domain. So bindly autosuspending and
> >> autoresuming devices is just pure overhead unless we know we can do it
> >> for all devices in the domain.
> >
> > Why can't you do this within the framework? You simply suspend when
> > all a domain's devices have been autosuspended.
>
> So you mean I should handle that in my arch/bus specific
> dev->bus->pm->autosuspend() code? So instead of calling
> dev->driver->pm->autosuspend() straight away I keep track of the use
> count of the power domain and when the domain is unused I call
> dev->driver->pm->autosuspend() for all devices in the power domain
> before powering off?

How much overhead do you have in autosuspend() if it actually powers
down the devices? If it is small, I suggest you really run the autosuspend
methods in the drivers but use a counter for the actual power switching
on a bus level.

> I guess hooking in things in dev->bus->pm->autosuspend() is doable,
> but then dev->power.runtime_status will be set to RPM_SUSPENDED even
> though the actual device isn't suspended at all. And RPM_IDLE state

Why do you care about a device being in RPM_SUSPENDED while
active. The inverse is a bug, but this seems harmless.

> will be more or less unused since the drivers should pass a delay of
> zero to make sure the bus code gets notified about the idle state
> straight away.

So? You are not getting a common code without a little overhead
for some cases.

> Basically, for my use case it would make more sense to let the
> bus_type directly decide when a device should be suspended instead of
> using a timeout before calling the bus_type code. I rather let the
> bus_type decide if a timeout should be used or not instead of using it
> for all bus_types.

So call with a delay of 0.

> So I guess the plan is that drivers directly should invoke
> pm_request_suspend() to notify the bus that they are idle? (I guess
> similar to my platform_device_idle()?)

Yes.

> So yes, I'd like to do things in dev->bus->pm->autosuspend(), and the
> code is quite close. I can't figure out why anyone would want the
> suspend delay at the current level though, but I guess other busses
> want to use that?

In your case resumption seems to cost almost no energy. In other
cases you must avoid short sleeps, as you conserve less energy
sleeping than it takes to resume.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  2:16                                         ` Alan Stern
  (?)
@ 2009-06-12  8:15                                         ` Oliver Neukum
  2009-06-12 14:32                                           ` [patch update] " Alan Stern
  2009-06-12 14:32                                             ` Alan Stern
  -1 siblings, 2 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-12  8:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

Am Freitag, 12. Juni 2009 04:16:10 schrieb Alan Stern:
> What tree constraint?  You mean that the PM core shouldn't allow
> devices to suspend unless all their children are suspended?  Why
> doesn't it still apply?

Because the hardware doesn't need it.

> Remember, when Rafael and I say "suspend" here, we don't mean "go to a
> low-power state".  We mean "the PM core calls the runtime_suspend
> method".  No matter what actions the link hardware may decide to take
> on its own, the PM core will still want to observe the
> all-children-suspended restriction when calling runtime_suspend
> methods.

No. The core if it insists all children be suspended will not use
the hardware's full capabilities.
If it leaves such power saving measures to the drivers, latency
accounting will be wrong.

> > I think there are devices who can be suspended while children are active
> > and devices which can not be. This is an attribute of the device and
> > should be evaluated by the core.
>
> Clearly it should be decided by the driver.  Should there be a bit for
> it in the dev_pm_info structure?

Yes.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  2:16                                         ` Alan Stern
  (?)
  (?)
@ 2009-06-12  8:15                                         ` Oliver Neukum
  -1 siblings, 0 replies; 199+ messages in thread
From: Oliver Neukum @ 2009-06-12  8:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

Am Freitag, 12. Juni 2009 04:16:10 schrieb Alan Stern:
> What tree constraint?  You mean that the PM core shouldn't allow
> devices to suspend unless all their children are suspended?  Why
> doesn't it still apply?

Because the hardware doesn't need it.

> Remember, when Rafael and I say "suspend" here, we don't mean "go to a
> low-power state".  We mean "the PM core calls the runtime_suspend
> method".  No matter what actions the link hardware may decide to take
> on its own, the PM core will still want to observe the
> all-children-suspended restriction when calling runtime_suspend
> methods.

No. The core if it insists all children be suspended will not use
the hardware's full capabilities.
If it leaves such power saving measures to the drivers, latency
accounting will be wrong.

> > I think there are devices who can be suspended while children are active
> > and devices which can not be. This is an attribute of the device and
> > should be evaluated by the core.
>
> Clearly it should be decided by the driver.  Should there be a bit for
> it in the dev_pm_info structure?

Yes.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  8:11                         ` Oliver Neukum
@ 2009-06-12 10:54                             ` Magnus Damm
  2009-06-12 10:54                             ` Magnus Damm
  1 sibling, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-12 10:54 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Alan Stern, linux-pm, ACPI Devel Maling List, LKML

On Fri, Jun 12, 2009 at 5:11 PM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Freitag, 12. Juni 2009 05:13:12 schrieb Magnus Damm:
>> Hi Oliver,
>>
>> On Thu, Jun 11, 2009 at 6:08 PM, Oliver Neukum<oliver@neukum.org> wrote:
>> > Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
>> >> 3) When all devices in the power domain are suspended the bus code can
>> >> turn off the power. The reason why I'd like to only autosuspend when
>> >
>> > So you are saying that you have power dependencies independent
>> > of the device tree?
>>
>> I can think of the following power dependencies:
>> - hardware bus topology
>> - clocks
>> - power domains
>
> That means that some devices otherwise unrelated have a common power
> switch, doesn't it?

Yes, various on chip devices share the same power switch.

On the SuperH Mobile SoCs that I've seen we basically have two power
domains. One big which contains almost everything except what's in the
second domain: a timer, watchdog and a rtc. I guess the devices are
related to each other because they are in the same SoC, but exactly
how they relate to each other depends on actual processor type. Same
for clocks.

This presentation may give you an overview of the SuperH Mobile
hardware and where we are today:

http://www.celinuxforum.org/CelfPubWiki/ELC2009Presentations?action=AttachFile&do=view&target=Runtime-Power-Management-on-SuperH-Mobile-20090407.pdf

>> >> all devices are idle is simply that we don't get any power savings
>> >> from the per device autosuspend() callbacks, only from turning off
>> >> power to the entire per-domain. So bindly autosuspending and
>> >> autoresuming devices is just pure overhead unless we know we can do it
>> >> for all devices in the domain.
>> >
>> > Why can't you do this within the framework? You simply suspend when
>> > all a domain's devices have been autosuspended.
>>
>> So you mean I should handle that in my arch/bus specific
>> dev->bus->pm->autosuspend() code? So instead of calling
>> dev->driver->pm->autosuspend() straight away I keep track of the use
>> count of the power domain and when the domain is unused I call
>> dev->driver->pm->autosuspend() for all devices in the power domain
>> before powering off?
>
> How much overhead do you have in autosuspend() if it actually powers
> down the devices? If it is small, I suggest you really run the autosuspend
> methods in the drivers but use a counter for the actual power switching
> on a bus level.

On a SoC scale, returning from the deepest sleep state takes a bit of
time. It depends on clock configuration, but worst case i think it
takes ~3ms to come back from the deepest sleep. Just to let the cpu
core start executing instructions. And on the way back we also have to
setup almost the entire system from scratch which probaly takes quite
a bit of time.

Why I don't want to call dev->driver->pm_autosuspend() directly is
basically our per-processor model hard coded clock dependencies. Some
open devices may block deep sleep, for instance if a serial port is
open we may not deep sleep unless we can live with stopping the clock
and potentially loosing incoming data. So if these open devices block
the entire power domain then there is no point in executing the
dev->driver->pm autosuspend()/autoresume() all the time since we
basically will never power off the domain.

The autosuspend()/autoresume() driver callbacks will save and restore
registers which is quite expensive since each memory access is
uncached. So I don't want to do that more than absolutely necessary.

But I can handle that in the bus type code, no problem. I'm sure ARM
can as well.

>> I guess hooking in things in dev->bus->pm->autosuspend() is doable,
>> but then dev->power.runtime_status will be set to RPM_SUSPENDED even
>> though the actual device isn't suspended at all. And RPM_IDLE state
>
> Why do you care about a device being in RPM_SUSPENDED while
> active. The inverse is a bug, but this seems harmless.

Ok, just wanted to check if you guys agreed with that as a valid combination.

>> will be more or less unused since the drivers should pass a delay of
>> zero to make sure the bus code gets notified about the idle state
>> straight away.
>
> So? You are not getting a common code without a little overhead
> for some cases.

Some overhead is of course acceptable. If it's too much then we can
always optimize later.

>> Basically, for my use case it would make more sense to let the
>> bus_type directly decide when a device should be suspended instead of
>> using a timeout before calling the bus_type code. I rather let the
>> bus_type decide if a timeout should be used or not instead of using it
>> for all bus_types.
>
> So call with a delay of 0.

Sure.

>> So I guess the plan is that drivers directly should invoke
>> pm_request_suspend() to notify the bus that they are idle? (I guess
>> similar to my platform_device_idle()?)
>
> Yes.

Ok, sounds good.

>> So yes, I'd like to do things in dev->bus->pm->autosuspend(), and the
>> code is quite close. I can't figure out why anyone would want the
>> suspend delay at the current level though, but I guess other busses
>> want to use that?
>
> In your case resumption seems to cost almost no energy. In other
> cases you must avoid short sleeps, as you conserve less energy
> sleeping than it takes to resume.

I want to avoid that as well, but I don't think a per-device timeout
is enough. I need to tie in latencies into this somehow. But I prefer
to do that after getting the basic stuff working. Step by step.

Thanks,

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re:  [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-12 10:54                             ` Magnus Damm
  0 siblings, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-12 10:54 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Alan Stern, linux-pm, ACPI Devel Maling List, LKML

On Fri, Jun 12, 2009 at 5:11 PM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Freitag, 12. Juni 2009 05:13:12 schrieb Magnus Damm:
>> Hi Oliver,
>>
>> On Thu, Jun 11, 2009 at 6:08 PM, Oliver Neukum<oliver@neukum.org> wrote:
>> > Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
>> >> 3) When all devices in the power domain are suspended the bus code can
>> >> turn off the power. The reason why I'd like to only autosuspend when
>> >
>> > So you are saying that you have power dependencies independent
>> > of the device tree?
>>
>> I can think of the following power dependencies:
>> - hardware bus topology
>> - clocks
>> - power domains
>
> That means that some devices otherwise unrelated have a common power
> switch, doesn't it?

Yes, various on chip devices share the same power switch.

On the SuperH Mobile SoCs that I've seen we basically have two power
domains. One big which contains almost everything except what's in the
second domain: a timer, watchdog and a rtc. I guess the devices are
related to each other because they are in the same SoC, but exactly
how they relate to each other depends on actual processor type. Same
for clocks.

This presentation may give you an overview of the SuperH Mobile
hardware and where we are today:

http://www.celinuxforum.org/CelfPubWiki/ELC2009Presentations?action=AttachFile&do=view&target=Runtime-Power-Management-on-SuperH-Mobile-20090407.pdf

>> >> all devices are idle is simply that we don't get any power savings
>> >> from the per device autosuspend() callbacks, only from turning off
>> >> power to the entire per-domain. So bindly autosuspending and
>> >> autoresuming devices is just pure overhead unless we know we can do it
>> >> for all devices in the domain.
>> >
>> > Why can't you do this within the framework? You simply suspend when
>> > all a domain's devices have been autosuspended.
>>
>> So you mean I should handle that in my arch/bus specific
>> dev->bus->pm->autosuspend() code? So instead of calling
>> dev->driver->pm->autosuspend() straight away I keep track of the use
>> count of the power domain and when the domain is unused I call
>> dev->driver->pm->autosuspend() for all devices in the power domain
>> before powering off?
>
> How much overhead do you have in autosuspend() if it actually powers
> down the devices? If it is small, I suggest you really run the autosuspend
> methods in the drivers but use a counter for the actual power switching
> on a bus level.

On a SoC scale, returning from the deepest sleep state takes a bit of
time. It depends on clock configuration, but worst case i think it
takes ~3ms to come back from the deepest sleep. Just to let the cpu
core start executing instructions. And on the way back we also have to
setup almost the entire system from scratch which probaly takes quite
a bit of time.

Why I don't want to call dev->driver->pm_autosuspend() directly is
basically our per-processor model hard coded clock dependencies. Some
open devices may block deep sleep, for instance if a serial port is
open we may not deep sleep unless we can live with stopping the clock
and potentially loosing incoming data. So if these open devices block
the entire power domain then there is no point in executing the
dev->driver->pm autosuspend()/autoresume() all the time since we
basically will never power off the domain.

The autosuspend()/autoresume() driver callbacks will save and restore
registers which is quite expensive since each memory access is
uncached. So I don't want to do that more than absolutely necessary.

But I can handle that in the bus type code, no problem. I'm sure ARM
can as well.

>> I guess hooking in things in dev->bus->pm->autosuspend() is doable,
>> but then dev->power.runtime_status will be set to RPM_SUSPENDED even
>> though the actual device isn't suspended at all. And RPM_IDLE state
>
> Why do you care about a device being in RPM_SUSPENDED while
> active. The inverse is a bug, but this seems harmless.

Ok, just wanted to check if you guys agreed with that as a valid combination.

>> will be more or less unused since the drivers should pass a delay of
>> zero to make sure the bus code gets notified about the idle state
>> straight away.
>
> So? You are not getting a common code without a little overhead
> for some cases.

Some overhead is of course acceptable. If it's too much then we can
always optimize later.

>> Basically, for my use case it would make more sense to let the
>> bus_type directly decide when a device should be suspended instead of
>> using a timeout before calling the bus_type code. I rather let the
>> bus_type decide if a timeout should be used or not instead of using it
>> for all bus_types.
>
> So call with a delay of 0.

Sure.

>> So I guess the plan is that drivers directly should invoke
>> pm_request_suspend() to notify the bus that they are idle? (I guess
>> similar to my platform_device_idle()?)
>
> Yes.

Ok, sounds good.

>> So yes, I'd like to do things in dev->bus->pm->autosuspend(), and the
>> code is quite close. I can't figure out why anyone would want the
>> suspend delay at the current level though, but I guess other busses
>> want to use that?
>
> In your case resumption seems to cost almost no energy. In other
> cases you must avoid short sleeps, as you conserve less energy
> sleeping than it takes to resume.

I want to avoid that as well, but I don't think a per-device timeout
is enough. I need to tie in latencies into this somehow. But I prefer
to do that after getting the basic stuff working. Step by step.

Thanks,

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  8:11                         ` Oliver Neukum
@ 2009-06-12 10:54                           ` Magnus Damm
  2009-06-12 10:54                             ` Magnus Damm
  1 sibling, 0 replies; 199+ messages in thread
From: Magnus Damm @ 2009-06-12 10:54 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, linux-pm, LKML

On Fri, Jun 12, 2009 at 5:11 PM, Oliver Neukum<oliver@neukum.org> wrote:
> Am Freitag, 12. Juni 2009 05:13:12 schrieb Magnus Damm:
>> Hi Oliver,
>>
>> On Thu, Jun 11, 2009 at 6:08 PM, Oliver Neukum<oliver@neukum.org> wrote:
>> > Am Donnerstag, 11. Juni 2009 07:18:46 schrieb Magnus Damm:
>> >> 3) When all devices in the power domain are suspended the bus code can
>> >> turn off the power. The reason why I'd like to only autosuspend when
>> >
>> > So you are saying that you have power dependencies independent
>> > of the device tree?
>>
>> I can think of the following power dependencies:
>> - hardware bus topology
>> - clocks
>> - power domains
>
> That means that some devices otherwise unrelated have a common power
> switch, doesn't it?

Yes, various on chip devices share the same power switch.

On the SuperH Mobile SoCs that I've seen we basically have two power
domains. One big which contains almost everything except what's in the
second domain: a timer, watchdog and a rtc. I guess the devices are
related to each other because they are in the same SoC, but exactly
how they relate to each other depends on actual processor type. Same
for clocks.

This presentation may give you an overview of the SuperH Mobile
hardware and where we are today:

http://www.celinuxforum.org/CelfPubWiki/ELC2009Presentations?action=AttachFile&do=view&target=Runtime-Power-Management-on-SuperH-Mobile-20090407.pdf

>> >> all devices are idle is simply that we don't get any power savings
>> >> from the per device autosuspend() callbacks, only from turning off
>> >> power to the entire per-domain. So bindly autosuspending and
>> >> autoresuming devices is just pure overhead unless we know we can do it
>> >> for all devices in the domain.
>> >
>> > Why can't you do this within the framework? You simply suspend when
>> > all a domain's devices have been autosuspended.
>>
>> So you mean I should handle that in my arch/bus specific
>> dev->bus->pm->autosuspend() code? So instead of calling
>> dev->driver->pm->autosuspend() straight away I keep track of the use
>> count of the power domain and when the domain is unused I call
>> dev->driver->pm->autosuspend() for all devices in the power domain
>> before powering off?
>
> How much overhead do you have in autosuspend() if it actually powers
> down the devices? If it is small, I suggest you really run the autosuspend
> methods in the drivers but use a counter for the actual power switching
> on a bus level.

On a SoC scale, returning from the deepest sleep state takes a bit of
time. It depends on clock configuration, but worst case i think it
takes ~3ms to come back from the deepest sleep. Just to let the cpu
core start executing instructions. And on the way back we also have to
setup almost the entire system from scratch which probaly takes quite
a bit of time.

Why I don't want to call dev->driver->pm_autosuspend() directly is
basically our per-processor model hard coded clock dependencies. Some
open devices may block deep sleep, for instance if a serial port is
open we may not deep sleep unless we can live with stopping the clock
and potentially loosing incoming data. So if these open devices block
the entire power domain then there is no point in executing the
dev->driver->pm autosuspend()/autoresume() all the time since we
basically will never power off the domain.

The autosuspend()/autoresume() driver callbacks will save and restore
registers which is quite expensive since each memory access is
uncached. So I don't want to do that more than absolutely necessary.

But I can handle that in the bus type code, no problem. I'm sure ARM
can as well.

>> I guess hooking in things in dev->bus->pm->autosuspend() is doable,
>> but then dev->power.runtime_status will be set to RPM_SUSPENDED even
>> though the actual device isn't suspended at all. And RPM_IDLE state
>
> Why do you care about a device being in RPM_SUSPENDED while
> active. The inverse is a bug, but this seems harmless.

Ok, just wanted to check if you guys agreed with that as a valid combination.

>> will be more or less unused since the drivers should pass a delay of
>> zero to make sure the bus code gets notified about the idle state
>> straight away.
>
> So? You are not getting a common code without a little overhead
> for some cases.

Some overhead is of course acceptable. If it's too much then we can
always optimize later.

>> Basically, for my use case it would make more sense to let the
>> bus_type directly decide when a device should be suspended instead of
>> using a timeout before calling the bus_type code. I rather let the
>> bus_type decide if a timeout should be used or not instead of using it
>> for all bus_types.
>
> So call with a delay of 0.

Sure.

>> So I guess the plan is that drivers directly should invoke
>> pm_request_suspend() to notify the bus that they are idle? (I guess
>> similar to my platform_device_idle()?)
>
> Yes.

Ok, sounds good.

>> So yes, I'd like to do things in dev->bus->pm->autosuspend(), and the
>> code is quite close. I can't figure out why anyone would want the
>> suspend delay at the current level though, but I guess other busses
>> want to use that?
>
> In your case resumption seems to cost almost no energy. In other
> cases you must avoid short sleeps, as you conserve less energy
> sleeping than it takes to resume.

I want to avoid that as well, but I don't think a per-device timeout
is enough. I need to tie in latencies into this somehow. But I prefer
to do that after getting the basic stuff working. Step by step.

Thanks,

/ magnus

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 19:43                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-12 14:25                               ` [patch update] " Alan Stern
@ 2009-06-12 14:25                               ` Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 14:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Thu, 11 Jun 2009, Rafael J. Wysocki wrote:

> > > > It might also be nice to make sure that the driver core autoresumes a 
> > > > device before probing it and autosuspends a device (after some 
> > > > reasonable delay) after unbinding its driver.
> > > 
> > > Agreed.
> > 
> > This is another case where a usage counter comes in handy.  The driver
> > core resumes the device and increments the counter -- thus preventing
> > any unwanted autosuspends -- before making the probe and remove calls.
> 
> I like this idea.
> 
> BTW, where exactly the counter should be increased in that case?
> 
> I thought of driver_probe_device(), but is it sufficient?  Or is there a better
> place?

That's okay.  Or you could put it in really_probe().  Either one.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-11 19:43                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-12 14:25                               ` Alan Stern
  2009-06-12 14:25                               ` [patch update] Re: [linux-pm] " Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 14:25 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Thu, 11 Jun 2009, Rafael J. Wysocki wrote:

> > > > It might also be nice to make sure that the driver core autoresumes a 
> > > > device before probing it and autosuspends a device (after some 
> > > > reasonable delay) after unbinding its driver.
> > > 
> > > Agreed.
> > 
> > This is another case where a usage counter comes in handy.  The driver
> > core resumes the device and increments the counter -- thus preventing
> > any unwanted autosuspends -- before making the probe and remove calls.
> 
> I like this idea.
> 
> BTW, where exactly the counter should be increased in that case?
> 
> I thought of driver_probe_device(), but is it sufficient?  Or is there a better
> place?

That's okay.  Or you could put it in really_probe().  Either one.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  8:15                                         ` Oliver Neukum
@ 2009-06-12 14:32                                             ` Alan Stern
  2009-06-12 14:32                                             ` Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 14:32 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Fri, 12 Jun 2009, Oliver Neukum wrote:

> Am Freitag, 12. Juni 2009 04:16:10 schrieb Alan Stern:
> > What tree constraint?  You mean that the PM core shouldn't allow
> > devices to suspend unless all their children are suspended?  Why
> > doesn't it still apply?
> 
> Because the hardware doesn't need it.

But maybe drivers need it.

> > Remember, when Rafael and I say "suspend" here, we don't mean "go to a
> > low-power state".  We mean "the PM core calls the runtime_suspend
> > method".  No matter what actions the link hardware may decide to take
> > on its own, the PM core will still want to observe the
> > all-children-suspended restriction when calling runtime_suspend
> > methods.
> 
> No. The core if it insists all children be suspended will not use
> the hardware's full capabilities.

That isn't what I said.  The core does not insist that all children be 
suspended, i.e., be in a low-power state.  It insists only that the 
children's drivers' runtime_suspend methods have been called.  Those 
methods are not obligated to put the children in a low-power state.

> If it leaves such power saving measures to the drivers, latency
> accounting will be wrong.
> 
> > > I think there are devices who can be suspended while children are active
> > > and devices which can not be. This is an attribute of the device and
> > > should be evaluated by the core.
> >
> > Clearly it should be decided by the driver.  Should there be a bit for
> > it in the dev_pm_info structure?
> 
> Yes.

That would resolve the issue.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-12 14:32                                             ` Alan Stern
  0 siblings, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 14:32 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Rafael J. Wysocki, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Fri, 12 Jun 2009, Oliver Neukum wrote:

> Am Freitag, 12. Juni 2009 04:16:10 schrieb Alan Stern:
> > What tree constraint?  You mean that the PM core shouldn't allow
> > devices to suspend unless all their children are suspended?  Why
> > doesn't it still apply?
> 
> Because the hardware doesn't need it.

But maybe drivers need it.

> > Remember, when Rafael and I say "suspend" here, we don't mean "go to a
> > low-power state".  We mean "the PM core calls the runtime_suspend
> > method".  No matter what actions the link hardware may decide to take
> > on its own, the PM core will still want to observe the
> > all-children-suspended restriction when calling runtime_suspend
> > methods.
> 
> No. The core if it insists all children be suspended will not use
> the hardware's full capabilities.

That isn't what I said.  The core does not insist that all children be 
suspended, i.e., be in a low-power state.  It insists only that the 
children's drivers' runtime_suspend methods have been called.  Those 
methods are not obligated to put the children in a low-power state.

> If it leaves such power saving measures to the drivers, latency
> accounting will be wrong.
> 
> > > I think there are devices who can be suspended while children are active
> > > and devices which can not be. This is an attribute of the device and
> > > should be evaluated by the core.
> >
> > Clearly it should be decided by the driver.  Should there be a bit for
> > it in the dev_pm_info structure?
> 
> Yes.

That would resolve the issue.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12  8:15                                         ` Oliver Neukum
@ 2009-06-12 14:32                                           ` Alan Stern
  2009-06-12 14:32                                             ` Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 14:32 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Fri, 12 Jun 2009, Oliver Neukum wrote:

> Am Freitag, 12. Juni 2009 04:16:10 schrieb Alan Stern:
> > What tree constraint?  You mean that the PM core shouldn't allow
> > devices to suspend unless all their children are suspended?  Why
> > doesn't it still apply?
> 
> Because the hardware doesn't need it.

But maybe drivers need it.

> > Remember, when Rafael and I say "suspend" here, we don't mean "go to a
> > low-power state".  We mean "the PM core calls the runtime_suspend
> > method".  No matter what actions the link hardware may decide to take
> > on its own, the PM core will still want to observe the
> > all-children-suspended restriction when calling runtime_suspend
> > methods.
> 
> No. The core if it insists all children be suspended will not use
> the hardware's full capabilities.

That isn't what I said.  The core does not insist that all children be 
suspended, i.e., be in a low-power state.  It insists only that the 
children's drivers' runtime_suspend methods have been called.  Those 
methods are not obligated to put the children in a low-power state.

> If it leaves such power saving measures to the drivers, latency
> accounting will be wrong.
> 
> > > I think there are devices who can be suspended while children are active
> > > and devices which can not be. This is an attribute of the device and
> > > should be evaluated by the core.
> >
> > Clearly it should be decided by the driver.  Should there be a bit for
> > it in the dev_pm_info structure?
> 
> Yes.

That would resolve the issue.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 14:32                                             ` Alan Stern
  (?)
  (?)
@ 2009-06-12 19:09                                             ` Rafael J. Wysocki
  2009-06-12 19:48                                               ` Alan Stern
  2009-06-12 19:48                                               ` Alan Stern
  -1 siblings, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-12 19:09 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Friday 12 June 2009, Alan Stern wrote:
> On Fri, 12 Jun 2009, Oliver Neukum wrote:
> 
> > Am Freitag, 12. Juni 2009 04:16:10 schrieb Alan Stern:
> > > What tree constraint?  You mean that the PM core shouldn't allow
> > > devices to suspend unless all their children are suspended?  Why
> > > doesn't it still apply?
> > 
> > Because the hardware doesn't need it.
> 
> But maybe drivers need it.
> 
> > > Remember, when Rafael and I say "suspend" here, we don't mean "go to a
> > > low-power state".  We mean "the PM core calls the runtime_suspend
> > > method".  No matter what actions the link hardware may decide to take
> > > on its own, the PM core will still want to observe the
> > > all-children-suspended restriction when calling runtime_suspend
> > > methods.
> > 
> > No. The core if it insists all children be suspended will not use
> > the hardware's full capabilities.
> 
> That isn't what I said.  The core does not insist that all children be 
> suspended, i.e., be in a low-power state.  It insists only that the 
> children's drivers' runtime_suspend methods have been called.  Those 
> methods are not obligated to put the children in a low-power state.
> 
> > If it leaves such power saving measures to the drivers, latency
> > accounting will be wrong.
> > 
> > > > I think there are devices who can be suspended while children are active
> > > > and devices which can not be. This is an attribute of the device and
> > > > should be evaluated by the core.
> > >
> > > Clearly it should be decided by the driver.  Should there be a bit for
> > > it in the dev_pm_info structure?
> > 
> > Yes.
> 
> That would resolve the issue.

So, are you suggesting that the core should only check the "all children
suspended" condition if special flag is set in dev_pm_info?

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 14:32                                             ` Alan Stern
  (?)
@ 2009-06-12 19:09                                             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-12 19:09 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Friday 12 June 2009, Alan Stern wrote:
> On Fri, 12 Jun 2009, Oliver Neukum wrote:
> 
> > Am Freitag, 12. Juni 2009 04:16:10 schrieb Alan Stern:
> > > What tree constraint?  You mean that the PM core shouldn't allow
> > > devices to suspend unless all their children are suspended?  Why
> > > doesn't it still apply?
> > 
> > Because the hardware doesn't need it.
> 
> But maybe drivers need it.
> 
> > > Remember, when Rafael and I say "suspend" here, we don't mean "go to a
> > > low-power state".  We mean "the PM core calls the runtime_suspend
> > > method".  No matter what actions the link hardware may decide to take
> > > on its own, the PM core will still want to observe the
> > > all-children-suspended restriction when calling runtime_suspend
> > > methods.
> > 
> > No. The core if it insists all children be suspended will not use
> > the hardware's full capabilities.
> 
> That isn't what I said.  The core does not insist that all children be 
> suspended, i.e., be in a low-power state.  It insists only that the 
> children's drivers' runtime_suspend methods have been called.  Those 
> methods are not obligated to put the children in a low-power state.
> 
> > If it leaves such power saving measures to the drivers, latency
> > accounting will be wrong.
> > 
> > > > I think there are devices who can be suspended while children are active
> > > > and devices which can not be. This is an attribute of the device and
> > > > should be evaluated by the core.
> > >
> > > Clearly it should be decided by the driver.  Should there be a bit for
> > > it in the dev_pm_info structure?
> > 
> > Yes.
> 
> That would resolve the issue.

So, are you suggesting that the core should only check the "all children
suspended" condition if special flag is set in dev_pm_info?

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 19:09                                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-12 19:48                                               ` Alan Stern
  2009-06-12 19:56                                                 ` [patch update] " Rafael J. Wysocki
  2009-06-12 19:56                                                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-12 19:48                                               ` Alan Stern
  1 sibling, 2 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 19:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:

> So, are you suggesting that the core should only check the "all children
> suspended" condition if special flag is set in dev_pm_info?

Or rather, check it only if the special flag _isn't_ set.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 19:09                                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-12 19:48                                               ` Alan Stern
@ 2009-06-12 19:48                                               ` Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 19:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:

> So, are you suggesting that the core should only check the "all children
> suspended" condition if special flag is set in dev_pm_info?

Or rather, check it only if the special flag _isn't_ set.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 19:48                                               ` Alan Stern
  2009-06-12 19:56                                                 ` [patch update] " Rafael J. Wysocki
@ 2009-06-12 19:56                                                 ` Rafael J. Wysocki
  2009-06-12 21:23                                                   ` Alan Stern
  2009-06-12 21:23                                                   ` [patch update] " Alan Stern
  1 sibling, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-12 19:56 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Friday 12 June 2009, Alan Stern wrote:
> On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> 
> > So, are you suggesting that the core should only check the "all children
> > suspended" condition if special flag is set in dev_pm_info?
> 
> Or rather, check it only if the special flag _isn't_ set.

Where the default is unset, I guess?

But then, what about the resuming of the parents before the device is resumed?
Should the parents be resumed regardless of the flag state?  And if so, what's
the condition for breaking the recurrence?  Surely it's not sufficient to check
if the parent is active, because its parent need not be active if it has this
special flag set.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 19:48                                               ` Alan Stern
@ 2009-06-12 19:56                                                 ` Rafael J. Wysocki
  2009-06-12 19:56                                                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-12 19:56 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Friday 12 June 2009, Alan Stern wrote:
> On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> 
> > So, are you suggesting that the core should only check the "all children
> > suspended" condition if special flag is set in dev_pm_info?
> 
> Or rather, check it only if the special flag _isn't_ set.

Where the default is unset, I guess?

But then, what about the resuming of the parents before the device is resumed?
Should the parents be resumed regardless of the flag state?  And if so, what's
the condition for breaking the recurrence?  Surely it's not sufficient to check
if the parent is active, because its parent need not be active if it has this
special flag set.

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 19:56                                                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-12 21:23                                                   ` Alan Stern
  2009-06-12 23:06                                                     ` [patch update] " Rafael J. Wysocki
  2009-06-12 23:06                                                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-12 21:23                                                   ` [patch update] " Alan Stern
  1 sibling, 2 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 21:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:

> On Friday 12 June 2009, Alan Stern wrote:
> > On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> > 
> > > So, are you suggesting that the core should only check the "all children
> > > suspended" condition if special flag is set in dev_pm_info?
> > 
> > Or rather, check it only if the special flag _isn't_ set.
> 
> Where the default is unset, I guess?

Yep.

> But then, what about the resuming of the parents before the device is resumed?
> Should the parents be resumed regardless of the flag state?

Yes.  In general you should assume a device's parent (and the device
itself!) needs to be resumed whenever the kernel wants to do something
with the device.  The special flag arises because sometimes it's safe
to suspend the parent without suspending the device _if_ the kernel
isn't using the device.

Imagine an idle disk at the end of a link.  We might want to 
autosuspend the link without spinning down the disk.  When we have to 
communicate with the disk again, we autoresume the link.  (Including 
the case where the communication is a "spin-down" command.)

>  And if so, what's
> the condition for breaking the recurrence?  Surely it's not sufficient to check
> if the parent is active, because its parent need not be active if it has this
> special flag set.

That's a good question.  Let's assume that situations like this will be 
handled by the drivers.

For example, suppose A is the parent of B is the parent of C, and A is
suspended but B isn't and C is.  What happens when somebody wants to
use C?

An autoresume request is generated for C.  Since C's parent is already
resumed, the runtime_resume method in C's driver is called.  The driver
has to do some I/O in order to resume C, so it passes an I/O request up
to B's driver.  The request then gets passed up to A's driver.  This
driver knows that A is suspended, so it starts an autoresume of A and
waits for the autoresume to complete before carrying out the request.

Then the I/O can go through, so C gets resumed and everything works 
out.

I don't know how often this sort of pattern will arise.  It certainly
could be used in usb-storage; there would be no difficulty starting an
autoresume when an I/O request arrives from the SCSI layer below.  In
fact, that is exactly how some early runtime-PM patches for usb-storage
worked.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 19:56                                                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-12 21:23                                                   ` Alan Stern
@ 2009-06-12 21:23                                                   ` Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-12 21:23 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:

> On Friday 12 June 2009, Alan Stern wrote:
> > On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> > 
> > > So, are you suggesting that the core should only check the "all children
> > > suspended" condition if special flag is set in dev_pm_info?
> > 
> > Or rather, check it only if the special flag _isn't_ set.
> 
> Where the default is unset, I guess?

Yep.

> But then, what about the resuming of the parents before the device is resumed?
> Should the parents be resumed regardless of the flag state?

Yes.  In general you should assume a device's parent (and the device
itself!) needs to be resumed whenever the kernel wants to do something
with the device.  The special flag arises because sometimes it's safe
to suspend the parent without suspending the device _if_ the kernel
isn't using the device.

Imagine an idle disk at the end of a link.  We might want to 
autosuspend the link without spinning down the disk.  When we have to 
communicate with the disk again, we autoresume the link.  (Including 
the case where the communication is a "spin-down" command.)

>  And if so, what's
> the condition for breaking the recurrence?  Surely it's not sufficient to check
> if the parent is active, because its parent need not be active if it has this
> special flag set.

That's a good question.  Let's assume that situations like this will be 
handled by the drivers.

For example, suppose A is the parent of B is the parent of C, and A is
suspended but B isn't and C is.  What happens when somebody wants to
use C?

An autoresume request is generated for C.  Since C's parent is already
resumed, the runtime_resume method in C's driver is called.  The driver
has to do some I/O in order to resume C, so it passes an I/O request up
to B's driver.  The request then gets passed up to A's driver.  This
driver knows that A is suspended, so it starts an autoresume of A and
waits for the autoresume to complete before carrying out the request.

Then the I/O can go through, so C gets resumed and everything works 
out.

I don't know how often this sort of pattern will arise.  It certainly
could be used in usb-storage; there would be no difficulty starting an
autoresume when an I/O request arrives from the SCSI layer below.  In
fact, that is exactly how some early runtime-PM patches for usb-storage
worked.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 21:23                                                   ` Alan Stern
  2009-06-12 23:06                                                     ` [patch update] " Rafael J. Wysocki
@ 2009-06-12 23:06                                                     ` Rafael J. Wysocki
  2009-06-13 18:08                                                       ` [patch update] " Alan Stern
  2009-06-13 18:08                                                       ` [patch update] Re: [linux-pm] " Alan Stern
  1 sibling, 2 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-12 23:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Friday 12 June 2009, Alan Stern wrote:
> On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> 
> > On Friday 12 June 2009, Alan Stern wrote:
> > > On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> > > 
> > > > So, are you suggesting that the core should only check the "all children
> > > > suspended" condition if special flag is set in dev_pm_info?
> > > 
> > > Or rather, check it only if the special flag _isn't_ set.
> > 
> > Where the default is unset, I guess?
> 
> Yep.
> 
> > But then, what about the resuming of the parents before the device is resumed?
> > Should the parents be resumed regardless of the flag state?
> 
> Yes.  In general you should assume a device's parent (and the device
> itself!) needs to be resumed whenever the kernel wants to do something
> with the device.  The special flag arises because sometimes it's safe
> to suspend the parent without suspending the device _if_ the kernel
> isn't using the device.
> 
> Imagine an idle disk at the end of a link.  We might want to 
> autosuspend the link without spinning down the disk.  When we have to 
> communicate with the disk again, we autoresume the link.  (Including 
> the case where the communication is a "spin-down" command.)
> 
> >  And if so, what's
> > the condition for breaking the recurrence?  Surely it's not sufficient to check
> > if the parent is active, because its parent need not be active if it has this
> > special flag set.
> 
> That's a good question.  Let's assume that situations like this will be 
> handled by the drivers.
> 
> For example, suppose A is the parent of B is the parent of C, and A is
> suspended but B isn't and C is.  What happens when somebody wants to
> use C?
> 
> An autoresume request is generated for C.  Since C's parent is already
> resumed, the runtime_resume method in C's driver is called.  The driver
> has to do some I/O in order to resume C, so it passes an I/O request up
> to B's driver.  The request then gets passed up to A's driver.  This
> driver knows that A is suspended, so it starts an autoresume of A and
> waits for the autoresume to complete before carrying out the request.
> 
> Then the I/O can go through, so C gets resumed and everything works 
> out.
> 
> I don't know how often this sort of pattern will arise.  It certainly
> could be used in usb-storage; there would be no difficulty starting an
> autoresume when an I/O request arrives from the SCSI layer below.  In
> fact, that is exactly how some early runtime-PM patches for usb-storage
> worked.

So, the conclusion seems to be that we should break the recurrence
at the point we find an already active device or a device with no parent and
let the driver(s) handle the more complicated cases.  Is this correct?

BTW, is __device_release_driver() the right place for blocking the run-time PM
temporarily during remove?

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 21:23                                                   ` Alan Stern
@ 2009-06-12 23:06                                                     ` Rafael J. Wysocki
  2009-06-12 23:06                                                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-12 23:06 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Friday 12 June 2009, Alan Stern wrote:
> On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> 
> > On Friday 12 June 2009, Alan Stern wrote:
> > > On Fri, 12 Jun 2009, Rafael J. Wysocki wrote:
> > > 
> > > > So, are you suggesting that the core should only check the "all children
> > > > suspended" condition if special flag is set in dev_pm_info?
> > > 
> > > Or rather, check it only if the special flag _isn't_ set.
> > 
> > Where the default is unset, I guess?
> 
> Yep.
> 
> > But then, what about the resuming of the parents before the device is resumed?
> > Should the parents be resumed regardless of the flag state?
> 
> Yes.  In general you should assume a device's parent (and the device
> itself!) needs to be resumed whenever the kernel wants to do something
> with the device.  The special flag arises because sometimes it's safe
> to suspend the parent without suspending the device _if_ the kernel
> isn't using the device.
> 
> Imagine an idle disk at the end of a link.  We might want to 
> autosuspend the link without spinning down the disk.  When we have to 
> communicate with the disk again, we autoresume the link.  (Including 
> the case where the communication is a "spin-down" command.)
> 
> >  And if so, what's
> > the condition for breaking the recurrence?  Surely it's not sufficient to check
> > if the parent is active, because its parent need not be active if it has this
> > special flag set.
> 
> That's a good question.  Let's assume that situations like this will be 
> handled by the drivers.
> 
> For example, suppose A is the parent of B is the parent of C, and A is
> suspended but B isn't and C is.  What happens when somebody wants to
> use C?
> 
> An autoresume request is generated for C.  Since C's parent is already
> resumed, the runtime_resume method in C's driver is called.  The driver
> has to do some I/O in order to resume C, so it passes an I/O request up
> to B's driver.  The request then gets passed up to A's driver.  This
> driver knows that A is suspended, so it starts an autoresume of A and
> waits for the autoresume to complete before carrying out the request.
> 
> Then the I/O can go through, so C gets resumed and everything works 
> out.
> 
> I don't know how often this sort of pattern will arise.  It certainly
> could be used in usb-storage; there would be no difficulty starting an
> autoresume when an I/O request arrives from the SCSI layer below.  In
> fact, that is exactly how some early runtime-PM patches for usb-storage
> worked.

So, the conclusion seems to be that we should break the recurrence
at the point we find an already active device or a device with no parent and
let the driver(s) handle the more complicated cases.  Is this correct?

BTW, is __device_release_driver() the right place for blocking the run-time PM
temporarily during remove?

Best,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 23:06                                                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  2009-06-13 18:08                                                       ` [patch update] " Alan Stern
@ 2009-06-13 18:08                                                       ` Alan Stern
  2009-06-13 22:04                                                         ` [patch update] " Rafael J. Wysocki
  2009-06-13 22:04                                                         ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  1 sibling, 2 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-13 18:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Sat, 13 Jun 2009, Rafael J. Wysocki wrote:

> So, the conclusion seems to be that we should break the recurrence
> at the point we find an already active device or a device with no parent and
> let the driver(s) handle the more complicated cases.  Is this correct?

That's right.

> BTW, is __device_release_driver() the right place for blocking the run-time PM
> temporarily during remove?

It is.  And for submitting a delayed autosuspend request afterward; we
may as well try to suspend devices that don't have drivers.

Alan Stern


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-12 23:06                                                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
@ 2009-06-13 18:08                                                       ` Alan Stern
  2009-06-13 18:08                                                       ` [patch update] Re: [linux-pm] " Alan Stern
  1 sibling, 0 replies; 199+ messages in thread
From: Alan Stern @ 2009-06-13 18:08 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Sat, 13 Jun 2009, Rafael J. Wysocki wrote:

> So, the conclusion seems to be that we should break the recurrence
> at the point we find an already active device or a device with no parent and
> let the driver(s) handle the more complicated cases.  Is this correct?

That's right.

> BTW, is __device_release_driver() the right place for blocking the run-time PM
> temporarily during remove?

It is.  And for submitting a delayed autosuspend request afterward; we
may as well try to suspend devices that don't have drivers.

Alan Stern

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: [linux-pm] Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-13 18:08                                                       ` [patch update] Re: [linux-pm] " Alan Stern
  2009-06-13 22:04                                                         ` [patch update] " Rafael J. Wysocki
@ 2009-06-13 22:04                                                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-13 22:04 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Linux-pm mailing list, ACPI Devel Maling List, LKML

On Saturday 13 June 2009, Alan Stern wrote:
> On Sat, 13 Jun 2009, Rafael J. Wysocki wrote:
> 
> > So, the conclusion seems to be that we should break the recurrence
> > at the point we find an already active device or a device with no parent and
> > let the driver(s) handle the more complicated cases.  Is this correct?
> 
> That's right.

OK

> > BTW, is __device_release_driver() the right place for blocking the run-time PM
> > temporarily during remove?
> 
> It is.

OK

> And for submitting a delayed autosuspend request afterward; we
> may as well try to suspend devices that don't have drivers.

OK, but I'd like to add this functionality if future, when at least one bus
type starts using the framework.

I think I have all of the ducks in a row now, so I'm going to post a cleaned-up
patch in a new thread in a while.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [patch update] Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-13 18:08                                                       ` [patch update] Re: [linux-pm] " Alan Stern
@ 2009-06-13 22:04                                                         ` Rafael J. Wysocki
  2009-06-13 22:04                                                         ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2009-06-13 22:04 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linux-pm mailing list, LKML

On Saturday 13 June 2009, Alan Stern wrote:
> On Sat, 13 Jun 2009, Rafael J. Wysocki wrote:
> 
> > So, the conclusion seems to be that we should break the recurrence
> > at the point we find an already active device or a device with no parent and
> > let the driver(s) handle the more complicated cases.  Is this correct?
> 
> That's right.

OK

> > BTW, is __device_release_driver() the right place for blocking the run-time PM
> > temporarily during remove?
> 
> It is.

OK

> And for submitting a delayed autosuspend request afterward; we
> may as well try to suspend devices that don't have drivers.

OK, but I'd like to add this functionality if future, when at least one bus
type starts using the framework.

I think I have all of the ducks in a row now, so I'm going to post a cleaned-up
patch in a new thread in a while.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2]   PM: Rearrange core suspend code)
  2009-06-08 14:35                         ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
                                           ` (3 preceding siblings ...)
  (?)
@ 2009-06-19  1:50                         ` Robert Hancock
  -1 siblings, 0 replies; 199+ messages in thread
From: Robert Hancock @ 2009-06-19  1:50 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: Ingo Molnar, LKML, ACPI Devel Maling List, pm list, ide

On 06/08/2009 08:35 AM, Matthew Garrett wrote:
> On Mon, Jun 08, 2009 at 04:24:50PM +0200, Ingo Molnar wrote:
>> * Matthew Garrett<mjg59@srcf.ucam.org>  wrote:
>>> eSATA is pretty common now.
>> [ And 99% of the CPUs have an IDT still 99.9% of the users dont know
>>    what it is :) ]
>
> Users know that there's a socket on the front of their computer that
> they can plug a hard drive into, and if that doesn't work then they're
> going to be upset.
>
>>> The problem with this kind of default is that you get people who
>>> are confused that their hardware doesn't work.
>> If the hardware 'doesnt work' that is a kernel bug. Hardware that
>> _cannot be suspended_ safely (physically) should not be
>> auto-suspended, of course.
>
> So, like I said, the kernel can't automatically suspend AHCI unless it's
> received some information from elsewhere that tells it it's ok to. The
> kernel can't know if there's an eSATA port or not.
>
>>> If the kernel doesn't have enough information to make a decision
>>> it should err on the side of functionality - we're talking about
>>> fairly low-level power savings, but potentially several years of
>>> aggregate confusion on the part of users.
>> the difference between a 10W and a 1W footprint is a long series of
>> 'low-level power savings'.
>>
>> If users are getting confused and if hardware gets broken then tha's
>> a plain bug and the wrong path is being walked.
>
> Yes. And powersaving is a tradeoff between functionality and power
> consumption. The kernel doesn't know what level of functionality a given
> user requires. It *can't* know that itself.
>
>>> Users are generally ok at realising correlation between a setting
>>> change and something no longer working, so as long as you provide
>>> that they'll be happy. I agree that this sucks. What we actually
>>> want is some means of reliably identifying whether a port is
>>> hotplug or not, but eSATA makes this very difficult.
>> Is it impossible?
>
> To the best of my knowledge, yes.

Well, in some cases we can get an idea - the current AHCI spec has bits 
in the PxCMD register (External SATA Port and Hot Plug Capable Port) 
which can indicate which ports are externally accessible and thus are 
likely to receive hotplug events. Of course, these are supposed to be 
programmed by the BIOS based on the particular motherboard/machine, and 
we all know how accurate BIOS-reported information can be..

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:35                         ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
                                           ` (2 preceding siblings ...)
  (?)
@ 2009-06-19  1:50                         ` Robert Hancock
  -1 siblings, 0 replies; 199+ messages in thread
From: Robert Hancock @ 2009-06-19  1:50 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: ide, ACPI Devel Maling List, Ingo Molnar, LKML, pm list

On 06/08/2009 08:35 AM, Matthew Garrett wrote:
> On Mon, Jun 08, 2009 at 04:24:50PM +0200, Ingo Molnar wrote:
>> * Matthew Garrett<mjg59@srcf.ucam.org>  wrote:
>>> eSATA is pretty common now.
>> [ And 99% of the CPUs have an IDT still 99.9% of the users dont know
>>    what it is :) ]
>
> Users know that there's a socket on the front of their computer that
> they can plug a hard drive into, and if that doesn't work then they're
> going to be upset.
>
>>> The problem with this kind of default is that you get people who
>>> are confused that their hardware doesn't work.
>> If the hardware 'doesnt work' that is a kernel bug. Hardware that
>> _cannot be suspended_ safely (physically) should not be
>> auto-suspended, of course.
>
> So, like I said, the kernel can't automatically suspend AHCI unless it's
> received some information from elsewhere that tells it it's ok to. The
> kernel can't know if there's an eSATA port or not.
>
>>> If the kernel doesn't have enough information to make a decision
>>> it should err on the side of functionality - we're talking about
>>> fairly low-level power savings, but potentially several years of
>>> aggregate confusion on the part of users.
>> the difference between a 10W and a 1W footprint is a long series of
>> 'low-level power savings'.
>>
>> If users are getting confused and if hardware gets broken then tha's
>> a plain bug and the wrong path is being walked.
>
> Yes. And powersaving is a tradeoff between functionality and power
> consumption. The kernel doesn't know what level of functionality a given
> user requires. It *can't* know that itself.
>
>>> Users are generally ok at realising correlation between a setting
>>> change and something no longer working, so as long as you provide
>>> that they'll be happy. I agree that this sucks. What we actually
>>> want is some means of reliably identifying whether a port is
>>> hotplug or not, but eSATA makes this very difficult.
>> Is it impossible?
>
> To the best of my knowledge, yes.

Well, in some cases we can get an idea - the current AHCI spec has bits 
in the PxCMD register (External SATA Port and Hot Plug Capable Port) 
which can indicate which ports are externally accessible and thus are 
likely to receive hotplug events. Of course, these are supposed to be 
programmed by the BIOS based on the particular motherboard/machine, and 
we all know how accurate BIOS-reported information can be..

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code)
  2009-06-08 14:51                           ` Matthew Garrett
@ 2009-06-24 15:03                               ` Pavel Machek
  0 siblings, 0 replies; 199+ messages in thread
From: Pavel Machek @ 2009-06-24 15:03 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: LKML, ACPI Devel Maling List, Ingo Molnar, pm list


> > > [...] Yes, we can greatly expand the userland-visible interface to 
> > > every piece of hardware in order to make this work, but that's a 
> > > huge amount of effort to avoid a model where userspace sets some 
> > > tunables appropriately.
> > 
> > What huge amount of effort? All you are doing is to track the "is 
> > the device really used" state in user-space - and, if the current 
> > desktop experience is any measure, highly imperfectly so.
> > 
> > What i'm suggesting is to track it properly in the kernel. It's not 
> > like the kernel doesnt need to know whether a piece of hardware is 
> > under use or not ...
> 
> So, for instance, we need to add interfaces like "I care about hotplug 
> events on this SATA port" and "I'm listening for these keys so please 
> don't suspend the device" and "The service bound to this port needs to 
> maintain network connectivity and the one bound to this port doesn't, so 
> only put the wireless card into deep powersave if the first exits", and 
> then we need to wait for userspace to adopt these interfaces before we 
> can enable any of the functionality because otherwise old userspace will 
> be broken with new kernels.

Yes, that's the way to go. It is not particulary easy way, but at
least such userspace will work with upcoming hardware and kernel will
be able to get features such as 'system autosuspend'...
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: Run-time PM idea (was: Re: [linux-pm] [RFC][PATCH 0/2] PM: Rearrange core suspend code)
@ 2009-06-24 15:03                               ` Pavel Machek
  0 siblings, 0 replies; 199+ messages in thread
From: Pavel Machek @ 2009-06-24 15:03 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Ingo Molnar, Rafael J. Wysocki, Alan Stern, pm list,
	ACPI Devel Maling List, LKML, Magnus Damm


> > > [...] Yes, we can greatly expand the userland-visible interface to 
> > > every piece of hardware in order to make this work, but that's a 
> > > huge amount of effort to avoid a model where userspace sets some 
> > > tunables appropriately.
> > 
> > What huge amount of effort? All you are doing is to track the "is 
> > the device really used" state in user-space - and, if the current 
> > desktop experience is any measure, highly imperfectly so.
> > 
> > What i'm suggesting is to track it properly in the kernel. It's not 
> > like the kernel doesnt need to know whether a piece of hardware is 
> > under use or not ...
> 
> So, for instance, we need to add interfaces like "I care about hotplug 
> events on this SATA port" and "I'm listening for these keys so please 
> don't suspend the device" and "The service bound to this port needs to 
> maintain network connectivity and the one bound to this port doesn't, so 
> only put the wireless card into deep powersave if the first exits", and 
> then we need to wait for userspace to adopt these interfaces before we 
> can enable any of the functionality because otherwise old userspace will 
> be broken with new kernels.

Yes, that's the way to go. It is not particulary easy way, but at
least such userspace will work with upcoming hardware and kernel will
be able to get features such as 'system autosuspend'...
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 199+ messages in thread

end of thread, other threads:[~2009-06-27 11:27 UTC | newest]

Thread overview: 199+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-06 22:54 [RFC][PATCH 0/2] PM: Rearrange core suspend code Rafael J. Wysocki
2009-06-06 22:55 ` [RFC][PATCH 1/2] PM: Separate suspend to RAM functionality from core Rafael J. Wysocki
2009-06-06 22:55 ` Rafael J. Wysocki
2009-06-08  6:36   ` Pavel Machek
2009-06-08  6:36   ` Pavel Machek
2009-06-06 22:56 ` [RFC][PATCH 2/2] PM/Hibernate: Rename disk.c to hibernate.c Rafael J. Wysocki
2009-06-06 22:56 ` Rafael J. Wysocki
2009-06-08  6:37   ` Pavel Machek
2009-06-08  6:37   ` Pavel Machek
2009-06-07 20:51 ` [RFC][PATCH 0/2] PM: Rearrange core suspend code Alan Stern
2009-06-07 20:51 ` [linux-pm] " Alan Stern
2009-06-07 21:46   ` Run-time PM idea (was: Re: [RFC][PATCH 0/2] PM: Rearrange core suspend code) Rafael J. Wysocki
2009-06-07 21:46   ` Run-time PM idea (was: Re: [linux-pm] " Rafael J. Wysocki
2009-06-07 22:02     ` Run-time PM idea (was: " Oliver Neukum
2009-06-07 22:02     ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
2009-06-07 22:02       ` Oliver Neukum
2009-06-07 22:05     ` Run-time PM idea (was: " Oliver Neukum
2009-06-07 22:05     ` [linux-pm] " Oliver Neukum
2009-06-08 11:29       ` Rafael J. Wysocki
2009-06-08 11:29       ` [linux-pm] " Rafael J. Wysocki
2009-06-08 12:04         ` Oliver Neukum
2009-06-08 18:34           ` Rafael J. Wysocki
2009-06-09  7:25             ` Oliver Neukum
2009-06-09  7:25             ` [linux-pm] " Oliver Neukum
2009-06-09 14:33               ` Alan Stern
2009-06-09 14:33               ` [linux-pm] " Alan Stern
2009-06-09 14:33                 ` Alan Stern
2009-06-09 14:48                 ` Oliver Neukum
2009-06-09 14:48                   ` Oliver Neukum
2009-06-09 14:48                 ` Oliver Neukum
2009-06-09 22:44               ` [linux-pm] " Rafael J. Wysocki
2009-06-09 22:44               ` Rafael J. Wysocki
2009-06-08 18:34           ` Rafael J. Wysocki
2009-06-08 12:04         ` Oliver Neukum
2009-06-08 20:35         ` Alan Stern
2009-06-08 20:35         ` [linux-pm] " Alan Stern
2009-06-08 20:35           ` Alan Stern
2009-06-08 21:31           ` Rafael J. Wysocki
2009-06-09  2:49             ` Alan Stern
2009-06-09  2:49               ` Alan Stern
2009-06-09 22:57               ` Rafael J. Wysocki
2009-06-10  8:29                 ` [patch update] " Rafael J. Wysocki
2009-06-10  8:29                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-10 14:20                   ` [patch update] " Oliver Neukum
2009-06-10 14:20                   ` [patch update] Re: [linux-pm] " Oliver Neukum
2009-06-10 19:27                     ` Rafael J. Wysocki
2009-06-10 21:38                       ` Oliver Neukum
2009-06-10 21:38                         ` Oliver Neukum
2009-06-10 22:01                         ` [patch update] " Rafael J. Wysocki
2009-06-10 22:01                         ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-10 23:07                           ` Oliver Neukum
2009-06-10 23:07                             ` Oliver Neukum
2009-06-10 23:42                             ` [patch update] " Alan Stern
2009-06-10 23:42                             ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-10 23:42                               ` Alan Stern
2009-06-11 13:48                               ` Rafael J. Wysocki
2009-06-11 13:57                                 ` [patch update] " Oliver Neukum
2009-06-11 13:57                                 ` [patch update] Re: [linux-pm] " Oliver Neukum
2009-06-11 14:16                                   ` [patch update] " Alan Stern
2009-06-11 14:16                                   ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-11 14:16                                     ` Alan Stern
2009-06-11 19:38                                     ` [patch update] " Rafael J. Wysocki
2009-06-11 19:38                                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-11 13:48                               ` [patch update] " Rafael J. Wysocki
2009-06-11 13:46                             ` Rafael J. Wysocki
2009-06-11 13:46                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-10 23:07                           ` [patch update] " Oliver Neukum
2009-06-10 21:38                       ` Oliver Neukum
2009-06-10 19:27                     ` Rafael J. Wysocki
2009-06-10 21:14                   ` Alan Stern
2009-06-10 21:14                   ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-10 21:31                     ` [patch update] " Rafael J. Wysocki
2009-06-10 21:31                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-10 23:15                       ` [patch update] " Oliver Neukum
2009-06-10 23:15                       ` [patch update] Re: [linux-pm] " Oliver Neukum
2009-06-10 23:15                         ` Oliver Neukum
2009-06-11  5:27                         ` [patch update] " Magnus Damm
2009-06-11  5:27                         ` [patch update] Re: [linux-pm] " Magnus Damm
2009-06-11  5:27                           ` Magnus Damm
2009-06-10 23:42                       ` Alan Stern
2009-06-11 14:17                         ` [patch update] " Rafael J. Wysocki
2009-06-11 14:17                           ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-11 14:52                           ` [patch update] " Alan Stern
2009-06-11 14:52                           ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-11 15:06                             ` Oliver Neukum
2009-06-11 15:06                               ` Oliver Neukum
2009-06-11 15:22                               ` Alan Stern
2009-06-11 15:22                                 ` Alan Stern
2009-06-11 16:05                                 ` Oliver Neukum
2009-06-11 16:05                                   ` Oliver Neukum
2009-06-11 18:36                                   ` [patch update] " Alan Stern
2009-06-11 18:36                                   ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-11 18:36                                     ` Alan Stern
2009-06-11 21:05                                     ` [patch update] " Oliver Neukum
2009-06-11 21:05                                     ` [patch update] Re: [linux-pm] " Oliver Neukum
2009-06-11 21:05                                       ` Oliver Neukum
2009-06-12  2:16                                       ` Alan Stern
2009-06-12  2:16                                         ` Alan Stern
2009-06-12  8:15                                         ` Oliver Neukum
2009-06-12 14:32                                           ` [patch update] " Alan Stern
2009-06-12 14:32                                           ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-12 14:32                                             ` Alan Stern
2009-06-12 19:09                                             ` [patch update] " Rafael J. Wysocki
2009-06-12 19:09                                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-12 19:48                                               ` Alan Stern
2009-06-12 19:56                                                 ` [patch update] " Rafael J. Wysocki
2009-06-12 19:56                                                 ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-12 21:23                                                   ` Alan Stern
2009-06-12 23:06                                                     ` [patch update] " Rafael J. Wysocki
2009-06-12 23:06                                                     ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-13 18:08                                                       ` [patch update] " Alan Stern
2009-06-13 18:08                                                       ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-13 22:04                                                         ` [patch update] " Rafael J. Wysocki
2009-06-13 22:04                                                         ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-12 21:23                                                   ` [patch update] " Alan Stern
2009-06-12 19:48                                               ` Alan Stern
2009-06-12  8:15                                         ` Oliver Neukum
2009-06-12  2:16                                       ` Alan Stern
2009-06-11 16:05                                 ` Oliver Neukum
2009-06-11 15:22                               ` Alan Stern
2009-06-11 15:06                             ` Oliver Neukum
2009-06-11 19:43                             ` Rafael J. Wysocki
2009-06-11 19:43                             ` [patch update] Re: [linux-pm] " Rafael J. Wysocki
2009-06-12 14:25                               ` [patch update] " Alan Stern
2009-06-12 14:25                               ` [patch update] Re: [linux-pm] " Alan Stern
2009-06-10 23:42                       ` [patch update] " Alan Stern
2009-06-11  5:18                   ` [patch update] Re: [linux-pm] " Magnus Damm
2009-06-11  5:18                     ` Magnus Damm
2009-06-11  9:08                     ` Oliver Neukum
2009-06-12  3:13                       ` [patch update] " Magnus Damm
2009-06-12  3:13                         ` [patch update] Re: [linux-pm] " Magnus Damm
2009-06-12  8:11                         ` Oliver Neukum
2009-06-12 10:54                           ` [patch update] " Magnus Damm
2009-06-12 10:54                           ` [patch update] Re: [linux-pm] " Magnus Damm
2009-06-12 10:54                             ` Magnus Damm
2009-06-12  8:11                         ` [patch update] " Oliver Neukum
2009-06-11  9:08                     ` Oliver Neukum
2009-06-11  5:18                   ` Magnus Damm
2009-06-10 20:48                 ` [linux-pm] " Alan Stern
2009-06-10 20:48                   ` Alan Stern
2009-06-10 21:15                   ` Rafael J. Wysocki
2009-06-10 21:15                   ` [linux-pm] " Rafael J. Wysocki
2009-06-10 20:48                 ` Alan Stern
2009-06-09 22:57               ` Rafael J. Wysocki
2009-06-09  2:49             ` Alan Stern
2009-06-09  7:31             ` [linux-pm] " Oliver Neukum
2009-06-09  7:31               ` Oliver Neukum
2009-06-09 23:02               ` Rafael J. Wysocki
2009-06-09 23:02               ` [linux-pm] " Rafael J. Wysocki
2009-06-09  7:31             ` Oliver Neukum
2009-06-08 21:31           ` Rafael J. Wysocki
2009-06-08  6:54     ` Ingo Molnar
2009-06-08  6:54     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
2009-06-08 11:30       ` Rafael J. Wysocki
2009-06-08 13:05         ` Ingo Molnar
2009-06-08 13:11           ` Matthew Garrett
2009-06-08 13:22             ` Run-time PM idea (was: " Ingo Molnar
2009-06-08 13:22             ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
2009-06-08 13:32               ` Matthew Garrett
2009-06-08 13:46                 ` Run-time PM idea (was: " Ingo Molnar
2009-06-08 13:46                 ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
2009-06-08 13:54                   ` Run-time PM idea (was: " Matthew Garrett
2009-06-08 13:54                   ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
2009-06-08 14:24                     ` Run-time PM idea (was: " Ingo Molnar
2009-06-08 14:24                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
2009-06-08 14:35                       ` Run-time PM idea (was: " Matthew Garrett
2009-06-08 14:35                         ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
2009-06-08 14:44                         ` Run-time PM idea (was: " Ingo Molnar
2009-06-08 14:44                         ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
2009-06-08 14:51                           ` Matthew Garrett
2009-06-24 15:03                             ` Run-time PM idea (was: " Pavel Machek
2009-06-24 15:03                               ` Run-time PM idea (was: Re: [linux-pm] " Pavel Machek
2009-06-08 14:51                           ` Run-time PM idea (was: " Matthew Garrett
2009-06-19  1:50                         ` Robert Hancock
2009-06-19  1:50                         ` Robert Hancock
2009-06-08 13:58                   ` Oliver Neukum
2009-06-08 13:58                   ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
2009-06-08 13:58                     ` Oliver Neukum
2009-06-08 13:32               ` Run-time PM idea (was: " Matthew Garrett
2009-06-08 13:39               ` Oliver Neukum
2009-06-08 13:39               ` Run-time PM idea (was: Re: [linux-pm] " Oliver Neukum
2009-06-08 13:44                 ` Run-time PM idea (was: " Matthew Garrett
2009-06-08 13:44                 ` Run-time PM idea (was: Re: [linux-pm] " Matthew Garrett
2009-06-08 14:21                 ` Ingo Molnar
2009-06-08 14:30                   ` Matthew Garrett
2009-06-08 15:06                     ` Run-time PM idea (was: " Ingo Molnar
2009-06-08 15:06                     ` Run-time PM idea (was: Re: [linux-pm] " Ingo Molnar
2009-06-08 15:11                       ` Matthew Garrett
2009-06-08 15:11                       ` Run-time PM idea (was: " Matthew Garrett
2009-06-08 16:29                       ` Ray Lee
2009-06-08 16:29                       ` Run-time PM idea (was: Re: [linux-pm] " Ray Lee
2009-06-08 16:29                         ` Ray Lee
2009-06-08 14:30                   ` Run-time PM idea (was: " Matthew Garrett
2009-06-09 22:44                   ` Jiri Kosina
2009-06-09 22:44                   ` Run-time PM idea (was: Re: [linux-pm] " Jiri Kosina
2009-06-08 14:21                 ` Run-time PM idea (was: " Ingo Molnar
2009-06-08 13:11           ` Matthew Garrett
2009-06-08 13:05         ` Ingo Molnar
2009-06-08 11:30       ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.