All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 14)
@ 2009-08-08 14:25 Rafael J. Wysocki
  2009-08-09 21:13 ` [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15) Rafael J. Wysocki
  2009-08-09 21:13 ` Rafael J. Wysocki
  0 siblings, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-08 14:25 UTC (permalink / raw)
  To: Alan Stern, Magnus Damm
  Cc: Greg KH, Pavel Machek, Len Brown, LKML, Linux-pm mailing list

Hi,

Below is the rev. 14 of the runtime PM patch.  Hopefully it addresses all of
the remaining reservations.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 14)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  386 ++++++++++++++
 drivers/base/dd.c                  |   21 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   20 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       |  969 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  115 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1664 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,969 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM and will
+ * return 1.  Otherwise, 0 is returned.
+ */
+int __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	int retval = 0;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+		retval = 1;
+	}
+
+	if (dev->power.disable_depth++ > 0)
+		goto out;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend or wake-up in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,115 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern int __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline int __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	return 0;
+}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline int pm_runtime_disable(struct device *dev)
+{
+	return __pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_enable(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,11 +770,16 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		if (pm_runtime_disable(dev) && device_may_wakeup(dev))
+			/* Wake-up requested during system sleep transition. */
+			error = -EBUSY;
+		else
+			error = device_prepare(dev, state);
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				error = 0;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,17 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	/*
+	 * Wait for run-time PM calls to complete and prevent new suspend calls
+	 * until the probe is done.
+	 */
+	pm_runtime_disable(dev);
+	pm_runtime_get_noresume(dev);
+	pm_runtime_enable(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_noidle(dev);
+	if (!ret)
+		pm_runtime_idle(dev);
 
 	return ret;
 }
@@ -306,6 +317,14 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		/*
+		 * Wait for run-time PM calls to complete and prevent new
+		 * suspend calls until the remove is done.
+		 */
+		pm_runtime_disable(dev);
+		pm_runtime_get_noresume(dev);
+		pm_runtime_enable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +343,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_noidle(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,386 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If ->probe() calls pm_runtime_suspend() or pm_runtime_idle() or their
+asynchronous counterparts, they will fail returning -EAGAIN, because the
+device's usage counter is incremented by the core before executing ->probe().
+Still, it may be desirable to suspend the device as soon as ->probe() has
+finished, so the core uses pm_runtime_idle() to invoke the device bus type's
+->runtime_idle() callback at that time, but only if ->probe() is successful.
+
+If the device driver's or bus type's ->remove() callback executes
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts
+without preparation, they will fail returning -EAGAIN, because the device's
+usage counter is incremented by the core before executing ->remove().  However,
+if ->remove() wants to suspend the device, it can safely execute any of the
+pm_runtime_put*() helpers to decrement the device's usage counter, because the
+pm_runtime_put_noidle() called by the core after ->remove() has returned is
+guaranteed not to decrease the usage counter below zero.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
  2009-08-08 14:25 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 14) Rafael J. Wysocki
  2009-08-09 21:13 ` [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15) Rafael J. Wysocki
@ 2009-08-09 21:13 ` Rafael J. Wysocki
  2009-08-12 10:37   ` Magnus Damm
                     ` (5 more replies)
  1 sibling, 6 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-09 21:13 UTC (permalink / raw)
  To: Alan Stern
  Cc: Magnus Damm, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

Hi,

One more update.  This one should address your comments from this thread
http://lkml.org/lkml/2009/8/8/113

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 15)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  386 ++++++++++++++
 drivers/base/dd.c                  |   11 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   22 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       | 1011 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  114 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1697 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,1011 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * __pm_runtime_barrier - Cancel pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Flush all pending requests for the device from pm_wq and wait for all
+ * run-time PM operations involving the device in progress to complete.
+ *
+ * Should be called under dev->power.lock with interrupts disabled.
+ */
+static void __pm_runtime_barrier(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend, wake-up or idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+}
+
+/**
+ * pm_runtime_barrier - Flush pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Prevent the device from being suspended by incrementing its usage counter and
+ * if there's a pending resume request for the device, wake the device up.
+ * Next, make sure that all pending requests for the device have been flushed
+ * from pm_wq and wait for all run-time PM operations involving the device in
+ * progress to complete.
+ *
+ * Return value:
+ * 1, if there was a resume request pending and the device had to be woken up,
+ * 0, otherwise
+ */
+int pm_runtime_barrier(struct device *dev)
+{
+	int retval = 0;
+
+	pm_runtime_get_noresume(dev);
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		__pm_runtime_resume(dev, false);
+		retval = 1;
+	}
+
+	__pm_runtime_barrier(dev);
+
+	spin_unlock_irq(&dev->power.lock);
+	pm_runtime_put_noidle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_barrier);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM.
+ */
+void __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+	}
+
+	if (!dev->power.disable_depth++)
+		__pm_runtime_barrier(dev);
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,114 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern int pm_runtime_barrier(struct device *dev);
+extern void pm_runtime_enable(struct device *dev);
+extern void __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline int pm_runtime_barrier(struct device *dev) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void __pm_runtime_disable(struct device *dev, bool c) {}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_disable(struct device *dev)
+{
+	__pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_put_noidle(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,7 +770,14 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		pm_runtime_get_noresume(dev);
+		if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
+			/* Wake-up requested during system sleep transition. */
+			pm_runtime_put_noidle(dev);
+			error = -EBUSY;
+		} else {
+			error = device_prepare(dev, state);
+		}
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_noidle(dev);
+	if (!ret)
+		pm_runtime_idle(dev);
 
 	return ret;
 }
@@ -306,6 +312,9 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_get_noresume(dev);
+		pm_runtime_barrier(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +333,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_noidle(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,386 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization, Device Probing and Removal
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If ->probe() calls pm_runtime_suspend() or pm_runtime_idle() or their
+asynchronous counterparts, they will fail returning -EAGAIN, because the
+device's usage counter is incremented by the core before executing ->probe().
+Still, it may be desirable to suspend the device as soon as ->probe() has
+finished, so the core uses pm_runtime_idle() to invoke the device bus type's
+->runtime_idle() callback at that time, but only if ->probe() is successful.
+
+If the device driver's or bus type's ->remove() callback executes
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts
+without preparation, they will fail returning -EAGAIN, because the device's
+usage counter is incremented by the core before executing ->remove().  However,
+if ->remove() wants to suspend the device, it can safely execute any of the
+pm_runtime_put*() helpers to decrement the device's usage counter, because the
+pm_runtime_put_noidle() called by the core after ->remove() has returned is
+guaranteed not to decrease the usage counter below zero.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
  2009-08-08 14:25 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 14) Rafael J. Wysocki
@ 2009-08-09 21:13 ` Rafael J. Wysocki
  2009-08-09 21:13 ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-09 21:13 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, LKML, Linux-pm mailing list

Hi,

One more update.  This one should address your comments from this thread
http://lkml.org/lkml/2009/8/8/113

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 15)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  386 ++++++++++++++
 drivers/base/dd.c                  |   11 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   22 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       | 1011 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  114 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1697 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,1011 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * __pm_runtime_barrier - Cancel pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Flush all pending requests for the device from pm_wq and wait for all
+ * run-time PM operations involving the device in progress to complete.
+ *
+ * Should be called under dev->power.lock with interrupts disabled.
+ */
+static void __pm_runtime_barrier(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend, wake-up or idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+}
+
+/**
+ * pm_runtime_barrier - Flush pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Prevent the device from being suspended by incrementing its usage counter and
+ * if there's a pending resume request for the device, wake the device up.
+ * Next, make sure that all pending requests for the device have been flushed
+ * from pm_wq and wait for all run-time PM operations involving the device in
+ * progress to complete.
+ *
+ * Return value:
+ * 1, if there was a resume request pending and the device had to be woken up,
+ * 0, otherwise
+ */
+int pm_runtime_barrier(struct device *dev)
+{
+	int retval = 0;
+
+	pm_runtime_get_noresume(dev);
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		__pm_runtime_resume(dev, false);
+		retval = 1;
+	}
+
+	__pm_runtime_barrier(dev);
+
+	spin_unlock_irq(&dev->power.lock);
+	pm_runtime_put_noidle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_barrier);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM.
+ */
+void __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+	}
+
+	if (!dev->power.disable_depth++)
+		__pm_runtime_barrier(dev);
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,114 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern int pm_runtime_barrier(struct device *dev);
+extern void pm_runtime_enable(struct device *dev);
+extern void __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline int pm_runtime_barrier(struct device *dev) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void __pm_runtime_disable(struct device *dev, bool c) {}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_disable(struct device *dev)
+{
+	__pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_put_noidle(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,7 +770,14 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		pm_runtime_get_noresume(dev);
+		if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
+			/* Wake-up requested during system sleep transition. */
+			pm_runtime_put_noidle(dev);
+			error = -EBUSY;
+		} else {
+			error = device_prepare(dev, state);
+		}
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_noidle(dev);
+	if (!ret)
+		pm_runtime_idle(dev);
 
 	return ret;
 }
@@ -306,6 +312,9 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_get_noresume(dev);
+		pm_runtime_barrier(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +333,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_noidle(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,386 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization, Device Probing and Removal
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If ->probe() calls pm_runtime_suspend() or pm_runtime_idle() or their
+asynchronous counterparts, they will fail returning -EAGAIN, because the
+device's usage counter is incremented by the core before executing ->probe().
+Still, it may be desirable to suspend the device as soon as ->probe() has
+finished, so the core uses pm_runtime_idle() to invoke the device bus type's
+->runtime_idle() callback at that time, but only if ->probe() is successful.
+
+If the device driver's or bus type's ->remove() callback executes
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts
+without preparation, they will fail returning -EAGAIN, because the device's
+usage counter is incremented by the core before executing ->remove().  However,
+if ->remove() wants to suspend the device, it can safely execute any of the
+pm_runtime_put*() helpers to decrement the device's usage counter, because the
+pm_runtime_put_noidle() called by the core after ->remove() has returned is
+guaranteed not to decrease the usage counter below zero.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update] PM: Introduce core framework for run-time PM of  I/O devices (rev. 15)
  2009-08-09 21:13 ` Rafael J. Wysocki
@ 2009-08-12 10:37   ` Magnus Damm
  2009-08-12 15:47     ` Alan Stern
                       ` (3 more replies)
  2009-08-12 10:37   ` Magnus Damm
                     ` (4 subsequent siblings)
  5 siblings, 4 replies; 90+ messages in thread
From: Magnus Damm @ 2009-08-12 10:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

Hi Rafael,

On Mon, Aug 10, 2009 at 6:13 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
>
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
>
> Special thanks to Alan Stern for his help with the design and
> multiple detailed reviews of the pereceding versions of this patch
> and to Magnus Damm for testing feedback.
>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Looking good! I have a few nitpicks below, but from a functional
perspective it's all good. I've tested v15 with platform device
drivers for I2C, UIO and framebuffer. Before adding my "Acked-by"  I
also want to test the V4L capture driver, but I need to wait a few
days until I can get my hands on such a hardware platform.

Thanks for folding in and fixing up the debug patch. I was able to
drop most remaining patches thanks to feedback from Alan. So the only
needed patch apart from this one (and the ones in your linux-next
branch) is the one in this micro-series: "PM: Runtime PM v15 for
Platform Devices 20090812".

> --- linux-2.6.orig/include/linux/pm.h
> +++ linux-2.6/include/linux/pm.h
[..]
>  struct dev_pm_info {
>        pm_message_t            power_state;
> -       unsigned                can_wakeup:1;
> -       unsigned                should_wakeup:1;
> +       unsigned int            can_wakeup:1;
> +       unsigned int            should_wakeup:1;
>        enum dpm_state          status;         /* Owned by the PM core */
> -#ifdef CONFIG_PM_SLEEP
> +#ifdef CONFIG_PM_SLEEP
>        struct list_head        entry;
>  #endif
> +#ifdef CONFIG_PM_RUNTIME
> +       struct timer_list       suspend_timer;
> +       unsigned long           timer_expires;
> +       struct work_struct      work;
> +       wait_queue_head_t       wait_queue;
> +       spinlock_t              lock;
> +       atomic_t                usage_count;
> +       atomic_t                child_count;

I suppose child_count has to be atomic?

> --- /dev/null
> +++ linux-2.6/drivers/base/power/runtime.c
[...]
> +int __pm_runtime_suspend(struct device *dev, bool from_wq)
> +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
[...]
> +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
> +               spin_unlock_irq(&dev->power.lock);
> +
> +               retval = dev->bus->pm->runtime_suspend(dev);
> +
> +               spin_lock_irq(&dev->power.lock);
> +               dev->power.runtime_error = retval;
> +       } else {
> +               retval = -ENOSYS;
> +       }

Nit: { and } above do not follow the regular coding style.

> +int __pm_runtime_resume(struct device *dev, bool from_wq)
> +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
[...]
> +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
> +               spin_unlock_irq(&dev->power.lock);
> +
> +               retval = dev->bus->pm->runtime_resume(dev);
> +
> +               spin_lock_irq(&dev->power.lock);
> +               dev->power.runtime_error = retval;
> +       } else {
> +               retval = -ENOSYS;
> +       }

Same minor issue here.

Apart from that all is fine. Thank you.

/ magnus

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
  2009-08-09 21:13 ` Rafael J. Wysocki
  2009-08-12 10:37   ` Magnus Damm
@ 2009-08-12 10:37   ` Magnus Damm
  2009-08-13  0:29   ` [RFC] PCI: Runtime power management Matthew Garrett
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Magnus Damm @ 2009-08-12 10:37 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Greg KH, LKML, Linux-pm mailing list

Hi Rafael,

On Mon, Aug 10, 2009 at 6:13 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
>
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
>
> Special thanks to Alan Stern for his help with the design and
> multiple detailed reviews of the pereceding versions of this patch
> and to Magnus Damm for testing feedback.
>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Looking good! I have a few nitpicks below, but from a functional
perspective it's all good. I've tested v15 with platform device
drivers for I2C, UIO and framebuffer. Before adding my "Acked-by"  I
also want to test the V4L capture driver, but I need to wait a few
days until I can get my hands on such a hardware platform.

Thanks for folding in and fixing up the debug patch. I was able to
drop most remaining patches thanks to feedback from Alan. So the only
needed patch apart from this one (and the ones in your linux-next
branch) is the one in this micro-series: "PM: Runtime PM v15 for
Platform Devices 20090812".

> --- linux-2.6.orig/include/linux/pm.h
> +++ linux-2.6/include/linux/pm.h
[..]
>  struct dev_pm_info {
>        pm_message_t            power_state;
> -       unsigned                can_wakeup:1;
> -       unsigned                should_wakeup:1;
> +       unsigned int            can_wakeup:1;
> +       unsigned int            should_wakeup:1;
>        enum dpm_state          status;         /* Owned by the PM core */
> -#ifdef CONFIG_PM_SLEEP
> +#ifdef CONFIG_PM_SLEEP
>        struct list_head        entry;
>  #endif
> +#ifdef CONFIG_PM_RUNTIME
> +       struct timer_list       suspend_timer;
> +       unsigned long           timer_expires;
> +       struct work_struct      work;
> +       wait_queue_head_t       wait_queue;
> +       spinlock_t              lock;
> +       atomic_t                usage_count;
> +       atomic_t                child_count;

I suppose child_count has to be atomic?

> --- /dev/null
> +++ linux-2.6/drivers/base/power/runtime.c
[...]
> +int __pm_runtime_suspend(struct device *dev, bool from_wq)
> +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
[...]
> +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
> +               spin_unlock_irq(&dev->power.lock);
> +
> +               retval = dev->bus->pm->runtime_suspend(dev);
> +
> +               spin_lock_irq(&dev->power.lock);
> +               dev->power.runtime_error = retval;
> +       } else {
> +               retval = -ENOSYS;
> +       }

Nit: { and } above do not follow the regular coding style.

> +int __pm_runtime_resume(struct device *dev, bool from_wq)
> +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
[...]
> +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
> +               spin_unlock_irq(&dev->power.lock);
> +
> +               retval = dev->bus->pm->runtime_resume(dev);
> +
> +               spin_lock_irq(&dev->power.lock);
> +               dev->power.runtime_error = retval;
> +       } else {
> +               retval = -ENOSYS;
> +       }

Same minor issue here.

Apart from that all is fine. Thank you.

/ magnus

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
  2009-08-12 10:37   ` Magnus Damm
@ 2009-08-12 15:47     ` Alan Stern
  2009-08-12 15:47     ` Alan Stern
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-12 15:47 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Rafael J. Wysocki, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

On Wed, 12 Aug 2009, Magnus Damm wrote:

> > --- /dev/null
> > +++ linux-2.6/drivers/base/power/runtime.c
> [...]
> > +int __pm_runtime_suspend(struct device *dev, bool from_wq)
> > +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
> [...]
> > +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> > +               retval = dev->bus->pm->runtime_suspend(dev);
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +               dev->power.runtime_error = retval;
> > +       } else {
> > +               retval = -ENOSYS;
> > +       }
> 
> Nit: { and } above do not follow the regular coding style.

As a matter of fact they do.  From Documentation/CodingStyle:


Do not unnecessarily use braces where a single statement will do.

if (condition)
	action();

This does not apply if one branch of a conditional statement is a single
statement. Use braces in both branches.

if (condition) {
	do_this();
	do_that();
} else {
	otherwise();
}


Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
  2009-08-12 10:37   ` Magnus Damm
  2009-08-12 15:47     ` Alan Stern
@ 2009-08-12 15:47     ` Alan Stern
  2009-08-12 20:13     ` Rafael J. Wysocki
  2009-08-12 20:13     ` Rafael J. Wysocki
  3 siblings, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-12 15:47 UTC (permalink / raw)
  To: Magnus Damm; +Cc: Greg KH, LKML, Linux-pm mailing list

On Wed, 12 Aug 2009, Magnus Damm wrote:

> > --- /dev/null
> > +++ linux-2.6/drivers/base/power/runtime.c
> [...]
> > +int __pm_runtime_suspend(struct device *dev, bool from_wq)
> > +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
> [...]
> > +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> > +               retval = dev->bus->pm->runtime_suspend(dev);
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +               dev->power.runtime_error = retval;
> > +       } else {
> > +               retval = -ENOSYS;
> > +       }
> 
> Nit: { and } above do not follow the regular coding style.

As a matter of fact they do.  From Documentation/CodingStyle:


Do not unnecessarily use braces where a single statement will do.

if (condition)
	action();

This does not apply if one branch of a conditional statement is a single
statement. Use braces in both branches.

if (condition) {
	do_this();
	do_that();
} else {
	otherwise();
}


Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
  2009-08-12 10:37   ` Magnus Damm
                       ` (2 preceding siblings ...)
  2009-08-12 20:13     ` Rafael J. Wysocki
@ 2009-08-12 20:13     ` Rafael J. Wysocki
  3 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-12 20:13 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Alan Stern, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

On Wednesday 12 August 2009, Magnus Damm wrote:
> Hi Rafael,

Hi,

> On Mon, Aug 10, 2009 at 6:13 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
> >
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> >
> > Special thanks to Alan Stern for his help with the design and
> > multiple detailed reviews of the pereceding versions of this patch
> > and to Magnus Damm for testing feedback.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Looking good! I have a few nitpicks below, but from a functional
> perspective it's all good. I've tested v15 with platform device
> drivers for I2C, UIO and framebuffer. Before adding my "Acked-by"  I
> also want to test the V4L capture driver, but I need to wait a few
> days until I can get my hands on such a hardware platform.
> 
> Thanks for folding in and fixing up the debug patch. I was able to
> drop most remaining patches thanks to feedback from Alan. So the only
> needed patch apart from this one (and the ones in your linux-next
> branch) is the one in this micro-series: "PM: Runtime PM v15 for
> Platform Devices 20090812".
> 
> > --- linux-2.6.orig/include/linux/pm.h
> > +++ linux-2.6/include/linux/pm.h
> [..]
> >  struct dev_pm_info {
> >        pm_message_t            power_state;
> > -       unsigned                can_wakeup:1;
> > -       unsigned                should_wakeup:1;
> > +       unsigned int            can_wakeup:1;
> > +       unsigned int            should_wakeup:1;
> >        enum dpm_state          status;         /* Owned by the PM core */
> > -#ifdef CONFIG_PM_SLEEP
> > +#ifdef CONFIG_PM_SLEEP
> >        struct list_head        entry;
> >  #endif
> > +#ifdef CONFIG_PM_RUNTIME
> > +       struct timer_list       suspend_timer;
> > +       unsigned long           timer_expires;
> > +       struct work_struct      work;
> > +       wait_queue_head_t       wait_queue;
> > +       spinlock_t              lock;
> > +       atomic_t                usage_count;
> > +       atomic_t                child_count;
> 
> I suppose child_count has to be atomic?

I'd say so, it's modified in a few places without locking.

> > --- /dev/null
> > +++ linux-2.6/drivers/base/power/runtime.c
> [...]
> > +int __pm_runtime_suspend(struct device *dev, bool from_wq)
> > +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
> [...]
> > +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> > +               retval = dev->bus->pm->runtime_suspend(dev);
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +               dev->power.runtime_error = retval;
> > +       } else {
> > +               retval = -ENOSYS;
> > +       }
> 
> Nit: { and } above do not follow the regular coding style.

Well, you've got a very good answer to this from Alan. ;-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
  2009-08-12 10:37   ` Magnus Damm
  2009-08-12 15:47     ` Alan Stern
  2009-08-12 15:47     ` Alan Stern
@ 2009-08-12 20:13     ` Rafael J. Wysocki
  2009-08-12 20:13     ` Rafael J. Wysocki
  3 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-12 20:13 UTC (permalink / raw)
  To: Magnus Damm; +Cc: Greg KH, LKML, Linux-pm mailing list

On Wednesday 12 August 2009, Magnus Damm wrote:
> Hi Rafael,

Hi,

> On Mon, Aug 10, 2009 at 6:13 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 15)
> >
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> >
> > Special thanks to Alan Stern for his help with the design and
> > multiple detailed reviews of the pereceding versions of this patch
> > and to Magnus Damm for testing feedback.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Looking good! I have a few nitpicks below, but from a functional
> perspective it's all good. I've tested v15 with platform device
> drivers for I2C, UIO and framebuffer. Before adding my "Acked-by"  I
> also want to test the V4L capture driver, but I need to wait a few
> days until I can get my hands on such a hardware platform.
> 
> Thanks for folding in and fixing up the debug patch. I was able to
> drop most remaining patches thanks to feedback from Alan. So the only
> needed patch apart from this one (and the ones in your linux-next
> branch) is the one in this micro-series: "PM: Runtime PM v15 for
> Platform Devices 20090812".
> 
> > --- linux-2.6.orig/include/linux/pm.h
> > +++ linux-2.6/include/linux/pm.h
> [..]
> >  struct dev_pm_info {
> >        pm_message_t            power_state;
> > -       unsigned                can_wakeup:1;
> > -       unsigned                should_wakeup:1;
> > +       unsigned int            can_wakeup:1;
> > +       unsigned int            should_wakeup:1;
> >        enum dpm_state          status;         /* Owned by the PM core */
> > -#ifdef CONFIG_PM_SLEEP
> > +#ifdef CONFIG_PM_SLEEP
> >        struct list_head        entry;
> >  #endif
> > +#ifdef CONFIG_PM_RUNTIME
> > +       struct timer_list       suspend_timer;
> > +       unsigned long           timer_expires;
> > +       struct work_struct      work;
> > +       wait_queue_head_t       wait_queue;
> > +       spinlock_t              lock;
> > +       atomic_t                usage_count;
> > +       atomic_t                child_count;
> 
> I suppose child_count has to be atomic?

I'd say so, it's modified in a few places without locking.

> > --- /dev/null
> > +++ linux-2.6/drivers/base/power/runtime.c
> [...]
> > +int __pm_runtime_suspend(struct device *dev, bool from_wq)
> > +       __releases(&dev->power.lock) __acquires(&dev->power.lock)
> [...]
> > +       if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> > +               retval = dev->bus->pm->runtime_suspend(dev);
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +               dev->power.runtime_error = retval;
> > +       } else {
> > +               retval = -ENOSYS;
> > +       }
> 
> Nit: { and } above do not follow the regular coding style.

Well, you've got a very good answer to this from Alan. ;-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC] PCI: Runtime power management
  2009-08-09 21:13 ` Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2009-08-13  0:29   ` [RFC] PCI: Runtime power management Matthew Garrett
@ 2009-08-13  0:29   ` Matthew Garrett
  2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
                       ` (7 more replies)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
  5 siblings, 8 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13  0:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Greg KH, LKML, Linux-pm mailing list, linux-pci, linux-usb

I got a fixed BIOS from Dell and have been able to get this working now. 
It seems entirely happy with USB, but I'd like some sanity checks on 
whether I'm doing this correctly. There's certainly a couple of quirks 
related to setting the ACPI GPE type that would need a little bit of 
work in the ACPI layer, and it breaks ACPI-mediated PCI hotplug though 
that's easy enough to fix by just calling into the hotplug code from the 
core notifier.

This patch builds on top of Rafael's work on systemwide runtime power
management. It supports suspending and resuming PCI devices at runtime,
enabling platform wakeup events that allow the devices to automatically
resume when appropriate. It currently requires platform support, but PCIe
setups could be supported natively once native PCIe PME code has been added
to the kernel.
---
 drivers/pci/pci-acpi.c   |   55 +++++++++++++++++++++++++
 drivers/pci/pci-driver.c |  100 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.c        |   87 ++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h        |    3 +
 include/linux/pci.h      |    3 +
 5 files changed, 248 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index ea15b05..a98a777 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -12,6 +12,7 @@
 #include <linux/pci.h>
 #include <linux/module.h>
 #include <linux/pci-aspm.h>
+#include <linux/pm_runtime.h>
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 
@@ -120,14 +121,62 @@ static int acpi_pci_sleep_wake(struct pci_dev *dev, bool enable)
 	return error;
 }
 
+static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
+{
+	acpi_status status;
+	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
+	struct acpi_device *acpi_dev;
+
+	if (!handle)
+		return -ENODEV;
+
+	status = acpi_bus_get_device(handle, &acpi_dev);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	if (enable) {
+		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
+				  acpi_dev->wakeup.gpe_number,
+				  ACPI_GPE_TYPE_WAKE_RUN);
+		acpi_enable_gpe(acpi_dev->wakeup.gpe_device,
+				acpi_dev->wakeup.gpe_number);
+	} else {
+		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
+				  acpi_dev->wakeup.gpe_number,
+				  ACPI_GPE_TYPE_WAKE);
+		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
+				 acpi_dev->wakeup.gpe_number);
+	}
+	return 0;
+}
+
+
 static struct pci_platform_pm_ops acpi_pci_platform_pm = {
 	.is_manageable = acpi_pci_power_manageable,
 	.set_state = acpi_pci_set_power_state,
 	.choose_state = acpi_pci_choose_state,
 	.can_wakeup = acpi_pci_can_wakeup,
 	.sleep_wake = acpi_pci_sleep_wake,
+	.runtime_wake = acpi_pci_runtime_wake,
 };
 
+static void pci_device_notify(acpi_handle handle, u32 event, void *data)
+{
+	struct device *dev = data;
+
+	if (event == ACPI_NOTIFY_DEVICE_WAKE)
+		pm_runtime_resume(dev);
+}
+
+static void pci_root_bridge_notify(acpi_handle handle, u32 event, void *data)
+{
+	struct device *dev = data;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	if (event == ACPI_NOTIFY_DEVICE_WAKE)
+		pci_bus_pme_event(pci_dev);
+}
+
 /* ACPI bus type */
 static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
 {
@@ -140,6 +189,9 @@ static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
 	*handle = acpi_get_child(DEVICE_ACPI_HANDLE(dev->parent), addr);
 	if (!*handle)
 		return -ENODEV;
+
+	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
+				    pci_device_notify, dev);
 	return 0;
 }
 
@@ -158,6 +210,9 @@ static int acpi_pci_find_root_bridge(struct device *dev, acpi_handle *handle)
 	*handle = acpi_get_pci_rootbridge_handle(seg, bus);
 	if (!*handle)
 		return -ENODEV;
+
+	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
+				    pci_root_bridge_notify, dev);
 	return 0;
 }
 
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d76c4c8..1f605d8 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -11,12 +11,14 @@
 #include <linux/pci.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/interrupt.h>
 #include <linux/device.h>
 #include <linux/mempolicy.h>
 #include <linux/string.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/cpu.h>
+#include <linux/pm_runtime.h>
 #include "pci.h"
 
 /*
@@ -910,6 +912,101 @@ static int pci_pm_restore(struct device *dev)
 
 #endif /* !CONFIG_HIBERNATION */
 
+#ifdef CONFIG_PM_RUNTIME
+
+static int pci_pm_runtime_suspend(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	int error;
+
+	device_set_wakeup_enable(dev, 1);
+	error = pci_enable_runtime_wake(pci_dev, true);
+
+	if (error)
+		return -EBUSY;
+
+	if (pm && pm->runtime_suspend)
+		error = pm->runtime_suspend(dev);
+
+	if (error)
+		goto out;
+
+	error = pci_pm_suspend(dev);
+
+	if (error)
+		goto resume;
+
+	disable_irq(pci_dev->irq);
+	error = pci_pm_suspend_noirq(dev);
+	enable_irq(pci_dev->irq);
+
+	if (error)
+		goto resume_noirq;
+
+	return 0;
+
+resume_noirq:
+	disable_irq(pci_dev->irq);
+	pci_pm_resume_noirq(dev);
+	enable_irq(pci_dev->irq);
+resume:
+	pci_pm_resume(dev);
+out:
+	pci_enable_runtime_wake(pci_dev, false);
+	return error;
+}
+
+static int pci_pm_runtime_resume(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	int error = 0;
+
+	disable_irq(pci_dev->irq);
+	error = pci_pm_resume_noirq(dev);
+	enable_irq(pci_dev->irq);
+
+	if (error)
+		return error;
+
+	error = pci_pm_resume(dev);
+
+	if (error)
+		return error;
+
+	if (pm->runtime_resume)
+		error = pm->runtime_resume(dev);
+
+	if (error)
+		return error;
+
+	error = pci_enable_runtime_wake(pci_dev, false);
+
+	if (error)
+		return error;
+
+	return 0;
+}
+
+static void pci_pm_runtime_idle(struct device *dev)
+{
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+
+	if (pm && pm->runtime_idle)
+		pm->runtime_idle(dev);
+
+	pm_schedule_suspend(dev, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+#define pci_pm_runtime_suspend	NULL
+#define pci_pm_runtime_resume	NULL
+#define pci_pm_runtime_idle	NULL
+
+#endif
+
 struct dev_pm_ops pci_dev_pm_ops = {
 	.prepare = pci_pm_prepare,
 	.complete = pci_pm_complete,
@@ -925,6 +1022,9 @@ struct dev_pm_ops pci_dev_pm_ops = {
 	.thaw_noirq = pci_pm_thaw_noirq,
 	.poweroff_noirq = pci_pm_poweroff_noirq,
 	.restore_noirq = pci_pm_restore_noirq,
+	.runtime_suspend = pci_pm_runtime_suspend,
+	.runtime_resume = pci_pm_runtime_resume,
+	.runtime_idle = pci_pm_runtime_idle,
 };
 
 #define PCI_PM_OPS_PTR	(&pci_dev_pm_ops)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index dbd0f94..ab3a116 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -18,6 +18,7 @@
 #include <linux/log2.h>
 #include <linux/pci-aspm.h>
 #include <linux/pm_wakeup.h>
+#include <linux/pm_runtime.h>
 #include <linux/interrupt.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include <linux/device.h>
@@ -428,6 +429,12 @@ static inline int platform_pci_sleep_wake(struct pci_dev *dev, bool enable)
 			pci_platform_pm->sleep_wake(dev, enable) : -ENODEV;
 }
 
+static inline int platform_pci_runtime_wake(struct pci_dev *dev, bool enable)
+{
+	return pci_platform_pm ?
+			pci_platform_pm->runtime_wake(dev, enable) : -ENODEV;
+}
+
 /**
  * pci_raw_set_power_state - Use PCI PM registers to set the power state of
  *                           given PCI device
@@ -1239,6 +1246,38 @@ int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable)
 }
 
 /**
+ * pci_enable_runtime_wake - enable PCI device as runtime wakeup event source
+ * @dev: PCI device affected
+ * @enable: True to enable event generation; false to disable
+ *
+ * This enables the device as a runtime wakeup event source, or disables it.
+ * This typically requires platform support.
+ *
+ * RETURN VALUE:
+ * 0 is returned on success
+ * -EINVAL is returned if device is not supposed to wake up the system
+ * -ENODEV is returned if platform cannot support runtime PM on the device
+ */
+int pci_enable_runtime_wake(struct pci_dev *dev, bool enable)
+{
+	int error = 0;
+	bool pme_done = false;
+
+	if (!enable && platform_pci_can_wakeup(dev))
+		error = platform_pci_runtime_wake(dev, false);
+
+	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
+		pci_pme_active(dev, enable);
+		pme_done = true;
+	}
+
+	if (enable && platform_pci_can_wakeup(dev))
+		error = platform_pci_runtime_wake(dev, true);
+
+	return pme_done ? 0 : error;
+}
+
+/**
  * pci_wake_from_d3 - enable/disable device to wake up from D3_hot or D3_cold
  * @dev: PCI device to prepare
  * @enable: True to enable wake-up event generation; false to disable
@@ -1346,6 +1385,54 @@ int pci_back_from_sleep(struct pci_dev *dev)
 }
 
 /**
+ * pci_dev_pme_event - check if a device has a pending pme
+ *
+ * @dev: Device to handle.
+ */
+
+int pci_dev_pme_event(struct pci_dev *dev)
+{
+	u16 pmcsr;
+
+	if (!dev->pm_cap)
+		return -ENODEV;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+
+	if (pmcsr & PCI_PM_CTRL_PME_STATUS) {
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+		pm_runtime_get(&dev->dev);
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+/**
+ * pci_bus_pme_event - search for subordinate devices with a pending
+ *		   pme and handle them
+ *
+ * @dev: Parent device to handle
+ */
+int pci_bus_pme_event(struct pci_dev *dev)
+{
+	struct pci_bus *bus;
+	struct pci_dev *pdev;
+
+	if (pci_is_root_bus(dev->bus))
+		bus = dev->bus;
+	else if (dev->subordinate)
+		bus = dev->subordinate;
+	else
+		return -ENODEV;
+
+	list_for_each_entry(pdev, &bus->devices, bus_list)
+		pci_dev_pme_event(pdev);
+
+	return 0;
+}
+
+/**
  * pci_pm_init - Initialize PM functions of given PCI device
  * @dev: PCI device to handle.
  */
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index f73bcbe..a81aff2 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -34,6 +34,8 @@ extern int pci_mmap_fits(struct pci_dev *pdev, int resno,
  *
  * @sleep_wake: enables/disables the system wake up capability of given device
  *
+ * @runtime_wake: enables/disables the runtime wakeup capability of given device
+ *
  * If given platform is generally capable of power managing PCI devices, all of
  * these callbacks are mandatory.
  */
@@ -43,6 +45,7 @@ struct pci_platform_pm_ops {
 	pci_power_t (*choose_state)(struct pci_dev *dev);
 	bool (*can_wakeup)(struct pci_dev *dev);
 	int (*sleep_wake)(struct pci_dev *dev, bool enable);
+	int (*runtime_wake)(struct pci_dev *dev, bool enable);
 };
 
 extern int pci_set_platform_pm(struct pci_platform_pm_ops *ops);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 115fb7b..8a3fea0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -734,10 +734,13 @@ pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
 bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);
 void pci_pme_active(struct pci_dev *dev, bool enable);
 int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable);
+int pci_enable_runtime_wake(struct pci_dev *dev, bool enable);
 int pci_wake_from_d3(struct pci_dev *dev, bool enable);
 pci_power_t pci_target_state(struct pci_dev *dev);
 int pci_prepare_to_sleep(struct pci_dev *dev);
 int pci_back_from_sleep(struct pci_dev *dev);
+int pci_dev_pme_event(struct pci_dev *dev);
+int pci_bus_pme_event(struct pci_dev *dev);
 
 /* Functions for PCI Hotplug drivers to use */
 int pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC] PCI: Runtime power management
  2009-08-09 21:13 ` Rafael J. Wysocki
  2009-08-12 10:37   ` Magnus Damm
  2009-08-12 10:37   ` Magnus Damm
@ 2009-08-13  0:29   ` Matthew Garrett
  2009-08-13  0:29   ` Matthew Garrett
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13  0:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

I got a fixed BIOS from Dell and have been able to get this working now. 
It seems entirely happy with USB, but I'd like some sanity checks on 
whether I'm doing this correctly. There's certainly a couple of quirks 
related to setting the ACPI GPE type that would need a little bit of 
work in the ACPI layer, and it breaks ACPI-mediated PCI hotplug though 
that's easy enough to fix by just calling into the hotplug code from the 
core notifier.

This patch builds on top of Rafael's work on systemwide runtime power
management. It supports suspending and resuming PCI devices at runtime,
enabling platform wakeup events that allow the devices to automatically
resume when appropriate. It currently requires platform support, but PCIe
setups could be supported natively once native PCIe PME code has been added
to the kernel.
---
 drivers/pci/pci-acpi.c   |   55 +++++++++++++++++++++++++
 drivers/pci/pci-driver.c |  100 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.c        |   87 ++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h        |    3 +
 include/linux/pci.h      |    3 +
 5 files changed, 248 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index ea15b05..a98a777 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -12,6 +12,7 @@
 #include <linux/pci.h>
 #include <linux/module.h>
 #include <linux/pci-aspm.h>
+#include <linux/pm_runtime.h>
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 
@@ -120,14 +121,62 @@ static int acpi_pci_sleep_wake(struct pci_dev *dev, bool enable)
 	return error;
 }
 
+static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
+{
+	acpi_status status;
+	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
+	struct acpi_device *acpi_dev;
+
+	if (!handle)
+		return -ENODEV;
+
+	status = acpi_bus_get_device(handle, &acpi_dev);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	if (enable) {
+		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
+				  acpi_dev->wakeup.gpe_number,
+				  ACPI_GPE_TYPE_WAKE_RUN);
+		acpi_enable_gpe(acpi_dev->wakeup.gpe_device,
+				acpi_dev->wakeup.gpe_number);
+	} else {
+		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
+				  acpi_dev->wakeup.gpe_number,
+				  ACPI_GPE_TYPE_WAKE);
+		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
+				 acpi_dev->wakeup.gpe_number);
+	}
+	return 0;
+}
+
+
 static struct pci_platform_pm_ops acpi_pci_platform_pm = {
 	.is_manageable = acpi_pci_power_manageable,
 	.set_state = acpi_pci_set_power_state,
 	.choose_state = acpi_pci_choose_state,
 	.can_wakeup = acpi_pci_can_wakeup,
 	.sleep_wake = acpi_pci_sleep_wake,
+	.runtime_wake = acpi_pci_runtime_wake,
 };
 
+static void pci_device_notify(acpi_handle handle, u32 event, void *data)
+{
+	struct device *dev = data;
+
+	if (event == ACPI_NOTIFY_DEVICE_WAKE)
+		pm_runtime_resume(dev);
+}
+
+static void pci_root_bridge_notify(acpi_handle handle, u32 event, void *data)
+{
+	struct device *dev = data;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	if (event == ACPI_NOTIFY_DEVICE_WAKE)
+		pci_bus_pme_event(pci_dev);
+}
+
 /* ACPI bus type */
 static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
 {
@@ -140,6 +189,9 @@ static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
 	*handle = acpi_get_child(DEVICE_ACPI_HANDLE(dev->parent), addr);
 	if (!*handle)
 		return -ENODEV;
+
+	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
+				    pci_device_notify, dev);
 	return 0;
 }
 
@@ -158,6 +210,9 @@ static int acpi_pci_find_root_bridge(struct device *dev, acpi_handle *handle)
 	*handle = acpi_get_pci_rootbridge_handle(seg, bus);
 	if (!*handle)
 		return -ENODEV;
+
+	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
+				    pci_root_bridge_notify, dev);
 	return 0;
 }
 
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d76c4c8..1f605d8 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -11,12 +11,14 @@
 #include <linux/pci.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/interrupt.h>
 #include <linux/device.h>
 #include <linux/mempolicy.h>
 #include <linux/string.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/cpu.h>
+#include <linux/pm_runtime.h>
 #include "pci.h"
 
 /*
@@ -910,6 +912,101 @@ static int pci_pm_restore(struct device *dev)
 
 #endif /* !CONFIG_HIBERNATION */
 
+#ifdef CONFIG_PM_RUNTIME
+
+static int pci_pm_runtime_suspend(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	int error;
+
+	device_set_wakeup_enable(dev, 1);
+	error = pci_enable_runtime_wake(pci_dev, true);
+
+	if (error)
+		return -EBUSY;
+
+	if (pm && pm->runtime_suspend)
+		error = pm->runtime_suspend(dev);
+
+	if (error)
+		goto out;
+
+	error = pci_pm_suspend(dev);
+
+	if (error)
+		goto resume;
+
+	disable_irq(pci_dev->irq);
+	error = pci_pm_suspend_noirq(dev);
+	enable_irq(pci_dev->irq);
+
+	if (error)
+		goto resume_noirq;
+
+	return 0;
+
+resume_noirq:
+	disable_irq(pci_dev->irq);
+	pci_pm_resume_noirq(dev);
+	enable_irq(pci_dev->irq);
+resume:
+	pci_pm_resume(dev);
+out:
+	pci_enable_runtime_wake(pci_dev, false);
+	return error;
+}
+
+static int pci_pm_runtime_resume(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	int error = 0;
+
+	disable_irq(pci_dev->irq);
+	error = pci_pm_resume_noirq(dev);
+	enable_irq(pci_dev->irq);
+
+	if (error)
+		return error;
+
+	error = pci_pm_resume(dev);
+
+	if (error)
+		return error;
+
+	if (pm->runtime_resume)
+		error = pm->runtime_resume(dev);
+
+	if (error)
+		return error;
+
+	error = pci_enable_runtime_wake(pci_dev, false);
+
+	if (error)
+		return error;
+
+	return 0;
+}
+
+static void pci_pm_runtime_idle(struct device *dev)
+{
+	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+
+	if (pm && pm->runtime_idle)
+		pm->runtime_idle(dev);
+
+	pm_schedule_suspend(dev, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+#define pci_pm_runtime_suspend	NULL
+#define pci_pm_runtime_resume	NULL
+#define pci_pm_runtime_idle	NULL
+
+#endif
+
 struct dev_pm_ops pci_dev_pm_ops = {
 	.prepare = pci_pm_prepare,
 	.complete = pci_pm_complete,
@@ -925,6 +1022,9 @@ struct dev_pm_ops pci_dev_pm_ops = {
 	.thaw_noirq = pci_pm_thaw_noirq,
 	.poweroff_noirq = pci_pm_poweroff_noirq,
 	.restore_noirq = pci_pm_restore_noirq,
+	.runtime_suspend = pci_pm_runtime_suspend,
+	.runtime_resume = pci_pm_runtime_resume,
+	.runtime_idle = pci_pm_runtime_idle,
 };
 
 #define PCI_PM_OPS_PTR	(&pci_dev_pm_ops)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index dbd0f94..ab3a116 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -18,6 +18,7 @@
 #include <linux/log2.h>
 #include <linux/pci-aspm.h>
 #include <linux/pm_wakeup.h>
+#include <linux/pm_runtime.h>
 #include <linux/interrupt.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include <linux/device.h>
@@ -428,6 +429,12 @@ static inline int platform_pci_sleep_wake(struct pci_dev *dev, bool enable)
 			pci_platform_pm->sleep_wake(dev, enable) : -ENODEV;
 }
 
+static inline int platform_pci_runtime_wake(struct pci_dev *dev, bool enable)
+{
+	return pci_platform_pm ?
+			pci_platform_pm->runtime_wake(dev, enable) : -ENODEV;
+}
+
 /**
  * pci_raw_set_power_state - Use PCI PM registers to set the power state of
  *                           given PCI device
@@ -1239,6 +1246,38 @@ int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable)
 }
 
 /**
+ * pci_enable_runtime_wake - enable PCI device as runtime wakeup event source
+ * @dev: PCI device affected
+ * @enable: True to enable event generation; false to disable
+ *
+ * This enables the device as a runtime wakeup event source, or disables it.
+ * This typically requires platform support.
+ *
+ * RETURN VALUE:
+ * 0 is returned on success
+ * -EINVAL is returned if device is not supposed to wake up the system
+ * -ENODEV is returned if platform cannot support runtime PM on the device
+ */
+int pci_enable_runtime_wake(struct pci_dev *dev, bool enable)
+{
+	int error = 0;
+	bool pme_done = false;
+
+	if (!enable && platform_pci_can_wakeup(dev))
+		error = platform_pci_runtime_wake(dev, false);
+
+	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
+		pci_pme_active(dev, enable);
+		pme_done = true;
+	}
+
+	if (enable && platform_pci_can_wakeup(dev))
+		error = platform_pci_runtime_wake(dev, true);
+
+	return pme_done ? 0 : error;
+}
+
+/**
  * pci_wake_from_d3 - enable/disable device to wake up from D3_hot or D3_cold
  * @dev: PCI device to prepare
  * @enable: True to enable wake-up event generation; false to disable
@@ -1346,6 +1385,54 @@ int pci_back_from_sleep(struct pci_dev *dev)
 }
 
 /**
+ * pci_dev_pme_event - check if a device has a pending pme
+ *
+ * @dev: Device to handle.
+ */
+
+int pci_dev_pme_event(struct pci_dev *dev)
+{
+	u16 pmcsr;
+
+	if (!dev->pm_cap)
+		return -ENODEV;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+
+	if (pmcsr & PCI_PM_CTRL_PME_STATUS) {
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+		pm_runtime_get(&dev->dev);
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+/**
+ * pci_bus_pme_event - search for subordinate devices with a pending
+ *		   pme and handle them
+ *
+ * @dev: Parent device to handle
+ */
+int pci_bus_pme_event(struct pci_dev *dev)
+{
+	struct pci_bus *bus;
+	struct pci_dev *pdev;
+
+	if (pci_is_root_bus(dev->bus))
+		bus = dev->bus;
+	else if (dev->subordinate)
+		bus = dev->subordinate;
+	else
+		return -ENODEV;
+
+	list_for_each_entry(pdev, &bus->devices, bus_list)
+		pci_dev_pme_event(pdev);
+
+	return 0;
+}
+
+/**
  * pci_pm_init - Initialize PM functions of given PCI device
  * @dev: PCI device to handle.
  */
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index f73bcbe..a81aff2 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -34,6 +34,8 @@ extern int pci_mmap_fits(struct pci_dev *pdev, int resno,
  *
  * @sleep_wake: enables/disables the system wake up capability of given device
  *
+ * @runtime_wake: enables/disables the runtime wakeup capability of given device
+ *
  * If given platform is generally capable of power managing PCI devices, all of
  * these callbacks are mandatory.
  */
@@ -43,6 +45,7 @@ struct pci_platform_pm_ops {
 	pci_power_t (*choose_state)(struct pci_dev *dev);
 	bool (*can_wakeup)(struct pci_dev *dev);
 	int (*sleep_wake)(struct pci_dev *dev, bool enable);
+	int (*runtime_wake)(struct pci_dev *dev, bool enable);
 };
 
 extern int pci_set_platform_pm(struct pci_platform_pm_ops *ops);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 115fb7b..8a3fea0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -734,10 +734,13 @@ pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
 bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);
 void pci_pme_active(struct pci_dev *dev, bool enable);
 int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable);
+int pci_enable_runtime_wake(struct pci_dev *dev, bool enable);
 int pci_wake_from_d3(struct pci_dev *dev, bool enable);
 pci_power_t pci_target_state(struct pci_dev *dev);
 int pci_prepare_to_sleep(struct pci_dev *dev);
 int pci_back_from_sleep(struct pci_dev *dev);
+int pci_dev_pme_event(struct pci_dev *dev);
+int pci_bus_pme_event(struct pci_dev *dev);
 
 /* Functions for PCI Hotplug drivers to use */
 int pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13  0:29   ` Matthew Garrett
@ 2009-08-13  0:35     ` Matthew Garrett
  2009-08-13 12:16       ` Oliver Neukum
                         ` (3 more replies)
  2009-08-13  0:35     ` Matthew Garrett
                       ` (6 subsequent siblings)
  7 siblings, 4 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13  0:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Greg KH, LKML, Linux-pm mailing list, linux-pci, linux-usb

The PCI runtime power management code means that we can implement support
for powering down PCI host controllers. This patchset adds core support code
along with a new hcd flag (HCD_RUNTIME_PM) that allows hosts to opt in if
they support this functionality. Successfully tested with Intel EHCI and
UHCI, though the UHCI code may need to pay more attention to vendor-specific
features.

The power savings from this are measurable but not huge - it still seems 
like a decent optimisation. The main problem is that BIOS bugs on some 
Dell laptops will kill USB if this is used, so we either default to off 
or add some quirks to handle that case (I have some ideas in that 
respect).

---
 drivers/usb/core/hcd-pci.c  |   13 +++++++++++++
 drivers/usb/core/hcd.c      |    9 +++++++++
 drivers/usb/core/hcd.h      |    1 +
 drivers/usb/host/ehci-pci.c |    2 +-
 drivers/usb/host/uhci-hcd.c |    2 +-
 5 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 91f2885..e86324b 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -363,6 +363,18 @@ static int hcd_pci_restore(struct device *dev)
 	return resume_common(dev, true);
 }
 
+static int hcd_pci_runtime_suspend(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct usb_hcd *hcd = pci_get_drvdata(pci_dev);
+	struct usb_device *rhdev = hcd->self.root_hub;
+
+	if (!(hcd->driver->flags & HCD_RUNTIME_PM))
+		return -EINVAL;
+
+	return 0;
+}
+
 struct dev_pm_ops usb_hcd_pci_pm_ops = {
 	.suspend	= hcd_pci_suspend,
 	.suspend_noirq	= hcd_pci_suspend_noirq,
@@ -376,6 +388,7 @@ struct dev_pm_ops usb_hcd_pci_pm_ops = {
 	.poweroff_noirq	= hcd_pci_suspend_noirq,
 	.restore_noirq	= hcd_pci_resume_noirq,
 	.restore	= hcd_pci_restore,
+	.runtime_suspend = hcd_pci_runtime_suspend,
 };
 EXPORT_SYMBOL_GPL(usb_hcd_pci_pm_ops);
 
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 95ccfa0..a8f8784 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -38,6 +38,7 @@
 #include <asm/unaligned.h>
 #include <linux/platform_device.h>
 #include <linux/workqueue.h>
+#include <linux/pm_runtime.h>
 
 #include <linux/usb.h>
 
@@ -1747,6 +1748,7 @@ int hcd_bus_suspend(struct usb_device *rhdev, pm_message_t msg)
 	if (status == 0) {
 		usb_set_device_state(rhdev, USB_STATE_SUSPENDED);
 		hcd->state = HC_STATE_SUSPENDED;
+		pm_runtime_put(hcd->self.controller);
 	} else {
 		hcd->state = old_state;
 		dev_dbg(&rhdev->dev, "bus %s fail, err %d\n",
@@ -1768,6 +1770,7 @@ int hcd_bus_resume(struct usb_device *rhdev, pm_message_t msg)
 	if (hcd->state == HC_STATE_RUNNING)
 		return 0;
 
+	pm_runtime_get_sync(hcd->self.controller);
 	hcd->state = HC_STATE_RESUMING;
 	status = hcd->driver->bus_resume(hcd);
 	if (status == 0) {
@@ -1781,6 +1784,7 @@ int hcd_bus_resume(struct usb_device *rhdev, pm_message_t msg)
 		hcd->state = old_state;
 		dev_dbg(&rhdev->dev, "bus %s fail, err %d\n",
 				"resume", status);
+		pm_runtime_put(hcd->self.controller);
 		if (status != -ESHUTDOWN)
 			usb_hc_died(hcd);
 	}
@@ -1968,6 +1972,9 @@ struct usb_hcd *usb_create_hcd (const struct hc_driver *driver,
 	INIT_WORK(&hcd->wakeup_work, hcd_resume_work);
 #endif
 
+	pm_runtime_enable(dev);
+	pm_runtime_get(dev);
+
 	hcd->driver = driver;
 	hcd->product_desc = (driver->product_desc) ? driver->product_desc :
 			"USB Host Controller";
@@ -1979,6 +1986,8 @@ static void hcd_release (struct kref *kref)
 {
 	struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref);
 
+	pm_runtime_put_sync(hcd->self.controller);
+
 	kfree(hcd);
 }
 
diff --git a/drivers/usb/core/hcd.h b/drivers/usb/core/hcd.h
index ec5c67e..4dc12a8 100644
--- a/drivers/usb/core/hcd.h
+++ b/drivers/usb/core/hcd.h
@@ -171,6 +171,7 @@ struct hc_driver {
 	int	flags;
 #define	HCD_MEMORY	0x0001		/* HC regs use memory (else I/O) */
 #define	HCD_LOCAL_MEM	0x0002		/* HC needs local memory */
+#define	HCD_RUNTIME_PM	0x0004		/* HC supports handling runtime PM */
 #define	HCD_USB11	0x0010		/* USB 1.1 */
 #define	HCD_USB2	0x0020		/* USB 2.0 */
 #define	HCD_USB3	0x0040		/* USB 3.0 */
diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c
index c2f1b7d..9583621 100644
--- a/drivers/usb/host/ehci-pci.c
+++ b/drivers/usb/host/ehci-pci.c
@@ -368,7 +368,7 @@ static const struct hc_driver ehci_pci_hc_driver = {
 	 * generic hardware linkage
 	 */
 	.irq =			ehci_irq,
-	.flags =		HCD_MEMORY | HCD_USB2,
+	.flags =		HCD_MEMORY | HCD_USB2 | HCD_RUNTIME_PM,
 
 	/*
 	 * basic lifecycle operations
diff --git a/drivers/usb/host/uhci-hcd.c b/drivers/usb/host/uhci-hcd.c
index 274751b..36a3a4a 100644
--- a/drivers/usb/host/uhci-hcd.c
+++ b/drivers/usb/host/uhci-hcd.c
@@ -900,7 +900,7 @@ static const struct hc_driver uhci_driver = {
 
 	/* Generic hardware linkage */
 	.irq =			uhci_irq,
-	.flags =		HCD_USB11,
+	.flags =		HCD_USB11 | HCD_RUNTIME_PM,
 
 	/* Basic lifecycle operations */
 	.reset =		uhci_init,

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13  0:29   ` Matthew Garrett
  2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
@ 2009-08-13  0:35     ` Matthew Garrett
  2009-08-13 15:17     ` [RFC] PCI: Runtime power management Alan Stern
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13  0:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

The PCI runtime power management code means that we can implement support
for powering down PCI host controllers. This patchset adds core support code
along with a new hcd flag (HCD_RUNTIME_PM) that allows hosts to opt in if
they support this functionality. Successfully tested with Intel EHCI and
UHCI, though the UHCI code may need to pay more attention to vendor-specific
features.

The power savings from this are measurable but not huge - it still seems 
like a decent optimisation. The main problem is that BIOS bugs on some 
Dell laptops will kill USB if this is used, so we either default to off 
or add some quirks to handle that case (I have some ideas in that 
respect).

---
 drivers/usb/core/hcd-pci.c  |   13 +++++++++++++
 drivers/usb/core/hcd.c      |    9 +++++++++
 drivers/usb/core/hcd.h      |    1 +
 drivers/usb/host/ehci-pci.c |    2 +-
 drivers/usb/host/uhci-hcd.c |    2 +-
 5 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 91f2885..e86324b 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -363,6 +363,18 @@ static int hcd_pci_restore(struct device *dev)
 	return resume_common(dev, true);
 }
 
+static int hcd_pci_runtime_suspend(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct usb_hcd *hcd = pci_get_drvdata(pci_dev);
+	struct usb_device *rhdev = hcd->self.root_hub;
+
+	if (!(hcd->driver->flags & HCD_RUNTIME_PM))
+		return -EINVAL;
+
+	return 0;
+}
+
 struct dev_pm_ops usb_hcd_pci_pm_ops = {
 	.suspend	= hcd_pci_suspend,
 	.suspend_noirq	= hcd_pci_suspend_noirq,
@@ -376,6 +388,7 @@ struct dev_pm_ops usb_hcd_pci_pm_ops = {
 	.poweroff_noirq	= hcd_pci_suspend_noirq,
 	.restore_noirq	= hcd_pci_resume_noirq,
 	.restore	= hcd_pci_restore,
+	.runtime_suspend = hcd_pci_runtime_suspend,
 };
 EXPORT_SYMBOL_GPL(usb_hcd_pci_pm_ops);
 
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 95ccfa0..a8f8784 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -38,6 +38,7 @@
 #include <asm/unaligned.h>
 #include <linux/platform_device.h>
 #include <linux/workqueue.h>
+#include <linux/pm_runtime.h>
 
 #include <linux/usb.h>
 
@@ -1747,6 +1748,7 @@ int hcd_bus_suspend(struct usb_device *rhdev, pm_message_t msg)
 	if (status == 0) {
 		usb_set_device_state(rhdev, USB_STATE_SUSPENDED);
 		hcd->state = HC_STATE_SUSPENDED;
+		pm_runtime_put(hcd->self.controller);
 	} else {
 		hcd->state = old_state;
 		dev_dbg(&rhdev->dev, "bus %s fail, err %d\n",
@@ -1768,6 +1770,7 @@ int hcd_bus_resume(struct usb_device *rhdev, pm_message_t msg)
 	if (hcd->state == HC_STATE_RUNNING)
 		return 0;
 
+	pm_runtime_get_sync(hcd->self.controller);
 	hcd->state = HC_STATE_RESUMING;
 	status = hcd->driver->bus_resume(hcd);
 	if (status == 0) {
@@ -1781,6 +1784,7 @@ int hcd_bus_resume(struct usb_device *rhdev, pm_message_t msg)
 		hcd->state = old_state;
 		dev_dbg(&rhdev->dev, "bus %s fail, err %d\n",
 				"resume", status);
+		pm_runtime_put(hcd->self.controller);
 		if (status != -ESHUTDOWN)
 			usb_hc_died(hcd);
 	}
@@ -1968,6 +1972,9 @@ struct usb_hcd *usb_create_hcd (const struct hc_driver *driver,
 	INIT_WORK(&hcd->wakeup_work, hcd_resume_work);
 #endif
 
+	pm_runtime_enable(dev);
+	pm_runtime_get(dev);
+
 	hcd->driver = driver;
 	hcd->product_desc = (driver->product_desc) ? driver->product_desc :
 			"USB Host Controller";
@@ -1979,6 +1986,8 @@ static void hcd_release (struct kref *kref)
 {
 	struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref);
 
+	pm_runtime_put_sync(hcd->self.controller);
+
 	kfree(hcd);
 }
 
diff --git a/drivers/usb/core/hcd.h b/drivers/usb/core/hcd.h
index ec5c67e..4dc12a8 100644
--- a/drivers/usb/core/hcd.h
+++ b/drivers/usb/core/hcd.h
@@ -171,6 +171,7 @@ struct hc_driver {
 	int	flags;
 #define	HCD_MEMORY	0x0001		/* HC regs use memory (else I/O) */
 #define	HCD_LOCAL_MEM	0x0002		/* HC needs local memory */
+#define	HCD_RUNTIME_PM	0x0004		/* HC supports handling runtime PM */
 #define	HCD_USB11	0x0010		/* USB 1.1 */
 #define	HCD_USB2	0x0020		/* USB 2.0 */
 #define	HCD_USB3	0x0040		/* USB 3.0 */
diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c
index c2f1b7d..9583621 100644
--- a/drivers/usb/host/ehci-pci.c
+++ b/drivers/usb/host/ehci-pci.c
@@ -368,7 +368,7 @@ static const struct hc_driver ehci_pci_hc_driver = {
 	 * generic hardware linkage
 	 */
 	.irq =			ehci_irq,
-	.flags =		HCD_MEMORY | HCD_USB2,
+	.flags =		HCD_MEMORY | HCD_USB2 | HCD_RUNTIME_PM,
 
 	/*
 	 * basic lifecycle operations
diff --git a/drivers/usb/host/uhci-hcd.c b/drivers/usb/host/uhci-hcd.c
index 274751b..36a3a4a 100644
--- a/drivers/usb/host/uhci-hcd.c
+++ b/drivers/usb/host/uhci-hcd.c
@@ -900,7 +900,7 @@ static const struct hc_driver uhci_driver = {
 
 	/* Generic hardware linkage */
 	.irq =			uhci_irq,
-	.flags =		HCD_USB11,
+	.flags =		HCD_USB11 | HCD_RUNTIME_PM,
 
 	/* Basic lifecycle operations */
 	.reset =		uhci_init,

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
  2009-08-13 12:16       ` Oliver Neukum
@ 2009-08-13 12:16       ` Oliver Neukum
  2009-08-13 12:30         ` Matthew Garrett
  2009-08-13 12:30         ` [linux-pm] " Matthew Garrett
  2009-08-13 15:22       ` Alan Stern
  2009-08-13 15:22       ` Alan Stern
  3 siblings, 2 replies; 90+ messages in thread
From: Oliver Neukum @ 2009-08-13 12:16 UTC (permalink / raw)
  To: linux-pm
  Cc: Matthew Garrett, Rafael J. Wysocki, linux-usb, linux-pci, Greg KH, LKML

Am Donnerstag, 13. August 2009 02:35:44 schrieb Matthew Garrett:

> The power savings from this are measurable but not huge - it still seems

How large?

> like a decent optimisation. The main problem is that BIOS bugs on some
> Dell laptops will kill USB if this is used, so we either default to off
> or add some quirks to handle that case (I have some ideas in that
> respect).

Your earlier failures don't look promising regarding BIOSes.
What do you have in mind?

> @@ -1968,6 +1972,9 @@ struct usb_hcd *usb_create_hcd (const struct
> hc_driver *driver, INIT_WORK(&hcd->wakeup_work, hcd_resume_work);
>  #endif
>
> +	pm_runtime_enable(dev);

So you don't get a reference from that?

> +	pm_runtime_get(dev);

What happens if you get a runtime suspend request in between? Is this a flaw
of the API?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
@ 2009-08-13 12:16       ` Oliver Neukum
  2009-08-13 12:16       ` [linux-pm] " Oliver Neukum
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 90+ messages in thread
From: Oliver Neukum @ 2009-08-13 12:16 UTC (permalink / raw)
  To: linux-pm; +Cc: Greg KH, linux-pci, linux-usb, LKML

Am Donnerstag, 13. August 2009 02:35:44 schrieb Matthew Garrett:

> The power savings from this are measurable but not huge - it still seems

How large?

> like a decent optimisation. The main problem is that BIOS bugs on some
> Dell laptops will kill USB if this is used, so we either default to off
> or add some quirks to handle that case (I have some ideas in that
> respect).

Your earlier failures don't look promising regarding BIOSes.
What do you have in mind?

> @@ -1968,6 +1972,9 @@ struct usb_hcd *usb_create_hcd (const struct
> hc_driver *driver, INIT_WORK(&hcd->wakeup_work, hcd_resume_work);
>  #endif
>
> +	pm_runtime_enable(dev);

So you don't get a reference from that?

> +	pm_runtime_get(dev);

What happens if you get a runtime suspend request in between? Is this a flaw
of the API?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 12:16       ` [linux-pm] " Oliver Neukum
  2009-08-13 12:30         ` Matthew Garrett
@ 2009-08-13 12:30         ` Matthew Garrett
  2009-08-13 14:26           ` Oliver Neukum
  2009-08-13 14:26           ` [linux-pm] " Oliver Neukum
  1 sibling, 2 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 12:30 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: linux-pm, Rafael J. Wysocki, linux-usb, linux-pci, Greg KH, LKML

On Thu, Aug 13, 2009 at 02:16:41PM +0200, Oliver Neukum wrote:
> Am Donnerstag, 13. August 2009 02:35:44 schrieb Matthew Garrett:
> 
> > The power savings from this are measurable but not huge - it still seems
> 
> How large?

About 0.2W on an ich9 system.

> > like a decent optimisation. The main problem is that BIOS bugs on some
> > Dell laptops will kill USB if this is used, so we either default to off
> > or add some quirks to handle that case (I have some ideas in that
> > respect).
> 
> Your earlier failures don't look promising regarding BIOSes.
> What do you have in mind?

They range from pragmatic to ugly. We could blacklist all Dells, though 
I'm trying to find out if there's a BIOS date that guarantees the system 
is fixed. Alternatively, it's a single-line bug in the DSDT - we could 
implement some kind of fixup in the ACPI parsing code. I find the latter 
interesting but possibly too hideous to live :)

> > @@ -1968,6 +1972,9 @@ struct usb_hcd *usb_create_hcd (const struct
> > hc_driver *driver, INIT_WORK(&hcd->wakeup_work, hcd_resume_work);
> >  #endif
> >
> > +	pm_runtime_enable(dev);
> 
> So you don't get a reference from that?

No, but...

> > +	pm_runtime_get(dev);
> 
> What happens if you get a runtime suspend request in between? Is this a flaw
> of the API?

I suspect that just swapping the order of those two lines would be fine.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 12:16       ` [linux-pm] " Oliver Neukum
@ 2009-08-13 12:30         ` Matthew Garrett
  2009-08-13 12:30         ` [linux-pm] " Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 12:30 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Greg KH, linux-pci, linux-usb, LKML, linux-pm

On Thu, Aug 13, 2009 at 02:16:41PM +0200, Oliver Neukum wrote:
> Am Donnerstag, 13. August 2009 02:35:44 schrieb Matthew Garrett:
> 
> > The power savings from this are measurable but not huge - it still seems
> 
> How large?

About 0.2W on an ich9 system.

> > like a decent optimisation. The main problem is that BIOS bugs on some
> > Dell laptops will kill USB if this is used, so we either default to off
> > or add some quirks to handle that case (I have some ideas in that
> > respect).
> 
> Your earlier failures don't look promising regarding BIOSes.
> What do you have in mind?

They range from pragmatic to ugly. We could blacklist all Dells, though 
I'm trying to find out if there's a BIOS date that guarantees the system 
is fixed. Alternatively, it's a single-line bug in the DSDT - we could 
implement some kind of fixup in the ACPI parsing code. I find the latter 
interesting but possibly too hideous to live :)

> > @@ -1968,6 +1972,9 @@ struct usb_hcd *usb_create_hcd (const struct
> > hc_driver *driver, INIT_WORK(&hcd->wakeup_work, hcd_resume_work);
> >  #endif
> >
> > +	pm_runtime_enable(dev);
> 
> So you don't get a reference from that?

No, but...

> > +	pm_runtime_get(dev);
> 
> What happens if you get a runtime suspend request in between? Is this a flaw
> of the API?

I suspect that just swapping the order of those two lines would be fine.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 12:30         ` [linux-pm] " Matthew Garrett
  2009-08-13 14:26           ` Oliver Neukum
@ 2009-08-13 14:26           ` Oliver Neukum
  2009-08-13 21:42             ` Matthew Garrett
  2009-08-13 21:42             ` Matthew Garrett
  1 sibling, 2 replies; 90+ messages in thread
From: Oliver Neukum @ 2009-08-13 14:26 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-pm, Rafael J. Wysocki, linux-usb, linux-pci, Greg KH, LKML

Am Donnerstag, 13. August 2009 14:30:34 schrieb Matthew Garrett:
> > Your earlier failures don't look promising regarding BIOSes.
> > What do you have in mind?
>
> They range from pragmatic to ugly. We could blacklist all Dells, though
> I'm trying to find out if there's a BIOS date that guarantees the system
> is fixed. Alternatively, it's a single-line bug in the DSDT - we could
> implement some kind of fixup in the ACPI parsing code. I find the latter
> interesting but possibly too hideous to live :)

Is there any indication only those BIOSes are affected?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 12:30         ` [linux-pm] " Matthew Garrett
@ 2009-08-13 14:26           ` Oliver Neukum
  2009-08-13 14:26           ` [linux-pm] " Oliver Neukum
  1 sibling, 0 replies; 90+ messages in thread
From: Oliver Neukum @ 2009-08-13 14:26 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: Greg KH, linux-pci, linux-usb, LKML, linux-pm

Am Donnerstag, 13. August 2009 14:30:34 schrieb Matthew Garrett:
> > Your earlier failures don't look promising regarding BIOSes.
> > What do you have in mind?
>
> They range from pragmatic to ugly. We could blacklist all Dells, though
> I'm trying to find out if there's a BIOS date that guarantees the system
> is fixed. Alternatively, it's a single-line bug in the DSDT - we could
> implement some kind of fixup in the ACPI parsing code. I find the latter
> interesting but possibly too hideous to live :)

Is there any indication only those BIOSes are affected?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13  0:29   ` Matthew Garrett
  2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
  2009-08-13  0:35     ` Matthew Garrett
@ 2009-08-13 15:17     ` Alan Stern
  2009-08-13 21:47       ` Matthew Garrett
  2009-08-13 21:47       ` Matthew Garrett
  2009-08-13 15:17     ` Alan Stern
                       ` (4 subsequent siblings)
  7 siblings, 2 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-13 15:17 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Greg KH, LKML, Linux-pm mailing list,
	linux-pci, linux-usb

On Thu, 13 Aug 2009, Matthew Garrett wrote:

> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c

> +#ifdef CONFIG_PM_RUNTIME
> +
> +static int pci_pm_runtime_suspend(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error;
> +
> +	device_set_wakeup_enable(dev, 1);

This is a userspace policy parameter.  Kernel code should not alter it.
Instead you should test device_may_wakeup.

> +	error = pci_enable_runtime_wake(pci_dev, true);
> +
> +	if (error)
> +		return -EBUSY;
> +
> +	if (pm && pm->runtime_suspend)
> +		error = pm->runtime_suspend(dev);
> +
> +	if (error)
> +		goto out;
> +
> +	error = pci_pm_suspend(dev);
> +
> +	if (error)
> +		goto resume;
> +
> +	disable_irq(pci_dev->irq);
> +	error = pci_pm_suspend_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		goto resume_noirq;
> +
> +	return 0;
> +
> +resume_noirq:
> +	disable_irq(pci_dev->irq);
> +	pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +resume:
> +	pci_pm_resume(dev);
> +out:
> +	pci_enable_runtime_wake(pci_dev, false);
> +	return error;
> +}

The goto statements and unwinding code don't match up.

> +static int pci_pm_runtime_resume(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error = 0;
> +
> +	disable_irq(pci_dev->irq);
> +	error = pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_pm_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	if (pm->runtime_resume)
> +		error = pm->runtime_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_enable_runtime_wake(pci_dev, false);
> +
> +	if (error)
> +		return error;
> +
> +	return 0;
> +}

Log an error message when something goes wrong?

> +static void pci_pm_runtime_idle(struct device *dev)
> +{
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +
> +	if (pm && pm->runtime_idle)
> +		pm->runtime_idle(dev);
> +
> +	pm_schedule_suspend(dev, 0);
> +}

This misses the point.  The whole idea of runtime_idle is to tell you 
that the device is idle and might be ready to be suspended.  If you're 
going to call pm_schedule_suspend anyway, there's no reason to invoke 
pm->runtime_idle.

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13  0:29   ` Matthew Garrett
                       ` (2 preceding siblings ...)
  2009-08-13 15:17     ` [RFC] PCI: Runtime power management Alan Stern
@ 2009-08-13 15:17     ` Alan Stern
  2009-08-14 17:37     ` Jesse Barnes
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-13 15:17 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Thu, 13 Aug 2009, Matthew Garrett wrote:

> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c

> +#ifdef CONFIG_PM_RUNTIME
> +
> +static int pci_pm_runtime_suspend(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error;
> +
> +	device_set_wakeup_enable(dev, 1);

This is a userspace policy parameter.  Kernel code should not alter it.
Instead you should test device_may_wakeup.

> +	error = pci_enable_runtime_wake(pci_dev, true);
> +
> +	if (error)
> +		return -EBUSY;
> +
> +	if (pm && pm->runtime_suspend)
> +		error = pm->runtime_suspend(dev);
> +
> +	if (error)
> +		goto out;
> +
> +	error = pci_pm_suspend(dev);
> +
> +	if (error)
> +		goto resume;
> +
> +	disable_irq(pci_dev->irq);
> +	error = pci_pm_suspend_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		goto resume_noirq;
> +
> +	return 0;
> +
> +resume_noirq:
> +	disable_irq(pci_dev->irq);
> +	pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +resume:
> +	pci_pm_resume(dev);
> +out:
> +	pci_enable_runtime_wake(pci_dev, false);
> +	return error;
> +}

The goto statements and unwinding code don't match up.

> +static int pci_pm_runtime_resume(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error = 0;
> +
> +	disable_irq(pci_dev->irq);
> +	error = pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_pm_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	if (pm->runtime_resume)
> +		error = pm->runtime_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_enable_runtime_wake(pci_dev, false);
> +
> +	if (error)
> +		return error;
> +
> +	return 0;
> +}

Log an error message when something goes wrong?

> +static void pci_pm_runtime_idle(struct device *dev)
> +{
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +
> +	if (pm && pm->runtime_idle)
> +		pm->runtime_idle(dev);
> +
> +	pm_schedule_suspend(dev, 0);
> +}

This misses the point.  The whole idea of runtime_idle is to tell you 
that the device is idle and might be ready to be suspended.  If you're 
going to call pm_schedule_suspend anyway, there's no reason to invoke 
pm->runtime_idle.

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
                         ` (2 preceding siblings ...)
  2009-08-13 15:22       ` Alan Stern
@ 2009-08-13 15:22       ` Alan Stern
  2009-08-13 21:47         ` Matthew Garrett
  2009-08-13 21:47         ` Matthew Garrett
  3 siblings, 2 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-13 15:22 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Greg KH, LKML, Linux-pm mailing list,
	linux-pci, linux-usb

On Thu, 13 Aug 2009, Matthew Garrett wrote:

> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -363,6 +363,18 @@ static int hcd_pci_restore(struct device *dev)
>  	return resume_common(dev, true);
>  }
>  
> +static int hcd_pci_runtime_suspend(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct usb_hcd *hcd = pci_get_drvdata(pci_dev);
> +	struct usb_device *rhdev = hcd->self.root_hub;
> +
> +	if (!(hcd->driver->flags & HCD_RUNTIME_PM))
> +		return -EINVAL;
> +
> +	return 0;
> +}

You have to call the HCD's pci_suspend method!  Not to mention calling 
synchronize_irq and all the other stuff in hcd_pci_suspend and 
hcd_pci_suspend_noirq.

Ditto for resuming.

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
  2009-08-13 12:16       ` Oliver Neukum
  2009-08-13 12:16       ` [linux-pm] " Oliver Neukum
@ 2009-08-13 15:22       ` Alan Stern
  2009-08-13 15:22       ` Alan Stern
  3 siblings, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-13 15:22 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Thu, 13 Aug 2009, Matthew Garrett wrote:

> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -363,6 +363,18 @@ static int hcd_pci_restore(struct device *dev)
>  	return resume_common(dev, true);
>  }
>  
> +static int hcd_pci_runtime_suspend(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct usb_hcd *hcd = pci_get_drvdata(pci_dev);
> +	struct usb_device *rhdev = hcd->self.root_hub;
> +
> +	if (!(hcd->driver->flags & HCD_RUNTIME_PM))
> +		return -EINVAL;
> +
> +	return 0;
> +}

You have to call the HCD's pci_suspend method!  Not to mention calling 
synchronize_irq and all the other stuff in hcd_pci_suspend and 
hcd_pci_suspend_noirq.

Ditto for resuming.

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-09 21:13 ` Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2009-08-13  0:29   ` Matthew Garrett
@ 2009-08-13 20:56   ` Rafael J. Wysocki
  2009-08-13 21:03     ` Paul Mundt
                       ` (5 more replies)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
  5 siblings, 6 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-13 20:56 UTC (permalink / raw)
  To: Alan Stern
  Cc: Magnus Damm, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  378 +++++++++++++
 drivers/base/dd.c                  |   11 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   22 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       | 1011 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  114 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1689 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,1011 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * __pm_runtime_barrier - Cancel pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Flush all pending requests for the device from pm_wq and wait for all
+ * run-time PM operations involving the device in progress to complete.
+ *
+ * Should be called under dev->power.lock with interrupts disabled.
+ */
+static void __pm_runtime_barrier(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend, wake-up or idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+}
+
+/**
+ * pm_runtime_barrier - Flush pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Prevent the device from being suspended by incrementing its usage counter and
+ * if there's a pending resume request for the device, wake the device up.
+ * Next, make sure that all pending requests for the device have been flushed
+ * from pm_wq and wait for all run-time PM operations involving the device in
+ * progress to complete.
+ *
+ * Return value:
+ * 1, if there was a resume request pending and the device had to be woken up,
+ * 0, otherwise
+ */
+int pm_runtime_barrier(struct device *dev)
+{
+	int retval = 0;
+
+	pm_runtime_get_noresume(dev);
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		__pm_runtime_resume(dev, false);
+		retval = 1;
+	}
+
+	__pm_runtime_barrier(dev);
+
+	spin_unlock_irq(&dev->power.lock);
+	pm_runtime_put_noidle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_barrier);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM.
+ */
+void __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+	}
+
+	if (!dev->power.disable_depth++)
+		__pm_runtime_barrier(dev);
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,114 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern int pm_runtime_barrier(struct device *dev);
+extern void pm_runtime_enable(struct device *dev);
+extern void __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline int pm_runtime_barrier(struct device *dev) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void __pm_runtime_disable(struct device *dev, bool c) {}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_disable(struct device *dev)
+{
+	__pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_put_noidle(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,7 +770,14 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		pm_runtime_get_noresume(dev);
+		if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
+			/* Wake-up requested during system sleep transition. */
+			pm_runtime_put_noidle(dev);
+			error = -EBUSY;
+		} else {
+			error = device_prepare(dev, state);
+		}
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,10 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_sync(dev);
 
 	return ret;
 }
@@ -245,7 +249,9 @@ int device_attach(struct device *dev)
 			ret = 0;
 		}
 	} else {
+		pm_runtime_get_noresume(dev);
 		ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
+		pm_runtime_put_sync(dev);
 	}
 	up(&dev->sem);
 	return ret;
@@ -306,6 +312,9 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_get_noresume(dev);
+		pm_runtime_barrier(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +333,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_sync(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,378 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization, Device Probing and Removal
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If the device bus type's or driver's ->probe() or ->remove() callback runs
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
+they will fail returning -EAGAIN, because the device's usage counter is
+incremented by the core before executing ->probe() and ->remove().  Still, it
+may be desirable to suspend the device as soon as ->probe() or ->remove() has
+finished, so the PM core uses pm_runtime_idle_sync() to invoke the device bus
+type's ->runtime_idle() callback at that time.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-09 21:13 ` Rafael J. Wysocki
                     ` (4 preceding siblings ...)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
@ 2009-08-13 20:56   ` Rafael J. Wysocki
  5 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-13 20:56 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, LKML, Linux-pm mailing list

From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  378 +++++++++++++
 drivers/base/dd.c                  |   11 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   22 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       | 1011 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  114 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1689 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,1011 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * __pm_runtime_barrier - Cancel pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Flush all pending requests for the device from pm_wq and wait for all
+ * run-time PM operations involving the device in progress to complete.
+ *
+ * Should be called under dev->power.lock with interrupts disabled.
+ */
+static void __pm_runtime_barrier(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend, wake-up or idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+}
+
+/**
+ * pm_runtime_barrier - Flush pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Prevent the device from being suspended by incrementing its usage counter and
+ * if there's a pending resume request for the device, wake the device up.
+ * Next, make sure that all pending requests for the device have been flushed
+ * from pm_wq and wait for all run-time PM operations involving the device in
+ * progress to complete.
+ *
+ * Return value:
+ * 1, if there was a resume request pending and the device had to be woken up,
+ * 0, otherwise
+ */
+int pm_runtime_barrier(struct device *dev)
+{
+	int retval = 0;
+
+	pm_runtime_get_noresume(dev);
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		__pm_runtime_resume(dev, false);
+		retval = 1;
+	}
+
+	__pm_runtime_barrier(dev);
+
+	spin_unlock_irq(&dev->power.lock);
+	pm_runtime_put_noidle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_barrier);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM.
+ */
+void __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+	}
+
+	if (!dev->power.disable_depth++)
+		__pm_runtime_barrier(dev);
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,114 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern int pm_runtime_barrier(struct device *dev);
+extern void pm_runtime_enable(struct device *dev);
+extern void __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline int pm_runtime_barrier(struct device *dev) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void __pm_runtime_disable(struct device *dev, bool c) {}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_disable(struct device *dev)
+{
+	__pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_put_noidle(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,7 +770,14 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		pm_runtime_get_noresume(dev);
+		if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
+			/* Wake-up requested during system sleep transition. */
+			pm_runtime_put_noidle(dev);
+			error = -EBUSY;
+		} else {
+			error = device_prepare(dev, state);
+		}
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,10 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_sync(dev);
 
 	return ret;
 }
@@ -245,7 +249,9 @@ int device_attach(struct device *dev)
 			ret = 0;
 		}
 	} else {
+		pm_runtime_get_noresume(dev);
 		ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
+		pm_runtime_put_sync(dev);
 	}
 	up(&dev->sem);
 	return ret;
@@ -306,6 +312,9 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_get_noresume(dev);
+		pm_runtime_barrier(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +333,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_sync(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,378 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization, Device Probing and Removal
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If the device bus type's or driver's ->probe() or ->remove() callback runs
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
+they will fail returning -EAGAIN, because the device's usage counter is
+incremented by the core before executing ->probe() and ->remove().  Still, it
+may be desirable to suspend the device as soon as ->probe() or ->remove() has
+finished, so the PM core uses pm_runtime_idle_sync() to invoke the device bus
+type's ->runtime_idle() callback at that time.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
  2009-08-13 21:03     ` Paul Mundt
@ 2009-08-13 21:03     ` Paul Mundt
  2009-08-13 21:14       ` Rafael J. Wysocki
  2009-08-13 21:14       ` Rafael J. Wysocki
  2009-08-14  9:08     ` Magnus Damm
                       ` (3 subsequent siblings)
  5 siblings, 2 replies; 90+ messages in thread
From: Paul Mundt @ 2009-08-13 21:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Magnus Damm, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

On Thu, Aug 13, 2009 at 10:56:42PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
> 
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
> 
> Special thanks to Alan Stern for his help with the design and
> multiple detailed reviews of the pereceding versions of this patch
> and to Magnus Damm for testing feedback.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

If there are no more outstanding issues with this, any chance of having
this rolled in to a topic branch along with Magnus's platform bus
support? It would be nice if these could spend some time in -next,
especially as it seems there are no longer any glaring issues with the
patch series.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
@ 2009-08-13 21:03     ` Paul Mundt
  2009-08-13 21:03     ` Paul Mundt
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Paul Mundt @ 2009-08-13 21:03 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Greg KH, LKML, Linux-pm mailing list

On Thu, Aug 13, 2009 at 10:56:42PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
> 
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
> 
> Special thanks to Alan Stern for his help with the design and
> multiple detailed reviews of the pereceding versions of this patch
> and to Magnus Damm for testing feedback.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

If there are no more outstanding issues with this, any chance of having
this rolled in to a topic branch along with Magnus's platform bus
support? It would be nice if these could spend some time in -next,
especially as it seems there are no longer any glaring issues with the
patch series.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-13 21:03     ` Paul Mundt
  2009-08-13 21:14       ` Rafael J. Wysocki
@ 2009-08-13 21:14       ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-13 21:14 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Alan Stern, Magnus Damm, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

On Thursday 13 August 2009, Paul Mundt wrote:
> On Thu, Aug 13, 2009 at 10:56:42PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
> > 
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> > 
> > Special thanks to Alan Stern for his help with the design and
> > multiple detailed reviews of the pereceding versions of this patch
> > and to Magnus Damm for testing feedback.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> If there are no more outstanding issues with this, any chance of having
> this rolled in to a topic branch along with Magnus's platform bus
> support? It would be nice if these could spend some time in -next,
> especially as it seems there are no longer any glaring issues with the
> patch series.

I completely agree and I hope there are no major issues with this patch any
more.

If there are no objections, I'm going to put the patch into my linux-next
branch shortly, from where it will go to the for-linus branch.  I'm going to do
that with the Magnus' patch too.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-13 21:03     ` Paul Mundt
@ 2009-08-13 21:14       ` Rafael J. Wysocki
  2009-08-13 21:14       ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-13 21:14 UTC (permalink / raw)
  To: Paul Mundt; +Cc: Greg KH, LKML, Linux-pm mailing list

On Thursday 13 August 2009, Paul Mundt wrote:
> On Thu, Aug 13, 2009 at 10:56:42PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
> > 
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> > 
> > Special thanks to Alan Stern for his help with the design and
> > multiple detailed reviews of the pereceding versions of this patch
> > and to Magnus Damm for testing feedback.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> If there are no more outstanding issues with this, any chance of having
> this rolled in to a topic branch along with Magnus's platform bus
> support? It would be nice if these could spend some time in -next,
> especially as it seems there are no longer any glaring issues with the
> patch series.

I completely agree and I hope there are no major issues with this patch any
more.

If there are no objections, I'm going to put the patch into my linux-next
branch shortly, from where it will go to the for-linus branch.  I'm going to do
that with the Magnus' patch too.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 14:26           ` [linux-pm] " Oliver Neukum
@ 2009-08-13 21:42             ` Matthew Garrett
  2009-08-13 21:42             ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 21:42 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: linux-pm, Rafael J. Wysocki, linux-usb, linux-pci, Greg KH, LKML

On Thu, Aug 13, 2009 at 04:26:33PM +0200, Oliver Neukum wrote:
> Am Donnerstag, 13. August 2009 14:30:34 schrieb Matthew Garrett:
> > > Your earlier failures don't look promising regarding BIOSes.
> > > What do you have in mind?
> >
> > They range from pragmatic to ugly. We could blacklist all Dells, though
> > I'm trying to find out if there's a BIOS date that guarantees the system
> > is fixed. Alternatively, it's a single-line bug in the DSDT - we could
> > implement some kind of fixup in the ACPI parsing code. I find the latter
> > interesting but possibly too hideous to live :)
> 
> Is there any indication only those BIOSes are affected?

Based on what I've looked at, other BIOSes either indicate that they 
don't support this or should work properly. Real life may disagree.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 14:26           ` [linux-pm] " Oliver Neukum
  2009-08-13 21:42             ` Matthew Garrett
@ 2009-08-13 21:42             ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 21:42 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Greg KH, linux-pci, linux-usb, LKML, linux-pm

On Thu, Aug 13, 2009 at 04:26:33PM +0200, Oliver Neukum wrote:
> Am Donnerstag, 13. August 2009 14:30:34 schrieb Matthew Garrett:
> > > Your earlier failures don't look promising regarding BIOSes.
> > > What do you have in mind?
> >
> > They range from pragmatic to ugly. We could blacklist all Dells, though
> > I'm trying to find out if there's a BIOS date that guarantees the system
> > is fixed. Alternatively, it's a single-line bug in the DSDT - we could
> > implement some kind of fixup in the ACPI parsing code. I find the latter
> > interesting but possibly too hideous to live :)
> 
> Is there any indication only those BIOSes are affected?

Based on what I've looked at, other BIOSes either indicate that they 
don't support this or should work properly. Real life may disagree.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13 15:17     ` [RFC] PCI: Runtime power management Alan Stern
@ 2009-08-13 21:47       ` Matthew Garrett
  2009-08-14 12:30         ` Matthew Garrett
  2009-08-14 12:30         ` [linux-pm] " Matthew Garrett
  2009-08-13 21:47       ` Matthew Garrett
  1 sibling, 2 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 21:47 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Greg KH, LKML, Linux-pm mailing list,
	linux-pci, linux-usb

On Thu, Aug 13, 2009 at 11:17:05AM -0400, Alan Stern wrote:
> On Thu, 13 Aug 2009, Matthew Garrett wrote:
> 
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> 
> > +#ifdef CONFIG_PM_RUNTIME
> > +
> > +static int pci_pm_runtime_suspend(struct device *dev)
> > +{
> > +	struct pci_dev *pci_dev = to_pci_dev(dev);
> > +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > +	int error;
> > +
> > +	device_set_wakeup_enable(dev, 1);
> 
> This is a userspace policy parameter.  Kernel code should not alter it.
> Instead you should test device_may_wakeup.

Ugh. I'd really prefer us to assume that drivers are able to cope unless 
proven otherwise. Userspace policy makes sense where we don't have any 
idea whether something will work or not, but I'd really expect that most 
PCI drivers will either cope (in which case they'll have enabling code) 
or won't (in which case they won't). Why would we want userspace to 
influence this?

> > +	enable_irq(pci_dev->irq);
> > +resume:
> > +	pci_pm_resume(dev);
> > +out:
> > +	pci_enable_runtime_wake(pci_dev, false);
> > +	return error;
> > +}
> 
> The goto statements and unwinding code don't match up.

I'll look at that.

> > +	if (error)
> > +		return error;
> > +
> > +	return 0;
> > +}
> 
> Log an error message when something goes wrong?

Seems fair.

> > +static void pci_pm_runtime_idle(struct device *dev)
> > +{
> > +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > +
> > +	if (pm && pm->runtime_idle)
> > +		pm->runtime_idle(dev);
> > +
> > +	pm_schedule_suspend(dev, 0);
> > +}
> 
> This misses the point.  The whole idea of runtime_idle is to tell you 
> that the device is idle and might be ready to be suspended.  If you're 
> going to call pm_schedule_suspend anyway, there's no reason to invoke 
> pm->runtime_idle.

My understanding of the API was that pm_device_put() invokes 
runtime_idle if the refcount hits 0. The bus layer has no idea of the 
refcount, and calling suspend directly from the driver would defeat the 
point of the system-wide recounting.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13 15:17     ` [RFC] PCI: Runtime power management Alan Stern
  2009-08-13 21:47       ` Matthew Garrett
@ 2009-08-13 21:47       ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 21:47 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Thu, Aug 13, 2009 at 11:17:05AM -0400, Alan Stern wrote:
> On Thu, 13 Aug 2009, Matthew Garrett wrote:
> 
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> 
> > +#ifdef CONFIG_PM_RUNTIME
> > +
> > +static int pci_pm_runtime_suspend(struct device *dev)
> > +{
> > +	struct pci_dev *pci_dev = to_pci_dev(dev);
> > +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > +	int error;
> > +
> > +	device_set_wakeup_enable(dev, 1);
> 
> This is a userspace policy parameter.  Kernel code should not alter it.
> Instead you should test device_may_wakeup.

Ugh. I'd really prefer us to assume that drivers are able to cope unless 
proven otherwise. Userspace policy makes sense where we don't have any 
idea whether something will work or not, but I'd really expect that most 
PCI drivers will either cope (in which case they'll have enabling code) 
or won't (in which case they won't). Why would we want userspace to 
influence this?

> > +	enable_irq(pci_dev->irq);
> > +resume:
> > +	pci_pm_resume(dev);
> > +out:
> > +	pci_enable_runtime_wake(pci_dev, false);
> > +	return error;
> > +}
> 
> The goto statements and unwinding code don't match up.

I'll look at that.

> > +	if (error)
> > +		return error;
> > +
> > +	return 0;
> > +}
> 
> Log an error message when something goes wrong?

Seems fair.

> > +static void pci_pm_runtime_idle(struct device *dev)
> > +{
> > +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > +
> > +	if (pm && pm->runtime_idle)
> > +		pm->runtime_idle(dev);
> > +
> > +	pm_schedule_suspend(dev, 0);
> > +}
> 
> This misses the point.  The whole idea of runtime_idle is to tell you 
> that the device is idle and might be ready to be suspended.  If you're 
> going to call pm_schedule_suspend anyway, there's no reason to invoke 
> pm->runtime_idle.

My understanding of the API was that pm_device_put() invokes 
runtime_idle if the refcount hits 0. The bus layer has no idea of the 
refcount, and calling suspend directly from the driver would defeat the 
point of the system-wide recounting.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 15:22       ` Alan Stern
  2009-08-13 21:47         ` Matthew Garrett
@ 2009-08-13 21:47         ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 21:47 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Greg KH, LKML, Linux-pm mailing list,
	linux-pci, linux-usb

On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:

> You have to call the HCD's pci_suspend method!  Not to mention calling 
> synchronize_irq and all the other stuff in hcd_pci_suspend and 
> hcd_pci_suspend_noirq.

The bus level code does this, assuming that the driver-level code 
doesn't return an error.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] usb: Add support for runtime power management of the hcd
  2009-08-13 15:22       ` Alan Stern
@ 2009-08-13 21:47         ` Matthew Garrett
  2009-08-13 21:47         ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-13 21:47 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:

> You have to call the HCD's pci_suspend method!  Not to mention calling 
> synchronize_irq and all the other stuff in hcd_pci_suspend and 
> hcd_pci_suspend_noirq.

The bus level code does this, assuming that the driver-level code 
doesn't return an error.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of  I/O devices (rev. 16)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
  2009-08-13 21:03     ` Paul Mundt
  2009-08-13 21:03     ` Paul Mundt
@ 2009-08-14  9:08     ` Magnus Damm
  2009-08-14 17:19       ` Rafael J. Wysocki
  2009-08-14 17:19       ` Rafael J. Wysocki
  2009-08-14  9:08     ` Magnus Damm
                       ` (2 subsequent siblings)
  5 siblings, 2 replies; 90+ messages in thread
From: Magnus Damm @ 2009-08-14  9:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list

On Fri, Aug 14, 2009 at 5:56 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
>
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
>
> Special thanks to Alan Stern for his help with the design and
> multiple detailed reviews of the pereceding versions of this patch
> and to Magnus Damm for testing feedback.
>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

I've now tested v16 with my SuperH Mobile platform device Runtime PM
implementation. All seems fine and my patches for patches for drivers
such as i2c master, fbdev, v4l2 and uio behave as expected.

Acked-by: Magnus Damm <damm@igel.co.jp>

The patch "PM: Runtime PM v15 - Platform Device Bus Support" works
well with v16 and does not need any update.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
                       ` (2 preceding siblings ...)
  2009-08-14  9:08     ` Magnus Damm
@ 2009-08-14  9:08     ` Magnus Damm
  2009-08-14 17:25     ` [PATCH update 3x] PM: Introduce core framework for run-time PM of I/O devices (rev. 17) Rafael J. Wysocki
  2009-08-14 17:25     ` Rafael J. Wysocki
  5 siblings, 0 replies; 90+ messages in thread
From: Magnus Damm @ 2009-08-14  9:08 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Greg KH, LKML, Linux-pm mailing list

On Fri, Aug 14, 2009 at 5:56 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
>
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
>
> Special thanks to Alan Stern for his help with the design and
> multiple detailed reviews of the pereceding versions of this patch
> and to Magnus Damm for testing feedback.
>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

I've now tested v16 with my SuperH Mobile platform device Runtime PM
implementation. All seems fine and my patches for patches for drivers
such as i2c master, fbdev, v4l2 and uio behave as expected.

Acked-by: Magnus Damm <damm@igel.co.jp>

The patch "PM: Runtime PM v15 - Platform Device Bus Support" works
well with v16 and does not need any update.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-13 21:47       ` Matthew Garrett
  2009-08-14 12:30         ` Matthew Garrett
@ 2009-08-14 12:30         ` Matthew Garrett
  2009-08-14 14:43           ` Alan Stern
  2009-08-14 14:43           ` [linux-pm] " Alan Stern
  1 sibling, 2 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-14 12:30 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:

> Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> proven otherwise. Userspace policy makes sense where we don't have any 
> idea whether something will work or not, but I'd really expect that most 
> PCI drivers will either cope (in which case they'll have enabling code) 
> or won't (in which case they won't). Why would we want userspace to 
> influence this?

Though, thinking about it, you're right that setting this does override 
user policy. I think we need an additional flag to indicate that the 
device supports runtime wakeup and test that as well when doing 
device_may_wakeup().

> > This misses the point.  The whole idea of runtime_idle is to tell you 
> > that the device is idle and might be ready to be suspended.  If you're 
> > going to call pm_schedule_suspend anyway, there's no reason to invoke 
> > pm->runtime_idle.
> 
> My understanding of the API was that pm_device_put() invokes 
> runtime_idle if the refcount hits 0. The bus layer has no idea of the 
> refcount, and calling suspend directly from the driver would defeat the 
> point of the system-wide recounting.

>From the API docs:

"The action performed by a bus type's ->runtime_idle() callback is 
totally dependent on the bus type in question, but the expected and 
recommended action is to check if the device can be suspended (i.e. if 
all of the conditions necessary for suspending the device are satisfied) 
and to queue up a suspend request for the device in that case."

Though perhaps the device level runtime_idle shouldn't be void - that 
way the bus can ask the driver whether its suspend conditions have been 
satisfied? Right now there doesn't seem to be any way for the bus to ask 
that.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13 21:47       ` Matthew Garrett
@ 2009-08-14 12:30         ` Matthew Garrett
  2009-08-14 12:30         ` [linux-pm] " Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-14 12:30 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pci, Linux-pm mailing list, linux-usb, Greg KH, LKML

On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:

> Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> proven otherwise. Userspace policy makes sense where we don't have any 
> idea whether something will work or not, but I'd really expect that most 
> PCI drivers will either cope (in which case they'll have enabling code) 
> or won't (in which case they won't). Why would we want userspace to 
> influence this?

Though, thinking about it, you're right that setting this does override 
user policy. I think we need an additional flag to indicate that the 
device supports runtime wakeup and test that as well when doing 
device_may_wakeup().

> > This misses the point.  The whole idea of runtime_idle is to tell you 
> > that the device is idle and might be ready to be suspended.  If you're 
> > going to call pm_schedule_suspend anyway, there's no reason to invoke 
> > pm->runtime_idle.
> 
> My understanding of the API was that pm_device_put() invokes 
> runtime_idle if the refcount hits 0. The bus layer has no idea of the 
> refcount, and calling suspend directly from the driver would defeat the 
> point of the system-wide recounting.

>From the API docs:

"The action performed by a bus type's ->runtime_idle() callback is 
totally dependent on the bus type in question, but the expected and 
recommended action is to check if the device can be suspended (i.e. if 
all of the conditions necessary for suspending the device are satisfied) 
and to queue up a suspend request for the device in that case."

Though perhaps the device level runtime_idle shouldn't be void - that 
way the bus can ask the driver whether its suspend conditions have been 
satisfied? Right now there doesn't seem to be any way for the bus to ask 
that.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-14 12:30         ` [linux-pm] " Matthew Garrett
  2009-08-14 14:43           ` Alan Stern
@ 2009-08-14 14:43           ` Alan Stern
  2009-08-14 17:05             ` Rafael J. Wysocki
                               ` (3 more replies)
  1 sibling, 4 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-14 14:43 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Thu, 13 Aug 2009, Matthew Garrett wrote:

> On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
>
> > You have to call the HCD's pci_suspend method!  Not to mention calling
> > synchronize_irq and all the other stuff in hcd_pci_suspend and
> > hcd_pci_suspend_noirq.
>
> The bus level code does this, assuming that the driver-level code
> doesn't return an error.

So it does; my mistake.


On Fri, 14 Aug 2009, Matthew Garrett wrote:

> On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:
> 
> > Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> > proven otherwise. Userspace policy makes sense where we don't have any 
> > idea whether something will work or not, but I'd really expect that most 
> > PCI drivers will either cope (in which case they'll have enabling code) 
> > or won't (in which case they won't). Why would we want userspace to 
> > influence this?
> 
> Though, thinking about it, you're right that setting this does override 
> user policy. I think we need an additional flag to indicate that the 
> device supports runtime wakeup and test that as well when doing 
> device_may_wakeup().

You are suggesting separate flag sets for system-wide wakeup and
runtime wakeup?  I don't disagree, but implementing them will be
problematical.

That's because it's not always possible to change a device's wakeup 
setting while it is suspended.  Thus if a device was runtime suspended 
with wakeup enabled, and then we want to do a system sleep and change 
the device's wakeup setting to disabled, we would have to wake the 
device back up in order to do it.


> > > This misses the point.  The whole idea of runtime_idle is to tell you 
> > > that the device is idle and might be ready to be suspended.  If you're 
> > > going to call pm_schedule_suspend anyway, there's no reason to invoke 
> > > pm->runtime_idle.
> > 
> > My understanding of the API was that pm_device_put() invokes 
> > runtime_idle if the refcount hits 0. The bus layer has no idea of the 
> > refcount, and calling suspend directly from the driver would defeat the 
> > point of the system-wide recounting.
> 
> From the API docs:
> 
> "The action performed by a bus type's ->runtime_idle() callback is 
> totally dependent on the bus type in question, but the expected and 
> recommended action is to check if the device can be suspended (i.e. if 
> all of the conditions necessary for suspending the device are satisfied) 
> and to queue up a suspend request for the device in that case."
> 
> Though perhaps the device level runtime_idle shouldn't be void - that 
> way the bus can ask the driver whether its suspend conditions have been 
> satisfied? Right now there doesn't seem to be any way for the bus to ask 
> that.

If you want to get the device-level runtime_idle involved, you can make
_it_ responsible for scheduling the suspend.  Then the bus-level code
simply has to check whether everything is okay at the bus level, and if
it is, call the device-level routine.

However changing the return type wouldn't hurt anything, and it would 
allow the pm_schedule_suspend call to be centralized in the bus code.  
You could ask Rafael about it, or just send him a patch.

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 12:30         ` [linux-pm] " Matthew Garrett
@ 2009-08-14 14:43           ` Alan Stern
  2009-08-14 14:43           ` [linux-pm] " Alan Stern
  1 sibling, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-14 14:43 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-pci, Linux-pm mailing list, linux-usb, Greg KH, LKML

On Thu, 13 Aug 2009, Matthew Garrett wrote:

> On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
>
> > You have to call the HCD's pci_suspend method!  Not to mention calling
> > synchronize_irq and all the other stuff in hcd_pci_suspend and
> > hcd_pci_suspend_noirq.
>
> The bus level code does this, assuming that the driver-level code
> doesn't return an error.

So it does; my mistake.


On Fri, 14 Aug 2009, Matthew Garrett wrote:

> On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:
> 
> > Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> > proven otherwise. Userspace policy makes sense where we don't have any 
> > idea whether something will work or not, but I'd really expect that most 
> > PCI drivers will either cope (in which case they'll have enabling code) 
> > or won't (in which case they won't). Why would we want userspace to 
> > influence this?
> 
> Though, thinking about it, you're right that setting this does override 
> user policy. I think we need an additional flag to indicate that the 
> device supports runtime wakeup and test that as well when doing 
> device_may_wakeup().

You are suggesting separate flag sets for system-wide wakeup and
runtime wakeup?  I don't disagree, but implementing them will be
problematical.

That's because it's not always possible to change a device's wakeup 
setting while it is suspended.  Thus if a device was runtime suspended 
with wakeup enabled, and then we want to do a system sleep and change 
the device's wakeup setting to disabled, we would have to wake the 
device back up in order to do it.


> > > This misses the point.  The whole idea of runtime_idle is to tell you 
> > > that the device is idle and might be ready to be suspended.  If you're 
> > > going to call pm_schedule_suspend anyway, there's no reason to invoke 
> > > pm->runtime_idle.
> > 
> > My understanding of the API was that pm_device_put() invokes 
> > runtime_idle if the refcount hits 0. The bus layer has no idea of the 
> > refcount, and calling suspend directly from the driver would defeat the 
> > point of the system-wide recounting.
> 
> From the API docs:
> 
> "The action performed by a bus type's ->runtime_idle() callback is 
> totally dependent on the bus type in question, but the expected and 
> recommended action is to check if the device can be suspended (i.e. if 
> all of the conditions necessary for suspending the device are satisfied) 
> and to queue up a suspend request for the device in that case."
> 
> Though perhaps the device level runtime_idle shouldn't be void - that 
> way the bus can ask the driver whether its suspend conditions have been 
> satisfied? Right now there doesn't seem to be any way for the bus to ask 
> that.

If you want to get the device-level runtime_idle involved, you can make
_it_ responsible for scheduling the suspend.  Then the bus-level code
simply has to check whether everything is okay at the bus level, and if
it is, call the device-level routine.

However changing the return type wouldn't hurt anything, and it would 
allow the pm_schedule_suspend call to be centralized in the bus code.  
You could ask Rafael about it, or just send him a patch.

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-14 14:43           ` [linux-pm] " Alan Stern
  2009-08-14 17:05             ` Rafael J. Wysocki
@ 2009-08-14 17:05             ` Rafael J. Wysocki
  2009-08-14 17:13               ` Rafael J. Wysocki
  2009-08-14 17:13               ` [linux-pm] " Rafael J. Wysocki
  2009-08-14 20:05             ` [linux-pm] " Rafael J. Wysocki
  2009-08-14 20:05             ` Rafael J. Wysocki
  3 siblings, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Friday 14 August 2009, Alan Stern wrote:
> On Thu, 13 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
> >
> > > You have to call the HCD's pci_suspend method!  Not to mention calling
> > > synchronize_irq and all the other stuff in hcd_pci_suspend and
> > > hcd_pci_suspend_noirq.
> >
> > The bus level code does this, assuming that the driver-level code
> > doesn't return an error.
> 
> So it does; my mistake.
> 
> 
> On Fri, 14 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:
> > 
> > > Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> > > proven otherwise. Userspace policy makes sense where we don't have any 
> > > idea whether something will work or not, but I'd really expect that most 
> > > PCI drivers will either cope (in which case they'll have enabling code) 
> > > or won't (in which case they won't). Why would we want userspace to 
> > > influence this?
> > 
> > Though, thinking about it, you're right that setting this does override 
> > user policy. I think we need an additional flag to indicate that the 
> > device supports runtime wakeup and test that as well when doing 
> > device_may_wakeup().
> 
> You are suggesting separate flag sets for system-wide wakeup and
> runtime wakeup?  I don't disagree, but implementing them will be
> problematical.
> 
> That's because it's not always possible to change a device's wakeup 
> setting while it is suspended.  Thus if a device was runtime suspended 
> with wakeup enabled, and then we want to do a system sleep and change 
> the device's wakeup setting to disabled, we would have to wake the 
> device back up in order to do it.
> 
> 
> > > > This misses the point.  The whole idea of runtime_idle is to tell you 
> > > > that the device is idle and might be ready to be suspended.  If you're 
> > > > going to call pm_schedule_suspend anyway, there's no reason to invoke 
> > > > pm->runtime_idle.
> > > 
> > > My understanding of the API was that pm_device_put() invokes 
> > > runtime_idle if the refcount hits 0. The bus layer has no idea of the 
> > > refcount, and calling suspend directly from the driver would defeat the 
> > > point of the system-wide recounting.
> > 
> > From the API docs:
> > 
> > "The action performed by a bus type's ->runtime_idle() callback is 
> > totally dependent on the bus type in question, but the expected and 
> > recommended action is to check if the device can be suspended (i.e. if 
> > all of the conditions necessary for suspending the device are satisfied) 
> > and to queue up a suspend request for the device in that case."
> > 
> > Though perhaps the device level runtime_idle shouldn't be void - that 
> > way the bus can ask the driver whether its suspend conditions have been 
> > satisfied? Right now there doesn't seem to be any way for the bus to ask 
> > that.
> 
> If you want to get the device-level runtime_idle involved, you can make
> _it_ responsible for scheduling the suspend.  Then the bus-level code
> simply has to check whether everything is okay at the bus level, and if
> it is, call the device-level routine.
> 
> However changing the return type wouldn't hurt anything, and it would 
> allow the pm_schedule_suspend call to be centralized in the bus code.  
> You could ask Rafael about it, or just send him a patch.

Well, I'm not against that, but what should pm_runtime_idle() do with the
result returned by it?  Just pass it to the caller?

Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 14:43           ` [linux-pm] " Alan Stern
@ 2009-08-14 17:05             ` Rafael J. Wysocki
  2009-08-14 17:05             ` [linux-pm] " Rafael J. Wysocki
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Friday 14 August 2009, Alan Stern wrote:
> On Thu, 13 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
> >
> > > You have to call the HCD's pci_suspend method!  Not to mention calling
> > > synchronize_irq and all the other stuff in hcd_pci_suspend and
> > > hcd_pci_suspend_noirq.
> >
> > The bus level code does this, assuming that the driver-level code
> > doesn't return an error.
> 
> So it does; my mistake.
> 
> 
> On Fri, 14 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:
> > 
> > > Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> > > proven otherwise. Userspace policy makes sense where we don't have any 
> > > idea whether something will work or not, but I'd really expect that most 
> > > PCI drivers will either cope (in which case they'll have enabling code) 
> > > or won't (in which case they won't). Why would we want userspace to 
> > > influence this?
> > 
> > Though, thinking about it, you're right that setting this does override 
> > user policy. I think we need an additional flag to indicate that the 
> > device supports runtime wakeup and test that as well when doing 
> > device_may_wakeup().
> 
> You are suggesting separate flag sets for system-wide wakeup and
> runtime wakeup?  I don't disagree, but implementing them will be
> problematical.
> 
> That's because it's not always possible to change a device's wakeup 
> setting while it is suspended.  Thus if a device was runtime suspended 
> with wakeup enabled, and then we want to do a system sleep and change 
> the device's wakeup setting to disabled, we would have to wake the 
> device back up in order to do it.
> 
> 
> > > > This misses the point.  The whole idea of runtime_idle is to tell you 
> > > > that the device is idle and might be ready to be suspended.  If you're 
> > > > going to call pm_schedule_suspend anyway, there's no reason to invoke 
> > > > pm->runtime_idle.
> > > 
> > > My understanding of the API was that pm_device_put() invokes 
> > > runtime_idle if the refcount hits 0. The bus layer has no idea of the 
> > > refcount, and calling suspend directly from the driver would defeat the 
> > > point of the system-wide recounting.
> > 
> > From the API docs:
> > 
> > "The action performed by a bus type's ->runtime_idle() callback is 
> > totally dependent on the bus type in question, but the expected and 
> > recommended action is to check if the device can be suspended (i.e. if 
> > all of the conditions necessary for suspending the device are satisfied) 
> > and to queue up a suspend request for the device in that case."
> > 
> > Though perhaps the device level runtime_idle shouldn't be void - that 
> > way the bus can ask the driver whether its suspend conditions have been 
> > satisfied? Right now there doesn't seem to be any way for the bus to ask 
> > that.
> 
> If you want to get the device-level runtime_idle involved, you can make
> _it_ responsible for scheduling the suspend.  Then the bus-level code
> simply has to check whether everything is okay at the bus level, and if
> it is, call the device-level routine.
> 
> However changing the return type wouldn't hurt anything, and it would 
> allow the pm_schedule_suspend call to be centralized in the bus code.  
> You could ask Rafael about it, or just send him a patch.

Well, I'm not against that, but what should pm_runtime_idle() do with the
result returned by it?  Just pass it to the caller?

Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-14 17:05             ` [linux-pm] " Rafael J. Wysocki
  2009-08-14 17:13               ` Rafael J. Wysocki
@ 2009-08-14 17:13               ` Rafael J. Wysocki
  2009-08-14 19:01                 ` Alan Stern
  2009-08-14 19:01                 ` Alan Stern
  1 sibling, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:13 UTC (permalink / raw)
  To: Alan Stern
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Friday 14 August 2009, Rafael J. Wysocki wrote:
> On Friday 14 August 2009, Alan Stern wrote:
> > On Thu, 13 Aug 2009, Matthew Garrett wrote:
> > 
> > > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
...
> > > Though perhaps the device level runtime_idle shouldn't be void - that 
> > > way the bus can ask the driver whether its suspend conditions have been 
> > > satisfied? Right now there doesn't seem to be any way for the bus to ask 
> > > that.
> > 
> > If you want to get the device-level runtime_idle involved, you can make
> > _it_ responsible for scheduling the suspend.  Then the bus-level code
> > simply has to check whether everything is okay at the bus level, and if
> > it is, call the device-level routine.
> > 
> > However changing the return type wouldn't hurt anything, and it would 
> > allow the pm_schedule_suspend call to be centralized in the bus code.  
> > You could ask Rafael about it, or just send him a patch.
> 
> Well, I'm not against that, but what should pm_runtime_idle() do with the
> result returned by it?  Just pass it to the caller?

Hm, perhaps its better to ignore it, though.

Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 17:05             ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-14 17:13               ` Rafael J. Wysocki
  2009-08-14 17:13               ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:13 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Friday 14 August 2009, Rafael J. Wysocki wrote:
> On Friday 14 August 2009, Alan Stern wrote:
> > On Thu, 13 Aug 2009, Matthew Garrett wrote:
> > 
> > > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
...
> > > Though perhaps the device level runtime_idle shouldn't be void - that 
> > > way the bus can ask the driver whether its suspend conditions have been 
> > > satisfied? Right now there doesn't seem to be any way for the bus to ask 
> > > that.
> > 
> > If you want to get the device-level runtime_idle involved, you can make
> > _it_ responsible for scheduling the suspend.  Then the bus-level code
> > simply has to check whether everything is okay at the bus level, and if
> > it is, call the device-level routine.
> > 
> > However changing the return type wouldn't hurt anything, and it would 
> > allow the pm_schedule_suspend call to be centralized in the bus code.  
> > You could ask Rafael about it, or just send him a patch.
> 
> Well, I'm not against that, but what should pm_runtime_idle() do with the
> result returned by it?  Just pass it to the caller?

Hm, perhaps its better to ignore it, though.

Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-14  9:08     ` Magnus Damm
  2009-08-14 17:19       ` Rafael J. Wysocki
@ 2009-08-14 17:19       ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:19 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Alan Stern, Greg KH, Pavel Machek, Len Brown, LKML,
	Linux-pm mailing list, Matthew Garrett

On Friday 14 August 2009, Magnus Damm wrote:
> On Fri, Aug 14, 2009 at 5:56 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
> >
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> >
> > Special thanks to Alan Stern for his help with the design and
> > multiple detailed reviews of the pereceding versions of this patch
> > and to Magnus Damm for testing feedback.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> I've now tested v16 with my SuperH Mobile platform device Runtime PM
> implementation. All seems fine and my patches for patches for drivers
> such as i2c master, fbdev, v4l2 and uio behave as expected.
> 
> Acked-by: Magnus Damm <damm@igel.co.jp>
> 
> The patch "PM: Runtime PM v15 - Platform Device Bus Support" works
> well with v16 and does not need any update.

Thanks for the ACK, but I have one more update to the patch.  Namely, I've
decided to change the return value of the ->runtime_idle() callback to int
as a result of the recent discussion on the PCI runtime PM patch from Matthew.

I'll post the updated patch in a little while.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
  2009-08-14  9:08     ` Magnus Damm
@ 2009-08-14 17:19       ` Rafael J. Wysocki
  2009-08-14 17:19       ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:19 UTC (permalink / raw)
  To: Magnus Damm; +Cc: Greg KH, LKML, Linux-pm mailing list

On Friday 14 August 2009, Magnus Damm wrote:
> On Fri, Aug 14, 2009 at 5:56 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 16)
> >
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> >
> > Special thanks to Alan Stern for his help with the design and
> > multiple detailed reviews of the pereceding versions of this patch
> > and to Magnus Damm for testing feedback.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> I've now tested v16 with my SuperH Mobile platform device Runtime PM
> implementation. All seems fine and my patches for patches for drivers
> such as i2c master, fbdev, v4l2 and uio behave as expected.
> 
> Acked-by: Magnus Damm <damm@igel.co.jp>
> 
> The patch "PM: Runtime PM v15 - Platform Device Bus Support" works
> well with v16 and does not need any update.

Thanks for the ACK, but I have one more update to the patch.  Namely, I've
decided to change the return value of the ->runtime_idle() callback to int
as a result of the recent discussion on the PCI runtime PM patch from Matthew.

I'll post the updated patch in a little while.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH update 3x] PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
                       ` (3 preceding siblings ...)
  2009-08-14  9:08     ` Magnus Damm
@ 2009-08-14 17:25     ` Rafael J. Wysocki
  2009-08-14 17:25     ` Rafael J. Wysocki
  5 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:25 UTC (permalink / raw)
  To: Linux-pm mailing list
  Cc: Alan Stern, Magnus Damm, Greg KH, Pavel Machek, Len Brown, LKML,
	Matthew Garrett, Paul Mundt

Hi,

The only difference between this one and rev. 16 is that now the runtime_idle()
callback is defined to return int, as suggested by Matthew in the
"PCI: Runtime power management" thread.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 17)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Magnus Damm <damm@igel.co.jp>
---
 Documentation/power/runtime_pm.txt |  378 +++++++++++++
 drivers/base/dd.c                  |   11 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   22 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       | 1011 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  114 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1689 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.  The return value is ignored by the PM core.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	int (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,1011 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * __pm_runtime_barrier - Cancel pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Flush all pending requests for the device from pm_wq and wait for all
+ * run-time PM operations involving the device in progress to complete.
+ *
+ * Should be called under dev->power.lock with interrupts disabled.
+ */
+static void __pm_runtime_barrier(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend, wake-up or idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+}
+
+/**
+ * pm_runtime_barrier - Flush pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Prevent the device from being suspended by incrementing its usage counter and
+ * if there's a pending resume request for the device, wake the device up.
+ * Next, make sure that all pending requests for the device have been flushed
+ * from pm_wq and wait for all run-time PM operations involving the device in
+ * progress to complete.
+ *
+ * Return value:
+ * 1, if there was a resume request pending and the device had to be woken up,
+ * 0, otherwise
+ */
+int pm_runtime_barrier(struct device *dev)
+{
+	int retval = 0;
+
+	pm_runtime_get_noresume(dev);
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		__pm_runtime_resume(dev, false);
+		retval = 1;
+	}
+
+	__pm_runtime_barrier(dev);
+
+	spin_unlock_irq(&dev->power.lock);
+	pm_runtime_put_noidle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_barrier);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM.
+ */
+void __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+	}
+
+	if (!dev->power.disable_depth++)
+		__pm_runtime_barrier(dev);
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,114 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern int pm_runtime_barrier(struct device *dev);
+extern void pm_runtime_enable(struct device *dev);
+extern void __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline int pm_runtime_barrier(struct device *dev) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void __pm_runtime_disable(struct device *dev, bool c) {}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_disable(struct device *dev)
+{
+	__pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_put_noidle(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,7 +770,14 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		pm_runtime_get_noresume(dev);
+		if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
+			/* Wake-up requested during system sleep transition. */
+			pm_runtime_put_noidle(dev);
+			error = -EBUSY;
+		} else {
+			error = device_prepare(dev, state);
+		}
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,10 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_sync(dev);
 
 	return ret;
 }
@@ -245,7 +249,9 @@ int device_attach(struct device *dev)
 			ret = 0;
 		}
 	} else {
+		pm_runtime_get_noresume(dev);
 		ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
+		pm_runtime_put_sync(dev);
 	}
 	up(&dev->sem);
 	return ret;
@@ -306,6 +312,9 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_get_noresume(dev);
+		pm_runtime_barrier(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +333,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_sync(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,378 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization, Device Probing and Removal
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If the device bus type's or driver's ->probe() or ->remove() callback runs
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
+they will fail returning -EAGAIN, because the device's usage counter is
+incremented by the core before executing ->probe() and ->remove().  Still, it
+may be desirable to suspend the device as soon as ->probe() or ->remove() has
+finished, so the PM core uses pm_runtime_idle_sync() to invoke the device bus
+type's ->runtime_idle() callback at that time.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH update 3x] PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
  2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
                       ` (4 preceding siblings ...)
  2009-08-14 17:25     ` [PATCH update 3x] PM: Introduce core framework for run-time PM of I/O devices (rev. 17) Rafael J. Wysocki
@ 2009-08-14 17:25     ` Rafael J. Wysocki
  5 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 17:25 UTC (permalink / raw)
  To: Linux-pm mailing list; +Cc: Greg KH, LKML

Hi,

The only difference between this one and rev. 16 is that now the runtime_idle()
callback is defined to return int, as suggested by Matthew in the
"PCI: Runtime power management" thread.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 17)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Magnus Damm <damm@igel.co.jp>
---
 Documentation/power/runtime_pm.txt |  378 +++++++++++++
 drivers/base/dd.c                  |   11 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   22 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       | 1011 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  114 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1689 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.  The return value is ignored by the PM core.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	int (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,1011 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * __pm_runtime_barrier - Cancel pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Flush all pending requests for the device from pm_wq and wait for all
+ * run-time PM operations involving the device in progress to complete.
+ *
+ * Should be called under dev->power.lock with interrupts disabled.
+ */
+static void __pm_runtime_barrier(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend, wake-up or idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+}
+
+/**
+ * pm_runtime_barrier - Flush pending requests and wait for completions.
+ * @dev: Device to handle.
+ *
+ * Prevent the device from being suspended by incrementing its usage counter and
+ * if there's a pending resume request for the device, wake the device up.
+ * Next, make sure that all pending requests for the device have been flushed
+ * from pm_wq and wait for all run-time PM operations involving the device in
+ * progress to complete.
+ *
+ * Return value:
+ * 1, if there was a resume request pending and the device had to be woken up,
+ * 0, otherwise
+ */
+int pm_runtime_barrier(struct device *dev)
+{
+	int retval = 0;
+
+	pm_runtime_get_noresume(dev);
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		__pm_runtime_resume(dev, false);
+		retval = 1;
+	}
+
+	__pm_runtime_barrier(dev);
+
+	spin_unlock_irq(&dev->power.lock);
+	pm_runtime_put_noidle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_barrier);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM.
+ */
+void __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+	}
+
+	if (!dev->power.disable_depth++)
+		__pm_runtime_barrier(dev);
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,114 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern int pm_runtime_barrier(struct device *dev);
+extern void pm_runtime_enable(struct device *dev);
+extern void __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline int pm_runtime_barrier(struct device *dev) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void __pm_runtime_disable(struct device *dev, bool c) {}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_disable(struct device *dev)
+{
+	__pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_put_noidle(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,7 +770,14 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		pm_runtime_get_noresume(dev);
+		if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
+			/* Wake-up requested during system sleep transition. */
+			pm_runtime_put_noidle(dev);
+			error = -EBUSY;
+		} else {
+			error = device_prepare(dev, state);
+		}
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,10 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_sync(dev);
 
 	return ret;
 }
@@ -245,7 +249,9 @@ int device_attach(struct device *dev)
 			ret = 0;
 		}
 	} else {
+		pm_runtime_get_noresume(dev);
 		ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
+		pm_runtime_put_sync(dev);
 	}
 	up(&dev->sem);
 	return ret;
@@ -306,6 +312,9 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_get_noresume(dev);
+		pm_runtime_barrier(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +333,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_sync(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,378 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization, Device Probing and Removal
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If the device bus type's or driver's ->probe() or ->remove() callback runs
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
+they will fail returning -EAGAIN, because the device's usage counter is
+incremented by the core before executing ->probe() and ->remove().  Still, it
+may be desirable to suspend the device as soon as ->probe() or ->remove() has
+finished, so the PM core uses pm_runtime_idle_sync() to invoke the device bus
+type's ->runtime_idle() callback at that time.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13  0:29   ` Matthew Garrett
                       ` (4 preceding siblings ...)
  2009-08-14 17:37     ` Jesse Barnes
@ 2009-08-14 17:37     ` Jesse Barnes
  2009-08-14 19:15       ` Rafael J. Wysocki
  2009-08-14 19:15       ` Rafael J. Wysocki
  2009-08-14 21:22     ` Rafael J. Wysocki
  2009-08-14 21:22     ` Rafael J. Wysocki
  7 siblings, 2 replies; 90+ messages in thread
From: Jesse Barnes @ 2009-08-14 17:37 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Alan Stern, Greg KH, LKML,
	Linux-pm mailing list, linux-pci, linux-usb

On Thu, 13 Aug 2009 01:29:25 +0100
Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> I got a fixed BIOS from Dell and have been able to get this working
> now. It seems entirely happy with USB, but I'd like some sanity
> checks on whether I'm doing this correctly. There's certainly a
> couple of quirks related to setting the ACPI GPE type that would need
> a little bit of work in the ACPI layer, and it breaks ACPI-mediated
> PCI hotplug though that's easy enough to fix by just calling into the
> hotplug code from the core notifier.
> 
> This patch builds on top of Rafael's work on systemwide runtime power
> management. It supports suspending and resuming PCI devices at
> runtime, enabling platform wakeup events that allow the devices to
> automatically resume when appropriate. It currently requires platform
> support, but PCIe setups could be supported natively once native PCIe
> PME code has been added to the kernel.

PCI bits look pretty good to me, though Rafael should take a look too.
Card readers and firewire could benefit from similar treatment, maybe
that would get us to .5W territory on some machines.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13  0:29   ` Matthew Garrett
                       ` (3 preceding siblings ...)
  2009-08-13 15:17     ` Alan Stern
@ 2009-08-14 17:37     ` Jesse Barnes
  2009-08-14 17:37     ` Jesse Barnes
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Jesse Barnes @ 2009-08-14 17:37 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Thu, 13 Aug 2009 01:29:25 +0100
Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> I got a fixed BIOS from Dell and have been able to get this working
> now. It seems entirely happy with USB, but I'd like some sanity
> checks on whether I'm doing this correctly. There's certainly a
> couple of quirks related to setting the ACPI GPE type that would need
> a little bit of work in the ACPI layer, and it breaks ACPI-mediated
> PCI hotplug though that's easy enough to fix by just calling into the
> hotplug code from the core notifier.
> 
> This patch builds on top of Rafael's work on systemwide runtime power
> management. It supports suspending and resuming PCI devices at
> runtime, enabling platform wakeup events that allow the devices to
> automatically resume when appropriate. It currently requires platform
> support, but PCIe setups could be supported natively once native PCIe
> PME code has been added to the kernel.

PCI bits look pretty good to me, though Rafael should take a look too.
Card readers and firewire could benefit from similar treatment, maybe
that would get us to .5W territory on some machines.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-14 17:13               ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-14 19:01                 ` Alan Stern
  2009-08-14 19:01                 ` Alan Stern
  1 sibling, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-14 19:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Fri, 14 Aug 2009, Rafael J. Wysocki wrote:

> On Friday 14 August 2009, Rafael J. Wysocki wrote:
> > On Friday 14 August 2009, Alan Stern wrote:
> > > On Thu, 13 Aug 2009, Matthew Garrett wrote:
> > > 
> > > > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
> ...
> > > > Though perhaps the device level runtime_idle shouldn't be void - that 
> > > > way the bus can ask the driver whether its suspend conditions have been 
> > > > satisfied? Right now there doesn't seem to be any way for the bus to ask 
> > > > that.
> > > 
> > > If you want to get the device-level runtime_idle involved, you can make
> > > _it_ responsible for scheduling the suspend.  Then the bus-level code
> > > simply has to check whether everything is okay at the bus level, and if
> > > it is, call the device-level routine.
> > > 
> > > However changing the return type wouldn't hurt anything, and it would 
> > > allow the pm_schedule_suspend call to be centralized in the bus code.  
> > > You could ask Rafael about it, or just send him a patch.
> > 
> > Well, I'm not against that, but what should pm_runtime_idle() do with the
> > result returned by it?  Just pass it to the caller?
> 
> Hm, perhaps its better to ignore it, though.

That's what I was going to say.  The return value is intended for use 
by bus-level code when calling a driver-level routine.

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 17:13               ` [linux-pm] " Rafael J. Wysocki
  2009-08-14 19:01                 ` Alan Stern
@ 2009-08-14 19:01                 ` Alan Stern
  1 sibling, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-14 19:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Fri, 14 Aug 2009, Rafael J. Wysocki wrote:

> On Friday 14 August 2009, Rafael J. Wysocki wrote:
> > On Friday 14 August 2009, Alan Stern wrote:
> > > On Thu, 13 Aug 2009, Matthew Garrett wrote:
> > > 
> > > > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
> ...
> > > > Though perhaps the device level runtime_idle shouldn't be void - that 
> > > > way the bus can ask the driver whether its suspend conditions have been 
> > > > satisfied? Right now there doesn't seem to be any way for the bus to ask 
> > > > that.
> > > 
> > > If you want to get the device-level runtime_idle involved, you can make
> > > _it_ responsible for scheduling the suspend.  Then the bus-level code
> > > simply has to check whether everything is okay at the bus level, and if
> > > it is, call the device-level routine.
> > > 
> > > However changing the return type wouldn't hurt anything, and it would 
> > > allow the pm_schedule_suspend call to be centralized in the bus code.  
> > > You could ask Rafael about it, or just send him a patch.
> > 
> > Well, I'm not against that, but what should pm_runtime_idle() do with the
> > result returned by it?  Just pass it to the caller?
> 
> Hm, perhaps its better to ignore it, though.

That's what I was going to say.  The return value is intended for use 
by bus-level code when calling a driver-level routine.

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 17:37     ` Jesse Barnes
  2009-08-14 19:15       ` Rafael J. Wysocki
@ 2009-08-14 19:15       ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 19:15 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Matthew Garrett, Alan Stern, Greg KH, LKML,
	Linux-pm mailing list, linux-pci, linux-usb

On Friday 14 August 2009, Jesse Barnes wrote:
> On Thu, 13 Aug 2009 01:29:25 +0100
> Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> 
> > I got a fixed BIOS from Dell and have been able to get this working
> > now. It seems entirely happy with USB, but I'd like some sanity
> > checks on whether I'm doing this correctly. There's certainly a
> > couple of quirks related to setting the ACPI GPE type that would need
> > a little bit of work in the ACPI layer, and it breaks ACPI-mediated
> > PCI hotplug though that's easy enough to fix by just calling into the
> > hotplug code from the core notifier.
> > 
> > This patch builds on top of Rafael's work on systemwide runtime power
> > management. It supports suspending and resuming PCI devices at
> > runtime, enabling platform wakeup events that allow the devices to
> > automatically resume when appropriate. It currently requires platform
> > support, but PCIe setups could be supported natively once native PCIe
> > PME code has been added to the kernel.
> 
> PCI bits look pretty good to me, though Rafael should take a look too.

I'm going to do that shortly.

> Card readers and firewire could benefit from similar treatment,

As well us network adapters.

> maybe that would get us to .5W territory on some machines.

Hopefully. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 17:37     ` Jesse Barnes
@ 2009-08-14 19:15       ` Rafael J. Wysocki
  2009-08-14 19:15       ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 19:15 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Friday 14 August 2009, Jesse Barnes wrote:
> On Thu, 13 Aug 2009 01:29:25 +0100
> Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> 
> > I got a fixed BIOS from Dell and have been able to get this working
> > now. It seems entirely happy with USB, but I'd like some sanity
> > checks on whether I'm doing this correctly. There's certainly a
> > couple of quirks related to setting the ACPI GPE type that would need
> > a little bit of work in the ACPI layer, and it breaks ACPI-mediated
> > PCI hotplug though that's easy enough to fix by just calling into the
> > hotplug code from the core notifier.
> > 
> > This patch builds on top of Rafael's work on systemwide runtime power
> > management. It supports suspending and resuming PCI devices at
> > runtime, enabling platform wakeup events that allow the devices to
> > automatically resume when appropriate. It currently requires platform
> > support, but PCIe setups could be supported natively once native PCIe
> > PME code has been added to the kernel.
> 
> PCI bits look pretty good to me, though Rafael should take a look too.

I'm going to do that shortly.

> Card readers and firewire could benefit from similar treatment,

As well us network adapters.

> maybe that would get us to .5W territory on some machines.

Hopefully. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-14 14:43           ` [linux-pm] " Alan Stern
  2009-08-14 17:05             ` Rafael J. Wysocki
  2009-08-14 17:05             ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-14 20:05             ` Rafael J. Wysocki
  2009-08-14 22:21               ` Matthew Garrett
  2009-08-14 22:21               ` Matthew Garrett
  2009-08-14 20:05             ` Rafael J. Wysocki
  3 siblings, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 20:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Friday 14 August 2009, Alan Stern wrote:
> On Thu, 13 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
> >
> > > You have to call the HCD's pci_suspend method!  Not to mention calling
> > > synchronize_irq and all the other stuff in hcd_pci_suspend and
> > > hcd_pci_suspend_noirq.
> >
> > The bus level code does this, assuming that the driver-level code
> > doesn't return an error.
> 
> So it does; my mistake.
> 
> 
> On Fri, 14 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:
> > 
> > > Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> > > proven otherwise. Userspace policy makes sense where we don't have any 
> > > idea whether something will work or not, but I'd really expect that most 
> > > PCI drivers will either cope (in which case they'll have enabling code) 
> > > or won't (in which case they won't). Why would we want userspace to 
> > > influence this?
> > 
> > Though, thinking about it, you're right that setting this does override 
> > user policy. I think we need an additional flag to indicate that the 
> > device supports runtime wakeup and test that as well when doing 
> > device_may_wakeup().
> 
> You are suggesting separate flag sets for system-wide wakeup and
> runtime wakeup?  I don't disagree, but implementing them will be
> problematical.
> 
> That's because it's not always possible to change a device's wakeup 
> setting while it is suspended.  Thus if a device was runtime suspended 
> with wakeup enabled, and then we want to do a system sleep and change 
> the device's wakeup setting to disabled, we would have to wake the 
> device back up in order to do it.

Well, sometimes the user may want a device to be power managed at run time
and not to be able to wake up the system from sleep states.  For example,
I'd like the USB controller in my box to be suspended at run time whenever it's
not used, but surely I wouldn't like it to do system-wide wakeup, because it
does that when I move the mouse which is a cordless one.  Simply turning the
mouse on causes the system to wake up. :-)

So, I don't think we have a choice here.  If the user forbids the device to
wake up the system from sleep states, we have to do what he wants, even if
that means the devices has to be resumed before putting the system into a
sleep state.

Why don't we add a flag indicating whether or not the device is allowed to
be power managed at run time, something like runtime_forbidden, that the
user space will be able to set through sysfs?

If the user space does that, we can't power manage the device at run time,
so it can't do runtime wakeup as well.  Otherwise, the device is power
manageable at run time, which implies runtime wakeup if supported.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 14:43           ` [linux-pm] " Alan Stern
                               ` (2 preceding siblings ...)
  2009-08-14 20:05             ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-14 20:05             ` Rafael J. Wysocki
  3 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 20:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Friday 14 August 2009, Alan Stern wrote:
> On Thu, 13 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 11:22:44AM -0400, Alan Stern wrote:
> >
> > > You have to call the HCD's pci_suspend method!  Not to mention calling
> > > synchronize_irq and all the other stuff in hcd_pci_suspend and
> > > hcd_pci_suspend_noirq.
> >
> > The bus level code does this, assuming that the driver-level code
> > doesn't return an error.
> 
> So it does; my mistake.
> 
> 
> On Fri, 14 Aug 2009, Matthew Garrett wrote:
> 
> > On Thu, Aug 13, 2009 at 10:47:01PM +0100, Matthew Garrett wrote:
> > 
> > > Ugh. I'd really prefer us to assume that drivers are able to cope unless 
> > > proven otherwise. Userspace policy makes sense where we don't have any 
> > > idea whether something will work or not, but I'd really expect that most 
> > > PCI drivers will either cope (in which case they'll have enabling code) 
> > > or won't (in which case they won't). Why would we want userspace to 
> > > influence this?
> > 
> > Though, thinking about it, you're right that setting this does override 
> > user policy. I think we need an additional flag to indicate that the 
> > device supports runtime wakeup and test that as well when doing 
> > device_may_wakeup().
> 
> You are suggesting separate flag sets for system-wide wakeup and
> runtime wakeup?  I don't disagree, but implementing them will be
> problematical.
> 
> That's because it's not always possible to change a device's wakeup 
> setting while it is suspended.  Thus if a device was runtime suspended 
> with wakeup enabled, and then we want to do a system sleep and change 
> the device's wakeup setting to disabled, we would have to wake the 
> device back up in order to do it.

Well, sometimes the user may want a device to be power managed at run time
and not to be able to wake up the system from sleep states.  For example,
I'd like the USB controller in my box to be suspended at run time whenever it's
not used, but surely I wouldn't like it to do system-wide wakeup, because it
does that when I move the mouse which is a cordless one.  Simply turning the
mouse on causes the system to wake up. :-)

So, I don't think we have a choice here.  If the user forbids the device to
wake up the system from sleep states, we have to do what he wants, even if
that means the devices has to be resumed before putting the system into a
sleep state.

Why don't we add a flag indicating whether or not the device is allowed to
be power managed at run time, something like runtime_forbidden, that the
user space will be able to set through sysfs?

If the user space does that, we can't power manage the device at run time,
so it can't do runtime wakeup as well.  Otherwise, the device is power
manageable at run time, which implies runtime wakeup if supported.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13  0:29   ` Matthew Garrett
                       ` (5 preceding siblings ...)
  2009-08-14 17:37     ` Jesse Barnes
@ 2009-08-14 21:22     ` Rafael J. Wysocki
  2009-08-14 22:30       ` Matthew Garrett
  2009-08-14 22:30       ` Matthew Garrett
  2009-08-14 21:22     ` Rafael J. Wysocki
  7 siblings, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 21:22 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Alan Stern, Greg KH, LKML, Linux-pm mailing list, linux-pci, linux-usb

Hi,

First of all, thanks a lot for looking into this!

On Thursday 13 August 2009, Matthew Garrett wrote:
> I got a fixed BIOS from Dell and have been able to get this working now. 
> It seems entirely happy with USB, but I'd like some sanity checks on 
> whether I'm doing this correctly. There's certainly a couple of quirks 
> related to setting the ACPI GPE type that would need a little bit of 
> work in the ACPI layer, and it breaks ACPI-mediated PCI hotplug though 
> that's easy enough to fix by just calling into the hotplug code from the 
> core notifier.
> 
> This patch builds on top of Rafael's work on systemwide runtime power
> management. It supports suspending and resuming PCI devices at runtime,
> enabling platform wakeup events that allow the devices to automatically
> resume when appropriate. It currently requires platform support, but PCIe
> setups could be supported natively once native PCIe PME code has been added
> to the kernel.

Do you have any prototypes for that?  I started working on it some time ago,
but then I focused on the core runtime PM framework.

> ---
>  drivers/pci/pci-acpi.c   |   55 +++++++++++++++++++++++++
>  drivers/pci/pci-driver.c |  100 ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.c        |   87 ++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.h        |    3 +
>  include/linux/pci.h      |    3 +
>  5 files changed, 248 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index ea15b05..a98a777 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -12,6 +12,7 @@
>  #include <linux/pci.h>
>  #include <linux/module.h>
>  #include <linux/pci-aspm.h>
> +#include <linux/pm_runtime.h>
>  #include <acpi/acpi.h>
>  #include <acpi/acpi_bus.h>
>  
> @@ -120,14 +121,62 @@ static int acpi_pci_sleep_wake(struct pci_dev *dev, bool enable)
>  	return error;
>  }
>  
> +static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
> +{
> +	acpi_status status;
> +	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
> +	struct acpi_device *acpi_dev;
> +

Hm, I'd move that into ACPI as

int acp_runtime_wake_enable(acpi_handle handle, bool enable)

in which form it could also be useful to non-PCI devices.

> +	if (!handle)
> +		return -ENODEV;
> +
> +	status = acpi_bus_get_device(handle, &acpi_dev);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	if (enable) {
> +		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
> +				  acpi_dev->wakeup.gpe_number,
> +				  ACPI_GPE_TYPE_WAKE_RUN);
> +		acpi_enable_gpe(acpi_dev->wakeup.gpe_device,
> +				acpi_dev->wakeup.gpe_number);
> +	} else {
> +		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
> +				  acpi_dev->wakeup.gpe_number,
> +				  ACPI_GPE_TYPE_WAKE);
> +		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
> +				 acpi_dev->wakeup.gpe_number);
> +	}
> +	return 0;
> +}

Ah, that's the part I've always been missing!

How exactly do we figure out which GPE is a wake-up one for given device?
IOW, how are the wakeup.gpe_device and wakeup.gpe_number fields populated?

> +
> +
>  static struct pci_platform_pm_ops acpi_pci_platform_pm = {
>  	.is_manageable = acpi_pci_power_manageable,
>  	.set_state = acpi_pci_set_power_state,
>  	.choose_state = acpi_pci_choose_state,
>  	.can_wakeup = acpi_pci_can_wakeup,
>  	.sleep_wake = acpi_pci_sleep_wake,
> +	.runtime_wake = acpi_pci_runtime_wake,
>  };
>  
> +static void pci_device_notify(acpi_handle handle, u32 event, void *data)
> +{
> +	struct device *dev = data;
> +
> +	if (event == ACPI_NOTIFY_DEVICE_WAKE)
> +		pm_runtime_resume(dev);
> +}
> +
> +static void pci_root_bridge_notify(acpi_handle handle, u32 event, void *data)
> +{
> +	struct device *dev = data;
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +
> +	if (event == ACPI_NOTIFY_DEVICE_WAKE)
> +		pci_bus_pme_event(pci_dev);
> +}
> +
>  /* ACPI bus type */
>  static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
>  {
> @@ -140,6 +189,9 @@ static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
>  	*handle = acpi_get_child(DEVICE_ACPI_HANDLE(dev->parent), addr);
>  	if (!*handle)
>  		return -ENODEV;
> +
> +	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
> +				    pci_device_notify, dev);
>  	return 0;
>  }
>  
> @@ -158,6 +210,9 @@ static int acpi_pci_find_root_bridge(struct device *dev, acpi_handle *handle)
>  	*handle = acpi_get_pci_rootbridge_handle(seg, bus);
>  	if (!*handle)
>  		return -ENODEV;
> +
> +	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
> +				    pci_root_bridge_notify, dev);
>  	return 0;
>  }
>  
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d76c4c8..1f605d8 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -11,12 +11,14 @@
>  #include <linux/pci.h>
>  #include <linux/module.h>
>  #include <linux/init.h>
> +#include <linux/interrupt.h>
>  #include <linux/device.h>
>  #include <linux/mempolicy.h>
>  #include <linux/string.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/cpu.h>
> +#include <linux/pm_runtime.h>
>  #include "pci.h"
>  
>  /*
> @@ -910,6 +912,101 @@ static int pci_pm_restore(struct device *dev)
>  
>  #endif /* !CONFIG_HIBERNATION */
>  
> +#ifdef CONFIG_PM_RUNTIME
> +
> +static int pci_pm_runtime_suspend(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error;
> +
> +	device_set_wakeup_enable(dev, 1);
> +	error = pci_enable_runtime_wake(pci_dev, true);
> +
> +	if (error)
> +		return -EBUSY;
> +
> +	if (pm && pm->runtime_suspend)
> +		error = pm->runtime_suspend(dev);
> +
> +	if (error)
> +		goto out;
> +
> +	error = pci_pm_suspend(dev);

This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
->suspend() routine, which is specific to suspend to RAM.  So, this means
that drivers are supposed to implement ->runtime_suspend() only if they
want to do something _in_ _addition_ to the things done by
->suspend() and ->suspend_noirq().

> +
> +	if (error)
> +		goto resume;
> +
> +	disable_irq(pci_dev->irq);

I don't really think it's necessary to disable the interrupt here.  We prevent
drivers from receiving interrupts while pci_pm_suspend_noirq() is being run
during system-wide power transitions to protect them from receiving "alien"
interrupts they might be unable to handle, but in the runtime case I think the
driver should take care of protecting itself from that.

The generic part of pci_pm_suspend_noirq() doesn't do things that require
interrupts to be disabled.  The driver's ->suspend_noirq() may do such things
in principle, but then I'd prefer it not to be called directly from here.

IMO pci_pm_runtime_suspend() should work like a combination of
pci_pm_suspend() and pci_pm_suspend_noirq(), except that instead of executing
two driver callbacks it will only call one ->runtime_suspend() callback without
disabling the device interrupt.

If the driver has to disable its interrupt in ->runtime_suspend(), it should do
that by itself.

Of course, analogous comment apply to pci_pm_runtime_resume() below.

> +	error = pci_pm_suspend_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		goto resume_noirq;
> +
> +	return 0;
> +
> +resume_noirq:
> +	disable_irq(pci_dev->irq);
> +	pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +resume:
> +	pci_pm_resume(dev);
> +out:
> +	pci_enable_runtime_wake(pci_dev, false);
> +	return error;
> +}
> +
> +static int pci_pm_runtime_resume(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error = 0;
> +
> +	disable_irq(pci_dev->irq);
> +	error = pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_pm_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	if (pm->runtime_resume)
> +		error = pm->runtime_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_enable_runtime_wake(pci_dev, false);
> +
> +	if (error)
> +		return error;
> +
> +	return 0;
> +}
> +
> +static void pci_pm_runtime_idle(struct device *dev)
> +{
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +
> +	if (pm && pm->runtime_idle)
> +		pm->runtime_idle(dev);
> +
> +	pm_schedule_suspend(dev, 0);
> +}

That has already been discussed.

> +
> +#else /* !CONFIG_PM_RUNTIME */
> +
> +#define pci_pm_runtime_suspend	NULL
> +#define pci_pm_runtime_resume	NULL
> +#define pci_pm_runtime_idle	NULL
> +
> +#endif
> +
>  struct dev_pm_ops pci_dev_pm_ops = {
>  	.prepare = pci_pm_prepare,
>  	.complete = pci_pm_complete,
> @@ -925,6 +1022,9 @@ struct dev_pm_ops pci_dev_pm_ops = {
>  	.thaw_noirq = pci_pm_thaw_noirq,
>  	.poweroff_noirq = pci_pm_poweroff_noirq,
>  	.restore_noirq = pci_pm_restore_noirq,
> +	.runtime_suspend = pci_pm_runtime_suspend,
> +	.runtime_resume = pci_pm_runtime_resume,
> +	.runtime_idle = pci_pm_runtime_idle,
>  };
>  
>  #define PCI_PM_OPS_PTR	(&pci_dev_pm_ops)
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index dbd0f94..ab3a116 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -18,6 +18,7 @@
>  #include <linux/log2.h>
>  #include <linux/pci-aspm.h>
>  #include <linux/pm_wakeup.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/interrupt.h>
>  #include <asm/dma.h>	/* isa_dma_bridge_buggy */
>  #include <linux/device.h>
> @@ -428,6 +429,12 @@ static inline int platform_pci_sleep_wake(struct pci_dev *dev, bool enable)
>  			pci_platform_pm->sleep_wake(dev, enable) : -ENODEV;
>  }
>  
> +static inline int platform_pci_runtime_wake(struct pci_dev *dev, bool enable)
> +{
> +	return pci_platform_pm ?
> +			pci_platform_pm->runtime_wake(dev, enable) : -ENODEV;
> +}
> +
>  /**
>   * pci_raw_set_power_state - Use PCI PM registers to set the power state of
>   *                           given PCI device
> @@ -1239,6 +1246,38 @@ int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable)
>  }
>  
>  /**
> + * pci_enable_runtime_wake - enable PCI device as runtime wakeup event source
> + * @dev: PCI device affected
> + * @enable: True to enable event generation; false to disable
> + *
> + * This enables the device as a runtime wakeup event source, or disables it.
> + * This typically requires platform support.
> + *
> + * RETURN VALUE:
> + * 0 is returned on success
> + * -EINVAL is returned if device is not supposed to wake up the system
> + * -ENODEV is returned if platform cannot support runtime PM on the device
> + */
> +int pci_enable_runtime_wake(struct pci_dev *dev, bool enable)
> +{
> +	int error = 0;
> +	bool pme_done = false;
> +
> +	if (!enable && platform_pci_can_wakeup(dev))
> +		error = platform_pci_runtime_wake(dev, false);
> +
> +	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
> +		pci_pme_active(dev, enable);
> +		pme_done = true;
> +	}

I don't really follow your intention here.  The condition means that PME is
going to be enabled unless 'enable' is set and the device is not capable
of generating PMEs.  However, if 'enable' is unset, we're still going to try
to enable the PME, even if the device can't generate it.  Shouldn't that
be

if (enable && pci_pme_capable(dev, PCI_D3hot)) ?

Also, that assumes the device is going to be put into D3_hot, but do we know
that for sure?

> +
> +	if (enable && platform_pci_can_wakeup(dev))
> +		error = platform_pci_runtime_wake(dev, true);
> +
> +	return pme_done ? 0 : error;
> +}

I have no comments to the part below.

> +/**
>   * pci_wake_from_d3 - enable/disable device to wake up from D3_hot or D3_cold
>   * @dev: PCI device to prepare
>   * @enable: True to enable wake-up event generation; false to disable
> @@ -1346,6 +1385,54 @@ int pci_back_from_sleep(struct pci_dev *dev)
>  }
>  
>  /**
> + * pci_dev_pme_event - check if a device has a pending pme
> + *
> + * @dev: Device to handle.
> + */
> +
> +int pci_dev_pme_event(struct pci_dev *dev)
> +{
> +	u16 pmcsr;
> +
> +	if (!dev->pm_cap)
> +		return -ENODEV;
> +
> +	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +
> +	if (pmcsr & PCI_PM_CTRL_PME_STATUS) {
> +		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
> +		pm_runtime_get(&dev->dev);
> +		return 0;
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +/**
> + * pci_bus_pme_event - search for subordinate devices with a pending
> + *		   pme and handle them
> + *
> + * @dev: Parent device to handle
> + */
> +int pci_bus_pme_event(struct pci_dev *dev)
> +{
> +	struct pci_bus *bus;
> +	struct pci_dev *pdev;
> +
> +	if (pci_is_root_bus(dev->bus))
> +		bus = dev->bus;
> +	else if (dev->subordinate)
> +		bus = dev->subordinate;
> +	else
> +		return -ENODEV;
> +
> +	list_for_each_entry(pdev, &bus->devices, bus_list)
> +		pci_dev_pme_event(pdev);
> +
> +	return 0;
> +}
> +
> +/**
>   * pci_pm_init - Initialize PM functions of given PCI device
>   * @dev: PCI device to handle.
>   */
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index f73bcbe..a81aff2 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -34,6 +34,8 @@ extern int pci_mmap_fits(struct pci_dev *pdev, int resno,
>   *
>   * @sleep_wake: enables/disables the system wake up capability of given device
>   *
> + * @runtime_wake: enables/disables the runtime wakeup capability of given device
> + *
>   * If given platform is generally capable of power managing PCI devices, all of
>   * these callbacks are mandatory.
>   */
> @@ -43,6 +45,7 @@ struct pci_platform_pm_ops {
>  	pci_power_t (*choose_state)(struct pci_dev *dev);
>  	bool (*can_wakeup)(struct pci_dev *dev);
>  	int (*sleep_wake)(struct pci_dev *dev, bool enable);
> +	int (*runtime_wake)(struct pci_dev *dev, bool enable);
>  };
>  
>  extern int pci_set_platform_pm(struct pci_platform_pm_ops *ops);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 115fb7b..8a3fea0 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -734,10 +734,13 @@ pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
>  bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);
>  void pci_pme_active(struct pci_dev *dev, bool enable);
>  int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable);
> +int pci_enable_runtime_wake(struct pci_dev *dev, bool enable);
>  int pci_wake_from_d3(struct pci_dev *dev, bool enable);
>  pci_power_t pci_target_state(struct pci_dev *dev);
>  int pci_prepare_to_sleep(struct pci_dev *dev);
>  int pci_back_from_sleep(struct pci_dev *dev);
> +int pci_dev_pme_event(struct pci_dev *dev);
> +int pci_bus_pme_event(struct pci_dev *dev);
>  
>  /* Functions for PCI Hotplug drivers to use */
>  int pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-13  0:29   ` Matthew Garrett
                       ` (6 preceding siblings ...)
  2009-08-14 21:22     ` Rafael J. Wysocki
@ 2009-08-14 21:22     ` Rafael J. Wysocki
  7 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-14 21:22 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

Hi,

First of all, thanks a lot for looking into this!

On Thursday 13 August 2009, Matthew Garrett wrote:
> I got a fixed BIOS from Dell and have been able to get this working now. 
> It seems entirely happy with USB, but I'd like some sanity checks on 
> whether I'm doing this correctly. There's certainly a couple of quirks 
> related to setting the ACPI GPE type that would need a little bit of 
> work in the ACPI layer, and it breaks ACPI-mediated PCI hotplug though 
> that's easy enough to fix by just calling into the hotplug code from the 
> core notifier.
> 
> This patch builds on top of Rafael's work on systemwide runtime power
> management. It supports suspending and resuming PCI devices at runtime,
> enabling platform wakeup events that allow the devices to automatically
> resume when appropriate. It currently requires platform support, but PCIe
> setups could be supported natively once native PCIe PME code has been added
> to the kernel.

Do you have any prototypes for that?  I started working on it some time ago,
but then I focused on the core runtime PM framework.

> ---
>  drivers/pci/pci-acpi.c   |   55 +++++++++++++++++++++++++
>  drivers/pci/pci-driver.c |  100 ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.c        |   87 ++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.h        |    3 +
>  include/linux/pci.h      |    3 +
>  5 files changed, 248 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index ea15b05..a98a777 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -12,6 +12,7 @@
>  #include <linux/pci.h>
>  #include <linux/module.h>
>  #include <linux/pci-aspm.h>
> +#include <linux/pm_runtime.h>
>  #include <acpi/acpi.h>
>  #include <acpi/acpi_bus.h>
>  
> @@ -120,14 +121,62 @@ static int acpi_pci_sleep_wake(struct pci_dev *dev, bool enable)
>  	return error;
>  }
>  
> +static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
> +{
> +	acpi_status status;
> +	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
> +	struct acpi_device *acpi_dev;
> +

Hm, I'd move that into ACPI as

int acp_runtime_wake_enable(acpi_handle handle, bool enable)

in which form it could also be useful to non-PCI devices.

> +	if (!handle)
> +		return -ENODEV;
> +
> +	status = acpi_bus_get_device(handle, &acpi_dev);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	if (enable) {
> +		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
> +				  acpi_dev->wakeup.gpe_number,
> +				  ACPI_GPE_TYPE_WAKE_RUN);
> +		acpi_enable_gpe(acpi_dev->wakeup.gpe_device,
> +				acpi_dev->wakeup.gpe_number);
> +	} else {
> +		acpi_set_gpe_type(acpi_dev->wakeup.gpe_device,
> +				  acpi_dev->wakeup.gpe_number,
> +				  ACPI_GPE_TYPE_WAKE);
> +		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
> +				 acpi_dev->wakeup.gpe_number);
> +	}
> +	return 0;
> +}

Ah, that's the part I've always been missing!

How exactly do we figure out which GPE is a wake-up one for given device?
IOW, how are the wakeup.gpe_device and wakeup.gpe_number fields populated?

> +
> +
>  static struct pci_platform_pm_ops acpi_pci_platform_pm = {
>  	.is_manageable = acpi_pci_power_manageable,
>  	.set_state = acpi_pci_set_power_state,
>  	.choose_state = acpi_pci_choose_state,
>  	.can_wakeup = acpi_pci_can_wakeup,
>  	.sleep_wake = acpi_pci_sleep_wake,
> +	.runtime_wake = acpi_pci_runtime_wake,
>  };
>  
> +static void pci_device_notify(acpi_handle handle, u32 event, void *data)
> +{
> +	struct device *dev = data;
> +
> +	if (event == ACPI_NOTIFY_DEVICE_WAKE)
> +		pm_runtime_resume(dev);
> +}
> +
> +static void pci_root_bridge_notify(acpi_handle handle, u32 event, void *data)
> +{
> +	struct device *dev = data;
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +
> +	if (event == ACPI_NOTIFY_DEVICE_WAKE)
> +		pci_bus_pme_event(pci_dev);
> +}
> +
>  /* ACPI bus type */
>  static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
>  {
> @@ -140,6 +189,9 @@ static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
>  	*handle = acpi_get_child(DEVICE_ACPI_HANDLE(dev->parent), addr);
>  	if (!*handle)
>  		return -ENODEV;
> +
> +	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
> +				    pci_device_notify, dev);
>  	return 0;
>  }
>  
> @@ -158,6 +210,9 @@ static int acpi_pci_find_root_bridge(struct device *dev, acpi_handle *handle)
>  	*handle = acpi_get_pci_rootbridge_handle(seg, bus);
>  	if (!*handle)
>  		return -ENODEV;
> +
> +	acpi_install_notify_handler(*handle, ACPI_SYSTEM_NOTIFY,
> +				    pci_root_bridge_notify, dev);
>  	return 0;
>  }
>  
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d76c4c8..1f605d8 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -11,12 +11,14 @@
>  #include <linux/pci.h>
>  #include <linux/module.h>
>  #include <linux/init.h>
> +#include <linux/interrupt.h>
>  #include <linux/device.h>
>  #include <linux/mempolicy.h>
>  #include <linux/string.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/cpu.h>
> +#include <linux/pm_runtime.h>
>  #include "pci.h"
>  
>  /*
> @@ -910,6 +912,101 @@ static int pci_pm_restore(struct device *dev)
>  
>  #endif /* !CONFIG_HIBERNATION */
>  
> +#ifdef CONFIG_PM_RUNTIME
> +
> +static int pci_pm_runtime_suspend(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error;
> +
> +	device_set_wakeup_enable(dev, 1);
> +	error = pci_enable_runtime_wake(pci_dev, true);
> +
> +	if (error)
> +		return -EBUSY;
> +
> +	if (pm && pm->runtime_suspend)
> +		error = pm->runtime_suspend(dev);
> +
> +	if (error)
> +		goto out;
> +
> +	error = pci_pm_suspend(dev);

This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
->suspend() routine, which is specific to suspend to RAM.  So, this means
that drivers are supposed to implement ->runtime_suspend() only if they
want to do something _in_ _addition_ to the things done by
->suspend() and ->suspend_noirq().

> +
> +	if (error)
> +		goto resume;
> +
> +	disable_irq(pci_dev->irq);

I don't really think it's necessary to disable the interrupt here.  We prevent
drivers from receiving interrupts while pci_pm_suspend_noirq() is being run
during system-wide power transitions to protect them from receiving "alien"
interrupts they might be unable to handle, but in the runtime case I think the
driver should take care of protecting itself from that.

The generic part of pci_pm_suspend_noirq() doesn't do things that require
interrupts to be disabled.  The driver's ->suspend_noirq() may do such things
in principle, but then I'd prefer it not to be called directly from here.

IMO pci_pm_runtime_suspend() should work like a combination of
pci_pm_suspend() and pci_pm_suspend_noirq(), except that instead of executing
two driver callbacks it will only call one ->runtime_suspend() callback without
disabling the device interrupt.

If the driver has to disable its interrupt in ->runtime_suspend(), it should do
that by itself.

Of course, analogous comment apply to pci_pm_runtime_resume() below.

> +	error = pci_pm_suspend_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		goto resume_noirq;
> +
> +	return 0;
> +
> +resume_noirq:
> +	disable_irq(pci_dev->irq);
> +	pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +resume:
> +	pci_pm_resume(dev);
> +out:
> +	pci_enable_runtime_wake(pci_dev, false);
> +	return error;
> +}
> +
> +static int pci_pm_runtime_resume(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error = 0;
> +
> +	disable_irq(pci_dev->irq);
> +	error = pci_pm_resume_noirq(dev);
> +	enable_irq(pci_dev->irq);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_pm_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	if (pm->runtime_resume)
> +		error = pm->runtime_resume(dev);
> +
> +	if (error)
> +		return error;
> +
> +	error = pci_enable_runtime_wake(pci_dev, false);
> +
> +	if (error)
> +		return error;
> +
> +	return 0;
> +}
> +
> +static void pci_pm_runtime_idle(struct device *dev)
> +{
> +	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +
> +	if (pm && pm->runtime_idle)
> +		pm->runtime_idle(dev);
> +
> +	pm_schedule_suspend(dev, 0);
> +}

That has already been discussed.

> +
> +#else /* !CONFIG_PM_RUNTIME */
> +
> +#define pci_pm_runtime_suspend	NULL
> +#define pci_pm_runtime_resume	NULL
> +#define pci_pm_runtime_idle	NULL
> +
> +#endif
> +
>  struct dev_pm_ops pci_dev_pm_ops = {
>  	.prepare = pci_pm_prepare,
>  	.complete = pci_pm_complete,
> @@ -925,6 +1022,9 @@ struct dev_pm_ops pci_dev_pm_ops = {
>  	.thaw_noirq = pci_pm_thaw_noirq,
>  	.poweroff_noirq = pci_pm_poweroff_noirq,
>  	.restore_noirq = pci_pm_restore_noirq,
> +	.runtime_suspend = pci_pm_runtime_suspend,
> +	.runtime_resume = pci_pm_runtime_resume,
> +	.runtime_idle = pci_pm_runtime_idle,
>  };
>  
>  #define PCI_PM_OPS_PTR	(&pci_dev_pm_ops)
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index dbd0f94..ab3a116 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -18,6 +18,7 @@
>  #include <linux/log2.h>
>  #include <linux/pci-aspm.h>
>  #include <linux/pm_wakeup.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/interrupt.h>
>  #include <asm/dma.h>	/* isa_dma_bridge_buggy */
>  #include <linux/device.h>
> @@ -428,6 +429,12 @@ static inline int platform_pci_sleep_wake(struct pci_dev *dev, bool enable)
>  			pci_platform_pm->sleep_wake(dev, enable) : -ENODEV;
>  }
>  
> +static inline int platform_pci_runtime_wake(struct pci_dev *dev, bool enable)
> +{
> +	return pci_platform_pm ?
> +			pci_platform_pm->runtime_wake(dev, enable) : -ENODEV;
> +}
> +
>  /**
>   * pci_raw_set_power_state - Use PCI PM registers to set the power state of
>   *                           given PCI device
> @@ -1239,6 +1246,38 @@ int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable)
>  }
>  
>  /**
> + * pci_enable_runtime_wake - enable PCI device as runtime wakeup event source
> + * @dev: PCI device affected
> + * @enable: True to enable event generation; false to disable
> + *
> + * This enables the device as a runtime wakeup event source, or disables it.
> + * This typically requires platform support.
> + *
> + * RETURN VALUE:
> + * 0 is returned on success
> + * -EINVAL is returned if device is not supposed to wake up the system
> + * -ENODEV is returned if platform cannot support runtime PM on the device
> + */
> +int pci_enable_runtime_wake(struct pci_dev *dev, bool enable)
> +{
> +	int error = 0;
> +	bool pme_done = false;
> +
> +	if (!enable && platform_pci_can_wakeup(dev))
> +		error = platform_pci_runtime_wake(dev, false);
> +
> +	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
> +		pci_pme_active(dev, enable);
> +		pme_done = true;
> +	}

I don't really follow your intention here.  The condition means that PME is
going to be enabled unless 'enable' is set and the device is not capable
of generating PMEs.  However, if 'enable' is unset, we're still going to try
to enable the PME, even if the device can't generate it.  Shouldn't that
be

if (enable && pci_pme_capable(dev, PCI_D3hot)) ?

Also, that assumes the device is going to be put into D3_hot, but do we know
that for sure?

> +
> +	if (enable && platform_pci_can_wakeup(dev))
> +		error = platform_pci_runtime_wake(dev, true);
> +
> +	return pme_done ? 0 : error;
> +}

I have no comments to the part below.

> +/**
>   * pci_wake_from_d3 - enable/disable device to wake up from D3_hot or D3_cold
>   * @dev: PCI device to prepare
>   * @enable: True to enable wake-up event generation; false to disable
> @@ -1346,6 +1385,54 @@ int pci_back_from_sleep(struct pci_dev *dev)
>  }
>  
>  /**
> + * pci_dev_pme_event - check if a device has a pending pme
> + *
> + * @dev: Device to handle.
> + */
> +
> +int pci_dev_pme_event(struct pci_dev *dev)
> +{
> +	u16 pmcsr;
> +
> +	if (!dev->pm_cap)
> +		return -ENODEV;
> +
> +	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +
> +	if (pmcsr & PCI_PM_CTRL_PME_STATUS) {
> +		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
> +		pm_runtime_get(&dev->dev);
> +		return 0;
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +/**
> + * pci_bus_pme_event - search for subordinate devices with a pending
> + *		   pme and handle them
> + *
> + * @dev: Parent device to handle
> + */
> +int pci_bus_pme_event(struct pci_dev *dev)
> +{
> +	struct pci_bus *bus;
> +	struct pci_dev *pdev;
> +
> +	if (pci_is_root_bus(dev->bus))
> +		bus = dev->bus;
> +	else if (dev->subordinate)
> +		bus = dev->subordinate;
> +	else
> +		return -ENODEV;
> +
> +	list_for_each_entry(pdev, &bus->devices, bus_list)
> +		pci_dev_pme_event(pdev);
> +
> +	return 0;
> +}
> +
> +/**
>   * pci_pm_init - Initialize PM functions of given PCI device
>   * @dev: PCI device to handle.
>   */
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index f73bcbe..a81aff2 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -34,6 +34,8 @@ extern int pci_mmap_fits(struct pci_dev *pdev, int resno,
>   *
>   * @sleep_wake: enables/disables the system wake up capability of given device
>   *
> + * @runtime_wake: enables/disables the runtime wakeup capability of given device
> + *
>   * If given platform is generally capable of power managing PCI devices, all of
>   * these callbacks are mandatory.
>   */
> @@ -43,6 +45,7 @@ struct pci_platform_pm_ops {
>  	pci_power_t (*choose_state)(struct pci_dev *dev);
>  	bool (*can_wakeup)(struct pci_dev *dev);
>  	int (*sleep_wake)(struct pci_dev *dev, bool enable);
> +	int (*runtime_wake)(struct pci_dev *dev, bool enable);
>  };
>  
>  extern int pci_set_platform_pm(struct pci_platform_pm_ops *ops);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 115fb7b..8a3fea0 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -734,10 +734,13 @@ pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
>  bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);
>  void pci_pme_active(struct pci_dev *dev, bool enable);
>  int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable);
> +int pci_enable_runtime_wake(struct pci_dev *dev, bool enable);
>  int pci_wake_from_d3(struct pci_dev *dev, bool enable);
>  pci_power_t pci_target_state(struct pci_dev *dev);
>  int pci_prepare_to_sleep(struct pci_dev *dev);
>  int pci_back_from_sleep(struct pci_dev *dev);
> +int pci_dev_pme_event(struct pci_dev *dev);
> +int pci_bus_pme_event(struct pci_dev *dev);
>  
>  /* Functions for PCI Hotplug drivers to use */
>  int pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-14 20:05             ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-14 22:21               ` Matthew Garrett
  2009-08-15 14:18                 ` Rafael J. Wysocki
  2009-08-15 14:18                 ` [linux-pm] " Rafael J. Wysocki
  2009-08-14 22:21               ` Matthew Garrett
  1 sibling, 2 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-14 22:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Fri, Aug 14, 2009 at 10:05:19PM +0200, Rafael J. Wysocki wrote:

> Well, sometimes the user may want a device to be power managed at run time
> and not to be able to wake up the system from sleep states.  For example,
> I'd like the USB controller in my box to be suspended at run time whenever it's
> not used, but surely I wouldn't like it to do system-wide wakeup, because it
> does that when I move the mouse which is a cordless one.  Simply turning the
> mouse on causes the system to wake up. :-)

Right, so clearly my code is broken right now.

> Why don't we add a flag indicating whether or not the device is allowed to
> be power managed at run time, something like runtime_forbidden, that the
> user space will be able to set through sysfs?

I think even having a runtime_wakeup flag (which defaults to on) would 
be sufficient. If the worst case is scenario is that we have to resume 
devices in order to apply the correct policy when going into a 
systemwide suspend state, I think that's acceptable.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 20:05             ` [linux-pm] " Rafael J. Wysocki
  2009-08-14 22:21               ` Matthew Garrett
@ 2009-08-14 22:21               ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-14 22:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Fri, Aug 14, 2009 at 10:05:19PM +0200, Rafael J. Wysocki wrote:

> Well, sometimes the user may want a device to be power managed at run time
> and not to be able to wake up the system from sleep states.  For example,
> I'd like the USB controller in my box to be suspended at run time whenever it's
> not used, but surely I wouldn't like it to do system-wide wakeup, because it
> does that when I move the mouse which is a cordless one.  Simply turning the
> mouse on causes the system to wake up. :-)

Right, so clearly my code is broken right now.

> Why don't we add a flag indicating whether or not the device is allowed to
> be power managed at run time, something like runtime_forbidden, that the
> user space will be able to set through sysfs?

I think even having a runtime_wakeup flag (which defaults to on) would 
be sufficient. If the worst case is scenario is that we have to resume 
devices in order to apply the correct policy when going into a 
systemwide suspend state, I think that's acceptable.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 21:22     ` Rafael J. Wysocki
  2009-08-14 22:30       ` Matthew Garrett
@ 2009-08-14 22:30       ` Matthew Garrett
  2009-08-15 14:41         ` Rafael J. Wysocki
  2009-08-15 14:41         ` Rafael J. Wysocki
  1 sibling, 2 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-14 22:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Greg KH, LKML, Linux-pm mailing list, linux-pci, linux-usb

On Fri, Aug 14, 2009 at 11:22:27PM +0200, Rafael J. Wysocki wrote:

> Do you have any prototypes for that?  I started working on it some time ago,
> but then I focused on the core runtime PM framework.

The native PCIe PME code? There's some in the final patchset at 
http://bugzilla.kernel.org/show_bug.cgi?id=6892 but I haven't had time 
to look into merging that into the current kernel. I also don't have 
anything to test against, which makes life more awkward.

> > +static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
> > +{
> > +	acpi_status status;
> > +	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
> > +	struct acpi_device *acpi_dev;
> > +
> 
> Hm, I'd move that into ACPI as
> 
> int acp_runtime_wake_enable(acpi_handle handle, bool enable)
> 
> in which form it could also be useful to non-PCI devices.

Hm. Yeah, that's not too bad an idea.

> > +		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
> > +				 acpi_dev->wakeup.gpe_number);
> > +	}
> > +	return 0;
> > +}
> 
> Ah, that's the part I've always been missing!
> 
> How exactly do we figure out which GPE is a wake-up one for given device?
> IOW, how are the wakeup.gpe_device and wakeup.gpe_number fields populated?

There's a field in the ACPI device definition in the DSDT that defines 
the needed GPE and which runlevels it can resume from.

> > +	error = pci_pm_suspend(dev);
> 
> This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
> ->suspend() routine, which is specific to suspend to RAM.  So, this means
> that drivers are supposed to implement ->runtime_suspend() only if they
> want to do something _in_ _addition_ to the things done by
> ->suspend() and ->suspend_noirq().

Yes, that was how I'd planned it. An alternative would be for 
runtime_suspend to return a negative value if there's an error, 0 if the 
bus code should continue or a positive value if the runtime_suspend() 
call handles all of it and the bus code should just return immediately?

> > +	disable_irq(pci_dev->irq);
> 
> I don't really think it's necessary to disable the interrupt here.  We prevent
> drivers from receiving interrupts while pci_pm_suspend_noirq() is being run
> during system-wide power transitions to protect them from receiving "alien"
> interrupts they might be unable to handle, but in the runtime case I think the
> driver should take care of protecting itself from that.

That sounds fine. I didn't want to take a risk in that respect, but if 
we should be safe here I can just drop that.

> > +	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
> > +		pci_pme_active(dev, enable);
> > +		pme_done = true;
> > +	}
> 
> I don't really follow your intention here.  The condition means that PME is
> going to be enabled unless 'enable' is set and the device is not capable
> of generating PMEs.  However, if 'enable' is unset, we're still going to try
> to enable the PME, even if the device can't generate it.  Shouldn't that
> be

Hmm. That was copied from pci_enable_wake() just above, but it does seem 
a little bit odd. I suspect that that needs some clarification as well.

> Also, that assumes the device is going to be put into D3_hot, but do we know
> that for sure?

I'd be surprised if there's any hardware that supports wakeups from D2 
but not D3hot, so I just kept the code simple for now.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 21:22     ` Rafael J. Wysocki
@ 2009-08-14 22:30       ` Matthew Garrett
  2009-08-14 22:30       ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-14 22:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Fri, Aug 14, 2009 at 11:22:27PM +0200, Rafael J. Wysocki wrote:

> Do you have any prototypes for that?  I started working on it some time ago,
> but then I focused on the core runtime PM framework.

The native PCIe PME code? There's some in the final patchset at 
http://bugzilla.kernel.org/show_bug.cgi?id=6892 but I haven't had time 
to look into merging that into the current kernel. I also don't have 
anything to test against, which makes life more awkward.

> > +static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
> > +{
> > +	acpi_status status;
> > +	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
> > +	struct acpi_device *acpi_dev;
> > +
> 
> Hm, I'd move that into ACPI as
> 
> int acp_runtime_wake_enable(acpi_handle handle, bool enable)
> 
> in which form it could also be useful to non-PCI devices.

Hm. Yeah, that's not too bad an idea.

> > +		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
> > +				 acpi_dev->wakeup.gpe_number);
> > +	}
> > +	return 0;
> > +}
> 
> Ah, that's the part I've always been missing!
> 
> How exactly do we figure out which GPE is a wake-up one for given device?
> IOW, how are the wakeup.gpe_device and wakeup.gpe_number fields populated?

There's a field in the ACPI device definition in the DSDT that defines 
the needed GPE and which runlevels it can resume from.

> > +	error = pci_pm_suspend(dev);
> 
> This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
> ->suspend() routine, which is specific to suspend to RAM.  So, this means
> that drivers are supposed to implement ->runtime_suspend() only if they
> want to do something _in_ _addition_ to the things done by
> ->suspend() and ->suspend_noirq().

Yes, that was how I'd planned it. An alternative would be for 
runtime_suspend to return a negative value if there's an error, 0 if the 
bus code should continue or a positive value if the runtime_suspend() 
call handles all of it and the bus code should just return immediately?

> > +	disable_irq(pci_dev->irq);
> 
> I don't really think it's necessary to disable the interrupt here.  We prevent
> drivers from receiving interrupts while pci_pm_suspend_noirq() is being run
> during system-wide power transitions to protect them from receiving "alien"
> interrupts they might be unable to handle, but in the runtime case I think the
> driver should take care of protecting itself from that.

That sounds fine. I didn't want to take a risk in that respect, but if 
we should be safe here I can just drop that.

> > +	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
> > +		pci_pme_active(dev, enable);
> > +		pme_done = true;
> > +	}
> 
> I don't really follow your intention here.  The condition means that PME is
> going to be enabled unless 'enable' is set and the device is not capable
> of generating PMEs.  However, if 'enable' is unset, we're still going to try
> to enable the PME, even if the device can't generate it.  Shouldn't that
> be

Hmm. That was copied from pci_enable_wake() just above, but it does seem 
a little bit odd. I suspect that that needs some clarification as well.

> Also, that assumes the device is going to be put into D3_hot, but do we know
> that for sure?

I'd be surprised if there's any hardware that supports wakeups from D2 
but not D3hot, so I just kept the code simple for now.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-14 22:21               ` Matthew Garrett
  2009-08-15 14:18                 ` Rafael J. Wysocki
@ 2009-08-15 14:18                 ` Rafael J. Wysocki
  2009-08-15 15:53                   ` Alan Stern
  2009-08-15 15:53                   ` [linux-pm] " Alan Stern
  1 sibling, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 14:18 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Alan Stern, linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Fri, Aug 14, 2009 at 10:05:19PM +0200, Rafael J. Wysocki wrote:
> 
> > Well, sometimes the user may want a device to be power managed at run time
> > and not to be able to wake up the system from sleep states.  For example,
> > I'd like the USB controller in my box to be suspended at run time whenever it's
> > not used, but surely I wouldn't like it to do system-wide wakeup, because it
> > does that when I move the mouse which is a cordless one.  Simply turning the
> > mouse on causes the system to wake up. :-)
> 
> Right, so clearly my code is broken right now.
> 
> > Why don't we add a flag indicating whether or not the device is allowed to
> > be power managed at run time, something like runtime_forbidden, that the
> > user space will be able to set through sysfs?
> 
> I think even having a runtime_wakeup flag (which defaults to on) would 
> be sufficient.

Perhaps it would, but then unsetting runtime_wakeup would effectively disable
runtime PM for devices that need it to be power managed at run time (probably
all input devices).  Also there may be situations in which user space may
really want to disable runtime PM for some devices (think of broken hardware
for one example).

> If the worst case is scenario is that we have to resume 
> devices in order to apply the correct policy when going into a 
> systemwide suspend state, I think that's acceptable.

I agree.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 22:21               ` Matthew Garrett
@ 2009-08-15 14:18                 ` Rafael J. Wysocki
  2009-08-15 14:18                 ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 14:18 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Fri, Aug 14, 2009 at 10:05:19PM +0200, Rafael J. Wysocki wrote:
> 
> > Well, sometimes the user may want a device to be power managed at run time
> > and not to be able to wake up the system from sleep states.  For example,
> > I'd like the USB controller in my box to be suspended at run time whenever it's
> > not used, but surely I wouldn't like it to do system-wide wakeup, because it
> > does that when I move the mouse which is a cordless one.  Simply turning the
> > mouse on causes the system to wake up. :-)
> 
> Right, so clearly my code is broken right now.
> 
> > Why don't we add a flag indicating whether or not the device is allowed to
> > be power managed at run time, something like runtime_forbidden, that the
> > user space will be able to set through sysfs?
> 
> I think even having a runtime_wakeup flag (which defaults to on) would 
> be sufficient.

Perhaps it would, but then unsetting runtime_wakeup would effectively disable
runtime PM for devices that need it to be power managed at run time (probably
all input devices).  Also there may be situations in which user space may
really want to disable runtime PM for some devices (think of broken hardware
for one example).

> If the worst case is scenario is that we have to resume 
> devices in order to apply the correct policy when going into a 
> systemwide suspend state, I think that's acceptable.

I agree.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 22:30       ` Matthew Garrett
@ 2009-08-15 14:41         ` Rafael J. Wysocki
  2009-08-15 15:24           ` Rafael J. Wysocki
  2009-08-15 15:24           ` Rafael J. Wysocki
  2009-08-15 14:41         ` Rafael J. Wysocki
  1 sibling, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 14:41 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Alan Stern, Greg KH, LKML, Linux-pm mailing list, linux-pci, linux-usb

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Fri, Aug 14, 2009 at 11:22:27PM +0200, Rafael J. Wysocki wrote:
> 
> > Do you have any prototypes for that?  I started working on it some time ago,
> > but then I focused on the core runtime PM framework.
> 
> The native PCIe PME code?

Yes.

> There's some in the final patchset at 
> http://bugzilla.kernel.org/show_bug.cgi?id=6892

Well, I don't like that very much.

> but I haven't had time to look into merging that into the current kernel.
> I also don't have anything to test against, which makes life more awkward.

One of my AMD-based boxes should be suitable for that.

> > > +static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
> > > +{
> > > +	acpi_status status;
> > > +	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
> > > +	struct acpi_device *acpi_dev;
> > > +
> > 
> > Hm, I'd move that into ACPI as
> > 
> > int acp_runtime_wake_enable(acpi_handle handle, bool enable)
> > 
> > in which form it could also be useful to non-PCI devices.
> 
> Hm. Yeah, that's not too bad an idea.
> 
> > > +		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
> > > +				 acpi_dev->wakeup.gpe_number);
> > > +	}
> > > +	return 0;
> > > +}
> > 
> > Ah, that's the part I've always been missing!
> > 
> > How exactly do we figure out which GPE is a wake-up one for given device?
> > IOW, how are the wakeup.gpe_device and wakeup.gpe_number fields populated?
> 
> There's a field in the ACPI device definition in the DSDT that defines 
> the needed GPE and which runlevels it can resume from.
> 
> > > +	error = pci_pm_suspend(dev);
> > 
> > This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
> > ->suspend() routine, which is specific to suspend to RAM.  So, this means
> > that drivers are supposed to implement ->runtime_suspend() only if they
> > want to do something _in_ _addition_ to the things done by
> > ->suspend() and ->suspend_noirq().
> 
> Yes, that was how I'd planned it. An alternative would be for 
> runtime_suspend to return a negative value if there's an error, 0 if the 
> bus code should continue or a positive value if the runtime_suspend() 
> call handles all of it and the bus code should just return immediately?

I just don't think that the existing suspend-resume callbacks are suitable for
runtime PM.

For example, in the majority of cases the existing suspend callbacks put
devices into D3_hot (or into the deepest low power state allowed by the
platform and hardware), but that means 10 ms latency, also for the resume
part.  Do we want that for runtime PM for all drivers?

Perhaps a more suitable model would be to put devices into D1 first, if
available, and then put them into D2 and D3 after specific delays?  Currently
the core framework doesn't provide any tools for that, but it may be worth
extending it for this purpose.

Also, I think it should be impossible to use the "legacy" callbacks for runtime
PM.  They surely are not designed with that in mind.

> > > +	disable_irq(pci_dev->irq);
> > 
> > I don't really think it's necessary to disable the interrupt here.  We prevent
> > drivers from receiving interrupts while pci_pm_suspend_noirq() is being run
> > during system-wide power transitions to protect them from receiving "alien"
> > interrupts they might be unable to handle, but in the runtime case I think the
> > driver should take care of protecting itself from that.
> 
> That sounds fine. I didn't want to take a risk in that respect, but if 
> we should be safe here I can just drop that.

As far as the PCI PM core is concerned, we should.

> > > +	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
> > > +		pci_pme_active(dev, enable);
> > > +		pme_done = true;
> > > +	}
> > 
> > I don't really follow your intention here.  The condition means that PME is
> > going to be enabled unless 'enable' is set and the device is not capable
> > of generating PMEs.  However, if 'enable' is unset, we're still going to try
> > to enable the PME, even if the device can't generate it.  Shouldn't that
> > be
> 
> Hmm. That was copied from pci_enable_wake() just above, but it does seem 
> a little bit odd. I suspect that that needs some clarification as well.

Ah, OK.  If 'enabled' is unset, we want to disable the PME for all states, so
we call pci_pm_active(dev, false) unconditionally in that case.  If 'enable' is
set, we only want to enable the PME if it's supported in given state.  Sorry
for the noise.

> > Also, that assumes the device is going to be put into D3_hot, but do we know
> > that for sure?
> 
> I'd be surprised if there's any hardware that supports wakeups from D2 
> but not D3hot, so I just kept the code simple for now.

OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-14 22:30       ` Matthew Garrett
  2009-08-15 14:41         ` Rafael J. Wysocki
@ 2009-08-15 14:41         ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 14:41 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Fri, Aug 14, 2009 at 11:22:27PM +0200, Rafael J. Wysocki wrote:
> 
> > Do you have any prototypes for that?  I started working on it some time ago,
> > but then I focused on the core runtime PM framework.
> 
> The native PCIe PME code?

Yes.

> There's some in the final patchset at 
> http://bugzilla.kernel.org/show_bug.cgi?id=6892

Well, I don't like that very much.

> but I haven't had time to look into merging that into the current kernel.
> I also don't have anything to test against, which makes life more awkward.

One of my AMD-based boxes should be suitable for that.

> > > +static int acpi_pci_runtime_wake(struct pci_dev *dev, bool enable)
> > > +{
> > > +	acpi_status status;
> > > +	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
> > > +	struct acpi_device *acpi_dev;
> > > +
> > 
> > Hm, I'd move that into ACPI as
> > 
> > int acp_runtime_wake_enable(acpi_handle handle, bool enable)
> > 
> > in which form it could also be useful to non-PCI devices.
> 
> Hm. Yeah, that's not too bad an idea.
> 
> > > +		acpi_disable_gpe(acpi_dev->wakeup.gpe_device,
> > > +				 acpi_dev->wakeup.gpe_number);
> > > +	}
> > > +	return 0;
> > > +}
> > 
> > Ah, that's the part I've always been missing!
> > 
> > How exactly do we figure out which GPE is a wake-up one for given device?
> > IOW, how are the wakeup.gpe_device and wakeup.gpe_number fields populated?
> 
> There's a field in the ACPI device definition in the DSDT that defines 
> the needed GPE and which runlevels it can resume from.
> 
> > > +	error = pci_pm_suspend(dev);
> > 
> > This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
> > ->suspend() routine, which is specific to suspend to RAM.  So, this means
> > that drivers are supposed to implement ->runtime_suspend() only if they
> > want to do something _in_ _addition_ to the things done by
> > ->suspend() and ->suspend_noirq().
> 
> Yes, that was how I'd planned it. An alternative would be for 
> runtime_suspend to return a negative value if there's an error, 0 if the 
> bus code should continue or a positive value if the runtime_suspend() 
> call handles all of it and the bus code should just return immediately?

I just don't think that the existing suspend-resume callbacks are suitable for
runtime PM.

For example, in the majority of cases the existing suspend callbacks put
devices into D3_hot (or into the deepest low power state allowed by the
platform and hardware), but that means 10 ms latency, also for the resume
part.  Do we want that for runtime PM for all drivers?

Perhaps a more suitable model would be to put devices into D1 first, if
available, and then put them into D2 and D3 after specific delays?  Currently
the core framework doesn't provide any tools for that, but it may be worth
extending it for this purpose.

Also, I think it should be impossible to use the "legacy" callbacks for runtime
PM.  They surely are not designed with that in mind.

> > > +	disable_irq(pci_dev->irq);
> > 
> > I don't really think it's necessary to disable the interrupt here.  We prevent
> > drivers from receiving interrupts while pci_pm_suspend_noirq() is being run
> > during system-wide power transitions to protect them from receiving "alien"
> > interrupts they might be unable to handle, but in the runtime case I think the
> > driver should take care of protecting itself from that.
> 
> That sounds fine. I didn't want to take a risk in that respect, but if 
> we should be safe here I can just drop that.

As far as the PCI PM core is concerned, we should.

> > > +	if (!enable || pci_pme_capable(dev, PCI_D3hot)) {
> > > +		pci_pme_active(dev, enable);
> > > +		pme_done = true;
> > > +	}
> > 
> > I don't really follow your intention here.  The condition means that PME is
> > going to be enabled unless 'enable' is set and the device is not capable
> > of generating PMEs.  However, if 'enable' is unset, we're still going to try
> > to enable the PME, even if the device can't generate it.  Shouldn't that
> > be
> 
> Hmm. That was copied from pci_enable_wake() just above, but it does seem 
> a little bit odd. I suspect that that needs some clarification as well.

Ah, OK.  If 'enabled' is unset, we want to disable the PME for all states, so
we call pci_pm_active(dev, false) unconditionally in that case.  If 'enable' is
set, we only want to enable the PME if it's supported in given state.  Sorry
for the noise.

> > Also, that assumes the device is going to be put into D3_hot, but do we know
> > that for sure?
> 
> I'd be surprised if there's any hardware that supports wakeups from D2 
> but not D3hot, so I just kept the code simple for now.

OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 14:41         ` Rafael J. Wysocki
  2009-08-15 15:24           ` Rafael J. Wysocki
@ 2009-08-15 15:24           ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 15:24 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Alan Stern, Greg KH, LKML, Linux-pm mailing list, linux-pci, linux-usb

On Saturday 15 August 2009, Rafael J. Wysocki wrote:
> On Saturday 15 August 2009, Matthew Garrett wrote:
> > On Fri, Aug 14, 2009 at 11:22:27PM +0200, Rafael J. Wysocki wrote:
...
> > > > +	error = pci_pm_suspend(dev);
> > > 
> > > This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
> > > ->suspend() routine, which is specific to suspend to RAM.  So, this means
> > > that drivers are supposed to implement ->runtime_suspend() only if they
> > > want to do something _in_ _addition_ to the things done by
> > > ->suspend() and ->suspend_noirq().
> > 
> > Yes, that was how I'd planned it. An alternative would be for 
> > runtime_suspend to return a negative value if there's an error, 0 if the 
> > bus code should continue or a positive value if the runtime_suspend() 
> > call handles all of it and the bus code should just return immediately?
> 
> I just don't think that the existing suspend-resume callbacks are suitable for
> runtime PM.
> 
> For example, in the majority of cases the existing suspend callbacks put
> devices into D3_hot (or into the deepest low power state allowed by the
> platform and hardware), but that means 10 ms latency, also for the resume
> part.  Do we want that for runtime PM for all drivers?
> 
> Perhaps a more suitable model would be to put devices into D1 first, if
> available, and then put them into D2 and D3 after specific delays?  Currently
> the core framework doesn't provide any tools for that, but it may be worth
> extending it for this purpose.
> 
> Also, I think it should be impossible to use the "legacy" callbacks for runtime
> PM.  They surely are not designed with that in mind.

To be more specific, let's go through pci_pm_suspend() and
pci_pm_suspend_noirq() and see what parts of these are useful for runtime PM.

pci_legacy_suspend() shouldn't be used at run time IMO, as I said above.

For runtime PM we shouldn't even continue if pm is NULL.  There's nothing
like "default runtime PM", either the driver supports it, or not.  We also
should require the callback to be implemented and IMO that should be
->runtime_suspend().

Of course we need to invoke the callback and call pci_fixup_device() after
that.

pci_legacy_suspend_late() shouldn't be used at run time.

Since I think we should require pm to be not NULL for runtime PM, the second
block in pci_pm_suspend_noirq() is redundant in that case.

Now, I don't think we need the _noirq callback for runtime PM, because
we've just executed the "regular" callback and I bet there are no devices
requiring additional driver-specific operations after pci_fixup_device().
So ->runtime_suspend() should be sufficient.

The remaining part of pci_pm_suspend_noirq() is useful, so it can be executed
by pci_pm_runtime_suspend() directly.

Thus we get

static int pci_pm_runtime_suspend(struct device *dev)
{
	struct pci_dev *pci_dev = to_pci_dev(dev);
	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
	int error;

	if (!pm || !pm->runtime_suspend)
		return -ENOSYS;

	pci_dev->state_saved = false;

	error = pm->runtime_suspend(dev);
	suspend_report_result(pm->runtime_suspend, error);
	if (error)
		return error;

	pci_fixup_device(pci_fixup_suspend, pci_dev);

	if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
	    && pci_dev->current_state != PCI_UNKNOWN) {
		WARN_ONCE(pci_dev->current_state != prev,
				"PCI PM: State of device not saved by %pF\n",
				pm->runtime_suspend);
		return 0;
	}

	if (!pci_dev->state_saved) {
		pci_save_state(pci_dev);
		if (!pci_is_bridge(pci_dev))
			pci_prepare_to_sleep(pci_dev);
	}

	pci_pm_set_unknown_state(pci_dev);

	return 0;
}

Now, if the driver has a universal suspend routine, like for example r8169,
it only needs to point .runtime_suspend to that one and it should work.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 14:41         ` Rafael J. Wysocki
@ 2009-08-15 15:24           ` Rafael J. Wysocki
  2009-08-15 15:24           ` Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 15:24 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Rafael J. Wysocki wrote:
> On Saturday 15 August 2009, Matthew Garrett wrote:
> > On Fri, Aug 14, 2009 at 11:22:27PM +0200, Rafael J. Wysocki wrote:
...
> > > > +	error = pci_pm_suspend(dev);
> > > 
> > > This has a chance to be confusing IMO.  pci_pm_suspend() calls the driver's
> > > ->suspend() routine, which is specific to suspend to RAM.  So, this means
> > > that drivers are supposed to implement ->runtime_suspend() only if they
> > > want to do something _in_ _addition_ to the things done by
> > > ->suspend() and ->suspend_noirq().
> > 
> > Yes, that was how I'd planned it. An alternative would be for 
> > runtime_suspend to return a negative value if there's an error, 0 if the 
> > bus code should continue or a positive value if the runtime_suspend() 
> > call handles all of it and the bus code should just return immediately?
> 
> I just don't think that the existing suspend-resume callbacks are suitable for
> runtime PM.
> 
> For example, in the majority of cases the existing suspend callbacks put
> devices into D3_hot (or into the deepest low power state allowed by the
> platform and hardware), but that means 10 ms latency, also for the resume
> part.  Do we want that for runtime PM for all drivers?
> 
> Perhaps a more suitable model would be to put devices into D1 first, if
> available, and then put them into D2 and D3 after specific delays?  Currently
> the core framework doesn't provide any tools for that, but it may be worth
> extending it for this purpose.
> 
> Also, I think it should be impossible to use the "legacy" callbacks for runtime
> PM.  They surely are not designed with that in mind.

To be more specific, let's go through pci_pm_suspend() and
pci_pm_suspend_noirq() and see what parts of these are useful for runtime PM.

pci_legacy_suspend() shouldn't be used at run time IMO, as I said above.

For runtime PM we shouldn't even continue if pm is NULL.  There's nothing
like "default runtime PM", either the driver supports it, or not.  We also
should require the callback to be implemented and IMO that should be
->runtime_suspend().

Of course we need to invoke the callback and call pci_fixup_device() after
that.

pci_legacy_suspend_late() shouldn't be used at run time.

Since I think we should require pm to be not NULL for runtime PM, the second
block in pci_pm_suspend_noirq() is redundant in that case.

Now, I don't think we need the _noirq callback for runtime PM, because
we've just executed the "regular" callback and I bet there are no devices
requiring additional driver-specific operations after pci_fixup_device().
So ->runtime_suspend() should be sufficient.

The remaining part of pci_pm_suspend_noirq() is useful, so it can be executed
by pci_pm_runtime_suspend() directly.

Thus we get

static int pci_pm_runtime_suspend(struct device *dev)
{
	struct pci_dev *pci_dev = to_pci_dev(dev);
	struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
	int error;

	if (!pm || !pm->runtime_suspend)
		return -ENOSYS;

	pci_dev->state_saved = false;

	error = pm->runtime_suspend(dev);
	suspend_report_result(pm->runtime_suspend, error);
	if (error)
		return error;

	pci_fixup_device(pci_fixup_suspend, pci_dev);

	if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
	    && pci_dev->current_state != PCI_UNKNOWN) {
		WARN_ONCE(pci_dev->current_state != prev,
				"PCI PM: State of device not saved by %pF\n",
				pm->runtime_suspend);
		return 0;
	}

	if (!pci_dev->state_saved) {
		pci_save_state(pci_dev);
		if (!pci_is_bridge(pci_dev))
			pci_prepare_to_sleep(pci_dev);
	}

	pci_pm_set_unknown_state(pci_dev);

	return 0;
}

Now, if the driver has a universal suspend routine, like for example r8169,
it only needs to point .runtime_suspend to that one and it should work.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 14:18                 ` [linux-pm] " Rafael J. Wysocki
  2009-08-15 15:53                   ` Alan Stern
@ 2009-08-15 15:53                   ` Alan Stern
  2009-08-15 20:54                     ` Rafael J. Wysocki
  2009-08-15 20:54                     ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 2 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-15 15:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> > > Why don't we add a flag indicating whether or not the device is allowed to
> > > be power managed at run time, something like runtime_forbidden, that the
> > > user space will be able to set through sysfs?
> > 
> > I think even having a runtime_wakeup flag (which defaults to on) would 
> > be sufficient.
> 
> Perhaps it would, but then unsetting runtime_wakeup would effectively disable
> runtime PM for devices that need it to be power managed at run time (probably
> all input devices).  Also there may be situations in which user space may
> really want to disable runtime PM for some devices (think of broken hardware
> for one example).

It sounds like there are really three choices here, and the decision 
should largely be left up to the user:

	1. don't use runtime PM,

	2. allow runtime PM but disable remote wakeup,

	3. allow runtime PM with remote wakeup enabled.

Now, a driver may say "I can't do my job without remote wakeup".  Such
a driver would refuse to do runtime_suspend in case 2.  But otherwise
we should follow the preference of the user.

The only remaining question is how to expose this in sysfs in a way 
that won't be confusing and that won't be confused with the "wakeup" 
attribute.  One possibility is to use the "level" attribute introduced 
in USB; possible levels are "on" (no runtime PM) and "auto" (runtime 
PM allowed).  Then a new "runtime_wakeup" attribute could contain 
nothing (if wakeup is not available), "enabled", or "disabled".

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 14:18                 ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-15 15:53                   ` Alan Stern
  2009-08-15 15:53                   ` [linux-pm] " Alan Stern
  1 sibling, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-15 15:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> > > Why don't we add a flag indicating whether or not the device is allowed to
> > > be power managed at run time, something like runtime_forbidden, that the
> > > user space will be able to set through sysfs?
> > 
> > I think even having a runtime_wakeup flag (which defaults to on) would 
> > be sufficient.
> 
> Perhaps it would, but then unsetting runtime_wakeup would effectively disable
> runtime PM for devices that need it to be power managed at run time (probably
> all input devices).  Also there may be situations in which user space may
> really want to disable runtime PM for some devices (think of broken hardware
> for one example).

It sounds like there are really three choices here, and the decision 
should largely be left up to the user:

	1. don't use runtime PM,

	2. allow runtime PM but disable remote wakeup,

	3. allow runtime PM with remote wakeup enabled.

Now, a driver may say "I can't do my job without remote wakeup".  Such
a driver would refuse to do runtime_suspend in case 2.  But otherwise
we should follow the preference of the user.

The only remaining question is how to expose this in sysfs in a way 
that won't be confusing and that won't be confused with the "wakeup" 
attribute.  One possibility is to use the "level" attribute introduced 
in USB; possible levels are "on" (no runtime PM) and "auto" (runtime 
PM allowed).  Then a new "runtime_wakeup" attribute could contain 
nothing (if wakeup is not available), "enabled", or "disabled".

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 15:53                   ` [linux-pm] " Alan Stern
  2009-08-15 20:54                     ` Rafael J. Wysocki
@ 2009-08-15 20:54                     ` Rafael J. Wysocki
  2009-08-15 20:58                       ` Matthew Garrett
                                         ` (3 more replies)
  1 sibling, 4 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 20:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Saturday 15 August 2009, Alan Stern wrote:
> On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:
> 
> > > > Why don't we add a flag indicating whether or not the device is allowed to
> > > > be power managed at run time, something like runtime_forbidden, that the
> > > > user space will be able to set through sysfs?
> > > 
> > > I think even having a runtime_wakeup flag (which defaults to on) would 
> > > be sufficient.
> > 
> > Perhaps it would, but then unsetting runtime_wakeup would effectively disable
> > runtime PM for devices that need it to be power managed at run time (probably
> > all input devices).  Also there may be situations in which user space may
> > really want to disable runtime PM for some devices (think of broken hardware
> > for one example).
> 
> It sounds like there are really three choices here, and the decision 
> should largely be left up to the user:
> 
> 	1. don't use runtime PM,
> 
> 	2. allow runtime PM but disable remote wakeup,
> 
> 	3. allow runtime PM with remote wakeup enabled.
> 
> Now, a driver may say "I can't do my job without remote wakeup".  Such
> a driver would refuse to do runtime_suspend in case 2.  But otherwise
> we should follow the preference of the user.
> 
> The only remaining question is how to expose this in sysfs in a way 
> that won't be confusing and that won't be confused with the "wakeup" 
> attribute.  One possibility is to use the "level" attribute introduced 
> in USB; possible levels are "on" (no runtime PM) and "auto" (runtime 
> PM allowed).  Then a new "runtime_wakeup" attribute could contain 
> nothing (if wakeup is not available), "enabled", or "disabled".

That seems to require two flags.

runtime_forbidden - if unset, the driver decides whether or not to use runtime
  PM; that could be exposed through sysfs as 'runtime' under the 'power'
  subdirectory with the following values:
  * 'disabled' - runtime_forbidden is set by the user space
  * 'on' - runtime_forbidden is unset, runtime PM is used (disable_depth == 0)
  * 'off' - runtime_forbidden is unset, runtime PM is not used
  To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
  The default is unset.

runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
  That could be represented as 'runtime_wakeup' under 'power' with the
  following values:
  * no value (empty file) is 'runtime' is 'disabled'
  * 'enabled'
  * 'disabled'
  To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
  The default is set.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 15:53                   ` [linux-pm] " Alan Stern
@ 2009-08-15 20:54                     ` Rafael J. Wysocki
  2009-08-15 20:54                     ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 20:54 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Alan Stern wrote:
> On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:
> 
> > > > Why don't we add a flag indicating whether or not the device is allowed to
> > > > be power managed at run time, something like runtime_forbidden, that the
> > > > user space will be able to set through sysfs?
> > > 
> > > I think even having a runtime_wakeup flag (which defaults to on) would 
> > > be sufficient.
> > 
> > Perhaps it would, but then unsetting runtime_wakeup would effectively disable
> > runtime PM for devices that need it to be power managed at run time (probably
> > all input devices).  Also there may be situations in which user space may
> > really want to disable runtime PM for some devices (think of broken hardware
> > for one example).
> 
> It sounds like there are really three choices here, and the decision 
> should largely be left up to the user:
> 
> 	1. don't use runtime PM,
> 
> 	2. allow runtime PM but disable remote wakeup,
> 
> 	3. allow runtime PM with remote wakeup enabled.
> 
> Now, a driver may say "I can't do my job without remote wakeup".  Such
> a driver would refuse to do runtime_suspend in case 2.  But otherwise
> we should follow the preference of the user.
> 
> The only remaining question is how to expose this in sysfs in a way 
> that won't be confusing and that won't be confused with the "wakeup" 
> attribute.  One possibility is to use the "level" attribute introduced 
> in USB; possible levels are "on" (no runtime PM) and "auto" (runtime 
> PM allowed).  Then a new "runtime_wakeup" attribute could contain 
> nothing (if wakeup is not available), "enabled", or "disabled".

That seems to require two flags.

runtime_forbidden - if unset, the driver decides whether or not to use runtime
  PM; that could be exposed through sysfs as 'runtime' under the 'power'
  subdirectory with the following values:
  * 'disabled' - runtime_forbidden is set by the user space
  * 'on' - runtime_forbidden is unset, runtime PM is used (disable_depth == 0)
  * 'off' - runtime_forbidden is unset, runtime PM is not used
  To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
  The default is unset.

runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
  That could be represented as 'runtime_wakeup' under 'power' with the
  following values:
  * no value (empty file) is 'runtime' is 'disabled'
  * 'enabled'
  * 'disabled'
  To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
  The default is set.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 20:54                     ` [linux-pm] " Rafael J. Wysocki
  2009-08-15 20:58                       ` Matthew Garrett
@ 2009-08-15 20:58                       ` Matthew Garrett
  2009-08-15 21:21                           ` Rafael J. Wysocki
  2009-08-16 15:50                       ` Alan Stern
  2009-08-16 15:50                       ` [linux-pm] " Alan Stern
  3 siblings, 1 reply; 90+ messages in thread
From: Matthew Garrett @ 2009-08-15 20:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Sat, Aug 15, 2009 at 10:54:23PM +0200, Rafael J. Wysocki wrote:

> runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
>   That could be represented as 'runtime_wakeup' under 'power' with the
>   following values:
>   * no value (empty file) is 'runtime' is 'disabled'
>   * 'enabled'
>   * 'disabled'
>   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
>   The default is set.

Why would you ever want runtime_wakeup to be false unless 
runtime_forbidden is true? Surely the point of runtime power management 
is to be transparent to the user, in which case remote wakeup is 
required?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 20:54                     ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-15 20:58                       ` Matthew Garrett
  2009-08-15 20:58                       ` [linux-pm] " Matthew Garrett
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-15 20:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Sat, Aug 15, 2009 at 10:54:23PM +0200, Rafael J. Wysocki wrote:

> runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
>   That could be represented as 'runtime_wakeup' under 'power' with the
>   following values:
>   * no value (empty file) is 'runtime' is 'disabled'
>   * 'enabled'
>   * 'disabled'
>   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
>   The default is set.

Why would you ever want runtime_wakeup to be false unless 
runtime_forbidden is true? Surely the point of runtime power management 
is to be transparent to the user, in which case remote wakeup is 
required?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 20:58                       ` [linux-pm] " Matthew Garrett
@ 2009-08-15 21:21                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 21:21 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Alan Stern, linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Sat, Aug 15, 2009 at 10:54:23PM +0200, Rafael J. Wysocki wrote:
> 
> > runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
> >   That could be represented as 'runtime_wakeup' under 'power' with the
> >   following values:
> >   * no value (empty file) is 'runtime' is 'disabled'
> >   * 'enabled'
> >   * 'disabled'
> >   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
> >   The default is set.
> 
> Why would you ever want runtime_wakeup to be false unless 
> runtime_forbidden is true? Surely the point of runtime power management 
> is to be transparent to the user, in which case remote wakeup is 
> required?

Well, this was exactly my point previously. :-)

Still, although for the majority of devices 'runtime_wakeup' disabled would
mean no runtime PM at all IMO, there may be devices that actually work without
remote wakeup, although they support it in general.

I can even imagine a scenario where this setting might be useful, like when
we don't want a network adapter to be woken up from the outside.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
@ 2009-08-15 21:21                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 21:21 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Sat, Aug 15, 2009 at 10:54:23PM +0200, Rafael J. Wysocki wrote:
> 
> > runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
> >   That could be represented as 'runtime_wakeup' under 'power' with the
> >   following values:
> >   * no value (empty file) is 'runtime' is 'disabled'
> >   * 'enabled'
> >   * 'disabled'
> >   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
> >   The default is set.
> 
> Why would you ever want runtime_wakeup to be false unless 
> runtime_forbidden is true? Surely the point of runtime power management 
> is to be transparent to the user, in which case remote wakeup is 
> required?

Well, this was exactly my point previously. :-)

Still, although for the majority of devices 'runtime_wakeup' disabled would
mean no runtime PM at all IMO, there may be devices that actually work without
remote wakeup, although they support it in general.

I can even imagine a scenario where this setting might be useful, like when
we don't want a network adapter to be woken up from the outside.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 21:21                           ` Rafael J. Wysocki
  (?)
@ 2009-08-15 21:27                           ` Matthew Garrett
  2009-08-15 21:44                             ` Rafael J. Wysocki
  2009-08-15 21:44                             ` [linux-pm] " Rafael J. Wysocki
  -1 siblings, 2 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-15 21:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Sat, Aug 15, 2009 at 11:21:53PM +0200, Rafael J. Wysocki wrote:

> I can even imagine a scenario where this setting might be useful, like when
> we don't want a network adapter to be woken up from the outside.

I think in that case we'd probably just want the interface to be downed? 
Some of this is going to require device-specific policy, I think - for 
the network case we probably want something in between IF_RUNNING and 
IF_DOWN (IF_CARRIER, perhaps) that indicates that we want the PHY to be 
powered. Pushing this out to sysfs would mean we'd have a consistent 
interface but varying semantics, and I'm not convinced that's an 
especially helpful interface.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 21:21                           ` Rafael J. Wysocki
  (?)
  (?)
@ 2009-08-15 21:27                           ` Matthew Garrett
  -1 siblings, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-15 21:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Sat, Aug 15, 2009 at 11:21:53PM +0200, Rafael J. Wysocki wrote:

> I can even imagine a scenario where this setting might be useful, like when
> we don't want a network adapter to be woken up from the outside.

I think in that case we'd probably just want the interface to be downed? 
Some of this is going to require device-specific policy, I think - for 
the network case we probably want something in between IF_RUNNING and 
IF_DOWN (IF_CARRIER, perhaps) that indicates that we want the PHY to be 
powered. Pushing this out to sysfs would mean we'd have a consistent 
interface but varying semantics, and I'm not convinced that's an 
especially helpful interface.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 21:27                           ` [linux-pm] " Matthew Garrett
  2009-08-15 21:44                             ` Rafael J. Wysocki
@ 2009-08-15 21:44                             ` Rafael J. Wysocki
  2009-08-16 16:09                               ` Alan Stern
  2009-08-16 16:09                               ` Alan Stern
  1 sibling, 2 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 21:44 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Alan Stern, linux-usb, linux-pci, Greg KH, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Sat, Aug 15, 2009 at 11:21:53PM +0200, Rafael J. Wysocki wrote:
> 
> > I can even imagine a scenario where this setting might be useful, like when
> > we don't want a network adapter to be woken up from the outside.
> 
> I think in that case we'd probably just want the interface to be downed? 
> Some of this is going to require device-specific policy, I think - for 
> the network case we probably want something in between IF_RUNNING and 
> IF_DOWN (IF_CARRIER, perhaps) that indicates that we want the PHY to be 
> powered. Pushing this out to sysfs would mean we'd have a consistent 
> interface but varying semantics, and I'm not convinced that's an 
> especially helpful interface.

I'm not disagreeing with that.

At this point I'd like to know the Alan's opinion.  I would gladly use the
'runtime_forbidden' flag only, but if we overlook something now, it's going
to be difficult to fix later.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 21:27                           ` [linux-pm] " Matthew Garrett
@ 2009-08-15 21:44                             ` Rafael J. Wysocki
  2009-08-15 21:44                             ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-15 21:44 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Saturday 15 August 2009, Matthew Garrett wrote:
> On Sat, Aug 15, 2009 at 11:21:53PM +0200, Rafael J. Wysocki wrote:
> 
> > I can even imagine a scenario where this setting might be useful, like when
> > we don't want a network adapter to be woken up from the outside.
> 
> I think in that case we'd probably just want the interface to be downed? 
> Some of this is going to require device-specific policy, I think - for 
> the network case we probably want something in between IF_RUNNING and 
> IF_DOWN (IF_CARRIER, perhaps) that indicates that we want the PHY to be 
> powered. Pushing this out to sysfs would mean we'd have a consistent 
> interface but varying semantics, and I'm not convinced that's an 
> especially helpful interface.

I'm not disagreeing with that.

At this point I'd like to know the Alan's opinion.  I would gladly use the
'runtime_forbidden' flag only, but if we overlook something now, it's going
to be difficult to fix later.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 20:54                     ` [linux-pm] " Rafael J. Wysocki
                                         ` (2 preceding siblings ...)
  2009-08-16 15:50                       ` Alan Stern
@ 2009-08-16 15:50                       ` Alan Stern
  3 siblings, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-16 15:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> > The only remaining question is how to expose this in sysfs in a way 
> > that won't be confusing and that won't be confused with the "wakeup" 
> > attribute.  One possibility is to use the "level" attribute introduced 
> > in USB; possible levels are "on" (no runtime PM) and "auto" (runtime 
> > PM allowed).  Then a new "runtime_wakeup" attribute could contain 
> > nothing (if wakeup is not available), "enabled", or "disabled".
> 
> That seems to require two flags.
> 
> runtime_forbidden - if unset, the driver decides whether or not to use runtime
>   PM; that could be exposed through sysfs as 'runtime' under the 'power'
>   subdirectory with the following values:
>   * 'disabled' - runtime_forbidden is set by the user space
>   * 'on' - runtime_forbidden is unset, runtime PM is used (disable_depth == 0)
>   * 'off' - runtime_forbidden is unset, runtime PM is not used
>   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
>   The default is unset.

Too confusing.  The difference between "disabled" and "off" is so
subtle that even I don't understand it!

All we need for this is a simple on/off switch.  Or to be consistent
with the terms that are already exposed to userspace for USB devices:  
"auto"/"on", where "auto" means the system automatically changes power
levels (runtime PM is enabled) and "on" means the device is always on
(runtime PM is disabled).

And I don't like the "runtime_forbidden" name either.  It's long and
it's a negative, making it harder to understand.  ("off" means that the
device is permanently on!)  What's wrong with "level", as in
"power/level"?  That seems very intuitive.

> runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
>   That could be represented as 'runtime_wakeup' under 'power' with the
>   following values:
>   * no value (empty file) is 'runtime' is 'disabled'
>   * 'enabled'
>   * 'disabled'
>   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
>   The default is set.

This is okay except for your definition of empty file.  This makes it
dependent on the power/level setting.  The two should be independent.  
Thus, empty file should mean that the device doesn't support remote
wakeup, in which case writes are ignored.

This scheme, like yours, requires two new bitflags.  We could call them
something like "may_runtime_suspend" and "may_runtime_wakeup".

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 20:54                     ` [linux-pm] " Rafael J. Wysocki
  2009-08-15 20:58                       ` Matthew Garrett
  2009-08-15 20:58                       ` [linux-pm] " Matthew Garrett
@ 2009-08-16 15:50                       ` Alan Stern
  2009-08-16 15:50                       ` [linux-pm] " Alan Stern
  3 siblings, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-16 15:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> > The only remaining question is how to expose this in sysfs in a way 
> > that won't be confusing and that won't be confused with the "wakeup" 
> > attribute.  One possibility is to use the "level" attribute introduced 
> > in USB; possible levels are "on" (no runtime PM) and "auto" (runtime 
> > PM allowed).  Then a new "runtime_wakeup" attribute could contain 
> > nothing (if wakeup is not available), "enabled", or "disabled".
> 
> That seems to require two flags.
> 
> runtime_forbidden - if unset, the driver decides whether or not to use runtime
>   PM; that could be exposed through sysfs as 'runtime' under the 'power'
>   subdirectory with the following values:
>   * 'disabled' - runtime_forbidden is set by the user space
>   * 'on' - runtime_forbidden is unset, runtime PM is used (disable_depth == 0)
>   * 'off' - runtime_forbidden is unset, runtime PM is not used
>   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
>   The default is unset.

Too confusing.  The difference between "disabled" and "off" is so
subtle that even I don't understand it!

All we need for this is a simple on/off switch.  Or to be consistent
with the terms that are already exposed to userspace for USB devices:  
"auto"/"on", where "auto" means the system automatically changes power
levels (runtime PM is enabled) and "on" means the device is always on
(runtime PM is disabled).

And I don't like the "runtime_forbidden" name either.  It's long and
it's a negative, making it harder to understand.  ("off" means that the
device is permanently on!)  What's wrong with "level", as in
"power/level"?  That seems very intuitive.

> runtime_wakeup - if set, the device is allowed to do remote wakeup at run time
>   That could be represented as 'runtime_wakeup' under 'power' with the
>   following values:
>   * no value (empty file) is 'runtime' is 'disabled'
>   * 'enabled'
>   * 'disabled'
>   To set/unset the user space writes 'enabled'/'disabled' to it, respectively.
>   The default is set.

This is okay except for your definition of empty file.  This makes it
dependent on the power/level setting.  The two should be independent.  
Thus, empty file should mean that the device doesn't support remote
wakeup, in which case writes are ignored.

This scheme, like yours, requires two new bitflags.  We could call them
something like "may_runtime_suspend" and "may_runtime_wakeup".

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 21:21                           ` Rafael J. Wysocki
                                             ` (3 preceding siblings ...)
  (?)
@ 2009-08-16 15:57                           ` Alan Stern
  2009-08-16 16:04                             ` Matthew Garrett
  2009-08-16 16:04                             ` [linux-pm] " Matthew Garrett
  -1 siblings, 2 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-16 15:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> > Why would you ever want runtime_wakeup to be false unless 
> > runtime_forbidden is true? Surely the point of runtime power management 
> > is to be transparent to the user, in which case remote wakeup is 
> > required?

Matthew, what makes you think remote wakeup is required?  Lots of
power-manageable devices don't support it at all (consider disk drives
or display screens).

> Well, this was exactly my point previously. :-)
> 
> Still, although for the majority of devices 'runtime_wakeup' disabled would
> mean no runtime PM at all IMO, there may be devices that actually work without
> remote wakeup, although they support it in general.

That last part is quite true.  For example, we might suspend the device
whenever no process has opened the device file.  It would be a degraded 
form of power management, but better than nothing.

> I can even imagine a scenario where this setting might be useful, like when
> we don't want a network adapter to be woken up from the outside.

Or if the device's support for remote wakeup is broken.

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 21:21                           ` Rafael J. Wysocki
                                             ` (2 preceding siblings ...)
  (?)
@ 2009-08-16 15:57                           ` Alan Stern
  -1 siblings, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-16 15:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> > Why would you ever want runtime_wakeup to be false unless 
> > runtime_forbidden is true? Surely the point of runtime power management 
> > is to be transparent to the user, in which case remote wakeup is 
> > required?

Matthew, what makes you think remote wakeup is required?  Lots of
power-manageable devices don't support it at all (consider disk drives
or display screens).

> Well, this was exactly my point previously. :-)
> 
> Still, although for the majority of devices 'runtime_wakeup' disabled would
> mean no runtime PM at all IMO, there may be devices that actually work without
> remote wakeup, although they support it in general.

That last part is quite true.  For example, we might suspend the device
whenever no process has opened the device file.  It would be a degraded 
form of power management, but better than nothing.

> I can even imagine a scenario where this setting might be useful, like when
> we don't want a network adapter to be woken up from the outside.

Or if the device's support for remote wakeup is broken.

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-16 15:57                           ` [linux-pm] " Alan Stern
  2009-08-16 16:04                             ` Matthew Garrett
@ 2009-08-16 16:04                             ` Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-16 16:04 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Sun, Aug 16, 2009 at 11:57:53AM -0400, Alan Stern wrote:
> On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:
> 
> > > Why would you ever want runtime_wakeup to be false unless 
> > > runtime_forbidden is true? Surely the point of runtime power management 
> > > is to be transparent to the user, in which case remote wakeup is 
> > > required?
> 
> Matthew, what makes you think remote wakeup is required?  Lots of
> power-manageable devices don't support it at all (consider disk drives
> or display screens).

Sorry, I meant in cases where remote wakeup is a sensible concept. For 
things like storage there's obviously no reason to require it, but for 
something like USB the absence of remote wakeup would effectively break 
the driver for most users.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-16 15:57                           ` [linux-pm] " Alan Stern
@ 2009-08-16 16:04                             ` Matthew Garrett
  2009-08-16 16:04                             ` [linux-pm] " Matthew Garrett
  1 sibling, 0 replies; 90+ messages in thread
From: Matthew Garrett @ 2009-08-16 16:04 UTC (permalink / raw)
  To: Alan Stern; +Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Sun, Aug 16, 2009 at 11:57:53AM -0400, Alan Stern wrote:
> On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:
> 
> > > Why would you ever want runtime_wakeup to be false unless 
> > > runtime_forbidden is true? Surely the point of runtime power management 
> > > is to be transparent to the user, in which case remote wakeup is 
> > > required?
> 
> Matthew, what makes you think remote wakeup is required?  Lots of
> power-manageable devices don't support it at all (consider disk drives
> or display screens).

Sorry, I meant in cases where remote wakeup is a sensible concept. For 
things like storage there's obviously no reason to require it, but for 
something like USB the absence of remote wakeup would effectively break 
the driver for most users.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [linux-pm] [RFC] PCI: Runtime power management
  2009-08-15 21:44                             ` [linux-pm] " Rafael J. Wysocki
@ 2009-08-16 16:09                               ` Alan Stern
  2009-08-16 16:09                               ` Alan Stern
  1 sibling, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-16 16:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Garrett, linux-usb, linux-pci, Greg KH, LKML,
	Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> On Saturday 15 August 2009, Matthew Garrett wrote:
> > On Sat, Aug 15, 2009 at 11:21:53PM +0200, Rafael J. Wysocki wrote:
> > 
> > > I can even imagine a scenario where this setting might be useful, like when
> > > we don't want a network adapter to be woken up from the outside.
> > 
> > I think in that case we'd probably just want the interface to be downed? 
> > Some of this is going to require device-specific policy, I think - for 
> > the network case we probably want something in between IF_RUNNING and 
> > IF_DOWN (IF_CARRIER, perhaps) that indicates that we want the PHY to be 
> > powered. Pushing this out to sysfs would mean we'd have a consistent 
> > interface but varying semantics, and I'm not convinced that's an 
> > especially helpful interface.
> 
> I'm not disagreeing with that.
> 
> At this point I'd like to know the Alan's opinion.  I would gladly use the
> 'runtime_forbidden' flag only, but if we overlook something now, it's going
> to be difficult to fix later.

The notion of "remote wakeup" is more or less the same for all devices,
and it can effectively be abstracted into the core.  Other notions,
like IF_CARRIER, are more device-dependent.  They must be handled at
the bus or driver level, not in the core.

The real question is whether we should export a "may_runtime_wakeup"  
flag.  If we don't then we prevent the use of a degraded
power-management mode in devices with broken wakeup support.  Perhaps 
that's okay, I don't know.

Which reminds me...  In addition to these flags controlling what
settings should be enabled, maybe we should add a flag to record
whether or not remote wakeup was turned on when the device was last
suspended.  Although the core wouldn't use it, such a flag might be
very convenient for drivers.

Alan Stern


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC] PCI: Runtime power management
  2009-08-15 21:44                             ` [linux-pm] " Rafael J. Wysocki
  2009-08-16 16:09                               ` Alan Stern
@ 2009-08-16 16:09                               ` Alan Stern
  1 sibling, 0 replies; 90+ messages in thread
From: Alan Stern @ 2009-08-16 16:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pci, linux-usb, LKML, Linux-pm mailing list

On Sat, 15 Aug 2009, Rafael J. Wysocki wrote:

> On Saturday 15 August 2009, Matthew Garrett wrote:
> > On Sat, Aug 15, 2009 at 11:21:53PM +0200, Rafael J. Wysocki wrote:
> > 
> > > I can even imagine a scenario where this setting might be useful, like when
> > > we don't want a network adapter to be woken up from the outside.
> > 
> > I think in that case we'd probably just want the interface to be downed? 
> > Some of this is going to require device-specific policy, I think - for 
> > the network case we probably want something in between IF_RUNNING and 
> > IF_DOWN (IF_CARRIER, perhaps) that indicates that we want the PHY to be 
> > powered. Pushing this out to sysfs would mean we'd have a consistent 
> > interface but varying semantics, and I'm not convinced that's an 
> > especially helpful interface.
> 
> I'm not disagreeing with that.
> 
> At this point I'd like to know the Alan's opinion.  I would gladly use the
> 'runtime_forbidden' flag only, but if we overlook something now, it's going
> to be difficult to fix later.

The notion of "remote wakeup" is more or less the same for all devices,
and it can effectively be abstracted into the core.  Other notions,
like IF_CARRIER, are more device-dependent.  They must be handled at
the bus or driver level, not in the core.

The real question is whether we should export a "may_runtime_wakeup"  
flag.  If we don't then we prevent the use of a degraded
power-management mode in devices with broken wakeup support.  Perhaps 
that's okay, I don't know.

Which reminds me...  In addition to these flags controlling what
settings should be enabled, maybe we should add a flag to record
whether or not remote wakeup was turned on when the device was last
suspended.  Although the core wouldn't use it, such a flag might be
very convenient for drivers.

Alan Stern

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 14)
@ 2009-08-08 14:25 Rafael J. Wysocki
  0 siblings, 0 replies; 90+ messages in thread
From: Rafael J. Wysocki @ 2009-08-08 14:25 UTC (permalink / raw)
  To: Alan Stern, Magnus Damm; +Cc: Linux-pm mailing list, Greg KH, LKML

Hi,

Below is the rev. 14 of the runtime PM patch.  Hopefully it addresses all of
the remaining reservations.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 14)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Special thanks to Alan Stern for his help with the design and
multiple detailed reviews of the pereceding versions of this patch
and to Magnus Damm for testing feedback.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  386 ++++++++++++++
 drivers/base/dd.c                  |   21 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   20 
 drivers/base/power/power.h         |   31 -
 drivers/base/power/runtime.c       |  969 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |  101 +++
 include/linux/pm_runtime.h         |  115 ++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1664 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,10 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/timer.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +169,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +208,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /*
@@ -329,14 +358,80 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management status.
+ *
+ * These status labels are used internally by the PM core to indicate the
+ * current status of a device with respect to the PM core operations.  They do
+ * not reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational.  Indicates that the device
+ *			bus type's ->runtime_resume() callback has completed
+ *			successfully.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ */
+
+enum rpm_status {
+	RPM_ACTIVE = 0,
+	RPM_RESUMING,
+	RPM_SUSPENDED,
+	RPM_SUSPENDING,
+};
+
+/**
+ * Device run-time power management request types.
+ *
+ * RPM_REQ_NONE		Do nothing.
+ *
+ * RPM_REQ_IDLE		Run the device bus type's ->runtime_idle() callback
+ *
+ * RPM_REQ_SUSPEND	Run the device bus type's ->runtime_suspend() callback
+ *
+ * RPM_REQ_RESUME	Run the device bus type's ->runtime_resume() callback
+ */
+
+enum rpm_request {
+	RPM_REQ_NONE = 0,
+	RPM_REQ_IDLE,
+	RPM_REQ_SUSPEND,
+	RPM_REQ_RESUME,
+};
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct timer_list	suspend_timer;
+	unsigned long		timer_expires;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	spinlock_t		lock;
+	atomic_t		usage_count;
+	atomic_t		child_count;
+	unsigned int		disable_depth:3;
+	unsigned int		ignore_children:1;
+	unsigned int		idle_notification:1;
+	unsigned int		request_pending:1;
+	unsigned int		deferred_resume:1;
+	enum rpm_request	request;
+	enum rpm_status		runtime_status;
+	int			runtime_error;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,969 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static int __pm_runtime_resume(struct device *dev, bool from_wq);
+static int __pm_request_idle(struct device *dev);
+static int __pm_request_resume(struct device *dev);
+
+/**
+ * pm_runtime_deactivate_timer - Deactivate given device's suspend timer.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_deactivate_timer(struct device *dev)
+{
+	if (dev->power.timer_expires > 0) {
+		del_timer(&dev->power.suspend_timer);
+		dev->power.timer_expires = 0;
+	}
+}
+
+/**
+ * pm_runtime_cancel_pending - Deactivate suspend timer and cancel requests.
+ * @dev: Device to handle.
+ */
+static void pm_runtime_cancel_pending(struct device *dev)
+{
+	pm_runtime_deactivate_timer(dev);
+	/*
+	 * In case there's a request pending, make sure its work function will
+	 * return without doing anything.
+	 */
+	dev->power.request = RPM_REQ_NONE;
+}
+
+/**
+ * __pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_runtime_idle(struct device *dev)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_idle()!\n");
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (dev->power.idle_notification)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status != RPM_ACTIVE)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.request_pending) {
+		/*
+		 * If an idle notification request is pending, cancel it.  Any
+		 * other pending request takes precedence over us.
+		 */
+		if (dev->power.request == RPM_REQ_IDLE) {
+			dev->power.request = RPM_REQ_NONE;
+		} else if (dev->power.request != RPM_REQ_NONE) {
+			retval = -EAGAIN;
+			goto out;
+		}
+	}
+
+	dev->power.idle_notification = true;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle) {
+		spin_unlock_irq(&dev->power.lock);
+
+		dev->bus->pm->runtime_idle(dev);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev->power.idle_notification = false;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	dev_dbg(dev, "__pm_runtime_idle() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ */
+int pm_runtime_idle(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_idle(dev);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If an idle notification or suspend request is pending or
+ * scheduled, cancel it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_suspend(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	bool notify = false;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_suspend()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	/* Pending resume requests take precedence over us. */
+	if (dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	/* Other scheduled or pending requests need to be canceled. */
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.disable_depth > 0
+	    || atomic_read(&dev->power.usage_count) > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the other suspend running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_suspend(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		pm_runtime_cancel_pending(dev);
+		dev->power.deferred_resume = false;
+
+		if (retval == -EAGAIN || retval == -EBUSY) {
+			notify = true;
+			dev->power.runtime_error = 0;
+		}
+	} else {
+		dev->power.runtime_status = RPM_SUSPENDED;
+
+		if (dev->parent) {
+			parent = dev->parent;
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+		}
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (dev->power.deferred_resume) {
+		dev->power.deferred_resume = false;
+		__pm_runtime_resume(dev, false);
+		retval = -EAGAIN;
+		goto out;
+	}
+
+	if (notify)
+		__pm_runtime_idle(dev);
+
+	if (parent && !parent->power.ignore_children) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_request_idle(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+ out:
+	dev_dbg(dev, "__pm_runtime_suspend() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_suspend(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_suspend(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @from_wq: If set, the function has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  Cancel any scheduled
+ * or pending requests.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+int __pm_runtime_resume(struct device *dev, bool from_wq)
+	__releases(&dev->power.lock) __acquires(&dev->power.lock)
+{
+	struct device *parent = NULL;
+	int retval = 0;
+
+	dev_dbg(dev, "__pm_runtime_resume()%s!\n",
+		from_wq ? " from workqueue" : "");
+
+ repeat:
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	pm_runtime_cancel_pending(dev);
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.runtime_status == RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		if (from_wq) {
+			if (dev->power.runtime_status == RPM_SUSPENDING)
+				dev->power.deferred_resume = true;
+			retval = -EINPROGRESS;
+			goto out;
+		}
+
+		/* Wait for the operation carried out in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_RESUMING
+			    && dev->power.runtime_status != RPM_SUSPENDING)
+				break;
+
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat;
+	}
+
+	if (!parent && dev->parent) {
+		/*
+		 * Increment the parent's resume counter and resume it if
+		 * necessary.
+		 */
+		parent = dev->parent;
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_get_noresume(parent);
+
+		spin_lock_irq(&parent->power.lock);
+		/*
+		 * We can resume if the parent's run-time PM is disabled or it
+		 * is set to ignore children.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children) {
+			__pm_runtime_resume(parent, false);
+			if (parent->power.runtime_status != RPM_ACTIVE)
+				retval = -EBUSY;
+		}
+		spin_unlock_irq(&parent->power.lock);
+
+		spin_lock_irq(&dev->power.lock);
+		if (retval)
+			goto out;
+		goto repeat;
+	}
+
+	dev->power.runtime_status = RPM_RESUMING;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume) {
+		spin_unlock_irq(&dev->power.lock);
+
+		retval = dev->bus->pm->runtime_resume(dev);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.runtime_error = retval;
+	} else {
+		retval = -ENOSYS;
+	}
+
+	if (retval) {
+		dev->power.runtime_status = RPM_SUSPENDED;
+		pm_runtime_cancel_pending(dev);
+	} else {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (parent)
+			atomic_inc(&parent->power.child_count);
+	}
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!retval)
+		__pm_request_idle(dev);
+
+ out:
+	if (parent) {
+		spin_unlock_irq(&dev->power.lock);
+
+		pm_runtime_put(parent);
+
+		spin_lock_irq(&dev->power.lock);
+	}
+
+	dev_dbg(dev, "__pm_runtime_resume() returns %d!\n", retval);
+
+	return retval;
+}
+
+/**
+ * pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to suspend.
+ */
+int pm_runtime_resume(struct device *dev)
+{
+	int retval;
+
+	spin_lock_irq(&dev->power.lock);
+	retval = __pm_runtime_resume(dev, false);
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Universal run-time PM work function.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work is to be done for, determine what
+ * is to be done and execute the appropriate run-time PM function.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	struct device *dev = container_of(work, struct device, power.work);
+	enum rpm_request req;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.request_pending)
+		goto out;
+
+	req = dev->power.request;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.request_pending = false;
+
+	switch (req) {
+	case RPM_REQ_NONE:
+		break;
+	case RPM_REQ_IDLE:
+		__pm_runtime_idle(dev);
+		break;
+	case RPM_REQ_SUSPEND:
+		__pm_runtime_suspend(dev, true);
+		break;
+	case RPM_REQ_RESUME:
+		__pm_runtime_resume(dev, true);
+		break;
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+}
+
+/**
+ * __pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ *
+ * Check if the device's run-time PM status is correct for suspending the device
+ * and queue up a request to run __pm_runtime_idle() for it.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_idle(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		retval = -EINVAL;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0
+	    || dev->power.runtime_status == RPM_SUSPENDED
+	    || dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		return retval;
+
+	if (dev->power.request_pending) {
+		/* Any requests other then RPM_REQ_IDLE take precedence. */
+		if (dev->power.request == RPM_REQ_NONE)
+			dev->power.request = RPM_REQ_IDLE;
+		else if (dev->power.request != RPM_REQ_IDLE)
+			retval = -EAGAIN;
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_IDLE;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_idle - Submit an idle notification request for given device.
+ * @dev: Device to handle.
+ */
+int pm_request_idle(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_idle(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_idle);
+
+/**
+ * __pm_request_suspend - Submit a suspend request for given device.
+ * @dev: Device to suspend.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_suspend(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but we can
+		 * overtake any other pending request.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME)
+			retval = -EAGAIN;
+		else if (dev->power.request != RPM_REQ_SUSPEND)
+			dev->power.request = retval ?
+						RPM_REQ_NONE : RPM_REQ_SUSPEND;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_SUSPEND;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return 0;
+}
+
+/**
+ * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
+ * @data: Device pointer passed by pm_schedule_suspend().
+ *
+ * Check if the time is right and execute __pm_request_suspend() in that case.
+ */
+static void pm_suspend_timer_fn(unsigned long data)
+{
+	struct device *dev = (struct device *)data;
+	unsigned long flags;
+	unsigned long expires;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	expires = dev->power.timer_expires;
+	/* If 'expire' is after 'jiffies' we've been called too early. */
+	if (expires > 0 && !time_after(expires, jiffies)) {
+		dev->power.timer_expires = 0;
+		__pm_request_suspend(dev);
+	}
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
+ * @dev: Device to suspend.
+ * @delay: Time to wait before submitting a suspend request, in milliseconds.
+ */
+int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	unsigned long flags;
+	int retval = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_error) {
+		retval = -EINVAL;
+		goto out;
+	}
+
+	if (!delay) {
+		retval = __pm_request_suspend(dev);
+		goto out;
+	}
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/*
+		 * Pending resume requests take precedence over us, but any
+		 * other pending requests have to be canceled.
+		 */
+		if (dev->power.request == RPM_REQ_RESUME) {
+			retval = -EAGAIN;
+			goto out;
+		}
+		dev->power.request = RPM_REQ_NONE;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_SUSPENDING)
+		retval = -EINPROGRESS;
+	else if (atomic_read(&dev->power.usage_count) > 0
+	    || dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		retval = -EBUSY;
+	if (retval)
+		goto out;
+
+	dev->power.timer_expires = jiffies + msecs_to_jiffies(delay);
+	mod_timer(&dev->power.suspend_timer, dev->power.timer_expires);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_schedule_suspend);
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ *
+ * This function must be called under dev->power.lock with interrupts disabled.
+ */
+static int __pm_request_resume(struct device *dev)
+{
+	int retval = 0;
+
+	if (dev->power.runtime_error)
+		return -EINVAL;
+
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		retval = 1;
+	else if (dev->power.runtime_status == RPM_RESUMING)
+		retval = -EINPROGRESS;
+	else if (dev->power.disable_depth > 0)
+		retval = -EAGAIN;
+	if (retval < 0)
+		return retval;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		/* If non-resume request is pending, we can overtake it. */
+		dev->power.request = retval ? RPM_REQ_NONE : RPM_REQ_RESUME;
+		return retval;
+	} else if (retval) {
+		return retval;
+	}
+
+	dev->power.request = RPM_REQ_RESUME;
+	dev->power.request_pending = true;
+	queue_work(pm_wq, &dev->power.work);
+
+	return retval;
+}
+
+/**
+ * pm_request_resume - Submit a resume request for given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	unsigned long flags;
+	int retval;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	retval = __pm_request_resume(dev);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_get - Reference count a device and wake it up, if necessary.
+ * @dev: Device to handle.
+ * @sync: If set and the device is suspended, resume it synchronously.
+ *
+ * Increment the usage count of the device and if it was zero previously,
+ * resume it or submit a resume request for it, depending on the value of @sync.
+ */
+int __pm_runtime_get(struct device *dev, bool sync)
+{
+	int retval = 1;
+
+	if (atomic_add_return(1, &dev->power.usage_count) == 1)
+		retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_get);
+
+/**
+ * __pm_runtime_put - Decrement the device's usage counter and notify its bus.
+ * @dev: Device to handle.
+ * @sync: If the device's bus type is to be notified, do that synchronously.
+ *
+ * Decrement the usage count of the device and if it reaches zero, carry out a
+ * synchronous idle notification or submit an idle notification request for it,
+ * depending on the value of @sync.
+ */
+int __pm_runtime_put(struct device *dev, bool sync)
+{
+	int retval = 0;
+
+	if (atomic_dec_and_test(&dev->power.usage_count))
+		retval = sync ? pm_runtime_idle(dev) : pm_request_idle(dev);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_put);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its power.runtime_error field is
+ * different from zero, the status may be changed either to RPM_ACTIVE, or to
+ * RPM_SUSPENDED, as long as that reflects the actual state of the device.
+ * However, if the device has a parent and the parent is not active, and the
+ * parent's power.ignore_children flag is unset, the device's status cannot be
+ * set to RPM_ACTIVE, so -EBUSY is returned in that case.
+ *
+ * If successful, __pm_runtime_set_status() clears the power.runtime_error field
+ * and the device parent's counter of unsuspended children is modified to
+ * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
+ * notification request for the parent is submitted.
+ */
+int __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool notify_parent = false;
+	int error = 0;
+
+	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == status)
+		goto out_set;
+
+	if (status == RPM_SUSPENDED) {
+		/* It always is possible to set the status to 'suspended'. */
+		if (parent) {
+			atomic_add_unless(&parent->power.child_count, -1, 0);
+			notify_parent = !parent->power.ignore_children;
+		}
+		goto out_set;
+	}
+
+	if (parent) {
+		spin_lock_irq(&parent->power.lock);
+
+		/*
+		 * It is invalid to put an active child under a parent that is
+		 * not active, has run-time PM enabled and the
+		 * 'power.ignore_children' flag unset.
+		 */
+		if (!parent->power.disable_depth
+		    && !parent->power.ignore_children
+		    && parent->power.runtime_status != RPM_ACTIVE) {
+			error = -EBUSY;
+		} else {
+			if (dev->power.runtime_status == RPM_SUSPENDED)
+				atomic_inc(&parent->power.child_count);
+		}
+
+		spin_unlock_irq(&parent->power.lock);
+
+		if (error)
+			goto out;
+	}
+
+ out_set:
+	dev->power.runtime_status = status;
+	dev->power.runtime_error = 0;
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (notify_parent)
+		pm_request_idle(parent);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.disable_depth > 0)
+		dev->power.disable_depth--;
+	else
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * __pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ * @check_resume: If set, check if there's a resume request for the device.
+ *
+ * Increment power.disable_depth for the device and if was zero previously,
+ * cancel all pending run-time PM requests for the device and wait for all
+ * operations in progress to complete.  The device can be either active or
+ * suspended after its run-time PM has been disabled.
+ *
+ * If @check_resume is set and there's a resume request pending when
+ * __pm_runtime_disable() is called and power.disable_depth is zero, the
+ * function will wake up the device before disabling its run-time PM and will
+ * return 1.  Otherwise, 0 is returned.
+ */
+int __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	int retval = 0;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (dev->power.disable_depth > 0) {
+		dev->power.disable_depth++;
+		goto out;
+	}
+
+	/*
+	 * Wake up the device if there's a resume request pending, because that
+	 * means there probably is some I/O to process and disabling run-time PM
+	 * shouldn't prevent the device from processing the I/O.
+	 */
+	if (check_resume && dev->power.request_pending
+	    && dev->power.request == RPM_REQ_RESUME) {
+		/*
+		 * Prevent suspends and idle notifications from being carried
+		 * out after we have woken up the device.
+		 */
+		pm_runtime_get_noresume(dev);
+
+		__pm_runtime_resume(dev, false);
+
+		pm_runtime_put_noidle(dev);
+		retval = 1;
+	}
+
+	if (dev->power.disable_depth++ > 0)
+		goto out;
+
+	pm_runtime_deactivate_timer(dev);
+
+	if (dev->power.request_pending) {
+		dev->power.request = RPM_REQ_NONE;
+		spin_unlock_irq(&dev->power.lock);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irq(&dev->power.lock);
+		dev->power.request_pending = false;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDING
+	    || dev->power.runtime_status == RPM_RESUMING
+	    || dev->power.idle_notification) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend or wake-up in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (dev->power.runtime_status != RPM_SUSPENDING
+			    && dev->power.runtime_status != RPM_RESUMING
+			    && !dev->power.idle_notification)
+				break;
+			spin_unlock_irq(&dev->power.lock);
+
+			schedule();
+
+			spin_lock_irq(&dev->power.lock);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+
+ out:
+	spin_unlock_irq(&dev->power.lock);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_SUSPENDED;
+	dev->power.idle_notification = false;
+
+	dev->power.disable_depth = 1;
+	atomic_set(&dev->power.usage_count, 0);
+
+	dev->power.runtime_error = 0;
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	dev->power.request_pending = false;
+	dev->power.request = RPM_REQ_NONE;
+	dev->power.deferred_resume = false;
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+	dev->power.timer_expires = 0;
+	setup_timer(&dev->power.suspend_timer, pm_suspend_timer_fn,
+			(unsigned long)dev);
+
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	__pm_runtime_disable(dev, false);
+
+	/* Change the status back to 'suspended' to match the initial status. */
+	if (dev->power.runtime_status == RPM_ACTIVE)
+		pm_runtime_set_suspended(dev);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,115 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern int pm_runtime_idle(struct device *dev);
+extern int pm_runtime_suspend(struct device *dev);
+extern int pm_runtime_resume(struct device *dev);
+extern int pm_request_idle(struct device *dev);
+extern int pm_schedule_suspend(struct device *dev, unsigned int delay);
+extern int pm_request_resume(struct device *dev);
+extern int __pm_runtime_get(struct device *dev, bool sync);
+extern int __pm_runtime_put(struct device *dev, bool sync);
+extern int __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern int __pm_runtime_disable(struct device *dev, bool check_resume);
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+static inline void pm_runtime_get_noresume(struct device *dev)
+{
+	atomic_inc(&dev->power.usage_count);
+}
+
+static inline void pm_runtime_put_noidle(struct device *dev)
+{
+	atomic_add_unless(&dev->power.usage_count, -1, 0);
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_suspend(struct device *dev) { return -ENOSYS; }
+static inline int pm_runtime_resume(struct device *dev) { return 0; }
+static inline int pm_request_idle(struct device *dev) { return -ENOSYS; }
+static inline int pm_schedule_suspend(struct device *dev, unsigned int delay)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return 0; }
+static inline int __pm_runtime_get(struct device *dev, bool sync) { return 1; }
+static inline int __pm_runtime_put(struct device *dev, bool sync) { return 0; }
+static inline int __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) { return 0; }
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline int __pm_runtime_disable(struct device *dev, bool check_resume)
+{
+	return 0;
+}
+
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+static inline void pm_runtime_get_noresume(struct device *dev) {}
+static inline void pm_runtime_put_noidle(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_get(struct device *dev)
+{
+	return __pm_runtime_get(dev, false);
+}
+
+static inline int pm_runtime_get_sync(struct device *dev)
+{
+	return __pm_runtime_get(dev, true);
+}
+
+static inline int pm_runtime_put(struct device *dev)
+{
+	return __pm_runtime_put(dev, false);
+}
+
+static inline int pm_runtime_put_sync(struct device *dev)
+{
+	return __pm_runtime_put(dev, true);
+}
+
+static inline int pm_runtime_set_active(struct device *dev)
+{
+	return __pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+static inline int pm_runtime_disable(struct device *dev)
+{
+	return __pm_runtime_disable(dev, true);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object being initialized.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -105,6 +116,7 @@ void device_pm_remove(struct device *dev
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
+	pm_runtime_remove(dev);
 }
 
 /**
@@ -512,6 +524,7 @@ static void dpm_complete(pm_message_t st
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
+			pm_runtime_enable(dev);
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -757,11 +770,16 @@ static int dpm_prepare(pm_message_t stat
 		dev->power.status = DPM_PREPARING;
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_prepare(dev, state);
+		if (pm_runtime_disable(dev) && device_may_wakeup(dev))
+			/* Wake-up requested during system sleep transition. */
+			error = -EBUSY;
+		else
+			error = device_prepare(dev, state);
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				error = 0;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,17 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	/*
+	 * Wait for run-time PM calls to complete and prevent new suspend calls
+	 * until the probe is done.
+	 */
+	pm_runtime_disable(dev);
+	pm_runtime_get_noresume(dev);
+	pm_runtime_enable(dev);
 	ret = really_probe(dev, drv);
+	pm_runtime_put_noidle(dev);
+	if (!ret)
+		pm_runtime_idle(dev);
 
 	return ret;
 }
@@ -306,6 +317,14 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		/*
+		 * Wait for run-time PM calls to complete and prevent new
+		 * suspend calls until the remove is done.
+		 */
+		pm_runtime_disable(dev);
+		pm_runtime_get_noresume(dev);
+		pm_runtime_enable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +343,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_put_noidle(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,7 +1,14 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
+#ifdef CONFIG_PM_RUNTIME
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -16,23 +23,33 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
+
+static inline void device_pm_init(struct device *dev)
+{
+	pm_runtime_init(dev);
+}
+
+static inline void device_pm_remove(struct device *dev)
+{
+	pm_runtime_remove(dev);
+}
 
 static inline void device_pm_add(struct device *dev) {}
-static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
 					 struct device *devb) {}
 static inline void device_pm_move_after(struct device *deva,
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,386 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions (suspend to RAM,
+  hibernation and resume from system sleep states).  pm_wq is declared in
+  include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+fields of 'struct dev_pm_info' and the core helper functions provided for
+run-time PM are described below.
+
+2. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by the PM core for the bus type of
+the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_suspend() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has completed successfully
+    for given device, the PM core regards the device as suspended, which need
+    not mean that the device has been put into a low power state.  It is
+    supposed to mean, however, that the device will not process data and will
+    not communicate with the CPU(s) and RAM until its bus type's
+    ->runtime_resume() callback is executed for it.  The run-time PM status of
+    a device after successful execution of its bus type's ->runtime_suspend()
+    callback is 'suspended'.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is supposed to be 'active', which means that
+    the device _must_ be fully operational afterwards.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
+    error and will refuse to run the helper functions described in Section 4
+    for the device, until the status of it is directly set either to 'active'
+    or to 'suspended' (the PM core provides special helper functions for this
+    purpose).
+
+In particular, if the driver requires remote wakeup capability for proper
+functioning and device_may_wakeup() returns 'false' for the device, then
+->runtime_suspend() should return -EBUSY.  On the other hand, if
+device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of its bus type's
+->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
+allowing the device to request a change of its power state, such as PCI PME)
+will be enabled for the device.  Generally, remote wake-up should be enabled
+for all input devices put into a low power state at run time.
+
+The ->runtime_resume() callback is executed by the PM core for the bus type of
+the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's own ->runtime_resume() callback (from the
+PM core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has completed successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.  The run-time
+    PM status of the device is then 'active'.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as a fatal error and will refuse to run the helper
+    functions described in Section 4 for the device, until its status is
+    directly set either to 'active' or to 'suspended' (the PM core provides
+    special helper functions for this purpose).
+
+The ->runtime_idle() callback is executed by the PM core for the bus type of
+given device whenever the device appears to be idle, which is indicated to the
+PM core by two counters, the device's usage counter and the counter of 'active'
+children of the device.
+
+  * If any of these counters is decreased using a helper function provided by
+    the PM core and it turns out to be equal to zero, the other counter is
+    checked.  If that counter also is equal to zero, the PM core executes the
+    device bus type's ->runtime_idle() callback (with the device as an
+    argument).
+
+The action performed by a bus type's ->runtime_idle() callback is totally
+dependent on the bus type in question, but the expected and recommended action
+is to check if the device can be suspended (i.e. if all of the conditions
+necessary for suspending the device are satisfied) and to queue up a suspend
+request for the device in that case.
+
+The helper functions provided by the PM core, described in Section 4, guarantee
+that the following constraints are met with respect to the bus type's run-time
+PM callbacks:
+
+(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
+    ->runtime_suspend() in parallel with ->runtime_resume() or with another
+    instance of ->runtime_suspend() for the same device) with the exception that
+    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
+    ->runtime_idle() (although ->runtime_idle() will not be started while any
+    of the other callbacks is being executed for the same device).
+
+(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
+    devices (i.e. the PM core will only execute ->runtime_idle() or
+    ->runtime_suspend() for the devices the run-time PM status of which is
+    'active').
+
+(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
+    the usage counter of which is equal to zero _and_ either the counter of
+    'active' children of which is equal to zero, or the 'power.ignore_children'
+    flag of which is set.
+
+(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
+    PM core will only execute ->runtime_resume() for the devices the run-time
+    PM status of which is 'suspended').
+
+Additionally, the helper functions provided by the PM core obey the following
+rules:
+
+  * If ->runtime_suspend() is about to be executed or there's a pending request
+    to execute it, ->runtime_idle() will not be executed for the same device.
+
+  * A request to execute or to schedule the execution of ->runtime_suspend()
+    will cancel any pending requests to execute ->runtime_idle() for the same
+    device.
+
+  * If ->runtime_resume() is about to be executed or there's a pending request
+    to execute it, the other callbacks will not be executed for the same device.
+
+  * A request to execute ->runtime_resume() will cancel any pending or
+    scheduled requests to execute the other callbacks for the same device.
+
+3. Run-time PM Device Fields
+
+The following device run-time PM fields are present in 'struct dev_pm_info', as
+defined in include/linux/pm.h:
+
+  struct timer_list suspend_timer;
+    - timer used for scheduling (delayed) suspend request
+
+  unsigned long timer_expires;
+    - timer expiration time, in jiffies (if this is different from zero, the
+      timer is running and will expire at that time, otherwise the timer is not
+      running)
+
+  struct work_struct work;
+    - work structure used for queuing up requests (i.e. work items in pm_wq)
+
+  wait_queue_head_t wait_queue;
+    - wait queue used if any of the helper functions needs to wait for another
+      one to complete
+
+  spinlock_t lock;
+    - lock used for synchronisation
+
+  atomic_t usage_count;
+    - the usage counter of the device
+
+  atomic_t child_count;
+    - the count of 'active' children of the device
+
+  unsigned int ignore_children;
+    - if set, the value of child_count is ignored (but still updated)
+
+  unsigned int disable_depth;
+    - used for disabling the helper funcions (they work normally if this is
+      equal to zero); the initial value of it is 1 (i.e. run-time PM is
+      initially disabled for all devices)
+
+  unsigned int runtime_error;
+    - if set, there was a fatal error (one of the callbacks returned error code
+      as described in Section 2), so the helper funtions will not work until
+      this flag is cleared; this is the error code returned by the failing
+      callback
+
+  unsigned int idle_notification;
+    - if set, ->runtime_idle() is being executed
+
+  unsigned int request_pending;
+    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
+
+  enum rpm_request request;
+    - type of request that's pending (valid if request_pending is set)
+
+  unsigned int deferred_resume;
+    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
+      being executed for that device and it is not practical to wait for the
+      suspend to complete; means "start a resume as soon as you've suspended"
+
+  enum rpm_status runtime_status;
+    - the run-time PM status of the device; this field's initial value is
+      RPM_SUSPENDED, which means that each device is initially regarded by the
+      PM core as 'suspended', regardless of its real hardware status
+
+All of the above fields are members of the 'power' member of 'struct device'.
+
+4. Run-time PM Device Helper Functions
+
+The following run-time PM helper functions are defined in
+drivers/base/power/runtime.c and include/linux/pm_runtime.h:
+
+  void pm_runtime_init(struct device *dev);
+    - initialize the device run-time PM fields in 'struct dev_pm_info'
+
+  void pm_runtime_remove(struct device *dev);
+    - make sure that the run-time PM of the device will be disabled after
+      removing the device from device hierarchy
+
+  int pm_runtime_idle(struct device *dev);
+    - execute ->runtime_idle() for the device's bus type; returns 0 on success
+      or error code on failure, where -EINPROGRESS means that ->runtime_idle()
+      is already being executed
+
+  int pm_runtime_suspend(struct device *dev);
+    - execute ->runtime_suspend() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'suspended', or
+      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
+      to suspend the device again in future
+
+  int pm_runtime_resume(struct device *dev);
+    - execute ->runtime_resume() for the device's bus type; returns 0 on
+      success, 1 if the device's run-time PM status was already 'active' or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally
+
+  int pm_request_idle(struct device *dev);
+    - submit a request to execute ->runtime_idle() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on success
+      or error code if the request has not been queued up
+
+  int pm_schedule_suspend(struct device *dev, unsigned int delay);
+    - schedule the execution of ->runtime_suspend() for the device's bus type
+      in future, where 'delay' is the time to wait before queuing up a suspend
+      work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
+      queued up immediately); returns 0 on success, 1 if the device's PM
+      run-time status was already 'suspended', or error code if the request
+      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
+      ->runtime_suspend() is already scheduled and not yet expired, the new
+      value of 'delay' will be used as the time to wait
+
+  int pm_request_resume(struct device *dev);
+    - submit a request to execute ->runtime_resume() for the device's bus type
+      (the request is represented by a work item in pm_wq); returns 0 on
+      success, 1 if the device's run-time PM status was already 'active', or
+      error code if the request hasn't been queued up
+
+  void pm_runtime_get_noresume(struct device *dev);
+    - increment the device's usage counter
+
+  int pm_runtime_get(struct device *dev);
+    - increment the device's usage counter, run pm_request_resume(dev) and
+      return its result
+
+  int pm_runtime_get_sync(struct device *dev);
+    - increment the device's usage counter, run pm_runtime_resume(dev) and
+      return its result
+
+  void pm_runtime_put_noidle(struct device *dev);
+    - decrement the device's usage counter
+
+  int pm_runtime_put(struct device *dev);
+    - decrement the device's usage counter, run pm_request_idle(dev) and return
+      its result
+
+  int pm_runtime_put_sync(struct device *dev);
+    - decrement the device's usage counter, run pm_runtime_idle(dev) and return
+      its result
+
+  void pm_runtime_enable(struct device *dev);
+    - enable the run-time PM helper functions to run the device bus type's
+      run-time PM callbacks described in Section 2
+
+  int pm_runtime_disable(struct device *dev);
+    - prevent the run-time PM helper functions from running the device bus
+      type's run-time PM callbacks, make sure that all of the pending run-time
+      PM operations on the device are either completed or canceled; returns
+      1 if there was a resume request pending and it was necessary to execute
+      ->runtime_resume() for the device's bus type to satisfy that request,
+      otherwise 0 is returned
+
+  void pm_suspend_ignore_children(struct device *dev, bool enable);
+    - set/unset the power.ignore_children flag of the device
+
+  int pm_runtime_set_active(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'active' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero); it will fail and return error code if the device has a parent
+      which is not active and the 'power.ignore_children' flag of which is unset
+
+  void pm_runtime_set_suspended(struct device *dev);
+    - clear the device's 'power.runtime_error' flag, set the device's run-time
+      PM status to 'suspended' and update its parent's counter of 'active'
+      children as appropriate (it is only valid to use this function if
+      'power.runtime_error' is set or 'power.disable_depth' is greater than
+      zero)
+
+It is safe to execute the following helper functions from interrupt context:
+
+pm_request_idle()
+pm_schedule_suspend()
+pm_request_resume()
+pm_runtime_get_noresume()
+pm_runtime_get()
+pm_runtime_put_noidle()
+pm_runtime_put()
+pm_suspend_ignore_children()
+pm_runtime_set_active()
+pm_runtime_set_suspended()
+pm_runtime_enable()
+
+5. Run-time PM Initialization
+
+Initially, the run-time PM is disabled for all devices, which means that the
+majority of the run-time PM helper funtions described in Section 4 will return
+-EAGAIN until pm_runtime_enable() is called for the device.
+
+In addition to that, the initial run-time PM status of all devices is
+'suspended', but it need not reflect the actual physical state of the device.
+Thus, if the device is initially active (i.e. it is able to process I/O), its
+run-time PM status must be changed to 'active', with the help of
+pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
+
+However, if the device has a parent and the parent's run-time PM is enabled,
+calling pm_runtime_set_active() for the device will affect the parent, unless
+the parent's 'power.ignore_children' flag is set.  Namely, in that case the
+parent won't be able to suspend at run time, using the PM core's helper
+functions, as long as the child's status is 'active', even if the child's
+run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+the child yet or pm_runtime_disable() has been called for it).  For this reason,
+once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
+should be called for it too as soon as reasonably possible or its run-time PM
+status should be changed back to 'suspended' with the help of
+pm_runtime_set_suspended().
+
+If the default initial run-time PM status of the device (i.e. 'suspended')
+reflects the actual state of the device, its bus type's or its driver's
+->probe() callback will likely need to wake it up using one of the PM core's
+helper functions described in Section 4.  In that case, pm_runtime_resume()
+should be used.  Of course, for this purpose the device's run-time PM has to be
+enabled earlier by calling pm_runtime_enable().
+
+If ->probe() calls pm_runtime_suspend() or pm_runtime_idle() or their
+asynchronous counterparts, they will fail returning -EAGAIN, because the
+device's usage counter is incremented by the core before executing ->probe().
+Still, it may be desirable to suspend the device as soon as ->probe() has
+finished, so the core uses pm_runtime_idle() to invoke the device bus type's
+->runtime_idle() callback at that time, but only if ->probe() is successful.
+
+If the device driver's or bus type's ->remove() callback executes
+pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts
+without preparation, they will fail returning -EAGAIN, because the device's
+usage counter is incremented by the core before executing ->remove().  However,
+if ->remove() wants to suspend the device, it can safely execute any of the
+pm_runtime_put*() helpers to decrement the device's usage counter, because the
+pm_runtime_put_noidle() called by the core after ->remove() has returned is
+guaranteed not to decrease the usage counter below zero.

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2009-08-16 16:09 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-08 14:25 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 14) Rafael J. Wysocki
2009-08-09 21:13 ` [PATCH update] PM: Introduce core framework for run-time PM of I/O devices (rev. 15) Rafael J. Wysocki
2009-08-09 21:13 ` Rafael J. Wysocki
2009-08-12 10:37   ` Magnus Damm
2009-08-12 15:47     ` Alan Stern
2009-08-12 15:47     ` Alan Stern
2009-08-12 20:13     ` Rafael J. Wysocki
2009-08-12 20:13     ` Rafael J. Wysocki
2009-08-12 10:37   ` Magnus Damm
2009-08-13  0:29   ` [RFC] PCI: Runtime power management Matthew Garrett
2009-08-13  0:29   ` Matthew Garrett
2009-08-13  0:35     ` [RFC] usb: Add support for runtime power management of the hcd Matthew Garrett
2009-08-13 12:16       ` Oliver Neukum
2009-08-13 12:16       ` [linux-pm] " Oliver Neukum
2009-08-13 12:30         ` Matthew Garrett
2009-08-13 12:30         ` [linux-pm] " Matthew Garrett
2009-08-13 14:26           ` Oliver Neukum
2009-08-13 14:26           ` [linux-pm] " Oliver Neukum
2009-08-13 21:42             ` Matthew Garrett
2009-08-13 21:42             ` Matthew Garrett
2009-08-13 15:22       ` Alan Stern
2009-08-13 15:22       ` Alan Stern
2009-08-13 21:47         ` Matthew Garrett
2009-08-13 21:47         ` Matthew Garrett
2009-08-13  0:35     ` Matthew Garrett
2009-08-13 15:17     ` [RFC] PCI: Runtime power management Alan Stern
2009-08-13 21:47       ` Matthew Garrett
2009-08-14 12:30         ` Matthew Garrett
2009-08-14 12:30         ` [linux-pm] " Matthew Garrett
2009-08-14 14:43           ` Alan Stern
2009-08-14 14:43           ` [linux-pm] " Alan Stern
2009-08-14 17:05             ` Rafael J. Wysocki
2009-08-14 17:05             ` [linux-pm] " Rafael J. Wysocki
2009-08-14 17:13               ` Rafael J. Wysocki
2009-08-14 17:13               ` [linux-pm] " Rafael J. Wysocki
2009-08-14 19:01                 ` Alan Stern
2009-08-14 19:01                 ` Alan Stern
2009-08-14 20:05             ` [linux-pm] " Rafael J. Wysocki
2009-08-14 22:21               ` Matthew Garrett
2009-08-15 14:18                 ` Rafael J. Wysocki
2009-08-15 14:18                 ` [linux-pm] " Rafael J. Wysocki
2009-08-15 15:53                   ` Alan Stern
2009-08-15 15:53                   ` [linux-pm] " Alan Stern
2009-08-15 20:54                     ` Rafael J. Wysocki
2009-08-15 20:54                     ` [linux-pm] " Rafael J. Wysocki
2009-08-15 20:58                       ` Matthew Garrett
2009-08-15 20:58                       ` [linux-pm] " Matthew Garrett
2009-08-15 21:21                         ` Rafael J. Wysocki
2009-08-15 21:21                           ` Rafael J. Wysocki
2009-08-15 21:27                           ` [linux-pm] " Matthew Garrett
2009-08-15 21:44                             ` Rafael J. Wysocki
2009-08-15 21:44                             ` [linux-pm] " Rafael J. Wysocki
2009-08-16 16:09                               ` Alan Stern
2009-08-16 16:09                               ` Alan Stern
2009-08-15 21:27                           ` Matthew Garrett
2009-08-16 15:57                           ` Alan Stern
2009-08-16 15:57                           ` [linux-pm] " Alan Stern
2009-08-16 16:04                             ` Matthew Garrett
2009-08-16 16:04                             ` [linux-pm] " Matthew Garrett
2009-08-16 15:50                       ` Alan Stern
2009-08-16 15:50                       ` [linux-pm] " Alan Stern
2009-08-14 22:21               ` Matthew Garrett
2009-08-14 20:05             ` Rafael J. Wysocki
2009-08-13 21:47       ` Matthew Garrett
2009-08-13 15:17     ` Alan Stern
2009-08-14 17:37     ` Jesse Barnes
2009-08-14 17:37     ` Jesse Barnes
2009-08-14 19:15       ` Rafael J. Wysocki
2009-08-14 19:15       ` Rafael J. Wysocki
2009-08-14 21:22     ` Rafael J. Wysocki
2009-08-14 22:30       ` Matthew Garrett
2009-08-14 22:30       ` Matthew Garrett
2009-08-15 14:41         ` Rafael J. Wysocki
2009-08-15 15:24           ` Rafael J. Wysocki
2009-08-15 15:24           ` Rafael J. Wysocki
2009-08-15 14:41         ` Rafael J. Wysocki
2009-08-14 21:22     ` Rafael J. Wysocki
2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
2009-08-13 21:03     ` Paul Mundt
2009-08-13 21:03     ` Paul Mundt
2009-08-13 21:14       ` Rafael J. Wysocki
2009-08-13 21:14       ` Rafael J. Wysocki
2009-08-14  9:08     ` Magnus Damm
2009-08-14 17:19       ` Rafael J. Wysocki
2009-08-14 17:19       ` Rafael J. Wysocki
2009-08-14  9:08     ` Magnus Damm
2009-08-14 17:25     ` [PATCH update 3x] PM: Introduce core framework for run-time PM of I/O devices (rev. 17) Rafael J. Wysocki
2009-08-14 17:25     ` Rafael J. Wysocki
2009-08-13 20:56   ` [PATCH update 2x] PM: Introduce core framework for run-time PM of I/O devices (rev. 16) Rafael J. Wysocki
2009-08-08 14:25 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 14) Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.