All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
@ 2009-06-22 23:21 Rafael J. Wysocki
  2009-06-23 17:00 ` Rafael J. Wysocki
                   ` (3 more replies)
  0 siblings, 4 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-22 23:21 UTC (permalink / raw)
  To: Alan Stern, linux-pm
  Cc: Oliver Neukum, Magnus Damm, ACPI Devel Maling List, Ingo Molnar,
	LKML, Greg KH, Arjan van de Ven

Hi,

Below is a new revision of the patch introducing the run-time PM framework.

The most visible changes from the last version:

* I realized that if child_count is atomic, we can drop the parent locking from
  all of the functions, so I did that.

* Introduced pm_runtime_put() that decrements the resume counter and queues
  up an idle notification if the counter went down to 0 (and wasn't 0 previously).
  Using asynchronous notification makes it possible to call pm_runtime_put()
  from interrupt context, if necessary.

* Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
  disabling run-time PM for a device along with the resume counter).

Please let me know if I've overlooked anything. :-)

Best,
Rafael


---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 3)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  434 ++++++++++++++++++++++
 drivers/base/dd.c                  |    9 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   16 
 drivers/base/power/power.h         |   11 
 drivers/base/power/runtime.c       |  711 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |   96 ++++
 include/linux/pm_runtime.h         |  141 +++++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1440 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,76 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+#define RPM_ERROR	0x1F
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		suspend_aborted:1;
+	unsigned int		notify_running:1;
+	unsigned int		runtime_status:5;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,711 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * __pm_get_child - Increment the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_get_child(struct device *dev)
+{
+	atomic_inc(&dev->power.child_count);
+}
+
+/**
+ * __pm_put_child - Decrement the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_put_child(struct device *dev)
+{
+	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+}
+
+/**
+ * pm_runtime_notify_idle - Run a device bus type's runtime_idle() callback.
+ * @dev: Device to notify.
+ */
+static void pm_runtime_notify_idle(struct device *dev)
+{
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+}
+
+/**
+ * pm_runtime_notify_work - Run pm_runtime_notify_idle() for a device.
+ *
+ * Use @work to get the device object to run the notification for and execute
+ * pm_runtime_notify_idle().
+ */
+static void pm_runtime_notify_work(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	dev->power.runtime_status &= ~RPM_WAKE;
+	dev->power.notify_running = true;
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	pm_runtime_notify_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	dev->power.notify_running = false;
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_runtime_put - Decrement the resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the resume counter of the device, check if it is possible to
+ * suspend it and notify its bus type in that case.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+		goto out;
+	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/* Do not schedule a notification if one is already in progress. */
+	if ((dev->power.runtime_status & RPM_WAKE) || dev->power.notify_running)
+		goto out;
+
+	/*
+	 * The notification is asynchronous, so that this function can be called
+	 * from interrupt context.  Set the run-time PM status to RPM_WAKE to
+	 * prevent resume_work from being reused for a resume request and to let
+	 * pm_runtime_close() know it has a request to cancel.  It also prevents
+	 * suspends from running or being scheduled until the work function is
+	 * executed.
+	 */
+	dev->power.runtime_status = RPM_WAKE;
+	INIT_WORK(&dev->power.resume_work, pm_runtime_notify_work);
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * pm_runtime_idle - Check if device can be suspended and notify its bus type.
+ * @dev: Device to check.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	if (pm_suspend_possible(dev))
+		pm_runtime_notify_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the status of the device is appropriate and run the
+ * ->runtime_suspend() callback provided by the device's bus type driver.
+ * Update the run-time PM flags in the device object to reflect the current
+ * status of the device.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
+		error = 0;
+		goto out;
+	} else if (atomic_read(&dev->power.resume_count) > 0
+	    || (dev->power.runtime_status & (RPM_WAKE | RPM_RESUMING))
+	    || (!sync && (dev->power.runtime_status & RPM_IDLE)
+	    && dev->power.suspend_aborted)) {
+		/*
+		 * We're forbidden to suspend the device (eg. it may be
+		 * resuming) or a pending suspend request has just been
+		 * cancelled and we're running as a result of that request.
+		 */
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Another suspend is running in parallel with us.  Wait for it
+		 * to complete and return.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_error;
+	} else if (sync && dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.suspend_aborted) {
+		/*
+		 * Suspend request is pending, but we're not running as a result
+		 * of that request, so cancel it.  Since we're not clearing the
+		 * RPM_IDLE bit now, no new suspend requests will be queued up
+		 * while the pending one is waited for to finish.
+		 */
+		dev->power.suspend_aborted = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.suspend_aborted)
+			goto repeat;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = -EBUSY;
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		/*
+		 * Resume request might have been queued up in the meantime, in
+		 * which case the RPM_WAKE bit is also set in runtime_status.
+		 */
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent)
+		parent = dev->parent;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent) {
+		__pm_put_child(parent);
+
+		if (!atomic_read(&parent->power.resume_count)
+		    && !atomic_read(&parent->power.child_count)
+		    && !parent->power.ignore_children)
+			pm_runtime_notify_idle(parent);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for and
+ * run pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+void pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+
+	if (atomic_read(&dev->power.resume_count) > 0)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.  If
+ * runtime suspend is in progress while this function is being run, wait for it
+ * to finish before resuming the device.  If runtime suspend is scheduled, but
+ * it hasn't started yet, cancel it and we're done.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	unsigned int status;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	/*
+	 * This makes concurrent __pm_runtime_suspend() and pm_request_suspend()
+	 * started after us, or restarted, return immediately, so only the ones
+	 * started before us can execute ->runtime_suspend().
+	 */
+	__pm_runtime_get(dev);
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & ~RPM_WAKE) {
+		/*
+		 * If RPM_WAKE is the only bit set in runtime_status, an idle
+		 * notification is scheduled for the device which is active.
+		 */
+		error = 0;
+		goto out;
+	} else if (dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.suspend_aborted) {
+		/* Suspend request is pending, so cancel it. */
+		dev->power.suspend_aborted = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.suspend_aborted)
+			goto repeat_locked;
+
+		/*
+		 * Suspend request has been cancelled and there's nothing more
+		 * to do.  Clear the RPM_IDLE bit and return.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = 0;
+		goto out;
+	}
+
+	if (sync && (dev->power.runtime_status & RPM_WAKE)) {
+		/* Resume request is pending, so let it run. */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.resume_work);
+
+		goto repeat;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		goto repeat;
+	} else if (!put_parent && parent
+	    && dev->power.runtime_status == RPM_SUSPENDED) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		error = pm_runtime_resume(parent);
+		if (error)
+			return error;
+
+		put_parent = true;
+		error = -EINVAL;
+		goto repeat;
+	}
+
+	status = dev->power.runtime_status;
+	if (status == RPM_RESUMING)
+		goto unlock;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+ unlock:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	/*
+	 * We can decrement the parent's resume counter right now, because it
+	 * can't be suspended anyway after the __pm_get_child() above.
+	 */
+	if (put_parent) {
+		__pm_runtime_put(parent);
+		put_parent = false;
+	}
+
+	if (status == RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+		wait_for_completion(&dev->power.work_done);
+
+		error = dev->power.runtime_error;
+		goto out_put;
+	}
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (put_parent)
+		__pm_runtime_put(parent);
+
+ out_put:
+	/*
+	 * If we're running from pm_wq, the resume counter has been incremented
+	 * by pm_request_resume() too, so decrement it.
+	 */
+	if (error || !sync)
+		__pm_runtime_put(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_resume_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_resume_work(struct work_struct *work)
+{
+	__pm_runtime_resume(resume_work_to_device(work), false);
+}
+
+/**
+ * pm_cancel_suspend_work - Cancel a pending suspend request.
+ *
+ * Use @work to get the device object the work item has been scheduled for and
+ * cancel a pending suspend request for it.
+ */
+static void pm_cancel_suspend_work(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* Return if someone else has already dealt with the suspend request. */
+	if (dev->power.runtime_status != (RPM_IDLE | RPM_WAKE)
+	    || !dev->power.suspend_aborted)
+		goto out;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	cancel_delayed_work_sync(&dev->power.suspend_work);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* Clear the status if someone else hasn't done it already. */
+	if (dev->power.runtime_status == (RPM_IDLE | RPM_WAKE)
+	    && dev->power.suspend_aborted)
+		dev->power.runtime_status = RPM_ACTIVE;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * __pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ * @get: If set, always increment the device's resume counter.
+ *
+ * Schedule run-time resume of given device and increment its resume counter.
+ * If @get is set, the counter is incremented even if error code is going to be
+ * returned, and if it's unset, the counter is only incremented if resume
+ * request has been queued up (0 is returned in such a case).
+ */
+int __pm_request_resume(struct device *dev, bool get)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (get)
+		__pm_runtime_get(dev);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (dev->power.runtime_status & ~RPM_WAKE)
+		error = -EBUSY;
+	else if (dev->power.runtime_status & (RPM_WAKE | RPM_RESUMING))
+		error = -EINPROGRESS;
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		error = -EBUSY;
+
+		if (dev->power.suspend_aborted)
+			goto out;
+
+		/* Suspend request is pending.  Queue a request to cancel it. */
+		dev->power.suspend_aborted = true;
+		INIT_WORK(&dev->power.resume_work, pm_cancel_suspend_work);
+		goto queue;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	INIT_WORK(&dev->power.resume_work, pm_runtime_resume_work);
+	if (!get)
+		__pm_runtime_get(dev);
+
+ queue:
+	/*
+	 * The device may be suspending at the moment or there may be a resume
+	 * request pending for it and we can't clear the RPM_SUSPENDING and
+	 * RPM_IDLE bits in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		__pm_put_child(parent);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (dev->parent)
+			__pm_put_child(dev->parent);
+	}
+	__pm_runtime_put(dev);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	do {
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		pm_runtime_resume(dev);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+	} while (!__pm_runtime_put(dev));
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		dev->power.runtime_status = RPM_WAKE;
+		if (dev->parent)
+			__pm_get_child(dev->parent);
+	}
+	__pm_runtime_get(dev);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+	/*
+	 * Make any attempts to suspend the device or resume it, or to put a
+	 * request for it into pm_wq terminate immediately.
+	 */
+	dev->power.runtime_status = RPM_WAKE;
+	atomic_set(&dev->power.resume_count, 1);
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	dev->power.notify_running = false;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+
+	if (dev->parent)
+		__pm_get_child(dev->parent);
+}
+
+/**
+ * pm_runtime_remove - Prepare for the removal of a device object.
+ * @dev: Device object being removed.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	unsigned long flags;
+	unsigned int status;
+
+	/* This makes __pm_runtime_suspend() return immediately. */
+	__pm_runtime_get(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* Cancel any pending requests. */
+	if ((dev->power.runtime_status & RPM_WAKE)
+	    || dev->power.notify_running) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.resume_work);
+	} else if (dev->power.runtime_status == RPM_IDLE) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+	}
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	while (dev->power.runtime_status & (RPM_SUSPENDING | RPM_RESUMING)) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+	status = dev->power.runtime_status;
+
+	/* This makes the run-time PM functions above return immediately. */
+	dev->power.runtime_status = RPM_WAKE;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (status != RPM_SUSPENDED && dev->parent)
+		__pm_put_child(dev->parent);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,141 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern void pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int __pm_request_resume(struct device *dev, bool get);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(dw, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void __pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !(dev->power.runtime_status & RPM_WAKE);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline void pm_runtime_put_notify(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline void pm_request_suspend(struct device *dev, unsigned int msec) {}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int __pm_request_resume(struct device *dev, bool get)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void __pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline int pm_request_resume(struct device *dev)
+{
+	return __pm_request_resume(dev, false);
+}
+
+static inline int pm_request_resume_get(struct device *dev)
+{
+	return __pm_request_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -320,6 +327,8 @@ static void __device_release_driver(stru
 		devres_release_all(dev);
 		dev->driver = NULL;
 		klist_remove(&dev->p->knode_driver);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,434 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions.  pm_wq is declared
+  in include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The device run-time PM fields of 'struct dev_pm_info', the helper functions
+using them and the run-time PM callbacks present in 'struct dev_pm_ops' are
+described below.
+
+2. Run-time PM Helper Functions and Device Fields
+
+The following helper functions are defined in drivers/base/power/runtime.c
+and include/linux/pm_runtime.h:
+
+* void pm_runtime_init(struct device *dev);
+* void pm_runtime_close(struct device *dev);
+
+* void pm_runtime_put(struct device *dev);
+* void pm_runtime_idle(struct device *dev);
+* int pm_runtime_suspend(struct device *dev);
+* void pm_request_suspend(struct device *dev, unsigned int msec);
+* int pm_runtime_resume(struct device *dev);
+* void pm_request_resume(struct device *dev);
+
+* bool pm_suspend_possible(struct device *dev);
+
+* void pm_runtime_enable(struct device *dev);
+* void pm_runtime_disable(struct device *dev);
+
+* void pm_suspend_ignore_children(struct device *dev, bool enable);
+
+* void pm_runtime_clear_active(struct device *dev) {}
+* void pm_runtime_clear_suspended(struct device *dev) {}
+
+pm_runtime_init() initializes the run-time PM fields in the 'power' member of
+a device object.  It is called during the initialization of the device object,
+in drivers/base/core.c:device_initialize().
+
+pm_runtime_add() updates the run-time PM fields in the 'power' member of a
+device object while the device is being added to the device hierarchy.  It is
+called from drivers/base/power/main.c:device_pm_add().
+
+pm_runtime_remove() disables the run-time PM of a device and updates the 'power'
+member of its parent's device object to take the removal of the device into
+account.  It cancels all of the run-time PM requests pending and waits for all
+of the run-time PM operations to complete.  It is called from
+drivers/base/power/main.c:device_pm_remove().
+
+pm_runtime_suspend(), pm_request_suspend(), pm_runtime_resume(),
+pm_request_resume(), and pm_request_resume_get() use the 'power.runtime_status',
+'power.resume_count', 'power.suspend_aborted', and 'power.child_count' fields of
+'struct device' for mutual cooperation.  In what follows the
+'power.runtime_status', 'power.resume_count', and 'power.child_count' fields are
+referred to as the device's run-time PM status, the device's resume counter, and
+the counter of unsuspended children of the device, respectively.  They are set
+to RPM_WAKE, 1 and 0, respectively, by pm_runtime_init().
+
+pm_runtime_put() decrements the device's resume counter unless it's already 0.
+If the counter was not zero before the decrementation, the function checks if
+the device can be suspended using pm_suspend_possible() and if that returns
+'true', it sets the RPM_WAKE bit in the device's run-time PM status field and
+queues up a request to execute the ->runtime_idle() callback provided by the
+device's bus type.  The work function of this request clears the RPM_WAKE bit
+before executing the bus type's ->runtime_idle() callback.  It is valid to call
+pm_runtime_put() from interrupt context.
+
+It is anticipated that pm_runtime_put() will be called after
+pm_runtime_resume(), pm_request_resume() or pm_request_resume_get(), when all of
+the I/O operations involving the device have been completed, in order to
+decrement the device's resume counter that was previously incremented by one of
+these functions.  Moreover, unbalanced calls to pm_runtime_put() are invalid, so
+drivers should ensure that pm_runtime_put() be only run after a function that
+increments the device's resume counter.
+
+pm_runtime_idle() uses pm_suspend_possible() to check if it is possible to
+suspend a device and if so, it executes the ->runtime_idle() callback provided
+by the device's bus type.
+
+pm_runtime_suspend() is used to carry out a run-time suspend of an active
+device.  It is called directly by a bus type or device driver, but internally
+it calls __pm_runtime_suspend() that is also used for asynchronous suspending of
+devices (i.e. to complete requests queued up by pm_request_suspend()) and works
+as follows.
+
+  * If the device is suspended (i.e. the RPM_SUSPENDED bit is set in the
+    device's run-time PM status field, 'power.runtime_status'), success is
+    returned.
+
+  * If the device's resume counter is greater than 0 or the device is resuming,
+    or it has a resume request pending (i.e. at least one of the RPM_WAKE and
+    RPM_RESUMING bits are set in the device's run-time PM status field), or the
+    function has been called via pm_wq as a result of a cancelled suspend
+    request (the RPM_IDLE bit is set in the device's run-time PM status field
+    and its 'power.suspend_aborted' flag is set), -EAGAIN is returned.
+
+  * If the device is suspending (i.e. the RPM_SUSPENDING bit is set in its
+    run-time PM status field), which means that another instance of
+    __pm_runtime_suspend() is running at the same time for the same device, the
+    function waits for the other instance to complete and returns the result
+    returned by it.
+
+  * If the device has a pending suspend request (i.e. the device's run-time PM
+    status is RPM_IDLE) and the function hasn't been called as a result of that
+    request, it cancels the request (synchronously).  Next, if a concurrent
+    thread changed the device's run-time PM status while the request was being
+    waited for to cancel, the function is restarted.
+
+  * If the children of the device are not suspended and the
+    'power.ignore_children' flag is not set for it, the device's run-time PM
+    status is set to RPM_ACTIVE and -EAGAIN is returned.
+
+If none of the above takes place, or a pending suspend request has been
+successfully cancelled, the device's run-time PM status is set to RPM_SUSPENDING
+and its bus type's ->runtime_suspend() callback is executed.  This callback is
+entirely responsible for handling the device as appropriate (for example, it may
+choose to execute the device driver's ->runtime_suspend() callback or to carry
+out any other suitable action depending on the bus type).
+
+  * If it completes successfully, the RPM_SUSPENDING bit is cleared and the
+    RPM_SUSPENDED bit is set in the device's run-time PM status field.  Once
+    that has happened, the device is regarded by the PM core as suspended, but
+    it _need_ _not_ mean that the device has been put into a low power state.
+    What really occurs to the device at this point entirely depends on its bus
+    type (it may depend on the device's driver if the bus type chooses to call
+    it).  Additionally, if the device bus type's ->runtime_suspend() callback
+    completes successfully and there's no resume request pending for the device
+    (i.e. the RPM_WAKE flag is not set in its run-time PM status field), and the
+    device has a parent, the parent's counter of unsuspended children (i.e. the
+    'power.child_count' field) is decremented.  If that counter turns out to be
+    equal to zero (i.e. the device was the last unsuspended child of its parent)
+    and the parent's 'power.ignore_children' flag is unset, and the parent's
+    resume counter is equal to 0, its bus type's ->runtime_idle() callback is
+    executed for it.
+
+  * If either -EBUSY or -EAGAIN is returned, the device's run-time PM status is
+    set to RPM_ACTIVE.
+
+  * If another error code is returned, the device's run-time PM status is set to
+    RPM_ERROR, which makes the PM core refuse to carry out any run-time PM
+    operations for it until the status is cleared by its bus type or driver with
+    the help of pm_runtime_clear_active() or pm_runtime_clear_suspended().
+
+Finally, pm_runtime_suspend() returns the result returned by the device bus
+type's ->runtime_suspend() callback.  If the device's bus type doesn't implement
+->runtime_suspend(), -EINVAL is returned and the device's run-time PM status is
+set to RPM_ERROR.
+
+pm_request_suspend() is used to queue up a suspend request for an active device.
+If the run-time PM status of the device (i.e. the value of the
+'power.runtime_status' field in 'struct device') is different from RPM_ACTIVE
+or its resume counter is greater than 0 (i.e. the device is not active from the
+PM core standpoint), the function returns immediately.  Otherwise, it changes
+the device's run-time PM status to RPM_IDLE and puts a request to suspend the
+device into pm_wq.  The 'msec' argument is used to specify the time to wait
+before the request will be completed, in milliseconds.  It is valid to call this
+function from interrupt context.
+
+pm_runtime_resume() is used to increment the resume counter of a device and, if
+necessary, to wake the device up (that happens if the device is suspended,
+suspending or has a suspend request pending).  It is called directly by a bus
+type or device driver, but internally it calls __pm_runtime_resume() that is
+also used for asynchronous resuming of devices (i.e. to complete requests queued
+up by pm_request_resume()).
+
+__pm_runtime_resume() first increments the device's resume counter to prevent
+new suspend requests from being queued up and to make subsequent attempts to
+suspend the device fail.  The device's resume counter will be decremented on
+return if error code is about to be returned or the function has been called via
+pm_wq.  After incrementing the device's run-time PM counter the function
+proceeds as follows.
+
+  * If the device is active (i.e. all of the bits in its run-time PM status are
+    unset, possibly except for RPM_WAKE, which means that an idle notification
+    is pending for it), success is returned.
+
+  * If there's a suspend request pending for the device (i.e. the RPM_IDLE bit
+    is set in the device's run-time PM status field), the
+    'power.suspend_aborted' flag is set for the device and the request is
+    cancelled (synchronously).  Then, the function restarts itself if the
+    device's RPM_IDLE bit was cleared or the 'power.suspend_aborted' flag was
+    unset in the meantime by a concurrent thread.  Otherwise, the device's
+    run-time PM status is cleared to RPM_ACTIVE and the function returns
+    success.
+
+  * If the device has a pending resume request (i.e. the RPM_WAKE bit is set in
+    its run-time PM status field), but the function hasn't been called as a
+    result of that request, the request is waited for to complete and the
+    function restarts itself.
+
+  * If the device is suspending (i.e. the RPM_SUSPENDING bit is set in its
+    run-time PM status field), the function waits for the suspend operation to
+    complete and restarts itself.
+
+  * If the device is suspended and doesn't have a pending resume request (i.e.
+    its run-time PM status is RPM_SUSPENDED), and it has a parent,
+    pm_runtime_resume() is called (recursively) for the parent.  If the parent's
+    resume is successful, the function notes that the parent's resume counter
+    will have to be decremented and restarts itself.  Otherwise, it returns the
+    error code returned by the instance of pm_runtime_resume_get() handling the
+    parent.
+
+  * If the device is resuming (i.e. the device's run-time PM status is
+    RPM_RESUMING), which means that another instance of __pm_runtime_resume() is
+    running at the same time for the same device, the function waits for the
+    other instance to complete and returns the result returned by it.
+
+If none of the above happens, the function checks if the device's run-time PM
+status is RPM_SUSPENDED, which means that the device doesn't have a resume
+request pending, and if it has a parent.  If that is the case, the parent's
+counter of unsuspended children is incremented.  Next, the device's run-time PM
+status is set to RPM_RESUMING and its bus type's ->runtime_resume() callback is
+executed.  This callback is entirely responsible for handling the device as
+appropriate (for example, it may choose to execute the device driver's
+->runtime_resume() callback or to carry out any other suitable action depending
+on the bus type).
+
+  * If it completes successfully, the device's run-time PM status is set to
+    RPM_ACTIVE, which means that the device is fully operational.  Thus, the
+    device bus type's ->runtime_resume() callback, when it is about to return
+    success, _must_ _ensure_ that this really is the case (i.e. when it returns
+    success, the device _must_ be able to carry out I/O operations as needed).
+
+  * If an error code is returned, the device's run-time PM status is set to
+    RPM_ERROR, which makes the PM core refuse to carry out any run-time PM
+    operations for the device until the status is cleared by its bus type or
+    driver with the help of either pm_runtime_clear_active(), or
+    pm_runtime_clear_suspended().  Thus, it is strongly recommended that bus
+    types' ->runtime_resume() callbacks only return error codes in fatal error
+    conditions, when it is impossible to bring the device back to the
+    operational state by any available means.  Inability to wake up a suspended
+    device usually means a service loss and it may very well result in a data
+    loss to the user, so it _must_ be regarded as a severe problem and avoided
+    if at all possible.
+
+Finally, __pm_runtime_resume() returns the result returned by the device bus
+type's ->runtime_resume() callback.  If the device's bus type doesn't implement
+->runtime_resume(), -EINVAL is returned and the device's run-time PM status is
+set to RPM_ERROR.  If __pm_runtime_resume() returns success and it hasn't been
+called via pm_wq, it leaves the device's resume counter incremented, so the
+counter has to be decremented, with the help of pm_runtime_put(), so that it's
+possible to suspend the device.  If __pm_runtime_resume() has been called via
+pm_wq, as a result of a resume request queued up by pm_request_resume(), the
+device's resume counter is left incremented regardless of whether or not the
+attempt to wake up the device has been successful.
+
+pm_request_resume_get() is used to increment the resume counter of a device
+and, if necessary, to queue up a resume request for the device (this happens if
+the device is suspended, suspending or has a suspend request pending).
+pm_request_resume() is used the to queue up a resume request for the device
+and it increments the device's resume counter if the request has been queued up
+successfully.  Internally both of them call __pm_request_resume() that first
+increments the device's resume counter in the pm_request_resume_get() case and
+then proceeds as follows.
+
+* If the run-time PM status of the device is RPM_ACTIVE or the only bit set in
+  it is RPM_WAKE (i.e. the idle notification has been queued up for the device
+  by pm_runtime_put()), -EBUSY is returned.
+
+* If the device is resuming or has a resume request pending (i.e. at least one
+  of the RPM_WAKE and RPM_RESUMING bits is set in the device's run-time PM
+  status field, but RPM_WAKE is not the only bit set), -EINPROGRESS is returned.
+
+* If the device's run-time status is RPM_IDLE (i.e. a suspend request is pending
+  for it) and the 'power.suspend_aborted' flag is set (i.e. the pending request
+  is being cancelled), -EBUSY is returned.
+
+* If the device's run-time status is RPM_IDLE (i.e. a suspend request is pending
+  for it) and the 'power.suspend_aborted' flag is not set, the device's
+  'power.suspend_aborted' flag is set, a request to cancel the pending
+  suspend request is queued up and -EBUSY is returned.
+
+If none of the above happens, the function checks if the device's run-time PM
+status is RPM_SUSPENDED and if it has a parent, in which case the parent's
+counter of unsuspended children is incremented.  Next, the RPM_WAKE bit is set
+in the device's run-time PM status field and the request to execute
+__pm_runtime_resume() is put into pm_wq (the device's resume counter is then
+incremented in the pm_request_resume() case).  Finally, the function returns 0,
+which means that the resume request has been successfully queued up.
+
+pm_request_resume_get() leaves the device's resume counter incremented even if
+an error code is returned.  Thus, after pm_request_resume_get() has returned, it
+is necessary to decrement the device's resume counter, with the help of
+pm_runtime_put(), before it's possible to suspend the device again.
+
+It is valid to call pm_request_resume() and pm_request_resume_get() from
+interrupt context.
+
+Note that it usually is _not_ safe to access the device for I/O purposes
+immediately after pm_request_resume() has returned, unless the returned result
+is -EBUSY, which means that it wasn't necessary to resume the device.
+
+Note also that only one suspend request or one resume request may be queued up
+at any given moment.  Moreover, a resume request cannot be queued up along with
+a suspend request.  Still, if it's necessary to queue up a request to cancel a
+pending suspend request, these two requests will be present in pm_wq at the
+same time.  In that case, regardless of which request is attempted to complete
+first, the device's run-time PM status will be set to RPM_ACTIVE as a final
+result.
+
+pm_suspend_possible() is used to check if the device may be suspended at this
+particular moment.  It checks the device's resume counter, the counter of
+unsuspended children, and the run-time PM status.  It returns 'false' if any of
+the counters is greater than 0 or the RPM_WAKE bit is set in the device's
+run-time PM status field.  Otherwise, 'true' is returned.
+
+pm_runtime_enable() and pm_runtime_disable() are used to enable and disable,
+respectively, all of the run-time PM core operations.  For this purpose
+pm_runtime_disable() calls pm_runtime_resume() to put the device into the
+active state, sets the RPM_WAKE bit in the device's run-time PM status field
+and increments the device's resume counter.  In turn, pm_runtime_enable() resets
+the RPM_WAKE bit and decrements the device's resume counter.  Therefore, if
+pm_runtime_disable() is called several times in a row for the same device, it
+has to be balanced by the appropriate number of pm_runtime_enable() calls so
+that the other run-time PM core functions work for that device.  The initial
+values of the device's resume counter and run-time PM status, as set by
+pm_runtime_init(), are 1 and RPM_WAKE, respectively (i.e. the device's run-time
+PM is initially disabled).
+
+pm_runtime_disable() and pm_runtime_enable() are used by the device core to
+disable the run-time power management of devices temporarily during device probe
+and removal as well as during system-wide power transitions (i.e. system-wide
+suspend or hibernation, or resume from a system sleep state).
+
+pm_suspend_ignore_children() is used to set or unset the
+'power.ignore_children' flag in 'struct device'.  If the 'enabled'
+argument is 'true', the field is set to 1, and if 'enable' is 'false', the field
+is set to 0.  The default value of 'power.ignore_children', as set by
+pm_runtime_init(), is 0.
+
+pm_runtime_clear_active() is used to change the device's run-time PM status
+field from RPM_ERROR to RPM_ACTIVE.  It is valid to call this function from
+interrupt context.
+
+pm_runtime_clear_suspended() is used to change the device's run-time PM status
+field from RPM_ERROR to RPM_SUSPENDED.  If the device has a parent, the function
+additionally decrements the parent's counter of unsuspended children, although
+the parent's bus type is not notified if the counter becomes 0.  It is valid to
+call this function from interrupt context.
+
+3. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by pm_runtime_suspend() for the bus
+type of the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's ->runtime_suspend() callback (from the PM
+core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has returned successfully,
+    the PM core regards the device as suspended, which need not mean that the
+    device has been put into a low power state.  It is supposed to mean,
+    however, that the device will not communicate with the CPU(s) and RAM until
+    the bus type's ->runtime_resume() callback is executed for it.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is set to RPM_ACTIVE, which means that the
+    device _must_ be fully operational one this has happened.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as an
+    unrecoverable error and will refuse to run the helper functions described in
+    Section 1 until the status is changed with the help of either
+    pm_runtime_clear_active(), or pm_runtime_clear_suspended() by the device's
+    bus type or driver.
+
+In particular, it is recommended that ->runtime_suspend() return -EBUSY or
+-EAGAIN if device_may_wakeup() returns 'false' for the device.  On the other
+hand, if device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of ->runtime_suspend(), it is
+expected that remote wake-up (i.e. hardware mechanism allowing the device to
+request a change of its power state, such as PCI PME) will be enabled for the
+device.  Generally, remote wake-up should be enabled whenever the device is put
+into a low power state at run time and is expected to receive input from the
+outside of the system.
+
+The ->runtime_resume() callback is executed by pm_runtime_resume() for the bus
+type of the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's ->runtime_resume() callback (from the PM
+core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has returned successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as an unrecoverable error and will refuse to run the
+    helper functions described in Section 1 until the status is changed with the
+    help of either pm_runtime_clear_active(), or pm_runtime_clear_suspended() by
+    the device's bus type or driver.
+
+The ->runtime_idle() callback is executed by pm_runtime_suspend() for the bus
+type of a device the children of which are all suspended (and which has the
+'power.ignore_children' flag unset).  It also is executed if a device's resume
+counter is decremented with the help of pm_runtime_put() and it becomes 0.  The
+action carried out by this callback is totally dependent on the bus type in
+question, but the expected and recommended action is to check if the device can
+be suspended (i.e. if all of the conditions necessary for suspending the device
+are met) and to queue up a suspend request for the device if that is the case.
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
  2009-06-22 23:21 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3) Rafael J. Wysocki
  2009-06-23 17:00 ` Rafael J. Wysocki
@ 2009-06-23 17:00 ` Rafael J. Wysocki
  2009-06-23 17:10 ` Alan Stern
  2009-06-23 17:10 ` Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-23 17:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-pm, Oliver Neukum, Magnus Damm, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Tuesday 23 June 2009, Rafael J. Wysocki wrote:
> Hi,
> 
> Below is a new revision of the patch introducing the run-time PM framework.
> 
> The most visible changes from the last version:
> 
> * I realized that if child_count is atomic, we can drop the parent locking from
>   all of the functions, so I did that.
> 
> * Introduced pm_runtime_put() that decrements the resume counter and queues
>   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
>   Using asynchronous notification makes it possible to call pm_runtime_put()
>   from interrupt context, if necessary.
> 
> * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
>   disabling run-time PM for a device along with the resume counter).
> 
> Please let me know if I've overlooked anything. :-)

Well, I found quite a few problems myself, mostly related to disabling-enabling
of the run-time PM and to RPM_WAKE.

Updated patch will be sent out later today.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
  2009-06-22 23:21 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3) Rafael J. Wysocki
@ 2009-06-23 17:00 ` Rafael J. Wysocki
  2009-06-23 17:00 ` Rafael J. Wysocki
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-23 17:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, linux-pm, Ingo Molnar,
	Arjan van de Ven

On Tuesday 23 June 2009, Rafael J. Wysocki wrote:
> Hi,
> 
> Below is a new revision of the patch introducing the run-time PM framework.
> 
> The most visible changes from the last version:
> 
> * I realized that if child_count is atomic, we can drop the parent locking from
>   all of the functions, so I did that.
> 
> * Introduced pm_runtime_put() that decrements the resume counter and queues
>   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
>   Using asynchronous notification makes it possible to call pm_runtime_put()
>   from interrupt context, if necessary.
> 
> * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
>   disabling run-time PM for a device along with the resume counter).
> 
> Please let me know if I've overlooked anything. :-)

Well, I found quite a few problems myself, mostly related to disabling-enabling
of the run-time PM and to RPM_WAKE.

Updated patch will be sent out later today.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
  2009-06-22 23:21 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3) Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2009-06-23 17:10 ` Alan Stern
@ 2009-06-23 17:10 ` Alan Stern
  2009-06-24  0:08   ` Rafael J. Wysocki
  2009-06-24  0:08   ` Rafael J. Wysocki
  3 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-23 17:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Oliver Neukum, Magnus Damm, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:

> Hi,
> 
> Below is a new revision of the patch introducing the run-time PM framework.
> 
> The most visible changes from the last version:
> 
> * I realized that if child_count is atomic, we can drop the parent locking from
>   all of the functions, so I did that.
> 
> * Introduced pm_runtime_put() that decrements the resume counter and queues
>   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
>   Using asynchronous notification makes it possible to call pm_runtime_put()
>   from interrupt context, if necessary.
> 
> * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
>   disabling run-time PM for a device along with the resume counter).
> 
> Please let me know if I've overlooked anything. :-)

This first thing to strike me was that you moved the idle notifications 
into the workqueue.

Is that really needed?  Would we be better off just make the idle
callbacks directly from pm_runtime_put?  They would run in whatever
context the driver happened to be in at the time.

It's not clear exactly how much work the idle callbacks will need to 
do, but it seems likely that they won't have to do too much more than 
call pm_request_suspend.  And of course, that can be done in_interrupt.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
  2009-06-22 23:21 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3) Rafael J. Wysocki
  2009-06-23 17:00 ` Rafael J. Wysocki
  2009-06-23 17:00 ` Rafael J. Wysocki
@ 2009-06-23 17:10 ` Alan Stern
  2009-06-23 17:10 ` Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-23 17:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, linux-pm, Ingo Molnar,
	Arjan van de Ven

On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:

> Hi,
> 
> Below is a new revision of the patch introducing the run-time PM framework.
> 
> The most visible changes from the last version:
> 
> * I realized that if child_count is atomic, we can drop the parent locking from
>   all of the functions, so I did that.
> 
> * Introduced pm_runtime_put() that decrements the resume counter and queues
>   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
>   Using asynchronous notification makes it possible to call pm_runtime_put()
>   from interrupt context, if necessary.
> 
> * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
>   disabling run-time PM for a device along with the resume counter).
> 
> Please let me know if I've overlooked anything. :-)

This first thing to strike me was that you moved the idle notifications 
into the workqueue.

Is that really needed?  Would we be better off just make the idle
callbacks directly from pm_runtime_put?  They would run in whatever
context the driver happened to be in at the time.

It's not clear exactly how much work the idle callbacks will need to 
do, but it seems likely that they won't have to do too much more than 
call pm_request_suspend.  And of course, that can be done in_interrupt.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
  2009-06-23 17:10 ` Alan Stern
  2009-06-24  0:08   ` Rafael J. Wysocki
@ 2009-06-24  0:08   ` Rafael J. Wysocki
  2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
  2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-24  0:08 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-pm, Oliver Neukum, Magnus Damm, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Tuesday 23 June 2009, Alan Stern wrote:
> On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Hi,
> > 
> > Below is a new revision of the patch introducing the run-time PM framework.
> > 
> > The most visible changes from the last version:
> > 
> > * I realized that if child_count is atomic, we can drop the parent locking from
> >   all of the functions, so I did that.
> > 
> > * Introduced pm_runtime_put() that decrements the resume counter and queues
> >   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
> >   Using asynchronous notification makes it possible to call pm_runtime_put()
> >   from interrupt context, if necessary.
> > 
> > * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
> >   disabling run-time PM for a device along with the resume counter).
> > 
> > Please let me know if I've overlooked anything. :-)
> 
> This first thing to strike me was that you moved the idle notifications 
> into the workqueue.

Yes, I did.
 
> Is that really needed?  Would we be better off just make the idle
> callbacks directly from pm_runtime_put?  They would run in whatever
> context the driver happened to be in at the time.
> 
> It's not clear exactly how much work the idle callbacks will need to 
> do, but it seems likely that they won't have to do too much more than 
> call pm_request_suspend.  And of course, that can be done in_interrupt.

I just don't want to put any constraints on the implementation of
->runtime_idle().  The requirement that it be suitable for calling from
interrupt context may be quite inconvenient for some drivers and I'm afraid
they may have problems with meeting it.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
  2009-06-23 17:10 ` Alan Stern
@ 2009-06-24  0:08   ` Rafael J. Wysocki
  2009-06-24  0:08   ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-24  0:08 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, linux-pm, Ingo Molnar,
	Arjan van de Ven

On Tuesday 23 June 2009, Alan Stern wrote:
> On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Hi,
> > 
> > Below is a new revision of the patch introducing the run-time PM framework.
> > 
> > The most visible changes from the last version:
> > 
> > * I realized that if child_count is atomic, we can drop the parent locking from
> >   all of the functions, so I did that.
> > 
> > * Introduced pm_runtime_put() that decrements the resume counter and queues
> >   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
> >   Using asynchronous notification makes it possible to call pm_runtime_put()
> >   from interrupt context, if necessary.
> > 
> > * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
> >   disabling run-time PM for a device along with the resume counter).
> > 
> > Please let me know if I've overlooked anything. :-)
> 
> This first thing to strike me was that you moved the idle notifications 
> into the workqueue.

Yes, I did.
 
> Is that really needed?  Would we be better off just make the idle
> callbacks directly from pm_runtime_put?  They would run in whatever
> context the driver happened to be in at the time.
> 
> It's not clear exactly how much work the idle callbacks will need to 
> do, but it seems likely that they won't have to do too much more than 
> call pm_request_suspend.  And of course, that can be done in_interrupt.

I just don't want to put any constraints on the implementation of
->runtime_idle().  The requirement that it be suitable for calling from
interrupt context may be quite inconvenient for some drivers and I'm afraid
they may have problems with meeting it.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4)
  2009-06-24  0:08   ` Rafael J. Wysocki
@ 2009-06-24  0:36     ` Rafael J. Wysocki
  2009-06-24 19:24       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
  2009-06-24 19:24       ` Rafael J. Wysocki
  2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-24  0:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-pm, Oliver Neukum, Magnus Damm, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Wednesday 24 June 2009, Rafael J. Wysocki wrote:
> On Tuesday 23 June 2009, Alan Stern wrote:
> > On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:
> > 
> > > Hi,
> > > 
> > > Below is a new revision of the patch introducing the run-time PM framework.
> > > 
> > > The most visible changes from the last version:
> > > 
> > > * I realized that if child_count is atomic, we can drop the parent locking from
> > >   all of the functions, so I did that.
> > > 
> > > * Introduced pm_runtime_put() that decrements the resume counter and queues
> > >   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
> > >   Using asynchronous notification makes it possible to call pm_runtime_put()
> > >   from interrupt context, if necessary.
> > > 
> > > * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
> > >   disabling run-time PM for a device along with the resume counter).
> > > 
> > > Please let me know if I've overlooked anything. :-)
> > 
> > This first thing to strike me was that you moved the idle notifications 
> > into the workqueue.
> 
> Yes, I did.
>  
> > Is that really needed?  Would we be better off just make the idle
> > callbacks directly from pm_runtime_put?  They would run in whatever
> > context the driver happened to be in at the time.
> > 
> > It's not clear exactly how much work the idle callbacks will need to 
> > do, but it seems likely that they won't have to do too much more than 
> > call pm_request_suspend.  And of course, that can be done in_interrupt.
> 
> I just don't want to put any constraints on the implementation of
> ->runtime_idle().  The requirement that it be suitable for calling from
> interrupt context may be quite inconvenient for some drivers and I'm afraid
> they may have problems with meeting it.

BTW, appended is a new update.  Hopefully, the majority of bugs were found
and fixed this time.

I dropped the documentation for now, until the code settles down.

Also, I removed the automatic incrementing and decrementing of resume_count
in __pm_runtime_resume() and pm_request_resume().

Description of RPM_NOTIFY is missing (sorry for that).  It's set when idle
notification has been scheduled for the device and reset before running
pm_runtime_idle() by the work function.

Comments welcome.

Best,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 4)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |    9 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  709 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   98 +++++
 include/linux/pm_runtime.h   |  136 ++++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 +
 9 files changed, 1001 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,78 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+#define RPM_NOTIFY	0x20
+#define RPM_ERROR	0x3F
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_break:1;
+	unsigned int		runtime_busy:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status:6;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,709 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * pm_runtime_idle - Check if device can be suspended and notify its bus type.
+ * @dev: Device to notify.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	if (!pm_suspend_possible(dev))
+		return;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_get_child - Increment the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_get_child(struct device *dev)
+{
+	atomic_inc(&dev->power.child_count);
+}
+
+/**
+ * __pm_put_child - Decrement the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_put_child(struct device *dev)
+{
+	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+}
+
+/**
+ * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the run-time PM status of the device is appropriate and run the
+ * ->runtime_suspend() callback provided by the device's bus type.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
+		error = 0;
+		goto out;
+	} else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING))
+	    || (!sync && dev->power.runtime_status == RPM_IDLE
+	    && dev->power.runtime_break)) {
+		/*
+		 * We're forbidden to suspend the device, it is resuming or has
+		 * a resume request pending, or a pending suspend request has
+		 * just been cancelled and we're running as a result of that
+		 * request.
+		 */
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/* Another suspend is running in parallel with us. */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_error;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Idle notification is pending for the device, so preempt it.
+		 * There also may be a suspend request pending, but the idle
+		 * notification work function will run earlier, so make it
+		 * cancel that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+
+		goto repeat;
+	} else if (sync && dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/*
+		 * Suspend request is pending, but we're not running as a result
+		 * of that request, so cancel it.  Since we're not clearing the
+		 * RPM_IDLE bit now, no new suspend requests will be queued up
+		 * while the cancelled pending one is waited for.
+		 */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat;
+
+		dev->power.runtime_break = false;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = -EBUSY;
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		/*
+		 * Resume request might have been queued up in the meantime, in
+		 * which case the RPM_WAKE bit is also set in runtime_status.
+		 */
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent)
+		parent = dev->parent;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent) {
+		__pm_put_child(parent);
+
+		if (!parent->power.ignore_children)
+			pm_runtime_idle(parent);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for and
+ * run pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+void pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+
+	if (atomic_read(&dev->power.resume_count) > 0)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* There may be an idle notification in progress, so be careful. */
+	if (!(dev->power.runtime_status & ~RPM_NOTIFY)
+	    || dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	dev->power.runtime_break = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.  If
+ * runtime suspend is in progress while this function is being run, wait for it
+ * to finish before resuming the device.  If runtime suspend is scheduled, but
+ * it hasn't started yet, cancel it and we're done.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = 0;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, preempt it.
+		 * There also may be a suspend request pending, but the idle
+		 * notification function will run earlier, so make it cancel
+		 * that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/* Suspend request is pending, not yet aborted, so cancel it. */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already changed the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat_locked;
+
+		/* The RPM_IDLE bit is still set, so clear it and return. */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = 0;
+		goto out;
+	} else if (sync && (dev->power.runtime_status & RPM_WAKE)) {
+		/* Resume request is pending, so let it run. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		goto repeat;
+	} else if (!put_parent && parent
+	    && dev->power.runtime_status == RPM_SUSPENDED) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		__pm_runtime_get(parent);
+		error = pm_runtime_resume(parent);
+		if (error) {
+			__pm_runtime_put(parent);
+			return error;
+		}
+
+		put_parent = true;
+		error = -EINVAL;
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		error = dev->power.runtime_error;
+		goto out_parent;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	/*
+	 * We can decrement the parent's resume counter right now, because it
+	 * can't be suspended anyway after the __pm_get_child() above.
+	 */
+	if (put_parent) {
+		__pm_runtime_put(parent);
+		put_parent = false;
+	}
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		__pm_runtime_put(parent);
+
+	if (!sync && !error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	__pm_runtime_resume(work_to_device(work), false);
+}
+
+/**
+ * pm_notify_or_cancel_work - Run pm_runtime_idle() or cancel a suspend request.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to find a device and either execute pm_runtime_idle() for that
+ * device, or cancel a pending suspend request for it depending on the device's
+ * run-time PM status.
+ */
+static void pm_notify_or_cancel_work(struct work_struct *work)
+{
+	struct device *dev = work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/*
+	 * There are three situations in which this function is run.  First, if
+	 * there's a request to notify the device's bus type that the device is
+	 * idle.  Second, if there's a request to cancel a pending suspend
+	 * request.  Finally, if the previous two happen at the same time.
+	 * However, we only need to run pm_runtime_idle() in the first
+	 * situation, because in the last one the request to suspend being
+	 * cancelled must have happened after the request to run idle
+	 * notification, which means that runtime_break is set.  In addition to
+	 * that, runtime_break will be set if synchronous suspend or resume has
+	 * run before us.
+	 */
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+	if (!dev->power.runtime_break)
+		goto notify;
+
+	if (dev->power.runtime_status == (RPM_IDLE|RPM_WAKE)) {
+		/* We have a suspend request to cancel. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Clear the status if someone else hasn't done it yet. */
+		if (dev->power.runtime_status != (RPM_IDLE|RPM_WAKE)
+		    || !dev->power.runtime_break)
+			goto out;
+	}
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_break = false;
+	goto out;
+
+ notify:
+	dev->power.runtime_busy = true;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	pm_runtime_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_busy = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, so make it fail.
+		 * It may also have a suspend request pending, but the idle
+		 * notification work function will run before it and can cancel
+		 * it for us just fine.
+		 */
+		dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		error = -EBUSY;
+
+		/* Check if the suspend is being cancelled already. */
+		if (dev->power.runtime_break)
+			goto out;
+
+		/* Suspend request is pending.  Queue a request to cancel it. */
+		dev->power.runtime_break = true;
+		INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+		goto queue;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+ queue:
+	/*
+	 * The device may be suspending at the moment or there may be a resume
+	 * request pending for it and we can't clear the RPM_SUSPENDING and
+	 * RPM_IDLE bits in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * pm_runtime_put - Decrement the resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if it is possible to suspend the
+ * device and notify its bus type in that case.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced counter decrementation");
+		goto out;
+ 	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/* Do not queue up a notification if one is already in progress. */
+	if ((dev->power.runtime_status & RPM_NOTIFY) || dev->power.runtime_busy)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	dev->power.runtime_break = false;
+	INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		__pm_put_child(parent);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	__pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if ((dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY))
+	    || dev->power.runtime_busy) {
+		/* Resume request or idle notification pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+		dev->power.runtime_busy = false;
+	}
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~RPM_IDLE;
+	} else if (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING)) {
+		/* Suspend or wake-up in progress. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return;
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	dev->power.runtime_busy = false;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+
+	if (dev->parent)
+		__pm_get_child(dev->parent);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,136 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern void pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(dw, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void __pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !(dev->power.runtime_status & ~RPM_NOTIFY)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline void pm_request_suspend(struct device *dev, unsigned int msec) {}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return -ENOSYS; }
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void __pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	__pm_runtime_get(dev);
+}
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_remove(struct device *dev)
+{
+	pm_runtime_disable(dev);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -320,6 +327,8 @@ static void __device_release_driver(stru
 		devres_release_all(dev);
 		dev->driver = NULL;
 		klist_remove(&dev->p->knode_driver);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4)
  2009-06-24  0:08   ` Rafael J. Wysocki
  2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
@ 2009-06-24  0:36     ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-24  0:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, linux-pm, Ingo Molnar,
	Arjan van de Ven

On Wednesday 24 June 2009, Rafael J. Wysocki wrote:
> On Tuesday 23 June 2009, Alan Stern wrote:
> > On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:
> > 
> > > Hi,
> > > 
> > > Below is a new revision of the patch introducing the run-time PM framework.
> > > 
> > > The most visible changes from the last version:
> > > 
> > > * I realized that if child_count is atomic, we can drop the parent locking from
> > >   all of the functions, so I did that.
> > > 
> > > * Introduced pm_runtime_put() that decrements the resume counter and queues
> > >   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
> > >   Using asynchronous notification makes it possible to call pm_runtime_put()
> > >   from interrupt context, if necessary.
> > > 
> > > * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
> > >   disabling run-time PM for a device along with the resume counter).
> > > 
> > > Please let me know if I've overlooked anything. :-)
> > 
> > This first thing to strike me was that you moved the idle notifications 
> > into the workqueue.
> 
> Yes, I did.
>  
> > Is that really needed?  Would we be better off just make the idle
> > callbacks directly from pm_runtime_put?  They would run in whatever
> > context the driver happened to be in at the time.
> > 
> > It's not clear exactly how much work the idle callbacks will need to 
> > do, but it seems likely that they won't have to do too much more than 
> > call pm_request_suspend.  And of course, that can be done in_interrupt.
> 
> I just don't want to put any constraints on the implementation of
> ->runtime_idle().  The requirement that it be suitable for calling from
> interrupt context may be quite inconvenient for some drivers and I'm afraid
> they may have problems with meeting it.

BTW, appended is a new update.  Hopefully, the majority of bugs were found
and fixed this time.

I dropped the documentation for now, until the code settles down.

Also, I removed the automatic incrementing and decrementing of resume_count
in __pm_runtime_resume() and pm_request_resume().

Description of RPM_NOTIFY is missing (sorry for that).  It's set when idle
notification has been scheduled for the device and reset before running
pm_runtime_idle() by the work function.

Comments welcome.

Best,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 4)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |    9 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  709 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   98 +++++
 include/linux/pm_runtime.h   |  136 ++++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 +
 9 files changed, 1001 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,78 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+#define RPM_NOTIFY	0x20
+#define RPM_ERROR	0x3F
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_break:1;
+	unsigned int		runtime_busy:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status:6;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,709 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * pm_runtime_idle - Check if device can be suspended and notify its bus type.
+ * @dev: Device to notify.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	if (!pm_suspend_possible(dev))
+		return;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_get_child - Increment the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_get_child(struct device *dev)
+{
+	atomic_inc(&dev->power.child_count);
+}
+
+/**
+ * __pm_put_child - Decrement the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_put_child(struct device *dev)
+{
+	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+}
+
+/**
+ * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the run-time PM status of the device is appropriate and run the
+ * ->runtime_suspend() callback provided by the device's bus type.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
+		error = 0;
+		goto out;
+	} else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING))
+	    || (!sync && dev->power.runtime_status == RPM_IDLE
+	    && dev->power.runtime_break)) {
+		/*
+		 * We're forbidden to suspend the device, it is resuming or has
+		 * a resume request pending, or a pending suspend request has
+		 * just been cancelled and we're running as a result of that
+		 * request.
+		 */
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/* Another suspend is running in parallel with us. */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_error;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Idle notification is pending for the device, so preempt it.
+		 * There also may be a suspend request pending, but the idle
+		 * notification work function will run earlier, so make it
+		 * cancel that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+
+		goto repeat;
+	} else if (sync && dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/*
+		 * Suspend request is pending, but we're not running as a result
+		 * of that request, so cancel it.  Since we're not clearing the
+		 * RPM_IDLE bit now, no new suspend requests will be queued up
+		 * while the cancelled pending one is waited for.
+		 */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat;
+
+		dev->power.runtime_break = false;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = -EBUSY;
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		/*
+		 * Resume request might have been queued up in the meantime, in
+		 * which case the RPM_WAKE bit is also set in runtime_status.
+		 */
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent)
+		parent = dev->parent;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent) {
+		__pm_put_child(parent);
+
+		if (!parent->power.ignore_children)
+			pm_runtime_idle(parent);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for and
+ * run pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+void pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+
+	if (atomic_read(&dev->power.resume_count) > 0)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* There may be an idle notification in progress, so be careful. */
+	if (!(dev->power.runtime_status & ~RPM_NOTIFY)
+	    || dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	dev->power.runtime_break = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.  If
+ * runtime suspend is in progress while this function is being run, wait for it
+ * to finish before resuming the device.  If runtime suspend is scheduled, but
+ * it hasn't started yet, cancel it and we're done.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = 0;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, preempt it.
+		 * There also may be a suspend request pending, but the idle
+		 * notification function will run earlier, so make it cancel
+		 * that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/* Suspend request is pending, not yet aborted, so cancel it. */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already changed the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat_locked;
+
+		/* The RPM_IDLE bit is still set, so clear it and return. */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = 0;
+		goto out;
+	} else if (sync && (dev->power.runtime_status & RPM_WAKE)) {
+		/* Resume request is pending, so let it run. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		goto repeat;
+	} else if (!put_parent && parent
+	    && dev->power.runtime_status == RPM_SUSPENDED) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		__pm_runtime_get(parent);
+		error = pm_runtime_resume(parent);
+		if (error) {
+			__pm_runtime_put(parent);
+			return error;
+		}
+
+		put_parent = true;
+		error = -EINVAL;
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		error = dev->power.runtime_error;
+		goto out_parent;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	/*
+	 * We can decrement the parent's resume counter right now, because it
+	 * can't be suspended anyway after the __pm_get_child() above.
+	 */
+	if (put_parent) {
+		__pm_runtime_put(parent);
+		put_parent = false;
+	}
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		__pm_runtime_put(parent);
+
+	if (!sync && !error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	__pm_runtime_resume(work_to_device(work), false);
+}
+
+/**
+ * pm_notify_or_cancel_work - Run pm_runtime_idle() or cancel a suspend request.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to find a device and either execute pm_runtime_idle() for that
+ * device, or cancel a pending suspend request for it depending on the device's
+ * run-time PM status.
+ */
+static void pm_notify_or_cancel_work(struct work_struct *work)
+{
+	struct device *dev = work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/*
+	 * There are three situations in which this function is run.  First, if
+	 * there's a request to notify the device's bus type that the device is
+	 * idle.  Second, if there's a request to cancel a pending suspend
+	 * request.  Finally, if the previous two happen at the same time.
+	 * However, we only need to run pm_runtime_idle() in the first
+	 * situation, because in the last one the request to suspend being
+	 * cancelled must have happened after the request to run idle
+	 * notification, which means that runtime_break is set.  In addition to
+	 * that, runtime_break will be set if synchronous suspend or resume has
+	 * run before us.
+	 */
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+	if (!dev->power.runtime_break)
+		goto notify;
+
+	if (dev->power.runtime_status == (RPM_IDLE|RPM_WAKE)) {
+		/* We have a suspend request to cancel. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Clear the status if someone else hasn't done it yet. */
+		if (dev->power.runtime_status != (RPM_IDLE|RPM_WAKE)
+		    || !dev->power.runtime_break)
+			goto out;
+	}
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_break = false;
+	goto out;
+
+ notify:
+	dev->power.runtime_busy = true;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	pm_runtime_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_busy = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, so make it fail.
+		 * It may also have a suspend request pending, but the idle
+		 * notification work function will run before it and can cancel
+		 * it for us just fine.
+		 */
+		dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		error = -EBUSY;
+
+		/* Check if the suspend is being cancelled already. */
+		if (dev->power.runtime_break)
+			goto out;
+
+		/* Suspend request is pending.  Queue a request to cancel it. */
+		dev->power.runtime_break = true;
+		INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+		goto queue;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+ queue:
+	/*
+	 * The device may be suspending at the moment or there may be a resume
+	 * request pending for it and we can't clear the RPM_SUSPENDING and
+	 * RPM_IDLE bits in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * pm_runtime_put - Decrement the resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if it is possible to suspend the
+ * device and notify its bus type in that case.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced counter decrementation");
+		goto out;
+ 	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/* Do not queue up a notification if one is already in progress. */
+	if ((dev->power.runtime_status & RPM_NOTIFY) || dev->power.runtime_busy)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	dev->power.runtime_break = false;
+	INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		__pm_put_child(parent);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	__pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if ((dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY))
+	    || dev->power.runtime_busy) {
+		/* Resume request or idle notification pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+		dev->power.runtime_busy = false;
+	}
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~RPM_IDLE;
+	} else if (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING)) {
+		/* Suspend or wake-up in progress. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return;
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	dev->power.runtime_busy = false;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+
+	if (dev->parent)
+		__pm_get_child(dev->parent);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,136 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern void pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(dw, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void __pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !(dev->power.runtime_status & ~RPM_NOTIFY)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline void pm_request_suspend(struct device *dev, unsigned int msec) {}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev) { return -ENOSYS; }
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void __pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	__pm_runtime_get(dev);
+}
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_remove(struct device *dev)
+{
+	pm_runtime_disable(dev);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -320,6 +327,8 @@ static void __device_release_driver(stru
 		devres_release_all(dev);
 		dev->driver = NULL;
 		klist_remove(&dev->p->knode_driver);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
@ 2009-06-24 19:24       ` Rafael J. Wysocki
  2009-06-24 21:30         ` Alan Stern
                           ` (3 more replies)
  2009-06-24 19:24       ` Rafael J. Wysocki
  1 sibling, 4 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-24 19:24 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-pm, Oliver Neukum, Magnus Damm, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Wednesday 24 June 2009, Rafael J. Wysocki wrote:
> On Wednesday 24 June 2009, Rafael J. Wysocki wrote:
> > On Tuesday 23 June 2009, Alan Stern wrote:
> > > On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:
> > > 
> > > > Hi,
> > > > 
> > > > Below is a new revision of the patch introducing the run-time PM framework.
> > > > 
> > > > The most visible changes from the last version:
> > > > 
> > > > * I realized that if child_count is atomic, we can drop the parent locking from
> > > >   all of the functions, so I did that.
> > > > 
> > > > * Introduced pm_runtime_put() that decrements the resume counter and queues
> > > >   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
> > > >   Using asynchronous notification makes it possible to call pm_runtime_put()
> > > >   from interrupt context, if necessary.
> > > > 
> > > > * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
> > > >   disabling run-time PM for a device along with the resume counter).
> > > > 
> > > > Please let me know if I've overlooked anything. :-)
> > > 
> > > This first thing to strike me was that you moved the idle notifications 
> > > into the workqueue.
> > 
> > Yes, I did.
> >  
> > > Is that really needed?  Would we be better off just make the idle
> > > callbacks directly from pm_runtime_put?  They would run in whatever
> > > context the driver happened to be in at the time.
> > > 
> > > It's not clear exactly how much work the idle callbacks will need to 
> > > do, but it seems likely that they won't have to do too much more than 
> > > call pm_request_suspend.  And of course, that can be done in_interrupt.
> > 
> > I just don't want to put any constraints on the implementation of
> > ->runtime_idle().  The requirement that it be suitable for calling from
> > interrupt context may be quite inconvenient for some drivers and I'm afraid
> > they may have problems with meeting it.
> 
> BTW, appended is a new update.  Hopefully, the majority of bugs were found
> and fixed this time.
> 
> I dropped the documentation for now, until the code settles down.
> 
> Also, I removed the automatic incrementing and decrementing of resume_count
> in __pm_runtime_resume() and pm_request_resume().
> 
> Description of RPM_NOTIFY is missing (sorry for that).  It's set when idle
> notification has been scheduled for the device and reset before running
> pm_runtime_idle() by the work function.

One more update:

* __pm_runtime_suspend() now calls pm_runtime_idle() on return if
  error is -EBUSY or -EAGAIN (i.e. the status was set to RPM_ACTIVE).

* pm_request_suspend() returns error codes (I thought it might be useful to
  know if the suspend had been scheduled without looking into
  runtime_status:-)).

* The status check in pm_request_suspend() is fixed (was inversed), thanks
  to Alan.

* __pm_runtime_resume() increments resume_counter at the beginning and
  decrements it on return (the rationale is explained in a comment).

* __pm_runtime_resume() now calls pm_runtime_idle() even if sync is set.

* The code in pm_notify_or_cancel_work() has been rearranged.

Comments welcome.

Best,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 5)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |    9 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  729 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   98 +++++
 include/linux/pm_runtime.h   |  142 ++++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 +
 9 files changed, 1027 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,78 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+#define RPM_NOTIFY	0x20
+#define RPM_ERROR	0x3F
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_break:1;
+	unsigned int		runtime_notify:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status:6;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,729 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * pm_runtime_idle - Check if device can be suspended and notify its bus type.
+ * @dev: Device to notify.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	if (!pm_suspend_possible(dev))
+		return;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_get_child - Increment the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_get_child(struct device *dev)
+{
+	atomic_inc(&dev->power.child_count);
+}
+
+/**
+ * __pm_put_child - Decrement the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_put_child(struct device *dev)
+{
+	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+}
+
+/**
+ * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the run-time PM status of the device is appropriate and run the
+ * ->runtime_suspend() callback provided by the device's bus type.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
+		error = 0;
+		goto out;
+	} else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING))
+	    || (!sync && dev->power.runtime_status == RPM_IDLE
+	    && dev->power.runtime_break)) {
+		/*
+		 * We're forbidden to suspend the device, it is resuming or has
+		 * a resume request pending, or a pending suspend request has
+		 * just been cancelled and we're running as a result of that
+		 * request.
+		 */
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/* Another suspend is running in parallel with us. */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_error;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Idle notification is pending for the device, so preempt it.
+		 * There also may be a suspend request pending, but the idle
+		 * notification work function will run earlier, so make it
+		 * cancel that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+
+		goto repeat;
+	} else if (sync && dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/*
+		 * Suspend request is pending, but we're not running as a result
+		 * of that request, so cancel it.  Since we're not clearing the
+		 * RPM_IDLE bit now, no new suspend requests will be queued up
+		 * while the cancelled pending one is waited for.
+		 */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat;
+
+		dev->power.runtime_break = false;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = -EBUSY;
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		/*
+		 * Resume request might have been queued up in the meantime, in
+		 * which case the RPM_WAKE bit is also set in runtime_status.
+		 */
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent)
+		parent = dev->parent;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent) {
+		__pm_put_child(parent);
+
+		if (!parent->power.ignore_children)
+			pm_runtime_idle(parent);
+	}
+
+	if (error == -EBUSY || error == -EAGAIN)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for and
+ * run pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* There may be an idle notification in progress, so be careful. */
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (atomic_read(&dev->power.resume_count) > 0
+	    || (dev->power.runtime_status & ~RPM_NOTIFY)
+	    || dev->power.runtime_disabled)
+		error = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		error = -EBUSY;
+	if (error)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	dev->power.runtime_break = false;
+	delay = msecs_to_jiffies(msec);
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.  If
+ * runtime suspend is in progress while this function is being run, wait for it
+ * to finish before resuming the device.  If runtime suspend is scheduled, but
+ * it hasn't started yet, cancel it and we're done.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	/*
+	 * If we didn't increment the resume counter here and we needed to start
+	 * over releasing the lock, suspend could happen before we had a chance
+	 * to acquire the lock again.  In that case the device would be
+	 * suspended and then immediately woken up by us which would be a loss
+	 * of time.
+	 */
+	__pm_runtime_get(dev);
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = 0;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, so make it fail.
+		 * There also may be a suspend request pending, but the idle
+		 * notification function will run earlier, so make it cancel
+		 * that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/* Suspend request is pending, not yet aborted, so cancel it. */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already changed the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat_locked;
+
+		/* The RPM_IDLE bit is still set, so clear it and return. */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = 0;
+		goto out;
+	} else if (sync && (dev->power.runtime_status & RPM_WAKE)) {
+		/* Resume request is pending, so let it run. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		goto repeat;
+	} else if (!put_parent && parent
+	    && dev->power.runtime_status == RPM_SUSPENDED) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		__pm_runtime_get(parent);
+		put_parent = true;
+		error = pm_runtime_resume(parent);
+		if (error)
+			goto out_parent;
+
+		error = -EINVAL;
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		error = dev->power.runtime_error;
+		goto out_parent;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	/*
+	 * We can decrement the parent's resume counter right now, because it
+	 * can't be suspended anyway after the __pm_get_child() above.
+	 */
+	if (put_parent) {
+		__pm_runtime_put(parent);
+		put_parent = false;
+	}
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		__pm_runtime_put(parent);
+
+	__pm_runtime_put(dev);
+
+	if (!error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	__pm_runtime_resume(work_to_device(work), false);
+}
+
+/**
+ * pm_notify_or_cancel_work - Run pm_runtime_idle() or cancel a suspend request.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to find a device and either execute pm_runtime_idle() for that
+ * device, or cancel a pending suspend request for it depending on the device's
+ * run-time PM status.
+ */
+static void pm_notify_or_cancel_work(struct work_struct *work)
+{
+	struct device *dev = work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/*
+	 * There are three situations in which this function is run.  First, if
+	 * there's a request to notify the device's bus type that the device is
+	 * idle.  Second, if there's a request to cancel a pending suspend
+	 * request.  Finally, if the previous two happen at the same time.
+	 * However, we only need to run pm_runtime_idle() in the first
+	 * situation, because in the last one the request to suspend being
+	 * cancelled must have happened after the request to run idle
+	 * notification, which means that runtime_break is set.  In addition to
+	 * that, runtime_break will be set if synchronous suspend or resume has
+	 * run before us.
+	 */
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+
+	if (!dev->power.runtime_break) {
+		/* Idle notification should be carried out. */
+		dev->power.runtime_notify = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		pm_runtime_idle(dev);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_notify = false;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == (RPM_IDLE|RPM_WAKE)) {
+		/* We have a suspend request to cancel. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Clear the status if someone else hasn't done it yet. */
+		if (dev->power.runtime_status != (RPM_IDLE|RPM_WAKE)
+		    || !dev->power.runtime_break)
+			goto out;
+	}
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_break = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, so make it fail.
+		 * It may also have a suspend request pending, but the idle
+		 * notification work function will run before it and can cancel
+		 * it for us just fine.
+		 */
+		dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		error = -EBUSY;
+
+		/* Check if the suspend is being cancelled already. */
+		if (dev->power.runtime_break)
+			goto out;
+
+		/* Suspend request is pending.  Queue a request to cancel it. */
+		dev->power.runtime_break = true;
+		INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+		goto queue;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+ queue:
+	/*
+	 * The device may be suspending at the moment or there may be a resume
+	 * request pending for it and we can't clear the RPM_SUSPENDING and
+	 * RPM_IDLE bits in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * pm_runtime_put - Decrement the resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if it is possible to suspend the
+ * device and notify its bus type in that case.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced counter decrementation");
+		goto out;
+ 	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/* Do not queue up a notification if one is already in progress. */
+	if ((dev->power.runtime_status & RPM_NOTIFY) || dev->power.runtime_notify)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	dev->power.runtime_break = false;
+	INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		__pm_put_child(parent);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	__pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if ((dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY))
+	    || dev->power.runtime_notify) {
+		/* Resume request or idle notification pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+		dev->power.runtime_notify = false;
+	}
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~RPM_IDLE;
+	} else if (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING)) {
+		/* Suspend or wake-up in progress. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return;
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	dev->power.runtime_notify = false;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+
+	if (dev->parent)
+		__pm_get_child(dev->parent);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,142 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern int pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(dw, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void __pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !(dev->power.runtime_status & ~RPM_NOTIFY)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	return -ENOSYS;
+}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void __pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	__pm_runtime_get(dev);
+}
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_remove(struct device *dev)
+{
+	pm_runtime_disable(dev);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -320,6 +327,8 @@ static void __device_release_driver(stru
 		devres_release_all(dev);
 		dev->driver = NULL;
 		klist_remove(&dev->p->knode_driver);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
  2009-06-24 19:24       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
@ 2009-06-24 19:24       ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-24 19:24 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, linux-pm, Ingo Molnar,
	Arjan van de Ven

On Wednesday 24 June 2009, Rafael J. Wysocki wrote:
> On Wednesday 24 June 2009, Rafael J. Wysocki wrote:
> > On Tuesday 23 June 2009, Alan Stern wrote:
> > > On Tue, 23 Jun 2009, Rafael J. Wysocki wrote:
> > > 
> > > > Hi,
> > > > 
> > > > Below is a new revision of the patch introducing the run-time PM framework.
> > > > 
> > > > The most visible changes from the last version:
> > > > 
> > > > * I realized that if child_count is atomic, we can drop the parent locking from
> > > >   all of the functions, so I did that.
> > > > 
> > > > * Introduced pm_runtime_put() that decrements the resume counter and queues
> > > >   up an idle notification if the counter went down to 0 (and wasn't 0 previously).
> > > >   Using asynchronous notification makes it possible to call pm_runtime_put()
> > > >   from interrupt context, if necessary.
> > > > 
> > > > * Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
> > > >   disabling run-time PM for a device along with the resume counter).
> > > > 
> > > > Please let me know if I've overlooked anything. :-)
> > > 
> > > This first thing to strike me was that you moved the idle notifications 
> > > into the workqueue.
> > 
> > Yes, I did.
> >  
> > > Is that really needed?  Would we be better off just make the idle
> > > callbacks directly from pm_runtime_put?  They would run in whatever
> > > context the driver happened to be in at the time.
> > > 
> > > It's not clear exactly how much work the idle callbacks will need to 
> > > do, but it seems likely that they won't have to do too much more than 
> > > call pm_request_suspend.  And of course, that can be done in_interrupt.
> > 
> > I just don't want to put any constraints on the implementation of
> > ->runtime_idle().  The requirement that it be suitable for calling from
> > interrupt context may be quite inconvenient for some drivers and I'm afraid
> > they may have problems with meeting it.
> 
> BTW, appended is a new update.  Hopefully, the majority of bugs were found
> and fixed this time.
> 
> I dropped the documentation for now, until the code settles down.
> 
> Also, I removed the automatic incrementing and decrementing of resume_count
> in __pm_runtime_resume() and pm_request_resume().
> 
> Description of RPM_NOTIFY is missing (sorry for that).  It's set when idle
> notification has been scheduled for the device and reset before running
> pm_runtime_idle() by the work function.

One more update:

* __pm_runtime_suspend() now calls pm_runtime_idle() on return if
  error is -EBUSY or -EAGAIN (i.e. the status was set to RPM_ACTIVE).

* pm_request_suspend() returns error codes (I thought it might be useful to
  know if the suspend had been scheduled without looking into
  runtime_status:-)).

* The status check in pm_request_suspend() is fixed (was inversed), thanks
  to Alan.

* __pm_runtime_resume() increments resume_counter at the beginning and
  decrements it on return (the rationale is explained in a comment).

* __pm_runtime_resume() now calls pm_runtime_idle() even if sync is set.

* The code in pm_notify_or_cancel_work() has been rearranged.

Comments welcome.

Best,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 5)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |    9 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  729 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |   98 +++++
 include/linux/pm_runtime.h   |  142 ++++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 +
 9 files changed, 1027 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,78 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+#define RPM_NOTIFY	0x20
+#define RPM_ERROR	0x3F
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_break:1;
+	unsigned int		runtime_notify:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status:6;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,729 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * pm_runtime_idle - Check if device can be suspended and notify its bus type.
+ * @dev: Device to notify.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	if (!pm_suspend_possible(dev))
+		return;
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_get_child - Increment the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_get_child(struct device *dev)
+{
+	atomic_inc(&dev->power.child_count);
+}
+
+/**
+ * __pm_put_child - Decrement the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_put_child(struct device *dev)
+{
+	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+}
+
+/**
+ * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the run-time PM status of the device is appropriate and run the
+ * ->runtime_suspend() callback provided by the device's bus type.  Update the
+ * run-time PM flags in the device object to reflect the current status of the
+ * device.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
+		error = 0;
+		goto out;
+	} else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING))
+	    || (!sync && dev->power.runtime_status == RPM_IDLE
+	    && dev->power.runtime_break)) {
+		/*
+		 * We're forbidden to suspend the device, it is resuming or has
+		 * a resume request pending, or a pending suspend request has
+		 * just been cancelled and we're running as a result of that
+		 * request.
+		 */
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/* Another suspend is running in parallel with us. */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_error;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Idle notification is pending for the device, so preempt it.
+		 * There also may be a suspend request pending, but the idle
+		 * notification work function will run earlier, so make it
+		 * cancel that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+
+		goto repeat;
+	} else if (sync && dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/*
+		 * Suspend request is pending, but we're not running as a result
+		 * of that request, so cancel it.  Since we're not clearing the
+		 * RPM_IDLE bit now, no new suspend requests will be queued up
+		 * while the cancelled pending one is waited for.
+		 */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat;
+
+		dev->power.runtime_break = false;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = -EBUSY;
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		/*
+		 * Resume request might have been queued up in the meantime, in
+		 * which case the RPM_WAKE bit is also set in runtime_status.
+		 */
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent)
+		parent = dev->parent;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent) {
+		__pm_put_child(parent);
+
+		if (!parent->power.ignore_children)
+			pm_runtime_idle(parent);
+	}
+
+	if (error == -EBUSY || error == -EAGAIN)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for and
+ * run pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* There may be an idle notification in progress, so be careful. */
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (atomic_read(&dev->power.resume_count) > 0
+	    || (dev->power.runtime_status & ~RPM_NOTIFY)
+	    || dev->power.runtime_disabled)
+		error = -EAGAIN;
+	else if (!pm_children_suspended(dev))
+		error = -EBUSY;
+	if (error)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	dev->power.runtime_break = false;
+	delay = msecs_to_jiffies(msec);
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.  If
+ * runtime suspend is in progress while this function is being run, wait for it
+ * to finish before resuming the device.  If runtime suspend is scheduled, but
+ * it hasn't started yet, cancel it and we're done.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	/*
+	 * If we didn't increment the resume counter here and we needed to start
+	 * over releasing the lock, suspend could happen before we had a chance
+	 * to acquire the lock again.  In that case the device would be
+	 * suspended and then immediately woken up by us which would be a loss
+	 * of time.
+	 */
+	__pm_runtime_get(dev);
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = 0;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, so make it fail.
+		 * There also may be a suspend request pending, but the idle
+		 * notification function will run earlier, so make it cancel
+		 * that request for us.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE)
+			dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.runtime_break) {
+		/* Suspend request is pending, not yet aborted, so cancel it. */
+		dev->power.runtime_break = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has already changed the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.runtime_break)
+			goto repeat_locked;
+
+		/* The RPM_IDLE bit is still set, so clear it and return. */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = 0;
+		goto out;
+	} else if (sync && (dev->power.runtime_status & RPM_WAKE)) {
+		/* Resume request is pending, so let it run. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.work);
+		goto repeat;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		goto repeat;
+	} else if (!put_parent && parent
+	    && dev->power.runtime_status == RPM_SUSPENDED) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		__pm_runtime_get(parent);
+		put_parent = true;
+		error = pm_runtime_resume(parent);
+		if (error)
+			goto out_parent;
+
+		error = -EINVAL;
+		goto repeat;
+	} else if (dev->power.runtime_status == RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		error = dev->power.runtime_error;
+		goto out_parent;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	/*
+	 * We can decrement the parent's resume counter right now, because it
+	 * can't be suspended anyway after the __pm_get_child() above.
+	 */
+	if (put_parent) {
+		__pm_runtime_put(parent);
+		put_parent = false;
+	}
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		__pm_runtime_put(parent);
+
+	__pm_runtime_put(dev);
+
+	if (!error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	__pm_runtime_resume(work_to_device(work), false);
+}
+
+/**
+ * pm_notify_or_cancel_work - Run pm_runtime_idle() or cancel a suspend request.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to find a device and either execute pm_runtime_idle() for that
+ * device, or cancel a pending suspend request for it depending on the device's
+ * run-time PM status.
+ */
+static void pm_notify_or_cancel_work(struct work_struct *work)
+{
+	struct device *dev = work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/*
+	 * There are three situations in which this function is run.  First, if
+	 * there's a request to notify the device's bus type that the device is
+	 * idle.  Second, if there's a request to cancel a pending suspend
+	 * request.  Finally, if the previous two happen at the same time.
+	 * However, we only need to run pm_runtime_idle() in the first
+	 * situation, because in the last one the request to suspend being
+	 * cancelled must have happened after the request to run idle
+	 * notification, which means that runtime_break is set.  In addition to
+	 * that, runtime_break will be set if synchronous suspend or resume has
+	 * run before us.
+	 */
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+
+	if (!dev->power.runtime_break) {
+		/* Idle notification should be carried out. */
+		dev->power.runtime_notify = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		pm_runtime_idle(dev);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_notify = false;
+		goto out;
+	}
+
+	if (dev->power.runtime_status == (RPM_IDLE|RPM_WAKE)) {
+		/* We have a suspend request to cancel. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Clear the status if someone else hasn't done it yet. */
+		if (dev->power.runtime_status != (RPM_IDLE|RPM_WAKE)
+		    || !dev->power.runtime_break)
+			goto out;
+	}
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_break = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (dev->power.runtime_status == RPM_ACTIVE) {
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, so make it fail.
+		 * It may also have a suspend request pending, but the idle
+		 * notification work function will run before it and can cancel
+		 * it for us just fine.
+		 */
+		dev->power.runtime_status |= RPM_WAKE;
+		dev->power.runtime_break = true;
+		error = -EBUSY;
+	} else if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		error = -EBUSY;
+
+		/* Check if the suspend is being cancelled already. */
+		if (dev->power.runtime_break)
+			goto out;
+
+		/* Suspend request is pending.  Queue a request to cancel it. */
+		dev->power.runtime_break = true;
+		INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+		goto queue;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+ queue:
+	/*
+	 * The device may be suspending at the moment or there may be a resume
+	 * request pending for it and we can't clear the RPM_SUSPENDING and
+	 * RPM_IDLE bits in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * pm_runtime_put - Decrement the resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if it is possible to suspend the
+ * device and notify its bus type in that case.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced counter decrementation");
+		goto out;
+ 	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/* Do not queue up a notification if one is already in progress. */
+	if ((dev->power.runtime_status & RPM_NOTIFY) || dev->power.runtime_notify)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	dev->power.runtime_break = false;
+	INIT_WORK(&dev->power.work, pm_notify_or_cancel_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		__pm_put_child(parent);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	__pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if ((dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY))
+	    || dev->power.runtime_notify) {
+		/* Resume request or idle notification pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+		dev->power.runtime_notify = false;
+	}
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request pending. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		dev->power.runtime_status &= ~RPM_IDLE;
+	} else if (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING)) {
+		/* Suspend or wake-up in progress. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return;
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	dev->power.runtime_notify = false;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+
+	if (dev->parent)
+		__pm_get_child(dev->parent);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,142 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern int pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(dw, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void __pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !(dev->power.runtime_status & ~RPM_NOTIFY)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	return -ENOSYS;
+}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void __pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	__pm_runtime_get(dev);
+}
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_remove(struct device *dev)
+{
+	pm_runtime_disable(dev);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -320,6 +327,8 @@ static void __device_release_driver(stru
 		devres_release_all(dev);
 		dev->driver = NULL;
 		klist_remove(&dev->p->knode_driver);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 19:24       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
  2009-06-24 21:30         ` Alan Stern
@ 2009-06-24 21:30         ` Alan Stern
  2009-06-25 16:49           ` Alan Stern
                             ` (3 more replies)
  2009-06-25 14:57           ` Magnus Damm
  2009-06-25 14:57         ` Magnus Damm
  3 siblings, 4 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-24 21:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux-pm mailing list, Oliver Neukum, Magnus Damm,
	ACPI Devel Maling List, Ingo Molnar, LKML, Greg KH,
	Arjan van de Ven

On Wed, 24 Jun 2009, Rafael J. Wysocki wrote:

> +config PM_RUNTIME
> +	bool "Run-time PM core functionality"
> +	depends on PM
> +	---help---
> +	  Enable functionality allowing I/O devices to be put into energy-saving
> +	  (low power) states at run time (or autosuspended) after a specified
> +	  period of inactivity and woken up in response to a hardware-generated
> +	  wake-up event or a driver's request.
> +
> +	  Hardware support is generally required for this functionality to work
> +	  and the bus type drivers of the buses the devices are on are
> +	  responsibile for the actual handling of the autosuspend requests and

s/ibile/ible/

> @@ -165,6 +168,28 @@ typedef struct pm_message {
>   * It is allowed to unregister devices while the above callbacks are being
>   * executed.  However, it is not allowed to unregister a device from within any
>   * of its own callbacks.
> + *
> + * There also are the following callbacks related to run-time power management
> + * of devices:
> + *
> + * @runtime_suspend: Prepare the device for a condition in which it won't be
> + *	able to communicate with the CPU(s) and RAM due to power management.
> + *	This need not mean that the device should be put into a low power state.
> + *	For example, if the device is behind a link which is about to be turned
> + *	off, the device may remain at full power.  Still, if the device does go

s/Still, if/If/ -- the word "Still" seems a little odd in this context.

> + *	to low power and if device_may_wakeup(dev) is true, remote wake-up
> + *	(i.e. hardware mechanism allowing the device to request a change of its

s/i.e. /i.e., a /

> + *	power state, such as PCI PME) should be enabled for it.
> + *
> + * @runtime_resume: Put the device into the fully active state in response to a
> + *	wake-up event generated by hardware or at a request of software.  If

s/at a request/at the request/

> + *	necessary, put the device into the full power state and restore its
> + *	registers, so that it is fully operational.


> + * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
> + *			pending for it.
> + *
> + * RPM_IDLE		It has been requested that the device be suspended.
> + *			Suspend request has been put into the run-time PM
> + *			workqueue and it's pending execution.
> + *
> + * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
> + *			executed.
> + *
> + * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
> + *			completed successfully.  The device is regarded as
> + *			suspended.
> + *
> + * RPM_WAKE		It has been requested that the device be woken up.
> + *			Resume request has been put into the run-time PM
> + *			workqueue and it's pending execution.
> + *
> + * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
> + *			executed.

Remember to add RPM_NOTIFY.


> +/**
> + * __pm_get_child - Increment the counter of unsuspended children of a device.
> + * @dev: Device to handle;
> + */
> +static void __pm_get_child(struct device *dev)
> +{
> +	atomic_inc(&dev->power.child_count);
> +}
> +
> +/**
> + * __pm_put_child - Decrement the counter of unsuspended children of a device.
> + * @dev: Device to handle;
> + */
> +static void __pm_put_child(struct device *dev)
> +{
> +	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
> +		dev_WARN(dev, "Unbalanced counter decrementation");
> +}

I think we don't need this dev_WARN.  It should be straightforward to
verify that the increments and decrements balance correctly, and the
child_count field isn't manipulated by drivers.

In fact, these don't need to be separate routines at all.  Just call
atomic_inc or atomic_dec directly.

> +
> +/**
> + * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
> + * @dev: Device to suspend.
> + * @sync: If unset, the funtion has been called via pm_wq.
> + *
> + * Check if the run-time PM status of the device is appropriate and run the
> + * ->runtime_suspend() callback provided by the device's bus type.  Update the
> + * run-time PM flags in the device object to reflect the current status of the
> + * device.
> + */
> +int __pm_runtime_suspend(struct device *dev, bool sync)
> +{
> +	struct device *parent = NULL;
> +	unsigned long flags;
> +	int error = -EINVAL;

Remove the initializer.

> +
> +	might_sleep();
> +
> +	spin_lock_irqsave(&dev->power.lock, flags);
> +
> + repeat:
> +	if (dev->power.runtime_status == RPM_ERROR) {

Insert:		error = -EINVAL;

> +		goto out;
> +	} else if (dev->power.runtime_status & RPM_SUSPENDED) {

...


> +void pm_runtime_put(struct device *dev)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&dev->power.lock, flags);
> +
> +	if (!__pm_runtime_put(dev)) {
> +		dev_WARN(dev, "Unbalanced counter decrementation");

"decrementation" isn't a word -- or if it is, it shouldn't be.  :-)  
Just use "decrement".  Similarly in other places.

> +/**
> + * pm_runtime_add - Update run-time PM fields of a device while adding it.
> + * @dev: Device object being added to device hierarchy.
> + */
> +void pm_runtime_add(struct device *dev)
> +{
> +	dev->power.runtime_notify = false;
> +	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);

Doesn't INIT_DELAYED_WORK belong in pm_runtime_init?
Do we want the bus subsystem to be responsible for doing:

	dev->power.runtime_disabled = false;
	pm_runtime_put(dev);

after calling device_add?  Or should device_add do it?


> Index: linux-2.6/include/linux/pm_runtime.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6/include/linux/pm_runtime.h

> +static inline struct device *suspend_work_to_device(struct work_struct *work)
> +{
> +	struct delayed_work *dw = to_delayed_work(work);
> +	struct dev_pm_info *dpi;
> +
> +	dpi = container_of(dw, struct dev_pm_info, suspend_work);
> +	return container_of(dpi, struct device, power);
> +}

You don't need to iterate container_of like this.  You can do:

	return container_of(dw, struct device, power.suspend_work);

> +
> +static inline struct device *work_to_device(struct work_struct *work)
> +{
> +	struct dev_pm_info *dpi;
> +
> +	dpi = container_of(work, struct dev_pm_info, work);
> +	return container_of(dpi, struct device, power);
> +}

Similarly here.

These two routines aren't used outside of runtime.c.  They should be
moved into that file.  The same goes for pm_children_suspended and
pm_suspend_possible.

> +
> +static inline void __pm_runtime_get(struct device *dev)
> +{
> +	atomic_inc(&dev->power.resume_count);
> +}

Why introduce __pm_runtime_get?  Just make this pm_runtime_get.

> +static inline void pm_runtime_remove(struct device *dev)
> +{
> +	pm_runtime_disable(dev);
> +}

You forgot to decrement the parent's child_count if dev isn't
suspended (and then do a idle_notify on the parent).  Because of this 
additional complexity, don't inline the routine.

> Index: linux-2.6/drivers/base/dd.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/dd.c
> +++ linux-2.6/drivers/base/dd.c
> @@ -23,6 +23,7 @@
>  #include <linux/kthread.h>
>  #include <linux/wait.h>
>  #include <linux/async.h>
> +#include <linux/pm_runtime.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
>  	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
>  		 drv->bus->name, __func__, dev_name(dev), drv->name);
>  
> +	pm_runtime_disable(dev);
> +
>  	ret = really_probe(dev, drv);
>  
> +	pm_runtime_enable(dev);
> +

Shouldn't we guarantee that a device isn't probed while it is in a
suspended state?  So this should be

	pm_runtime_get(dev);
	ret = pm_runtime_resume(dev);
	if (ret == 0)
		ret = really_probe(dev, drv);
	pm_runtime_put(dev);	

It might be nice to have a simple combined pm_runtime_get_and_resume
for this sort of situation.


More comments to follow when I get time to review more of the code...

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 19:24       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
@ 2009-06-24 21:30         ` Alan Stern
  2009-06-24 21:30         ` Alan Stern
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-24 21:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wed, 24 Jun 2009, Rafael J. Wysocki wrote:

> +config PM_RUNTIME
> +	bool "Run-time PM core functionality"
> +	depends on PM
> +	---help---
> +	  Enable functionality allowing I/O devices to be put into energy-saving
> +	  (low power) states at run time (or autosuspended) after a specified
> +	  period of inactivity and woken up in response to a hardware-generated
> +	  wake-up event or a driver's request.
> +
> +	  Hardware support is generally required for this functionality to work
> +	  and the bus type drivers of the buses the devices are on are
> +	  responsibile for the actual handling of the autosuspend requests and

s/ibile/ible/

> @@ -165,6 +168,28 @@ typedef struct pm_message {
>   * It is allowed to unregister devices while the above callbacks are being
>   * executed.  However, it is not allowed to unregister a device from within any
>   * of its own callbacks.
> + *
> + * There also are the following callbacks related to run-time power management
> + * of devices:
> + *
> + * @runtime_suspend: Prepare the device for a condition in which it won't be
> + *	able to communicate with the CPU(s) and RAM due to power management.
> + *	This need not mean that the device should be put into a low power state.
> + *	For example, if the device is behind a link which is about to be turned
> + *	off, the device may remain at full power.  Still, if the device does go

s/Still, if/If/ -- the word "Still" seems a little odd in this context.

> + *	to low power and if device_may_wakeup(dev) is true, remote wake-up
> + *	(i.e. hardware mechanism allowing the device to request a change of its

s/i.e. /i.e., a /

> + *	power state, such as PCI PME) should be enabled for it.
> + *
> + * @runtime_resume: Put the device into the fully active state in response to a
> + *	wake-up event generated by hardware or at a request of software.  If

s/at a request/at the request/

> + *	necessary, put the device into the full power state and restore its
> + *	registers, so that it is fully operational.


> + * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
> + *			pending for it.
> + *
> + * RPM_IDLE		It has been requested that the device be suspended.
> + *			Suspend request has been put into the run-time PM
> + *			workqueue and it's pending execution.
> + *
> + * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
> + *			executed.
> + *
> + * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
> + *			completed successfully.  The device is regarded as
> + *			suspended.
> + *
> + * RPM_WAKE		It has been requested that the device be woken up.
> + *			Resume request has been put into the run-time PM
> + *			workqueue and it's pending execution.
> + *
> + * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
> + *			executed.

Remember to add RPM_NOTIFY.


> +/**
> + * __pm_get_child - Increment the counter of unsuspended children of a device.
> + * @dev: Device to handle;
> + */
> +static void __pm_get_child(struct device *dev)
> +{
> +	atomic_inc(&dev->power.child_count);
> +}
> +
> +/**
> + * __pm_put_child - Decrement the counter of unsuspended children of a device.
> + * @dev: Device to handle;
> + */
> +static void __pm_put_child(struct device *dev)
> +{
> +	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
> +		dev_WARN(dev, "Unbalanced counter decrementation");
> +}

I think we don't need this dev_WARN.  It should be straightforward to
verify that the increments and decrements balance correctly, and the
child_count field isn't manipulated by drivers.

In fact, these don't need to be separate routines at all.  Just call
atomic_inc or atomic_dec directly.

> +
> +/**
> + * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
> + * @dev: Device to suspend.
> + * @sync: If unset, the funtion has been called via pm_wq.
> + *
> + * Check if the run-time PM status of the device is appropriate and run the
> + * ->runtime_suspend() callback provided by the device's bus type.  Update the
> + * run-time PM flags in the device object to reflect the current status of the
> + * device.
> + */
> +int __pm_runtime_suspend(struct device *dev, bool sync)
> +{
> +	struct device *parent = NULL;
> +	unsigned long flags;
> +	int error = -EINVAL;

Remove the initializer.

> +
> +	might_sleep();
> +
> +	spin_lock_irqsave(&dev->power.lock, flags);
> +
> + repeat:
> +	if (dev->power.runtime_status == RPM_ERROR) {

Insert:		error = -EINVAL;

> +		goto out;
> +	} else if (dev->power.runtime_status & RPM_SUSPENDED) {

...


> +void pm_runtime_put(struct device *dev)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&dev->power.lock, flags);
> +
> +	if (!__pm_runtime_put(dev)) {
> +		dev_WARN(dev, "Unbalanced counter decrementation");

"decrementation" isn't a word -- or if it is, it shouldn't be.  :-)  
Just use "decrement".  Similarly in other places.

> +/**
> + * pm_runtime_add - Update run-time PM fields of a device while adding it.
> + * @dev: Device object being added to device hierarchy.
> + */
> +void pm_runtime_add(struct device *dev)
> +{
> +	dev->power.runtime_notify = false;
> +	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);

Doesn't INIT_DELAYED_WORK belong in pm_runtime_init?
Do we want the bus subsystem to be responsible for doing:

	dev->power.runtime_disabled = false;
	pm_runtime_put(dev);

after calling device_add?  Or should device_add do it?


> Index: linux-2.6/include/linux/pm_runtime.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6/include/linux/pm_runtime.h

> +static inline struct device *suspend_work_to_device(struct work_struct *work)
> +{
> +	struct delayed_work *dw = to_delayed_work(work);
> +	struct dev_pm_info *dpi;
> +
> +	dpi = container_of(dw, struct dev_pm_info, suspend_work);
> +	return container_of(dpi, struct device, power);
> +}

You don't need to iterate container_of like this.  You can do:

	return container_of(dw, struct device, power.suspend_work);

> +
> +static inline struct device *work_to_device(struct work_struct *work)
> +{
> +	struct dev_pm_info *dpi;
> +
> +	dpi = container_of(work, struct dev_pm_info, work);
> +	return container_of(dpi, struct device, power);
> +}

Similarly here.

These two routines aren't used outside of runtime.c.  They should be
moved into that file.  The same goes for pm_children_suspended and
pm_suspend_possible.

> +
> +static inline void __pm_runtime_get(struct device *dev)
> +{
> +	atomic_inc(&dev->power.resume_count);
> +}

Why introduce __pm_runtime_get?  Just make this pm_runtime_get.

> +static inline void pm_runtime_remove(struct device *dev)
> +{
> +	pm_runtime_disable(dev);
> +}

You forgot to decrement the parent's child_count if dev isn't
suspended (and then do a idle_notify on the parent).  Because of this 
additional complexity, don't inline the routine.

> Index: linux-2.6/drivers/base/dd.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/dd.c
> +++ linux-2.6/drivers/base/dd.c
> @@ -23,6 +23,7 @@
>  #include <linux/kthread.h>
>  #include <linux/wait.h>
>  #include <linux/async.h>
> +#include <linux/pm_runtime.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
>  	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
>  		 drv->bus->name, __func__, dev_name(dev), drv->name);
>  
> +	pm_runtime_disable(dev);
> +
>  	ret = really_probe(dev, drv);
>  
> +	pm_runtime_enable(dev);
> +

Shouldn't we guarantee that a device isn't probed while it is in a
suspended state?  So this should be

	pm_runtime_get(dev);
	ret = pm_runtime_resume(dev);
	if (ret == 0)
		ret = really_probe(dev, drv);
	pm_runtime_put(dev);	

It might be nice to have a simple combined pm_runtime_get_and_resume
for this sort of situation.


More comments to follow when I get time to review more of the code...

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 19:24       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
@ 2009-06-25 14:57           ` Magnus Damm
  2009-06-24 21:30         ` Alan Stern
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Magnus Damm @ 2009-06-25 14:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-pm, Oliver Neukum, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Thu, Jun 25, 2009 at 4:24 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
>
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
>
> Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Hi Rafael,

Thanks for your work on this. I've built some code for SuperH on top
of this today, and with that behind me I have a few questions and a
little bit of code feedback.

Questions:

1) Which functions are device drivers supposed to use?

I simply added pm_runtime_resume() and pm_runtime_suspend() where
clk_enable() and clk_disable() normally are used. In interrupt
handlers I used pm_request_suspend() instead of pm_runtime_suspend().

I'm not sure if the v5 patch does the right thing around
really_probe() like Alan pointed out. Basically, I'd like to be able
to call my bus callback for ->runtime_resume() from the driver
probe(), but power.resume_count seems stuck at 1 which leads to
pm_runtime_resume() returning -EAGAIN before invoking the bus
callback.

This leads to question number two...

2) What's the default state in probe()?

We touched this subject briefly before. I'd like to compare the
Runtime PM default state with the clock framework default state. The
clock framework requires you to use clk_enable() to enable the clock
to a hardware block before it is allowed to access the hardware
registers. At least that's how we handle stop bits on SuperH. So
clocks come up disabled from boot and should be enabled and disabled
by the device driver to save power.

I'd like to change our Module Stop Bits code on SuperH (once again)
from being handled by the clock framework to being managed by the
Runtime PM framework. Having the clock framework deal with the stop
bits works fine today, they are off by default after boot, and the
driver often enables the clock with clk_enable() in probe() or
hopefully in some more finegrained fashion.

I'm not sure how the Module Stop Bits should fit with the Runtime PM
code though. The default state for a device at probe() time seems to
be RPM_ACTIVE. Should drivers call pm_runtime_enable() to enable
Runtime PM?

One part of me likes the idea that Runtime PM-enabled drivers start in
RPM_SUSPENDED so they are forced to put pm_runtime_resume() before
actually using the hardware. This makes the Runtime PM behaviour
pretty close to the clock framework.

If you dislike starting from RPM_SUSPENDED (most likely) then I wonder
how I should set the state to RPM_SUSPENDED in the driver. I'd like to
make sure that pm_runtime_resume() can invoke the bus callback so the
hardware can be turned on for the first time somehow. Should I do a
dummy suspend?

3) Should drivers use pm_suspend_ignore_children(dev, true)?

It turns out that I can't suspend my I2C master driver out of the box
since it becomes the parent of all slaves on the I2C bus. The I2C
master driver is just a platform driver, and the children are I2C
devices (90% sure). I want to do Runtime PM regardless if the child
devices are suspended or not, I guess I should use
pm_suspend_ignore_children(dev, true) then?

> +/**
> + * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
> + * @dev: Device to resume.
> + * @sync: If unset, the funtion has been called via pm_wq.
> + *
> + * Check if the device is really suspended and run the ->runtime_resume()
> + * callback provided by the device's bus type driver.  Update the run-time PM
> + * flags in the device object to reflect the current status of the device.  If
> + * runtime suspend is in progress while this function is being run, wait for it
> + * to finish before resuming the device.  If runtime suspend is scheduled, but
> + * it hasn't started yet, cancel it and we're done.
> + */
> +int __pm_runtime_resume(struct device *dev, bool sync)
> +{
[snip]
> +}
> +EXPORT_SYMBOL_GPL(pm_runtime_resume);

You're missing "__" here unless you're aiming for something very exotic. =)

> +/**
> + * pm_runtime_work - Run __pm_runtime_resume() for a device.
> + * @work: Work structure used for scheduling the execution of this function.
> + *
> + * Use @work to get the device object the resume has been scheduled for and run
> + * __pm_runtime_resume() for it.
> + */
> +static void pm_runtime_work(struct work_struct *work)
> +{
> +       __pm_runtime_resume(work_to_device(work), false);
> +}

Anything wrong with the name pm_runtime_resume_work()?

Looking forward to v6, I'll switch task now, will be back to this late Monday.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of  I/O devices (rev. 5)
@ 2009-06-25 14:57           ` Magnus Damm
  0 siblings, 0 replies; 102+ messages in thread
From: Magnus Damm @ 2009-06-25 14:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, linux-pm, Oliver Neukum, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Thu, Jun 25, 2009 at 4:24 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
>
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
>
> Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Hi Rafael,

Thanks for your work on this. I've built some code for SuperH on top
of this today, and with that behind me I have a few questions and a
little bit of code feedback.

Questions:

1) Which functions are device drivers supposed to use?

I simply added pm_runtime_resume() and pm_runtime_suspend() where
clk_enable() and clk_disable() normally are used. In interrupt
handlers I used pm_request_suspend() instead of pm_runtime_suspend().

I'm not sure if the v5 patch does the right thing around
really_probe() like Alan pointed out. Basically, I'd like to be able
to call my bus callback for ->runtime_resume() from the driver
probe(), but power.resume_count seems stuck at 1 which leads to
pm_runtime_resume() returning -EAGAIN before invoking the bus
callback.

This leads to question number two...

2) What's the default state in probe()?

We touched this subject briefly before. I'd like to compare the
Runtime PM default state with the clock framework default state. The
clock framework requires you to use clk_enable() to enable the clock
to a hardware block before it is allowed to access the hardware
registers. At least that's how we handle stop bits on SuperH. So
clocks come up disabled from boot and should be enabled and disabled
by the device driver to save power.

I'd like to change our Module Stop Bits code on SuperH (once again)
from being handled by the clock framework to being managed by the
Runtime PM framework. Having the clock framework deal with the stop
bits works fine today, they are off by default after boot, and the
driver often enables the clock with clk_enable() in probe() or
hopefully in some more finegrained fashion.

I'm not sure how the Module Stop Bits should fit with the Runtime PM
code though. The default state for a device at probe() time seems to
be RPM_ACTIVE. Should drivers call pm_runtime_enable() to enable
Runtime PM?

One part of me likes the idea that Runtime PM-enabled drivers start in
RPM_SUSPENDED so they are forced to put pm_runtime_resume() before
actually using the hardware. This makes the Runtime PM behaviour
pretty close to the clock framework.

If you dislike starting from RPM_SUSPENDED (most likely) then I wonder
how I should set the state to RPM_SUSPENDED in the driver. I'd like to
make sure that pm_runtime_resume() can invoke the bus callback so the
hardware can be turned on for the first time somehow. Should I do a
dummy suspend?

3) Should drivers use pm_suspend_ignore_children(dev, true)?

It turns out that I can't suspend my I2C master driver out of the box
since it becomes the parent of all slaves on the I2C bus. The I2C
master driver is just a platform driver, and the children are I2C
devices (90% sure). I want to do Runtime PM regardless if the child
devices are suspended or not, I guess I should use
pm_suspend_ignore_children(dev, true) then?

> +/**
> + * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
> + * @dev: Device to resume.
> + * @sync: If unset, the funtion has been called via pm_wq.
> + *
> + * Check if the device is really suspended and run the ->runtime_resume()
> + * callback provided by the device's bus type driver.  Update the run-time PM
> + * flags in the device object to reflect the current status of the device.  If
> + * runtime suspend is in progress while this function is being run, wait for it
> + * to finish before resuming the device.  If runtime suspend is scheduled, but
> + * it hasn't started yet, cancel it and we're done.
> + */
> +int __pm_runtime_resume(struct device *dev, bool sync)
> +{
[snip]
> +}
> +EXPORT_SYMBOL_GPL(pm_runtime_resume);

You're missing "__" here unless you're aiming for something very exotic. =)

> +/**
> + * pm_runtime_work - Run __pm_runtime_resume() for a device.
> + * @work: Work structure used for scheduling the execution of this function.
> + *
> + * Use @work to get the device object the resume has been scheduled for and run
> + * __pm_runtime_resume() for it.
> + */
> +static void pm_runtime_work(struct work_struct *work)
> +{
> +       __pm_runtime_resume(work_to_device(work), false);
> +}

Anything wrong with the name pm_runtime_resume_work()?

Looking forward to v6, I'll switch task now, will be back to this late Monday.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 19:24       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
                           ` (2 preceding siblings ...)
  2009-06-25 14:57           ` Magnus Damm
@ 2009-06-25 14:57         ` Magnus Damm
  3 siblings, 0 replies; 102+ messages in thread
From: Magnus Damm @ 2009-06-25 14:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, linux-pm, Ingo Molnar,
	Arjan van de Ven

On Thu, Jun 25, 2009 at 4:24 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
>
> Introduce a core framework for run-time power management of I/O
> devices.  Add device run-time PM fields to 'struct dev_pm_info'
> and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> a run-time PM workqueue and define some device run-time PM helper
> functions at the core level.  Document all these things.
>
> Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Hi Rafael,

Thanks for your work on this. I've built some code for SuperH on top
of this today, and with that behind me I have a few questions and a
little bit of code feedback.

Questions:

1) Which functions are device drivers supposed to use?

I simply added pm_runtime_resume() and pm_runtime_suspend() where
clk_enable() and clk_disable() normally are used. In interrupt
handlers I used pm_request_suspend() instead of pm_runtime_suspend().

I'm not sure if the v5 patch does the right thing around
really_probe() like Alan pointed out. Basically, I'd like to be able
to call my bus callback for ->runtime_resume() from the driver
probe(), but power.resume_count seems stuck at 1 which leads to
pm_runtime_resume() returning -EAGAIN before invoking the bus
callback.

This leads to question number two...

2) What's the default state in probe()?

We touched this subject briefly before. I'd like to compare the
Runtime PM default state with the clock framework default state. The
clock framework requires you to use clk_enable() to enable the clock
to a hardware block before it is allowed to access the hardware
registers. At least that's how we handle stop bits on SuperH. So
clocks come up disabled from boot and should be enabled and disabled
by the device driver to save power.

I'd like to change our Module Stop Bits code on SuperH (once again)
from being handled by the clock framework to being managed by the
Runtime PM framework. Having the clock framework deal with the stop
bits works fine today, they are off by default after boot, and the
driver often enables the clock with clk_enable() in probe() or
hopefully in some more finegrained fashion.

I'm not sure how the Module Stop Bits should fit with the Runtime PM
code though. The default state for a device at probe() time seems to
be RPM_ACTIVE. Should drivers call pm_runtime_enable() to enable
Runtime PM?

One part of me likes the idea that Runtime PM-enabled drivers start in
RPM_SUSPENDED so they are forced to put pm_runtime_resume() before
actually using the hardware. This makes the Runtime PM behaviour
pretty close to the clock framework.

If you dislike starting from RPM_SUSPENDED (most likely) then I wonder
how I should set the state to RPM_SUSPENDED in the driver. I'd like to
make sure that pm_runtime_resume() can invoke the bus callback so the
hardware can be turned on for the first time somehow. Should I do a
dummy suspend?

3) Should drivers use pm_suspend_ignore_children(dev, true)?

It turns out that I can't suspend my I2C master driver out of the box
since it becomes the parent of all slaves on the I2C bus. The I2C
master driver is just a platform driver, and the children are I2C
devices (90% sure). I want to do Runtime PM regardless if the child
devices are suspended or not, I guess I should use
pm_suspend_ignore_children(dev, true) then?

> +/**
> + * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
> + * @dev: Device to resume.
> + * @sync: If unset, the funtion has been called via pm_wq.
> + *
> + * Check if the device is really suspended and run the ->runtime_resume()
> + * callback provided by the device's bus type driver.  Update the run-time PM
> + * flags in the device object to reflect the current status of the device.  If
> + * runtime suspend is in progress while this function is being run, wait for it
> + * to finish before resuming the device.  If runtime suspend is scheduled, but
> + * it hasn't started yet, cancel it and we're done.
> + */
> +int __pm_runtime_resume(struct device *dev, bool sync)
> +{
[snip]
> +}
> +EXPORT_SYMBOL_GPL(pm_runtime_resume);

You're missing "__" here unless you're aiming for something very exotic. =)

> +/**
> + * pm_runtime_work - Run __pm_runtime_resume() for a device.
> + * @work: Work structure used for scheduling the execution of this function.
> + *
> + * Use @work to get the device object the resume has been scheduled for and run
> + * __pm_runtime_resume() for it.
> + */
> +static void pm_runtime_work(struct work_struct *work)
> +{
> +       __pm_runtime_resume(work_to_device(work), false);
> +}

Anything wrong with the name pm_runtime_resume_work()?

Looking forward to v6, I'll switch task now, will be back to this late Monday.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [linux-pm] [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 21:30         ` Alan Stern
  2009-06-25 16:49           ` Alan Stern
@ 2009-06-25 16:49           ` Alan Stern
  2009-06-25 21:58             ` Rafael J. Wysocki
  2009-06-25 21:58             ` [linux-pm] " Rafael J. Wysocki
  2009-06-26 21:49           ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
  2009-06-26 21:49           ` Rafael J. Wysocki
  3 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-25 16:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wed, 24 Jun 2009, Alan Stern wrote:
> More comments to follow when I get time to review more of the code...

Here we go.  This isn't so detailed, because I wasn't able to do a 
detailed review.  Frankly, the code is kind of a mess.

The whole business about the runtime_notify and RPM_NOTIFY flags is 
impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
whenever a notify work item is queued.

The pm_notify_or_cancel_work routine should just be pm_notify_work.  
It's silly to submit a workqueue item just to cancel a delayed
workqueue item!  Do all the cancellations in the __pm_runtime_resume
and __pm_runtime_suspend routines, where you're already in process
context.  If this means a work item occasionally runs at the wrong time
then let it -- it will quickly find out that it has nothing to do.  
And while you're at it, get rid of the runtime_break flag.

The logic in __pm_runtime_resume and __pm_runtime_suspend is too
complicated to check.  This is probably because of the interactions
with RPM_NOTIFY and runtime_break.  Once they are gone, the logic
should be much more straightforward: test the flags, then do whatever 
is needed based on the status.

I think once these cleanups are made, the code will be a lot more 
transparent.

In __pm_runtime_resume, don't assume that incrementing the parent's
child_count will prevent the parent from suspending; also increment the
resume_count.  And don't forget to decrement the parent's child_count
again if the resume fails.

In __pm_runtime_suspend, you should decrement the parent's child_count
before releasing the child's lock.  The pm_runtime_idle call should 
stay where it is, of course.

One more thing: Don't use flush_work or its relatives -- it tends to
cause deadlocks.  Use cancel_work_sync instead.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 21:30         ` Alan Stern
@ 2009-06-25 16:49           ` Alan Stern
  2009-06-25 16:49           ` [linux-pm] " Alan Stern
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-25 16:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wed, 24 Jun 2009, Alan Stern wrote:
> More comments to follow when I get time to review more of the code...

Here we go.  This isn't so detailed, because I wasn't able to do a 
detailed review.  Frankly, the code is kind of a mess.

The whole business about the runtime_notify and RPM_NOTIFY flags is 
impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
whenever a notify work item is queued.

The pm_notify_or_cancel_work routine should just be pm_notify_work.  
It's silly to submit a workqueue item just to cancel a delayed
workqueue item!  Do all the cancellations in the __pm_runtime_resume
and __pm_runtime_suspend routines, where you're already in process
context.  If this means a work item occasionally runs at the wrong time
then let it -- it will quickly find out that it has nothing to do.  
And while you're at it, get rid of the runtime_break flag.

The logic in __pm_runtime_resume and __pm_runtime_suspend is too
complicated to check.  This is probably because of the interactions
with RPM_NOTIFY and runtime_break.  Once they are gone, the logic
should be much more straightforward: test the flags, then do whatever 
is needed based on the status.

I think once these cleanups are made, the code will be a lot more 
transparent.

In __pm_runtime_resume, don't assume that incrementing the parent's
child_count will prevent the parent from suspending; also increment the
resume_count.  And don't forget to decrement the parent's child_count
again if the resume fails.

In __pm_runtime_suspend, you should decrement the parent's child_count
before releasing the child's lock.  The pm_runtime_idle call should 
stay where it is, of course.

One more thing: Don't use flush_work or its relatives -- it tends to
cause deadlocks.  Use cancel_work_sync instead.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [linux-pm] [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 16:49           ` [linux-pm] " Alan Stern
  2009-06-25 21:58             ` Rafael J. Wysocki
@ 2009-06-25 21:58             ` Rafael J. Wysocki
  2009-06-25 23:17               ` Rafael J. Wysocki
                                 ` (3 more replies)
  1 sibling, 4 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-25 21:58 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 25 June 2009, Alan Stern wrote:
> On Wed, 24 Jun 2009, Alan Stern wrote:
> > More comments to follow when I get time to review more of the code...
> 
> Here we go.  This isn't so detailed, because I wasn't able to do a 
> detailed review.  Frankly, the code is kind of a mess.
> 
> The whole business about the runtime_notify and RPM_NOTIFY flags is 
> impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> whenever a notify work item is queued.

I was going to do exactly that, but I realized it wouldn't work in general,
because ->runtime_idle() could run __pm_runtime_suspend() in theory.

The runtime_notify bit is only needed for pm_runtime_disable() so that it
knows there's a work item to cancel.

> The pm_notify_or_cancel_work routine should just be pm_notify_work.  
> It's silly to submit a workqueue item just to cancel a delayed
> workqueue item!

Maybe, but how do you think we should cancel it?  cancel_delayed_work()
doesn't guarantee that the work structure used for queuing the work will
not be accessed after it's returned and we can't schedule the next suspend
request until we know it's safe.  So, we have to use cancel_delayed_work_sync()
for that, which can't be done from interrupt context, so we need to do it in a
work function.

> Do all the cancellations in the __pm_runtime_resume and
> __pm_runtime_suspend routines, where you're already in process
> context.

I tried that, but it looked even worse. :-)

> If this means a work item occasionally runs at the wrong time
> then let it -- it will quickly find out that it has nothing to do.

It's not so easy, because it doesn't make sense to let suspend run
if there's already a resume being scheduled or running.
 
> And while you're at it, get rid of the runtime_break flag.

I think it's necessary.  Otherwise I wouldn't have put it in there.

> The logic in __pm_runtime_resume and __pm_runtime_suspend is too
> complicated to check.  This is probably because of the interactions
> with RPM_NOTIFY and runtime_break.  Once they are gone, the logic
> should be much more straightforward: test the flags, then do whatever 
> is needed based on the status.

I tried that, but it turned out to be insufficient, unless there are more
flags.  Well, perhaps adding more flags is the way to go.

> I think once these cleanups are made, the code will be a lot more 
> transparent.
> 
> In __pm_runtime_resume, don't assume that incrementing the parent's
> child_count will prevent the parent from suspending; also increment the
> resume_count.

It's incremented, but dropped too early.

> And don't forget to decrement the parent's child_count again if the resume
> fails.

I didn't _forget_it, because the device can't be RPM_SUSPENDED after
__pm_runtime_resume().

> In __pm_runtime_suspend, you should decrement the parent's child_count
> before releasing the child's lock.

Why exactly is that necessary?

> The pm_runtime_idle call should stay where it is, of course.
>
> One more thing: Don't use flush_work or its relatives -- it tends to
> cause deadlocks.

Oh, well.

> Use cancel_work_sync instead.

OK

Thanks for your comments, but I'm really afraid I won't be able to simplify
the code very much.  It's complicated, because the problem is complicated.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 16:49           ` [linux-pm] " Alan Stern
@ 2009-06-25 21:58             ` Rafael J. Wysocki
  2009-06-25 21:58             ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-25 21:58 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 25 June 2009, Alan Stern wrote:
> On Wed, 24 Jun 2009, Alan Stern wrote:
> > More comments to follow when I get time to review more of the code...
> 
> Here we go.  This isn't so detailed, because I wasn't able to do a 
> detailed review.  Frankly, the code is kind of a mess.
> 
> The whole business about the runtime_notify and RPM_NOTIFY flags is 
> impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> whenever a notify work item is queued.

I was going to do exactly that, but I realized it wouldn't work in general,
because ->runtime_idle() could run __pm_runtime_suspend() in theory.

The runtime_notify bit is only needed for pm_runtime_disable() so that it
knows there's a work item to cancel.

> The pm_notify_or_cancel_work routine should just be pm_notify_work.  
> It's silly to submit a workqueue item just to cancel a delayed
> workqueue item!

Maybe, but how do you think we should cancel it?  cancel_delayed_work()
doesn't guarantee that the work structure used for queuing the work will
not be accessed after it's returned and we can't schedule the next suspend
request until we know it's safe.  So, we have to use cancel_delayed_work_sync()
for that, which can't be done from interrupt context, so we need to do it in a
work function.

> Do all the cancellations in the __pm_runtime_resume and
> __pm_runtime_suspend routines, where you're already in process
> context.

I tried that, but it looked even worse. :-)

> If this means a work item occasionally runs at the wrong time
> then let it -- it will quickly find out that it has nothing to do.

It's not so easy, because it doesn't make sense to let suspend run
if there's already a resume being scheduled or running.
 
> And while you're at it, get rid of the runtime_break flag.

I think it's necessary.  Otherwise I wouldn't have put it in there.

> The logic in __pm_runtime_resume and __pm_runtime_suspend is too
> complicated to check.  This is probably because of the interactions
> with RPM_NOTIFY and runtime_break.  Once they are gone, the logic
> should be much more straightforward: test the flags, then do whatever 
> is needed based on the status.

I tried that, but it turned out to be insufficient, unless there are more
flags.  Well, perhaps adding more flags is the way to go.

> I think once these cleanups are made, the code will be a lot more 
> transparent.
> 
> In __pm_runtime_resume, don't assume that incrementing the parent's
> child_count will prevent the parent from suspending; also increment the
> resume_count.

It's incremented, but dropped too early.

> And don't forget to decrement the parent's child_count again if the resume
> fails.

I didn't _forget_it, because the device can't be RPM_SUSPENDED after
__pm_runtime_resume().

> In __pm_runtime_suspend, you should decrement the parent's child_count
> before releasing the child's lock.

Why exactly is that necessary?

> The pm_runtime_idle call should stay where it is, of course.
>
> One more thing: Don't use flush_work or its relatives -- it tends to
> cause deadlocks.

Oh, well.

> Use cancel_work_sync instead.

OK

Thanks for your comments, but I'm really afraid I won't be able to simplify
the code very much.  It's complicated, because the problem is complicated.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [linux-pm] [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 21:58             ` [linux-pm] " Rafael J. Wysocki
@ 2009-06-25 23:17               ` Rafael J. Wysocki
  2009-06-25 23:17               ` Rafael J. Wysocki
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-25 23:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 25 June 2009, Rafael J. Wysocki wrote:
> On Thursday 25 June 2009, Alan Stern wrote:
> > On Wed, 24 Jun 2009, Alan Stern wrote:
> > > More comments to follow when I get time to review more of the code...
> > 
> > Here we go.  This isn't so detailed, because I wasn't able to do a 
> > detailed review.  Frankly, the code is kind of a mess.
> > 
> > The whole business about the runtime_notify and RPM_NOTIFY flags is 
> > impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> > and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> > whenever a notify work item is queued.
> 
> I was going to do exactly that, but I realized it wouldn't work in general,
> because ->runtime_idle() could run __pm_runtime_suspend() in theory.
> 
> The runtime_notify bit is only needed for pm_runtime_disable() so that it
> knows there's a work item to cancel.
> 
> > The pm_notify_or_cancel_work routine should just be pm_notify_work.  
> > It's silly to submit a workqueue item just to cancel a delayed
> > workqueue item!
> 
> Maybe, but how do you think we should cancel it?  cancel_delayed_work()
> doesn't guarantee that the work structure used for queuing the work will
> not be accessed after it's returned and we can't schedule the next suspend
> request until we know it's safe.  So, we have to use cancel_delayed_work_sync()
> for that, which can't be done from interrupt context, so we need to do it in a
> work function.

BTW, the problem is this.

Say we queue an idle notification, so power.work is used for this purpose.
Then, suspend is requested, but we cannot cancel the notification
asynchronously, so we can only queue the suspend.  power.suspend_work is used
for that.

Now, assume a resume is requested, so seemingly we need to queue a resume
(the suspend request is pending and we have to cancel it with
cancel_delayed_work_sync()), but power.work is already in use.  There are
two ways to handle this IMO: (1) use yet another 'struct work' variable
(suboptimal) and (2) make the idle notification work function cancel the
suspend instead of running the notification (that's what I did and I don't see
how I can avoid it without doing (1)).

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 21:58             ` [linux-pm] " Rafael J. Wysocki
  2009-06-25 23:17               ` Rafael J. Wysocki
@ 2009-06-25 23:17               ` Rafael J. Wysocki
  2009-06-26 18:06               ` Alan Stern
  2009-06-26 18:06               ` [linux-pm] " Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-25 23:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 25 June 2009, Rafael J. Wysocki wrote:
> On Thursday 25 June 2009, Alan Stern wrote:
> > On Wed, 24 Jun 2009, Alan Stern wrote:
> > > More comments to follow when I get time to review more of the code...
> > 
> > Here we go.  This isn't so detailed, because I wasn't able to do a 
> > detailed review.  Frankly, the code is kind of a mess.
> > 
> > The whole business about the runtime_notify and RPM_NOTIFY flags is 
> > impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> > and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> > whenever a notify work item is queued.
> 
> I was going to do exactly that, but I realized it wouldn't work in general,
> because ->runtime_idle() could run __pm_runtime_suspend() in theory.
> 
> The runtime_notify bit is only needed for pm_runtime_disable() so that it
> knows there's a work item to cancel.
> 
> > The pm_notify_or_cancel_work routine should just be pm_notify_work.  
> > It's silly to submit a workqueue item just to cancel a delayed
> > workqueue item!
> 
> Maybe, but how do you think we should cancel it?  cancel_delayed_work()
> doesn't guarantee that the work structure used for queuing the work will
> not be accessed after it's returned and we can't schedule the next suspend
> request until we know it's safe.  So, we have to use cancel_delayed_work_sync()
> for that, which can't be done from interrupt context, so we need to do it in a
> work function.

BTW, the problem is this.

Say we queue an idle notification, so power.work is used for this purpose.
Then, suspend is requested, but we cannot cancel the notification
asynchronously, so we can only queue the suspend.  power.suspend_work is used
for that.

Now, assume a resume is requested, so seemingly we need to queue a resume
(the suspend request is pending and we have to cancel it with
cancel_delayed_work_sync()), but power.work is already in use.  There are
two ways to handle this IMO: (1) use yet another 'struct work' variable
(suboptimal) and (2) make the idle notification work function cancel the
suspend instead of running the notification (that's what I did and I don't see
how I can avoid it without doing (1)).

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [linux-pm] [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 21:58             ` [linux-pm] " Rafael J. Wysocki
                                 ` (2 preceding siblings ...)
  2009-06-26 18:06               ` Alan Stern
@ 2009-06-26 18:06               ` Alan Stern
  2009-06-26 20:46                 ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Rafael J. Wysocki
  2009-06-26 20:46                 ` Rafael J. Wysocki
  3 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-26 18:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thu, 25 Jun 2009, Rafael J. Wysocki wrote:

> > The whole business about the runtime_notify and RPM_NOTIFY flags is 
> > impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> > and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> > whenever a notify work item is queued.
> 
> I was going to do exactly that, but I realized it wouldn't work in general,
> because ->runtime_idle() could run __pm_runtime_suspend() in theory.

I'll cut this short by noting the dilemma.  If the runtime_idle 
callback does a synchronous suspend, and __pm_runtime_suspend sees the 
status is already RPM_SUSPENDING, then it will wait for the suspend to 
finish.  Hence it's not safe to do cancel_work_sync from within 
__pm_runtime_suspend; it might deadlock.

It occurs to me that the problem would be solved if were a cancel_work
routine.  In the same vein, it ought to be possible for
cancel_delayed_work to run in interrupt context.  I'll see what can be
done.

What do you think about adding a version of pm_runtime_put that would 
call pm_runtime_idle directly when the counter reaches 0, instead of 
queuing an idle request?  I feel that drivers should have a choice 
about which sort of notification to use.


> > And don't forget to decrement the parent's child_count again if the resume
> > fails.
> 
> I didn't _forget_it, because the device can't be RPM_SUSPENDED after
> __pm_runtime_resume().

You're right; that fact escaped me.

> > In __pm_runtime_suspend, you should decrement the parent's child_count
> > before releasing the child's lock.
> 
> Why exactly is that necessary?

I guess it isn't.  But it won't hurt to keep the parent's counter
synchronized with the child's state as closely as possible.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 21:58             ` [linux-pm] " Rafael J. Wysocki
  2009-06-25 23:17               ` Rafael J. Wysocki
  2009-06-25 23:17               ` Rafael J. Wysocki
@ 2009-06-26 18:06               ` Alan Stern
  2009-06-26 18:06               ` [linux-pm] " Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-26 18:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thu, 25 Jun 2009, Rafael J. Wysocki wrote:

> > The whole business about the runtime_notify and RPM_NOTIFY flags is 
> > impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> > and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> > whenever a notify work item is queued.
> 
> I was going to do exactly that, but I realized it wouldn't work in general,
> because ->runtime_idle() could run __pm_runtime_suspend() in theory.

I'll cut this short by noting the dilemma.  If the runtime_idle 
callback does a synchronous suspend, and __pm_runtime_suspend sees the 
status is already RPM_SUSPENDING, then it will wait for the suspend to 
finish.  Hence it's not safe to do cancel_work_sync from within 
__pm_runtime_suspend; it might deadlock.

It occurs to me that the problem would be solved if were a cancel_work
routine.  In the same vein, it ought to be possible for
cancel_delayed_work to run in interrupt context.  I'll see what can be
done.

What do you think about adding a version of pm_runtime_put that would 
call pm_runtime_idle directly when the counter reaches 0, instead of 
queuing an idle request?  I feel that drivers should have a choice 
about which sort of notification to use.


> > And don't forget to decrement the parent's child_count again if the resume
> > fails.
> 
> I didn't _forget_it, because the device can't be RPM_SUSPENDED after
> __pm_runtime_resume().

You're right; that fact escaped me.

> > In __pm_runtime_suspend, you should decrement the parent's child_count
> > before releasing the child's lock.
> 
> Why exactly is that necessary?

I guess it isn't.  But it won't hurt to keep the parent's counter
synchronized with the child's state as closely as possible.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 18:06               ` [linux-pm] " Alan Stern
  2009-06-26 20:46                 ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Rafael J. Wysocki
@ 2009-06-26 20:46                 ` Rafael J. Wysocki
  2009-06-26 21:13                   ` Alan Stern
  2009-06-26 21:13                   ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 20:46 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 26 June 2009, Alan Stern wrote:
> On Thu, 25 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > The whole business about the runtime_notify and RPM_NOTIFY flags is 
> > > impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> > > and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> > > whenever a notify work item is queued.
> > 
> > I was going to do exactly that, but I realized it wouldn't work in general,
> > because ->runtime_idle() could run __pm_runtime_suspend() in theory.
> 
> I'll cut this short by noting the dilemma.  If the runtime_idle 
> callback does a synchronous suspend, and __pm_runtime_suspend sees the 
> status is already RPM_SUSPENDING, then it will wait for the suspend to 
> finish.  Hence it's not safe to do cancel_work_sync from within 
> __pm_runtime_suspend; it might deadlock.

Exactly. 

> It occurs to me that the problem would be solved if were a cancel_work
> routine.  In the same vein, it ought to be possible for
> cancel_delayed_work to run in interrupt context.  I'll see what can be
> done.

Having looked at the workqueue code I'm not sure if there's a way to implement
that in a non-racy way.  Which may be the reason why there are no such
functions already. :-)

> What do you think about adding a version of pm_runtime_put that would 
> call pm_runtime_idle directly when the counter reaches 0, instead of 
> queuing an idle request?  I feel that drivers should have a choice 
> about which sort of notification to use.

There can be pm_runtime_put_atomic() that will queue the idle request
and pm_runtime_put() that will call pm_runtime_idle() directly, why not.

> > > And don't forget to decrement the parent's child_count again if the resume
> > > fails.
> > 
> > I didn't _forget_it, because the device can't be RPM_SUSPENDED after
> > __pm_runtime_resume().
> 
> You're right; that fact escaped me.
> 
> > > In __pm_runtime_suspend, you should decrement the parent's child_count
> > > before releasing the child's lock.
> > 
> > Why exactly is that necessary?
> 
> I guess it isn't.  But it won't hurt to keep the parent's counter
> synchronized with the child's state as closely as possible.

OK

In the meantime I reworked the patch (below) to use more RPM_* flags and I
removed the runtime_break and runtime_notify bits from it.  Also added some
comments to explain some non-obvious steps (hope that helps).

I also added the pm_runtime_put_atomic() and pm_runtime_put() as per the
comment above.

It seems to be a bit cleaner this way, but that's my personal view. :-)

Best,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 6)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |    9 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  734 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |  117 ++++++
 include/linux/pm_runtime.h   |  133 +++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 
 9 files changed, 1042 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,97 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_NOTIFY		Idle notification has been scheduled for the device.
+ *
+ * RPM_NOTIFYING	Device bus type's ->runtime_idle() callback is being
+ *			executed (as a result of a scheduled idle notification
+ *			request).
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPEND		Attempt to suspend the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_suspend() callback has not been
+ *			executed yet.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUME		Attempt to wake up the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_resume() callback has not been
+ *			executed yet.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+
+#define RPM_NOTIFY	0x001
+#define RPM_NOTIFYING	0x002
+#define RPM_IDLE	0x004
+#define RPM_SUSPEND	0x008
+#define RPM_SUSPENDING	0x010
+#define RPM_SUSPENDED	0x020
+#define RPM_WAKE	0x040
+#define RPM_RESUME	0x080
+#define RPM_RESUMING	0x100
+
+#define RPM_ERROR	0x1FF
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,734 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * It is possible that suspend request was scheduled and resume was requested
+ * before this function has a chance to run.  If there's a suspend request
+ * pending only, return doing nothing, but if resume was requested in addition
+ * to it, cancel the suspend request.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	unsigned long flags;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & ~(RPM_NOTIFY|RPM_WAKE))
+		/*
+		 * Device suspended or run-time PM operation in progress. The
+		 * RPM_NOTIFY bit should have been cleared in that case.
+		 */
+		goto out;
+
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+
+	if (dev->power.runtime_status == RPM_WAKE) {
+		/*
+		 * Resume has been requested, and because all of the suspend
+		 * status bits are clear, there must be a suspend request
+		 * pending.  We have to cancel that request.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/*
+		 * Return if someone else has changed the status.  Otherwise,
+		 * the idle notification may still be worth running.
+		 */
+		if (dev->power.runtime_status != RPM_WAKE)
+			goto out;
+	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	dev->power.runtime_status = RPM_NOTIFYING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* The status might have been changed while executing runtime_idle(). */
+	dev->power.runtime_status &= ~RPM_NOTIFYING;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * pm_runtime_idle_work - Run pm_runtime_idle() via pm_wq.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the idle notification has been scheduled
+ * for and run pm_runtime_idle() for it.
+ */
+static void pm_runtime_idle_work(struct work_struct *work)
+{
+	pm_runtime_idle(pm_work_to_device(work));
+}
+
+/**
+ * pm_runtime_put_atomic - Decrement resume counter and queue idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if the device's run-time PM
+ * status is right for suspending and queue up a request to run
+ * pm_runtime_idle() for it.
+ */
+void pm_runtime_put_atomic(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		goto out;
+	}
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put_atomic);
+
+/**
+ * pm_runtime_put - Decrement resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter and run pm_runtime_idle() for it.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		return;
+	}
+
+	pm_runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If there's an idle notification pending, cancel it.  If
+ * there's a suspend request scheduled while this function is running and @sync
+ * is 'true', cancel that request.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	bool cancel_pending = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_SUSPENDED) {
+		/* Device suspended, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/* Another suspend is running in parallel with us. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return dev->power.runtime_error;
+	}
+
+	if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		/* Resume is scheduled or in progress. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * If there's a suspend request pending and we're not running as a
+	 * result of it, the request has to be cancelled, because it may be
+	 * scheduled in the future and we can't leave it behind us.
+	 */
+	if (sync && (dev->power.runtime_status & RPM_IDLE))
+		cancel_pending = true;
+
+	/* Clear the suspend status bits in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_IDLE|RPM_SUSPEND);
+
+	if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled) {
+		/* We are forbidden to suspend. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		error = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_SUSPEND in case we have to start over, to prevent idle
+	 * notifications from happening and new suspend requests from being
+	 * scheduled.
+	 */
+	dev->power.runtime_status |= RPM_SUSPEND;
+
+	if (cancel_pending) {
+		/* Cancel the concurrent pending suspend request. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	if (dev->power.runtime_status & RPM_NOTIFY) {
+		/* Idle notification is pending, cancel it. */
+		dev->power.runtime_status &= ~RPM_NOTIFY;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+	dev->power.runtime_status |= RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status &= RPM_NOTIFYING;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent) {
+		parent = dev->parent;
+		atomic_dec(&parent->power.child_count);
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent && !parent->power.ignore_children)
+		pm_runtime_idle(parent);
+
+	if (error == -EBUSY || error == -EAGAIN)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run __pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (dev->power.runtime_status & RPM_SUSPENDED)
+		/* Device is suspended, nothing to do. */
+		error = -ECANCELED;
+	else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)))
+		/* Can't suspend now. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status &
+				(RPM_IDLE|RPM_SUSPEND|RPM_SUSPENDING))
+		/* Already suspending or suspend request pending. */
+		error = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		error = -EBUSY;
+	if (error)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  If there's a suspend
+ * request or idle notification pending, cancel it.  If there's a resume request
+ * scheduled while this function is running and @sync is 'true', cancel that
+ * request.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		error = dev->power.runtime_error;
+		goto out_parent;
+	}
+
+	if (dev->power.runtime_disabled) {
+		/* Clear the resume flags before returning. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_RESUME in case we have to start over, to prevent suspends and
+	 * idle notifications from happening and new resume requests from being
+	 * queued up.
+	 */
+	dev->power.runtime_status |= RPM_RESUME;
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		goto repeat;
+	}
+
+	if ((dev->power.runtime_status & (RPM_IDLE|RPM_WAKE))
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request is pending that we're supposed to cancel. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	/*
+	 * Clear RPM_SUSPEND in case we've been running in parallel with
+	 * __pm_runtime_suspend().
+	 */
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+
+	if ((sync && (dev->power.runtime_status & RPM_WAKE))
+	    || (dev->power.runtime_status & RPM_NOTIFY)) {
+		/*
+		 * Idle notification is pending and since we're running the
+		 * device is not idle, or there's a resume request pending and
+		 * we're not running as a result of it.  In both cases it's
+		 * better to cancel the request.
+		 */
+		dev->power.runtime_status &= ~(RPM_NOTIFY|RPM_WAKE);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	/* Clear the resume status flags in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+
+	if (!(dev->power.runtime_status & RPM_SUSPENDED)) {
+		/*
+		 * If the device is not suspended at this point, we have
+		 * nothing to do.
+		 */
+		error = 0;
+		goto out;
+	}
+
+	if (!put_parent && parent) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		put_parent = true;
+		pm_runtime_get(parent);
+		error = pm_runtime_resume(parent);
+		if (error)
+			goto out_parent;
+
+		error = -EINVAL;
+		dev->power.runtime_status |= RPM_RESUME;
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPENDED;
+
+	if (parent)
+		atomic_inc(&parent->power.child_count);
+
+	dev->power.runtime_status |= RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status &= ~RPM_RESUMING;
+	if (error)
+		dev->power.runtime_status = RPM_ERROR;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		pm_runtime_put(parent);
+
+	if (!error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	__pm_runtime_resume(pm_work_to_device(work), false);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = -ECANCELED;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, which is not a
+		 * problem unless there's a suspend request pending in addition
+		 * to it.  In that case, ask the idle notification work function
+		 * to cancel the suspend request.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE) {
+			dev->power.runtime_status &= ~RPM_IDLE;
+			dev->power.runtime_status |= RPM_WAKE;
+			error = -EALREADY;
+		} else {
+			error = -ECANCELED;
+		}
+	} else if (dev->power.runtime_status &
+				(RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request is pending.  Make sure it won't run. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+		INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+		error = -EALREADY;
+		goto queue;
+	}
+
+	if ((dev->power.runtime_status & RPM_SUSPENDED) && parent)
+		atomic_inc(&parent->power.child_count);
+
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+ queue:
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		atomic_dec(&parent->power.child_count);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if (dev->power.runtime_status & (RPM_IDLE|RPM_WAKE)
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request pending. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	if (dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY)) {
+		/* Resume request pending or idle notification. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	dev->power.runtime_status &= ~RPM_NOTIFYING;
+
+	if (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING)) {
+		/* Suspend or wake-up in progress. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return;
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	if (dev->parent)
+		atomic_inc(&dev->parent->power.child_count);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,133 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern void pm_runtime_put_atomic(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern int pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+
+	return container_of(dw, struct device, power.suspend_work);
+}
+
+static inline struct device *pm_work_to_device(struct work_struct *work)
+{
+	return container_of(work, struct device, power.work);
+}
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline void pm_runtime_put_atomic(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	return -ENOSYS;
+}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_remove(struct device *dev)
+{
+	pm_runtime_disable(dev);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +331,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 18:06               ` [linux-pm] " Alan Stern
@ 2009-06-26 20:46                 ` Rafael J. Wysocki
  2009-06-26 20:46                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 20:46 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 26 June 2009, Alan Stern wrote:
> On Thu, 25 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > The whole business about the runtime_notify and RPM_NOTIFY flags is 
> > > impenetrable.  My suggestion: Rename runtime_notify to notify_pending 
> > > and eliminate RPM_NOTIFY.  Then make sure that notify_pending is set 
> > > whenever a notify work item is queued.
> > 
> > I was going to do exactly that, but I realized it wouldn't work in general,
> > because ->runtime_idle() could run __pm_runtime_suspend() in theory.
> 
> I'll cut this short by noting the dilemma.  If the runtime_idle 
> callback does a synchronous suspend, and __pm_runtime_suspend sees the 
> status is already RPM_SUSPENDING, then it will wait for the suspend to 
> finish.  Hence it's not safe to do cancel_work_sync from within 
> __pm_runtime_suspend; it might deadlock.

Exactly. 

> It occurs to me that the problem would be solved if were a cancel_work
> routine.  In the same vein, it ought to be possible for
> cancel_delayed_work to run in interrupt context.  I'll see what can be
> done.

Having looked at the workqueue code I'm not sure if there's a way to implement
that in a non-racy way.  Which may be the reason why there are no such
functions already. :-)

> What do you think about adding a version of pm_runtime_put that would 
> call pm_runtime_idle directly when the counter reaches 0, instead of 
> queuing an idle request?  I feel that drivers should have a choice 
> about which sort of notification to use.

There can be pm_runtime_put_atomic() that will queue the idle request
and pm_runtime_put() that will call pm_runtime_idle() directly, why not.

> > > And don't forget to decrement the parent's child_count again if the resume
> > > fails.
> > 
> > I didn't _forget_it, because the device can't be RPM_SUSPENDED after
> > __pm_runtime_resume().
> 
> You're right; that fact escaped me.
> 
> > > In __pm_runtime_suspend, you should decrement the parent's child_count
> > > before releasing the child's lock.
> > 
> > Why exactly is that necessary?
> 
> I guess it isn't.  But it won't hurt to keep the parent's counter
> synchronized with the child's state as closely as possible.

OK

In the meantime I reworked the patch (below) to use more RPM_* flags and I
removed the runtime_break and runtime_notify bits from it.  Also added some
comments to explain some non-obvious steps (hope that helps).

I also added the pm_runtime_put_atomic() and pm_runtime_put() as per the
comment above.

It seems to be a bit cleaner this way, but that's my personal view. :-)

Best,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 6)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |    9 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  734 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |  117 ++++++
 include/linux/pm_runtime.h   |  133 +++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 
 9 files changed, 1042 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,97 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_NOTIFY		Idle notification has been scheduled for the device.
+ *
+ * RPM_NOTIFYING	Device bus type's ->runtime_idle() callback is being
+ *			executed (as a result of a scheduled idle notification
+ *			request).
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPEND		Attempt to suspend the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_suspend() callback has not been
+ *			executed yet.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUME		Attempt to wake up the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_resume() callback has not been
+ *			executed yet.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+
+#define RPM_NOTIFY	0x001
+#define RPM_NOTIFYING	0x002
+#define RPM_IDLE	0x004
+#define RPM_SUSPEND	0x008
+#define RPM_SUSPENDING	0x010
+#define RPM_SUSPENDED	0x020
+#define RPM_WAKE	0x040
+#define RPM_RESUME	0x080
+#define RPM_RESUMING	0x100
+
+#define RPM_ERROR	0x1FF
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,734 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * It is possible that suspend request was scheduled and resume was requested
+ * before this function has a chance to run.  If there's a suspend request
+ * pending only, return doing nothing, but if resume was requested in addition
+ * to it, cancel the suspend request.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	unsigned long flags;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & ~(RPM_NOTIFY|RPM_WAKE))
+		/*
+		 * Device suspended or run-time PM operation in progress. The
+		 * RPM_NOTIFY bit should have been cleared in that case.
+		 */
+		goto out;
+
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+
+	if (dev->power.runtime_status == RPM_WAKE) {
+		/*
+		 * Resume has been requested, and because all of the suspend
+		 * status bits are clear, there must be a suspend request
+		 * pending.  We have to cancel that request.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/*
+		 * Return if someone else has changed the status.  Otherwise,
+		 * the idle notification may still be worth running.
+		 */
+		if (dev->power.runtime_status != RPM_WAKE)
+			goto out;
+	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	dev->power.runtime_status = RPM_NOTIFYING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* The status might have been changed while executing runtime_idle(). */
+	dev->power.runtime_status &= ~RPM_NOTIFYING;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * pm_runtime_idle_work - Run pm_runtime_idle() via pm_wq.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the idle notification has been scheduled
+ * for and run pm_runtime_idle() for it.
+ */
+static void pm_runtime_idle_work(struct work_struct *work)
+{
+	pm_runtime_idle(pm_work_to_device(work));
+}
+
+/**
+ * pm_runtime_put_atomic - Decrement resume counter and queue idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if the device's run-time PM
+ * status is right for suspending and queue up a request to run
+ * pm_runtime_idle() for it.
+ */
+void pm_runtime_put_atomic(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		goto out;
+	}
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put_atomic);
+
+/**
+ * pm_runtime_put - Decrement resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter and run pm_runtime_idle() for it.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		return;
+	}
+
+	pm_runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If there's an idle notification pending, cancel it.  If
+ * there's a suspend request scheduled while this function is running and @sync
+ * is 'true', cancel that request.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	bool cancel_pending = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_SUSPENDED) {
+		/* Device suspended, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/* Another suspend is running in parallel with us. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return dev->power.runtime_error;
+	}
+
+	if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		/* Resume is scheduled or in progress. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * If there's a suspend request pending and we're not running as a
+	 * result of it, the request has to be cancelled, because it may be
+	 * scheduled in the future and we can't leave it behind us.
+	 */
+	if (sync && (dev->power.runtime_status & RPM_IDLE))
+		cancel_pending = true;
+
+	/* Clear the suspend status bits in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_IDLE|RPM_SUSPEND);
+
+	if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled) {
+		/* We are forbidden to suspend. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		error = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_SUSPEND in case we have to start over, to prevent idle
+	 * notifications from happening and new suspend requests from being
+	 * scheduled.
+	 */
+	dev->power.runtime_status |= RPM_SUSPEND;
+
+	if (cancel_pending) {
+		/* Cancel the concurrent pending suspend request. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	if (dev->power.runtime_status & RPM_NOTIFY) {
+		/* Idle notification is pending, cancel it. */
+		dev->power.runtime_status &= ~RPM_NOTIFY;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+	dev->power.runtime_status |= RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status &= RPM_NOTIFYING;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent) {
+		parent = dev->parent;
+		atomic_dec(&parent->power.child_count);
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent && !parent->power.ignore_children)
+		pm_runtime_idle(parent);
+
+	if (error == -EBUSY || error == -EAGAIN)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run __pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (dev->power.runtime_status & RPM_SUSPENDED)
+		/* Device is suspended, nothing to do. */
+		error = -ECANCELED;
+	else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)))
+		/* Can't suspend now. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status &
+				(RPM_IDLE|RPM_SUSPEND|RPM_SUSPENDING))
+		/* Already suspending or suspend request pending. */
+		error = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		error = -EBUSY;
+	if (error)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  If there's a suspend
+ * request or idle notification pending, cancel it.  If there's a resume request
+ * scheduled while this function is running and @sync is 'true', cancel that
+ * request.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		error = dev->power.runtime_error;
+		goto out_parent;
+	}
+
+	if (dev->power.runtime_disabled) {
+		/* Clear the resume flags before returning. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_RESUME in case we have to start over, to prevent suspends and
+	 * idle notifications from happening and new resume requests from being
+	 * queued up.
+	 */
+	dev->power.runtime_status |= RPM_RESUME;
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		goto repeat;
+	}
+
+	if ((dev->power.runtime_status & (RPM_IDLE|RPM_WAKE))
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request is pending that we're supposed to cancel. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	/*
+	 * Clear RPM_SUSPEND in case we've been running in parallel with
+	 * __pm_runtime_suspend().
+	 */
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+
+	if ((sync && (dev->power.runtime_status & RPM_WAKE))
+	    || (dev->power.runtime_status & RPM_NOTIFY)) {
+		/*
+		 * Idle notification is pending and since we're running the
+		 * device is not idle, or there's a resume request pending and
+		 * we're not running as a result of it.  In both cases it's
+		 * better to cancel the request.
+		 */
+		dev->power.runtime_status &= ~(RPM_NOTIFY|RPM_WAKE);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	/* Clear the resume status flags in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+
+	if (!(dev->power.runtime_status & RPM_SUSPENDED)) {
+		/*
+		 * If the device is not suspended at this point, we have
+		 * nothing to do.
+		 */
+		error = 0;
+		goto out;
+	}
+
+	if (!put_parent && parent) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		put_parent = true;
+		pm_runtime_get(parent);
+		error = pm_runtime_resume(parent);
+		if (error)
+			goto out_parent;
+
+		error = -EINVAL;
+		dev->power.runtime_status |= RPM_RESUME;
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPENDED;
+
+	if (parent)
+		atomic_inc(&parent->power.child_count);
+
+	dev->power.runtime_status |= RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status &= ~RPM_RESUMING;
+	if (error)
+		dev->power.runtime_status = RPM_ERROR;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		pm_runtime_put(parent);
+
+	if (!error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_work(struct work_struct *work)
+{
+	__pm_runtime_resume(pm_work_to_device(work), false);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = -ECANCELED;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, which is not a
+		 * problem unless there's a suspend request pending in addition
+		 * to it.  In that case, ask the idle notification work function
+		 * to cancel the suspend request.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE) {
+			dev->power.runtime_status &= ~RPM_IDLE;
+			dev->power.runtime_status |= RPM_WAKE;
+			error = -EALREADY;
+		} else {
+			error = -ECANCELED;
+		}
+	} else if (dev->power.runtime_status &
+				(RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request is pending.  Make sure it won't run. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+		INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+		error = -EALREADY;
+		goto queue;
+	}
+
+	if ((dev->power.runtime_status & RPM_SUSPENDED) && parent)
+		atomic_inc(&parent->power.child_count);
+
+	INIT_WORK(&dev->power.work, pm_runtime_work);
+
+ queue:
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		atomic_dec(&parent->power.child_count);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced counter decrementation");
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if (dev->power.runtime_status & (RPM_IDLE|RPM_WAKE)
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request pending. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	if (dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY)) {
+		/* Resume request pending or idle notification. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	dev->power.runtime_status &= ~RPM_NOTIFYING;
+
+	if (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING)) {
+		/* Suspend or wake-up in progress. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+		return;
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	if (dev->parent)
+		atomic_inc(&dev->parent->power.child_count);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,133 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern void pm_runtime_put_atomic(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern int pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+
+	return container_of(dw, struct device, power.suspend_work);
+}
+
+static inline struct device *pm_work_to_device(struct work_struct *work)
+{
+	return container_of(work, struct device, power.work);
+}
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline void pm_runtime_put_atomic(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	return -ENOSYS;
+}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+static inline void pm_runtime_remove(struct device *dev)
+{
+	pm_runtime_disable(dev);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +331,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 20:46                 ` Rafael J. Wysocki
@ 2009-06-26 21:13                   ` Alan Stern
  2009-06-26 22:32                     ` Rafael J. Wysocki
                                       ` (3 more replies)
  2009-06-26 21:13                   ` Alan Stern
  1 sibling, 4 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-26 21:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Fri, 26 Jun 2009, Rafael J. Wysocki wrote:

> > It occurs to me that the problem would be solved if were a cancel_work
> > routine.  In the same vein, it ought to be possible for
> > cancel_delayed_work to run in interrupt context.  I'll see what can be
> > done.
> 
> Having looked at the workqueue code I'm not sure if there's a way to implement
> that in a non-racy way.  Which may be the reason why there are no such
> functions already. :-)

Well, I'll give it a try.

Speaking of races, have you noticed that the way power.work_done gets 
used is racy?  You can't wait for the completion before releasing the 
lock, but then anything could happen.

A safer approach would be to use a wait_queue.

> In the meantime I reworked the patch (below) to use more RPM_* flags and I
> removed the runtime_break and runtime_notify bits from it.  Also added some
> comments to explain some non-obvious steps (hope that helps).
> 
> I also added the pm_runtime_put_atomic() and pm_runtime_put() as per the
> comment above.
> 
> It seems to be a bit cleaner this way, but that's my personal view. :-)

I'll look at it over the weekend.  And I'll try to see if proper 
cancel_work and cancel_delayed_work functions can help clean it up.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 20:46                 ` Rafael J. Wysocki
  2009-06-26 21:13                   ` Alan Stern
@ 2009-06-26 21:13                   ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-26 21:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Fri, 26 Jun 2009, Rafael J. Wysocki wrote:

> > It occurs to me that the problem would be solved if were a cancel_work
> > routine.  In the same vein, it ought to be possible for
> > cancel_delayed_work to run in interrupt context.  I'll see what can be
> > done.
> 
> Having looked at the workqueue code I'm not sure if there's a way to implement
> that in a non-racy way.  Which may be the reason why there are no such
> functions already. :-)

Well, I'll give it a try.

Speaking of races, have you noticed that the way power.work_done gets 
used is racy?  You can't wait for the completion before releasing the 
lock, but then anything could happen.

A safer approach would be to use a wait_queue.

> In the meantime I reworked the patch (below) to use more RPM_* flags and I
> removed the runtime_break and runtime_notify bits from it.  Also added some
> comments to explain some non-obvious steps (hope that helps).
> 
> I also added the pm_runtime_put_atomic() and pm_runtime_put() as per the
> comment above.
> 
> It seems to be a bit cleaner this way, but that's my personal view. :-)

I'll look at it over the weekend.  And I'll try to see if proper 
cancel_work and cancel_delayed_work functions can help clean it up.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 21:30         ` Alan Stern
                             ` (2 preceding siblings ...)
  2009-06-26 21:49           ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
@ 2009-06-26 21:49           ` Rafael J. Wysocki
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 21:49 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linux-pm mailing list, Oliver Neukum, Magnus Damm,
	ACPI Devel Maling List, Ingo Molnar, LKML, Greg KH,
	Arjan van de Ven

On Wednesday 24 June 2009, Alan Stern wrote:
> On Wed, 24 Jun 2009, Rafael J. Wysocki wrote:
> 
> > +config PM_RUNTIME
> > +	bool "Run-time PM core functionality"
> > +	depends on PM
> > +	---help---
> > +	  Enable functionality allowing I/O devices to be put into energy-saving
> > +	  (low power) states at run time (or autosuspended) after a specified
> > +	  period of inactivity and woken up in response to a hardware-generated
> > +	  wake-up event or a driver's request.
> > +
> > +	  Hardware support is generally required for this functionality to work
> > +	  and the bus type drivers of the buses the devices are on are
> > +	  responsibile for the actual handling of the autosuspend requests and
> 
> s/ibile/ible/
> 
> > @@ -165,6 +168,28 @@ typedef struct pm_message {
> >   * It is allowed to unregister devices while the above callbacks are being
> >   * executed.  However, it is not allowed to unregister a device from within any
> >   * of its own callbacks.
> > + *
> > + * There also are the following callbacks related to run-time power management
> > + * of devices:
> > + *
> > + * @runtime_suspend: Prepare the device for a condition in which it won't be
> > + *	able to communicate with the CPU(s) and RAM due to power management.
> > + *	This need not mean that the device should be put into a low power state.
> > + *	For example, if the device is behind a link which is about to be turned
> > + *	off, the device may remain at full power.  Still, if the device does go
> 
> s/Still, if/If/ -- the word "Still" seems a little odd in this context.
> 
> > + *	to low power and if device_may_wakeup(dev) is true, remote wake-up
> > + *	(i.e. hardware mechanism allowing the device to request a change of its
> 
> s/i.e. /i.e., a /
> 
> > + *	power state, such as PCI PME) should be enabled for it.
> > + *
> > + * @runtime_resume: Put the device into the fully active state in response to a
> > + *	wake-up event generated by hardware or at a request of software.  If
> 
> s/at a request/at the request/
> 
> > + *	necessary, put the device into the full power state and restore its
> > + *	registers, so that it is fully operational.
> 
> 
> > + * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
> > + *			pending for it.
> > + *
> > + * RPM_IDLE		It has been requested that the device be suspended.
> > + *			Suspend request has been put into the run-time PM
> > + *			workqueue and it's pending execution.
> > + *
> > + * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
> > + *			executed.
> > + *
> > + * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
> > + *			completed successfully.  The device is regarded as
> > + *			suspended.
> > + *
> > + * RPM_WAKE		It has been requested that the device be woken up.
> > + *			Resume request has been put into the run-time PM
> > + *			workqueue and it's pending execution.
> > + *
> > + * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
> > + *			executed.
> 
> Remember to add RPM_NOTIFY.
> 
> 
> > +/**
> > + * __pm_get_child - Increment the counter of unsuspended children of a device.
> > + * @dev: Device to handle;
> > + */
> > +static void __pm_get_child(struct device *dev)
> > +{
> > +	atomic_inc(&dev->power.child_count);
> > +}
> > +
> > +/**
> > + * __pm_put_child - Decrement the counter of unsuspended children of a device.
> > + * @dev: Device to handle;
> > + */
> > +static void __pm_put_child(struct device *dev)
> > +{
> > +	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
> > +		dev_WARN(dev, "Unbalanced counter decrementation");
> > +}
> 
> I think we don't need this dev_WARN.  It should be straightforward to
> verify that the increments and decrements balance correctly, and the
> child_count field isn't manipulated by drivers.
> 
> In fact, these don't need to be separate routines at all.  Just call
> atomic_inc or atomic_dec directly.
> 
> > +
> > +/**
> > + * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
> > + * @dev: Device to suspend.
> > + * @sync: If unset, the funtion has been called via pm_wq.
> > + *
> > + * Check if the run-time PM status of the device is appropriate and run the
> > + * ->runtime_suspend() callback provided by the device's bus type.  Update the
> > + * run-time PM flags in the device object to reflect the current status of the
> > + * device.
> > + */
> > +int __pm_runtime_suspend(struct device *dev, bool sync)
> > +{
> > +	struct device *parent = NULL;
> > +	unsigned long flags;
> > +	int error = -EINVAL;
> 
> Remove the initializer.
> 
> > +
> > +	might_sleep();
> > +
> > +	spin_lock_irqsave(&dev->power.lock, flags);
> > +
> > + repeat:
> > +	if (dev->power.runtime_status == RPM_ERROR) {
> 
> Insert:		error = -EINVAL;
> 
> > +		goto out;
> > +	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
> 
> ...
> 
> 
> > +void pm_runtime_put(struct device *dev)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&dev->power.lock, flags);
> > +
> > +	if (!__pm_runtime_put(dev)) {
> > +		dev_WARN(dev, "Unbalanced counter decrementation");
> 
> "decrementation" isn't a word -- or if it is, it shouldn't be.  :-)  
> Just use "decrement".  Similarly in other places.
> 
> > +/**
> > + * pm_runtime_add - Update run-time PM fields of a device while adding it.
> > + * @dev: Device object being added to device hierarchy.
> > + */
> > +void pm_runtime_add(struct device *dev)
> > +{
> > +	dev->power.runtime_notify = false;
> > +	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
> 
> Doesn't INIT_DELAYED_WORK belong in pm_runtime_init?
> Do we want the bus subsystem to be responsible for doing:
> 
> 	dev->power.runtime_disabled = false;
> 	pm_runtime_put(dev);
> 
> after calling device_add?  Or should device_add do it?
> 
> 
> > Index: linux-2.6/include/linux/pm_runtime.h
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/include/linux/pm_runtime.h
> 
> > +static inline struct device *suspend_work_to_device(struct work_struct *work)
> > +{
> > +	struct delayed_work *dw = to_delayed_work(work);
> > +	struct dev_pm_info *dpi;
> > +
> > +	dpi = container_of(dw, struct dev_pm_info, suspend_work);
> > +	return container_of(dpi, struct device, power);
> > +}
> 
> You don't need to iterate container_of like this.  You can do:
> 
> 	return container_of(dw, struct device, power.suspend_work);
> 
> > +
> > +static inline struct device *work_to_device(struct work_struct *work)
> > +{
> > +	struct dev_pm_info *dpi;
> > +
> > +	dpi = container_of(work, struct dev_pm_info, work);
> > +	return container_of(dpi, struct device, power);
> > +}
> 
> Similarly here.
> 
> These two routines aren't used outside of runtime.c.  They should be
> moved into that file.  The same goes for pm_children_suspended and
> pm_suspend_possible.
> 
> > +
> > +static inline void __pm_runtime_get(struct device *dev)
> > +{
> > +	atomic_inc(&dev->power.resume_count);
> > +}
> 
> Why introduce __pm_runtime_get?  Just make this pm_runtime_get.
> 
> > +static inline void pm_runtime_remove(struct device *dev)
> > +{
> > +	pm_runtime_disable(dev);
> > +}
> 
> You forgot to decrement the parent's child_count if dev isn't
> suspended (and then do a idle_notify on the parent).  Because of this 
> additional complexity, don't inline the routine.
> 
> > Index: linux-2.6/drivers/base/dd.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/dd.c
> > +++ linux-2.6/drivers/base/dd.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/kthread.h>
> >  #include <linux/wait.h>
> >  #include <linux/async.h>
> > +#include <linux/pm_runtime.h>
> >  
> >  #include "base.h"
> >  #include "power/power.h"
> > @@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
> >  	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
> >  		 drv->bus->name, __func__, dev_name(dev), drv->name);
> >  
> > +	pm_runtime_disable(dev);
> > +
> >  	ret = really_probe(dev, drv);
> >  
> > +	pm_runtime_enable(dev);
> > +
> 
> Shouldn't we guarantee that a device isn't probed while it is in a
> suspended state?  So this should be
> 
> 	pm_runtime_get(dev);
> 	ret = pm_runtime_resume(dev);
> 	if (ret == 0)
> 		ret = really_probe(dev, drv);
> 	pm_runtime_put(dev);	
> 
> It might be nice to have a simple combined pm_runtime_get_and_resume
> for this sort of situation.

Just to clarify, the last version of the patch I sent didn't address the
comments from this, but this was not because I didn't agree with them, but
I was just focusing on simplifying drivers/base/power/resume.c .

I'll address them in the next version.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-24 21:30         ` Alan Stern
  2009-06-25 16:49           ` Alan Stern
  2009-06-25 16:49           ` [linux-pm] " Alan Stern
@ 2009-06-26 21:49           ` Rafael J. Wysocki
  2009-06-26 21:49           ` Rafael J. Wysocki
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 21:49 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wednesday 24 June 2009, Alan Stern wrote:
> On Wed, 24 Jun 2009, Rafael J. Wysocki wrote:
> 
> > +config PM_RUNTIME
> > +	bool "Run-time PM core functionality"
> > +	depends on PM
> > +	---help---
> > +	  Enable functionality allowing I/O devices to be put into energy-saving
> > +	  (low power) states at run time (or autosuspended) after a specified
> > +	  period of inactivity and woken up in response to a hardware-generated
> > +	  wake-up event or a driver's request.
> > +
> > +	  Hardware support is generally required for this functionality to work
> > +	  and the bus type drivers of the buses the devices are on are
> > +	  responsibile for the actual handling of the autosuspend requests and
> 
> s/ibile/ible/
> 
> > @@ -165,6 +168,28 @@ typedef struct pm_message {
> >   * It is allowed to unregister devices while the above callbacks are being
> >   * executed.  However, it is not allowed to unregister a device from within any
> >   * of its own callbacks.
> > + *
> > + * There also are the following callbacks related to run-time power management
> > + * of devices:
> > + *
> > + * @runtime_suspend: Prepare the device for a condition in which it won't be
> > + *	able to communicate with the CPU(s) and RAM due to power management.
> > + *	This need not mean that the device should be put into a low power state.
> > + *	For example, if the device is behind a link which is about to be turned
> > + *	off, the device may remain at full power.  Still, if the device does go
> 
> s/Still, if/If/ -- the word "Still" seems a little odd in this context.
> 
> > + *	to low power and if device_may_wakeup(dev) is true, remote wake-up
> > + *	(i.e. hardware mechanism allowing the device to request a change of its
> 
> s/i.e. /i.e., a /
> 
> > + *	power state, such as PCI PME) should be enabled for it.
> > + *
> > + * @runtime_resume: Put the device into the fully active state in response to a
> > + *	wake-up event generated by hardware or at a request of software.  If
> 
> s/at a request/at the request/
> 
> > + *	necessary, put the device into the full power state and restore its
> > + *	registers, so that it is fully operational.
> 
> 
> > + * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
> > + *			pending for it.
> > + *
> > + * RPM_IDLE		It has been requested that the device be suspended.
> > + *			Suspend request has been put into the run-time PM
> > + *			workqueue and it's pending execution.
> > + *
> > + * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
> > + *			executed.
> > + *
> > + * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
> > + *			completed successfully.  The device is regarded as
> > + *			suspended.
> > + *
> > + * RPM_WAKE		It has been requested that the device be woken up.
> > + *			Resume request has been put into the run-time PM
> > + *			workqueue and it's pending execution.
> > + *
> > + * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
> > + *			executed.
> 
> Remember to add RPM_NOTIFY.
> 
> 
> > +/**
> > + * __pm_get_child - Increment the counter of unsuspended children of a device.
> > + * @dev: Device to handle;
> > + */
> > +static void __pm_get_child(struct device *dev)
> > +{
> > +	atomic_inc(&dev->power.child_count);
> > +}
> > +
> > +/**
> > + * __pm_put_child - Decrement the counter of unsuspended children of a device.
> > + * @dev: Device to handle;
> > + */
> > +static void __pm_put_child(struct device *dev)
> > +{
> > +	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
> > +		dev_WARN(dev, "Unbalanced counter decrementation");
> > +}
> 
> I think we don't need this dev_WARN.  It should be straightforward to
> verify that the increments and decrements balance correctly, and the
> child_count field isn't manipulated by drivers.
> 
> In fact, these don't need to be separate routines at all.  Just call
> atomic_inc or atomic_dec directly.
> 
> > +
> > +/**
> > + * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
> > + * @dev: Device to suspend.
> > + * @sync: If unset, the funtion has been called via pm_wq.
> > + *
> > + * Check if the run-time PM status of the device is appropriate and run the
> > + * ->runtime_suspend() callback provided by the device's bus type.  Update the
> > + * run-time PM flags in the device object to reflect the current status of the
> > + * device.
> > + */
> > +int __pm_runtime_suspend(struct device *dev, bool sync)
> > +{
> > +	struct device *parent = NULL;
> > +	unsigned long flags;
> > +	int error = -EINVAL;
> 
> Remove the initializer.
> 
> > +
> > +	might_sleep();
> > +
> > +	spin_lock_irqsave(&dev->power.lock, flags);
> > +
> > + repeat:
> > +	if (dev->power.runtime_status == RPM_ERROR) {
> 
> Insert:		error = -EINVAL;
> 
> > +		goto out;
> > +	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
> 
> ...
> 
> 
> > +void pm_runtime_put(struct device *dev)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&dev->power.lock, flags);
> > +
> > +	if (!__pm_runtime_put(dev)) {
> > +		dev_WARN(dev, "Unbalanced counter decrementation");
> 
> "decrementation" isn't a word -- or if it is, it shouldn't be.  :-)  
> Just use "decrement".  Similarly in other places.
> 
> > +/**
> > + * pm_runtime_add - Update run-time PM fields of a device while adding it.
> > + * @dev: Device object being added to device hierarchy.
> > + */
> > +void pm_runtime_add(struct device *dev)
> > +{
> > +	dev->power.runtime_notify = false;
> > +	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
> 
> Doesn't INIT_DELAYED_WORK belong in pm_runtime_init?
> Do we want the bus subsystem to be responsible for doing:
> 
> 	dev->power.runtime_disabled = false;
> 	pm_runtime_put(dev);
> 
> after calling device_add?  Or should device_add do it?
> 
> 
> > Index: linux-2.6/include/linux/pm_runtime.h
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/include/linux/pm_runtime.h
> 
> > +static inline struct device *suspend_work_to_device(struct work_struct *work)
> > +{
> > +	struct delayed_work *dw = to_delayed_work(work);
> > +	struct dev_pm_info *dpi;
> > +
> > +	dpi = container_of(dw, struct dev_pm_info, suspend_work);
> > +	return container_of(dpi, struct device, power);
> > +}
> 
> You don't need to iterate container_of like this.  You can do:
> 
> 	return container_of(dw, struct device, power.suspend_work);
> 
> > +
> > +static inline struct device *work_to_device(struct work_struct *work)
> > +{
> > +	struct dev_pm_info *dpi;
> > +
> > +	dpi = container_of(work, struct dev_pm_info, work);
> > +	return container_of(dpi, struct device, power);
> > +}
> 
> Similarly here.
> 
> These two routines aren't used outside of runtime.c.  They should be
> moved into that file.  The same goes for pm_children_suspended and
> pm_suspend_possible.
> 
> > +
> > +static inline void __pm_runtime_get(struct device *dev)
> > +{
> > +	atomic_inc(&dev->power.resume_count);
> > +}
> 
> Why introduce __pm_runtime_get?  Just make this pm_runtime_get.
> 
> > +static inline void pm_runtime_remove(struct device *dev)
> > +{
> > +	pm_runtime_disable(dev);
> > +}
> 
> You forgot to decrement the parent's child_count if dev isn't
> suspended (and then do a idle_notify on the parent).  Because of this 
> additional complexity, don't inline the routine.
> 
> > Index: linux-2.6/drivers/base/dd.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/dd.c
> > +++ linux-2.6/drivers/base/dd.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/kthread.h>
> >  #include <linux/wait.h>
> >  #include <linux/async.h>
> > +#include <linux/pm_runtime.h>
> >  
> >  #include "base.h"
> >  #include "power/power.h"
> > @@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
> >  	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
> >  		 drv->bus->name, __func__, dev_name(dev), drv->name);
> >  
> > +	pm_runtime_disable(dev);
> > +
> >  	ret = really_probe(dev, drv);
> >  
> > +	pm_runtime_enable(dev);
> > +
> 
> Shouldn't we guarantee that a device isn't probed while it is in a
> suspended state?  So this should be
> 
> 	pm_runtime_get(dev);
> 	ret = pm_runtime_resume(dev);
> 	if (ret == 0)
> 		ret = really_probe(dev, drv);
> 	pm_runtime_put(dev);	
> 
> It might be nice to have a simple combined pm_runtime_get_and_resume
> for this sort of situation.

Just to clarify, the last version of the patch I sent didn't address the
comments from this, but this was not because I didn't agree with them, but
I was just focusing on simplifying drivers/base/power/resume.c .

I'll address them in the next version.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 14:57           ` Magnus Damm
  (?)
  (?)
@ 2009-06-26 22:02           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 22:02 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Alan Stern, linux-pm, Oliver Neukum, ACPI Devel Maling List,
	Ingo Molnar, LKML, Greg KH, Arjan van de Ven

On Thursday 25 June 2009, Magnus Damm wrote:
> On Thu, Jun 25, 2009 at 4:24 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
> >
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> >
> > Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Hi Rafael,
> 
> Thanks for your work on this. I've built some code for SuperH on top
> of this today, and with that behind me I have a few questions and a
> little bit of code feedback.
> 
> Questions:
> 
> 1) Which functions are device drivers supposed to use?
> 
> I simply added pm_runtime_resume() and pm_runtime_suspend() where
> clk_enable() and clk_disable() normally are used. In interrupt
> handlers I used pm_request_suspend() instead of pm_runtime_suspend().
> 
> I'm not sure if the v5 patch does the right thing around
> really_probe() like Alan pointed out.

Yes, Alan was right.  V6 doesn't address that, but V7 will. :-)

> Basically, I'd like to be able to call my bus callback for
> ->runtime_resume() from the driver probe(), but power.resume_count seems
> stuck at 1 which leads to pm_runtime_resume() returning -EAGAIN before
> invoking the bus callback.
> 
> This leads to question number two...
> 
> 2) What's the default state in probe()?

Currently, it's 'active', but runtime_disabled is set.  I'm open to
suggestions, though.

> We touched this subject briefly before. I'd like to compare the
> Runtime PM default state with the clock framework default state. The
> clock framework requires you to use clk_enable() to enable the clock
> to a hardware block before it is allowed to access the hardware
> registers. At least that's how we handle stop bits on SuperH. So
> clocks come up disabled from boot and should be enabled and disabled
> by the device driver to save power.
> 
> I'd like to change our Module Stop Bits code on SuperH (once again)
> from being handled by the clock framework to being managed by the
> Runtime PM framework. Having the clock framework deal with the stop
> bits works fine today, they are off by default after boot, and the
> driver often enables the clock with clk_enable() in probe() or
> hopefully in some more finegrained fashion.
> 
> I'm not sure how the Module Stop Bits should fit with the Runtime PM
> code though. The default state for a device at probe() time seems to
> be RPM_ACTIVE. Should drivers call pm_runtime_enable() to enable
> Runtime PM?

That's the idea.

> One part of me likes the idea that Runtime PM-enabled drivers start in
> RPM_SUSPENDED so they are forced to put pm_runtime_resume() before
> actually using the hardware. This makes the Runtime PM behaviour
> pretty close to the clock framework.
> 
> If you dislike starting from RPM_SUSPENDED (most likely) then I wonder
> how I should set the state to RPM_SUSPENDED in the driver. I'd like to
> make sure that pm_runtime_resume() can invoke the bus callback so the
> hardware can be turned on for the first time somehow. Should I do a
> dummy suspend?

Hmm, good question.

While run-time PM is disabled (power.runtime_disabled is set), which is the
case in the initial state, you can just change power.runtime_status to
RPM_SUSPENDED and nothing wrong happens as long as that reflects the actual
status of the device.

I'll add a helper function for that.

> 3) Should drivers use pm_suspend_ignore_children(dev, true)?
> 
> It turns out that I can't suspend my I2C master driver out of the box
> since it becomes the parent of all slaves on the I2C bus. The I2C
> master driver is just a platform driver, and the children are I2C
> devices (90% sure). I want to do Runtime PM regardless if the child
> devices are suspended or not, I guess I should use
> pm_suspend_ignore_children(dev, true) then?

Yes, that's the idea.

> > +/**
> > + * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
> > + * @dev: Device to resume.
> > + * @sync: If unset, the funtion has been called via pm_wq.
> > + *
> > + * Check if the device is really suspended and run the ->runtime_resume()
> > + * callback provided by the device's bus type driver.  Update the run-time PM
> > + * flags in the device object to reflect the current status of the device.  If
> > + * runtime suspend is in progress while this function is being run, wait for it
> > + * to finish before resuming the device.  If runtime suspend is scheduled, but
> > + * it hasn't started yet, cancel it and we're done.
> > + */
> > +int __pm_runtime_resume(struct device *dev, bool sync)
> > +{
> [snip]
> > +}
> > +EXPORT_SYMBOL_GPL(pm_runtime_resume);
> 
> You're missing "__" here unless you're aiming for something very exotic. =)

Ah, thanks!

> > +/**
> > + * pm_runtime_work - Run __pm_runtime_resume() for a device.
> > + * @work: Work structure used for scheduling the execution of this function.
> > + *
> > + * Use @work to get the device object the resume has been scheduled for and run
> > + * __pm_runtime_resume() for it.
> > + */
> > +static void pm_runtime_work(struct work_struct *work)
> > +{
> > +       __pm_runtime_resume(work_to_device(work), false);
> > +}
> 
> Anything wrong with the name pm_runtime_resume_work()?

Not at all. :-)  Just switched to that.

> Looking forward to v6, I'll switch task now, will be back to this late Monday.

Well, V6 was already sent, but unfortunately it didn't address your comments.
I'll send V7 during the weekend.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
  2009-06-25 14:57           ` Magnus Damm
  (?)
@ 2009-06-26 22:02           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 22:02 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Greg KH, LKML, ACPI Devel Maling List, linux-pm, Ingo Molnar,
	Arjan van de Ven

On Thursday 25 June 2009, Magnus Damm wrote:
> On Thu, Jun 25, 2009 at 4:24 AM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 5)
> >
> > Introduce a core framework for run-time power management of I/O
> > devices.  Add device run-time PM fields to 'struct dev_pm_info'
> > and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
> > a run-time PM workqueue and define some device run-time PM helper
> > functions at the core level.  Document all these things.
> >
> > Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Hi Rafael,
> 
> Thanks for your work on this. I've built some code for SuperH on top
> of this today, and with that behind me I have a few questions and a
> little bit of code feedback.
> 
> Questions:
> 
> 1) Which functions are device drivers supposed to use?
> 
> I simply added pm_runtime_resume() and pm_runtime_suspend() where
> clk_enable() and clk_disable() normally are used. In interrupt
> handlers I used pm_request_suspend() instead of pm_runtime_suspend().
> 
> I'm not sure if the v5 patch does the right thing around
> really_probe() like Alan pointed out.

Yes, Alan was right.  V6 doesn't address that, but V7 will. :-)

> Basically, I'd like to be able to call my bus callback for
> ->runtime_resume() from the driver probe(), but power.resume_count seems
> stuck at 1 which leads to pm_runtime_resume() returning -EAGAIN before
> invoking the bus callback.
> 
> This leads to question number two...
> 
> 2) What's the default state in probe()?

Currently, it's 'active', but runtime_disabled is set.  I'm open to
suggestions, though.

> We touched this subject briefly before. I'd like to compare the
> Runtime PM default state with the clock framework default state. The
> clock framework requires you to use clk_enable() to enable the clock
> to a hardware block before it is allowed to access the hardware
> registers. At least that's how we handle stop bits on SuperH. So
> clocks come up disabled from boot and should be enabled and disabled
> by the device driver to save power.
> 
> I'd like to change our Module Stop Bits code on SuperH (once again)
> from being handled by the clock framework to being managed by the
> Runtime PM framework. Having the clock framework deal with the stop
> bits works fine today, they are off by default after boot, and the
> driver often enables the clock with clk_enable() in probe() or
> hopefully in some more finegrained fashion.
> 
> I'm not sure how the Module Stop Bits should fit with the Runtime PM
> code though. The default state for a device at probe() time seems to
> be RPM_ACTIVE. Should drivers call pm_runtime_enable() to enable
> Runtime PM?

That's the idea.

> One part of me likes the idea that Runtime PM-enabled drivers start in
> RPM_SUSPENDED so they are forced to put pm_runtime_resume() before
> actually using the hardware. This makes the Runtime PM behaviour
> pretty close to the clock framework.
> 
> If you dislike starting from RPM_SUSPENDED (most likely) then I wonder
> how I should set the state to RPM_SUSPENDED in the driver. I'd like to
> make sure that pm_runtime_resume() can invoke the bus callback so the
> hardware can be turned on for the first time somehow. Should I do a
> dummy suspend?

Hmm, good question.

While run-time PM is disabled (power.runtime_disabled is set), which is the
case in the initial state, you can just change power.runtime_status to
RPM_SUSPENDED and nothing wrong happens as long as that reflects the actual
status of the device.

I'll add a helper function for that.

> 3) Should drivers use pm_suspend_ignore_children(dev, true)?
> 
> It turns out that I can't suspend my I2C master driver out of the box
> since it becomes the parent of all slaves on the I2C bus. The I2C
> master driver is just a platform driver, and the children are I2C
> devices (90% sure). I want to do Runtime PM regardless if the child
> devices are suspended or not, I guess I should use
> pm_suspend_ignore_children(dev, true) then?

Yes, that's the idea.

> > +/**
> > + * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
> > + * @dev: Device to resume.
> > + * @sync: If unset, the funtion has been called via pm_wq.
> > + *
> > + * Check if the device is really suspended and run the ->runtime_resume()
> > + * callback provided by the device's bus type driver.  Update the run-time PM
> > + * flags in the device object to reflect the current status of the device.  If
> > + * runtime suspend is in progress while this function is being run, wait for it
> > + * to finish before resuming the device.  If runtime suspend is scheduled, but
> > + * it hasn't started yet, cancel it and we're done.
> > + */
> > +int __pm_runtime_resume(struct device *dev, bool sync)
> > +{
> [snip]
> > +}
> > +EXPORT_SYMBOL_GPL(pm_runtime_resume);
> 
> You're missing "__" here unless you're aiming for something very exotic. =)

Ah, thanks!

> > +/**
> > + * pm_runtime_work - Run __pm_runtime_resume() for a device.
> > + * @work: Work structure used for scheduling the execution of this function.
> > + *
> > + * Use @work to get the device object the resume has been scheduled for and run
> > + * __pm_runtime_resume() for it.
> > + */
> > +static void pm_runtime_work(struct work_struct *work)
> > +{
> > +       __pm_runtime_resume(work_to_device(work), false);
> > +}
> 
> Anything wrong with the name pm_runtime_resume_work()?

Not at all. :-)  Just switched to that.

> Looking forward to v6, I'll switch task now, will be back to this late Monday.

Well, V6 was already sent, but unfortunately it didn't address your comments.
I'll send V7 during the weekend.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 21:13                   ` Alan Stern
@ 2009-06-26 22:32                     ` Rafael J. Wysocki
  2009-06-27  1:25                       ` Alan Stern
                                         ` (3 more replies)
  2009-06-26 22:32                     ` Rafael J. Wysocki
                                       ` (2 subsequent siblings)
  3 siblings, 4 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 22:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 26 June 2009, Alan Stern wrote:
> On Fri, 26 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > It occurs to me that the problem would be solved if were a cancel_work
> > > routine.  In the same vein, it ought to be possible for
> > > cancel_delayed_work to run in interrupt context.  I'll see what can be
> > > done.
> > 
> > Having looked at the workqueue code I'm not sure if there's a way to implement
> > that in a non-racy way.  Which may be the reason why there are no such
> > functions already. :-)
> 
> Well, I'll give it a try.
> 
> Speaking of races, have you noticed that the way power.work_done gets 
> used is racy?

Not really. :-)

> You can't wait for the completion before releasing the 
> lock, but then anything could happen.
> 
> A safer approach would be to use a wait_queue.

I'm not sure what you mean exactly.  What's the race?

> > In the meantime I reworked the patch (below) to use more RPM_* flags and I
> > removed the runtime_break and runtime_notify bits from it.  Also added some
> > comments to explain some non-obvious steps (hope that helps).
> > 
> > I also added the pm_runtime_put_atomic() and pm_runtime_put() as per the
> > comment above.
> > 
> > It seems to be a bit cleaner this way, but that's my personal view. :-)
> 
> I'll look at it over the weekend.  And I'll try to see if proper 
> cancel_work and cancel_delayed_work functions can help clean it up.

Great, thanks!

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 21:13                   ` Alan Stern
  2009-06-26 22:32                     ` Rafael J. Wysocki
@ 2009-06-26 22:32                     ` Rafael J. Wysocki
  2009-06-28 10:25                     ` Rafael J. Wysocki
  2009-06-28 10:25                     ` Rafael J. Wysocki
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-26 22:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 26 June 2009, Alan Stern wrote:
> On Fri, 26 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > It occurs to me that the problem would be solved if were a cancel_work
> > > routine.  In the same vein, it ought to be possible for
> > > cancel_delayed_work to run in interrupt context.  I'll see what can be
> > > done.
> > 
> > Having looked at the workqueue code I'm not sure if there's a way to implement
> > that in a non-racy way.  Which may be the reason why there are no such
> > functions already. :-)
> 
> Well, I'll give it a try.
> 
> Speaking of races, have you noticed that the way power.work_done gets 
> used is racy?

Not really. :-)

> You can't wait for the completion before releasing the 
> lock, but then anything could happen.
> 
> A safer approach would be to use a wait_queue.

I'm not sure what you mean exactly.  What's the race?

> > In the meantime I reworked the patch (below) to use more RPM_* flags and I
> > removed the runtime_break and runtime_notify bits from it.  Also added some
> > comments to explain some non-obvious steps (hope that helps).
> > 
> > I also added the pm_runtime_put_atomic() and pm_runtime_put() as per the
> > comment above.
> > 
> > It seems to be a bit cleaner this way, but that's my personal view. :-)
> 
> I'll look at it over the weekend.  And I'll try to see if proper 
> cancel_work and cancel_delayed_work functions can help clean it up.

Great, thanks!

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 22:32                     ` Rafael J. Wysocki
  2009-06-27  1:25                       ` Alan Stern
@ 2009-06-27  1:25                       ` Alan Stern
  2009-06-27 14:51                       ` Alan Stern
  2009-06-27 14:51                       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-27  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 27 Jun 2009, Rafael J. Wysocki wrote:

> > Speaking of races, have you noticed that the way power.work_done gets 
> > used is racy?
> 
> Not really. :-)
> 
> > You can't wait for the completion before releasing the 
> > lock, but then anything could happen.
> > 
> > A safer approach would be to use a wait_queue.
> 
> I'm not sure what you mean exactly.  What's the race?

Somebody calls pm_runtime_suspend when a suspend is already in 
progress.  The routine sees that the status is RPM_SUSPENDING, so it 
prepares to wait until the suspend is finished.  It drops the lock and 
calls wait_for_completion.

But in between those last two steps, the suspend could finish and 
a resume could start up.  Then the wait_for_completion wouldn't return 
until the device was fully resumed!

Now I admit this isn't as bad as it sounds.  The same sort of thing
could happen even if there weren't two suspends going on at the same
time; a resume could occur between when the routine drops the lock and
when it returns.  So okay, forget I mentioned it.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 22:32                     ` Rafael J. Wysocki
@ 2009-06-27  1:25                       ` Alan Stern
  2009-06-27  1:25                       ` Alan Stern
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-27  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 27 Jun 2009, Rafael J. Wysocki wrote:

> > Speaking of races, have you noticed that the way power.work_done gets 
> > used is racy?
> 
> Not really. :-)
> 
> > You can't wait for the completion before releasing the 
> > lock, but then anything could happen.
> > 
> > A safer approach would be to use a wait_queue.
> 
> I'm not sure what you mean exactly.  What's the race?

Somebody calls pm_runtime_suspend when a suspend is already in 
progress.  The routine sees that the status is RPM_SUSPENDING, so it 
prepares to wait until the suspend is finished.  It drops the lock and 
calls wait_for_completion.

But in between those last two steps, the suspend could finish and 
a resume could start up.  Then the wait_for_completion wouldn't return 
until the device was fully resumed!

Now I admit this isn't as bad as it sounds.  The same sort of thing
could happen even if there weren't two suspends going on at the same
time; a resume could occur between when the routine drops the lock and
when it returns.  So okay, forget I mentioned it.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 22:32                     ` Rafael J. Wysocki
  2009-06-27  1:25                       ` Alan Stern
  2009-06-27  1:25                       ` Alan Stern
@ 2009-06-27 14:51                       ` Alan Stern
  2009-06-27 21:51                         ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 7) Rafael J. Wysocki
  2009-06-27 21:51                         ` Rafael J. Wysocki
  2009-06-27 14:51                       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Alan Stern
  3 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-27 14:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 27 Jun 2009, Rafael J. Wysocki wrote:

> > Speaking of races, have you noticed that the way power.work_done gets 
> > used is racy?
> 
> Not really. :-)
> 
> > You can't wait for the completion before releasing the 
> > lock, but then anything could happen.
> > 
> > A safer approach would be to use a wait_queue.
> 
> I'm not sure what you mean exactly.  What's the race?

Come to think of it, there really is a problem here.  Because the
wait_for_completion call occurs outside the spinlock, it can race with
the init_completion call.  It's not good for both of them to run at the
same time; the completion's internal spinlock and list pointers could 
get corrupted.

Therefore I stand by my original assertion: The struct completion 
should be replaced with a wait_queue.  Set the runtime_error field to 
-EINPROGRESS initially, and make other threads wait until the value 
changes.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 22:32                     ` Rafael J. Wysocki
                                         ` (2 preceding siblings ...)
  2009-06-27 14:51                       ` Alan Stern
@ 2009-06-27 14:51                       ` Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-27 14:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 27 Jun 2009, Rafael J. Wysocki wrote:

> > Speaking of races, have you noticed that the way power.work_done gets 
> > used is racy?
> 
> Not really. :-)
> 
> > You can't wait for the completion before releasing the 
> > lock, but then anything could happen.
> > 
> > A safer approach would be to use a wait_queue.
> 
> I'm not sure what you mean exactly.  What's the race?

Come to think of it, there really is a problem here.  Because the
wait_for_completion call occurs outside the spinlock, it can race with
the init_completion call.  It's not good for both of them to run at the
same time; the completion's internal spinlock and list pointers could 
get corrupted.

Therefore I stand by my original assertion: The struct completion 
should be replaced with a wait_queue.  Set the runtime_error field to 
-EINPROGRESS initially, and make other threads wait until the value 
changes.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 7)
  2009-06-27 14:51                       ` Alan Stern
  2009-06-27 21:51                         ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 7) Rafael J. Wysocki
@ 2009-06-27 21:51                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-27 21:51 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Saturday 27 June 2009, Alan Stern wrote:
> On Sat, 27 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > Speaking of races, have you noticed that the way power.work_done gets 
> > > used is racy?
> > 
> > Not really. :-)
> > 
> > > You can't wait for the completion before releasing the 
> > > lock, but then anything could happen.
> > > 
> > > A safer approach would be to use a wait_queue.
> > 
> > I'm not sure what you mean exactly.  What's the race?
> 
> Come to think of it, there really is a problem here.  Because the
> wait_for_completion call occurs outside the spinlock, it can race with
> the init_completion call.

I don't really think it can, because if either __pm_runtime_suspend(), or
__pm_runtime_resume() finds RPM_SUSPENDING set in the status, it will wait for
the completion and won't reinitialize it until it's been completed.

> It's not good for both of them to run at the same time; the completion's
> internal spinlock and list pointers could get corrupted.

Nevertheless, I reworked the patch to use a wait queue instead of the
completion.  This also helps pm_runtime_disable() to ensure that
->runtime_idle() won't be running after it returns.

> Therefore I stand by my original assertion: The struct completion 
> should be replaced with a wait_queue.  Set the runtime_error field to 
> -EINPROGRESS initially, and make other threads wait until the value 
> changes.

Since runtime_error only changes along with the status, I think it's sufficient
to wait for the status to change.

The updated patch below also addresses some other comments from your previous
messages and from Magnus.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 7)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |   10 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  846 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |  117 +++++
 include/linux/pm_runtime.h   |  124 ++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 
 9 files changed, 1145 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,97 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_NOTIFY		Idle notification has been scheduled for the device.
+ *
+ * RPM_NOTIFYING	Device bus type's ->runtime_idle() callback is being
+ *			executed (as a result of a scheduled idle notification
+ *			request).
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPEND		Attempt to suspend the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_suspend() callback has not been
+ *			executed yet.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUME		Attempt to wake up the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_resume() callback has not been
+ *			executed yet.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+
+#define RPM_NOTIFY	0x001
+#define RPM_NOTIFYING	0x002
+#define RPM_IDLE	0x004
+#define RPM_SUSPEND	0x008
+#define RPM_SUSPENDING	0x010
+#define RPM_SUSPENDED	0x020
+#define RPM_WAKE	0x040
+#define RPM_RESUME	0x080
+#define RPM_RESUMING	0x100
+
+#define RPM_ERROR	0x1FF
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,846 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+
+	return container_of(dw, struct device, power.suspend_work);
+}
+
+static struct device *pm_work_to_device(struct work_struct *work)
+{
+	return container_of(work, struct device, power.work);
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * It is possible that suspend request was scheduled and resume was requested
+ * before this function has a chance to run.  If there's a suspend request
+ * pending only, return doing nothing, but if resume was requested in addition
+ * to it, cancel the suspend request.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	unsigned long flags;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & ~(RPM_NOTIFY|RPM_WAKE))
+		/*
+		 * Device suspended or run-time PM operation in progress. The
+		 * RPM_NOTIFY bit should have been cleared in that case.
+		 */
+		goto out;
+
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+
+	if (dev->power.runtime_status == RPM_WAKE) {
+		/*
+		 * Resume has been requested, and because all of the suspend
+		 * status bits are clear, there must be a suspend request
+		 * pending (otherwise, the resume request would have been
+		 * rejected).  We have to cancel that request.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/*
+		 * Return if someone else has changed the status.  Otherwise,
+		 * the idle notification may still be worth running.
+		 */
+		if (dev->power.runtime_status != RPM_WAKE)
+			goto out;
+	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/*
+	 * The role of the RPM_NOTIFYING bit is to prevent ->runtime_idle() from
+	 * running in parallel with itself and to help pm_runtime_disable() make
+	 * sure that the ->runtime_idle() callback will not be running after it
+	 * returns.
+	 */
+	dev->power.runtime_status = RPM_NOTIFYING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* The status might have been changed while executing runtime_idle(). */
+	dev->power.runtime_status &= ~RPM_NOTIFYING;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * pm_runtime_idle_work - Run pm_runtime_idle() via pm_wq.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the idle notification has been scheduled
+ * for and run pm_runtime_idle() for it.
+ */
+static void pm_runtime_idle_work(struct work_struct *work)
+{
+	pm_runtime_idle(pm_work_to_device(work));
+}
+
+/**
+ * pm_runtime_put_atomic - Decrement resume counter and queue idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if the device's run-time PM
+ * status is right for suspending and queue up a request to run
+ * pm_runtime_idle() for it.
+ */
+void pm_runtime_put_atomic(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		goto out;
+	}
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put_atomic);
+
+/**
+ * pm_runtime_put - Decrement resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter and run pm_runtime_idle() for it.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		return;
+	}
+
+	pm_runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If there's an idle notification pending, cancel it.  If
+ * there's a suspend request scheduled while this function is running and @sync
+ * is 'true', cancel that request.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	bool cancel_pending = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_SUSPENDED) {
+		/* Device suspended, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		/* Another suspend is running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_SUSPENDING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		error = dev->power.runtime_error;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		/* Resume is scheduled or in progress. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * If there's a suspend request pending and we're not running as a
+	 * result of it, the request has to be cancelled, because it may be
+	 * scheduled in the future and we can't leave it behind us.
+	 */
+	if (sync && (dev->power.runtime_status & RPM_IDLE))
+		cancel_pending = true;
+
+	/* Clear the suspend status bits in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_IDLE|RPM_SUSPEND);
+
+	if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled) {
+		/* We are forbidden to suspend. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		error = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_SUSPEND in case we have to start over, to prevent idle
+	 * notifications from happening and new suspend requests from being
+	 * scheduled.
+	 */
+	dev->power.runtime_status |= RPM_SUSPEND;
+
+	if (cancel_pending) {
+		/* Cancel the concurrent pending suspend request. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	if (dev->power.runtime_status & RPM_NOTIFY) {
+		/* Idle notification is pending, cancel it. */
+		dev->power.runtime_status &= ~RPM_NOTIFY;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+	dev->power.runtime_status |= RPM_SUSPENDING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status &= RPM_NOTIFYING;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent) {
+		parent = dev->parent;
+		atomic_dec(&parent->power.child_count);
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent && !parent->power.ignore_children)
+		pm_runtime_idle(parent);
+
+	if (error == -EBUSY || error == -EAGAIN)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run __pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (dev->power.runtime_status & RPM_SUSPENDED)
+		/* Device is suspended, nothing to do. */
+		error = -ECANCELED;
+	else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)))
+		/* Can't suspend now. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status &
+				(RPM_IDLE|RPM_SUSPEND|RPM_SUSPENDING))
+		/* Already suspending or suspend request pending. */
+		error = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		error = -EBUSY;
+	if (error)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  If there's a suspend
+ * request or idle notification pending, cancel it.  If there's a resume request
+ * scheduled while this function is running and @sync is 'true', cancel that
+ * request.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_RESUMING) {
+		DEFINE_WAIT(wait);
+
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_RESUMING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		error = dev->power.runtime_error;
+		goto out;
+	}
+
+	if (dev->power.runtime_disabled) {
+		/* Clear the resume flags before returning. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_RESUME in case we have to start over, to prevent suspends and
+	 * idle notifications from happening and new resume requests from being
+	 * queued up.
+	 */
+	dev->power.runtime_status |= RPM_RESUME;
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_SUSPENDING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat_locked;
+	}
+
+	if ((dev->power.runtime_status & (RPM_IDLE|RPM_WAKE))
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request is pending that we're supposed to cancel. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	/*
+	 * Clear RPM_SUSPEND in case we've been running in parallel with
+	 * __pm_runtime_suspend().
+	 */
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+
+	if ((sync && (dev->power.runtime_status & RPM_WAKE))
+	    || (dev->power.runtime_status & RPM_NOTIFY)) {
+		/*
+		 * Idle notification is pending and since we're running the
+		 * device is not idle, or there's a resume request pending and
+		 * we're not running as a result of it.  In both cases it's
+		 * better to cancel the request.
+		 */
+		dev->power.runtime_status &= ~(RPM_NOTIFY|RPM_WAKE);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	/* Clear the resume status flags in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+
+	if (!(dev->power.runtime_status & RPM_SUSPENDED)) {
+		/*
+		 * If the device is not suspended at this point, we have
+		 * nothing to do.
+		 */
+		error = 0;
+		goto out;
+	}
+
+	if (!put_parent && parent) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		put_parent = true;
+		error = pm_runtime_get_and_resume(parent);
+		if (error)
+			goto out_parent;
+
+		error = -EINVAL;
+		dev->power.runtime_status |= RPM_RESUME;
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPENDED;
+
+	if (parent)
+		atomic_inc(&parent->power.child_count);
+
+	dev->power.runtime_status |= RPM_RESUMING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status &= ~RPM_RESUMING;
+	if (error)
+		dev->power.runtime_status = RPM_ERROR;
+	dev->power.runtime_error = error;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		pm_runtime_put(parent);
+
+	if (!error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_resume);
+
+/**
+ * pm_runtime_resume_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_resume_work(struct work_struct *work)
+{
+	__pm_runtime_resume(pm_work_to_device(work), false);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = -ECANCELED;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, which is not a
+		 * problem unless there's a suspend request pending in addition
+		 * to it.  In that case, ask the idle notification work function
+		 * to cancel the suspend request.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE) {
+			dev->power.runtime_status &= ~RPM_IDLE;
+			dev->power.runtime_status |= RPM_WAKE;
+			error = -EALREADY;
+		} else {
+			error = -ECANCELED;
+		}
+	} else if (dev->power.runtime_status &
+				(RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request is pending.  Make sure it won't run. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+		INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+		error = -EALREADY;
+		goto queue;
+	}
+
+	if ((dev->power.runtime_status & RPM_SUSPENDED) && parent)
+		atomic_inc(&parent->power.child_count);
+
+	INIT_WORK(&dev->power.work, pm_runtime_resume_work);
+
+ queue:
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its run-time PM status is
+ * RPM_ERROR, the status may be set either to RPM_ACTIVE, or to RPM_SUSPENDED,
+ * as long as that reflects the actual state of the device.
+ */
+void __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == status)
+		goto out;
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && !dev->power.runtime_disabled)
+		goto out;
+
+	if (parent) {
+		if (status == RPM_SUSPENDED)
+			atomic_dec(&parent->power.child_count);
+		else if (dev->power.runtime_status == RPM_SUSPENDED)
+			atomic_inc(&parent->power.child_count);
+	}
+	dev->power.runtime_status = status;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced %s", __func__);
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & (RPM_IDLE|RPM_WAKE))
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request pending. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY))) {
+		/* Resume request or idle notification pending. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING))) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend or wake-up in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status &
+					(RPM_SUSPENDING|RPM_RESUMING)))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & RPM_NOTIFYING)) {
+		DEFINE_WAIT(wait);
+
+		/* Idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_NOTIFYING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	if (dev->parent)
+		atomic_inc(&dev->parent->power.child_count);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	struct device *parent = dev->parent;
+
+	pm_runtime_disable(dev);
+
+	if (dev->power.runtime_status != RPM_SUSPENDED && parent) {
+		atomic_dec(&parent->power.child_count);
+		if (!parent->power.ignore_children)
+			pm_runtime_idle(parent);
+	}
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,124 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern void pm_runtime_put_atomic(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern int pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline void pm_runtime_put_atomic(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	return -ENOSYS;
+}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline int pm_runtime_get_and_resume(struct device *dev)
+{
+	pm_runtime_get(dev);
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_set_active(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,10 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
-	ret = really_probe(dev, drv);
+	ret = pm_runtime_get_and_resume(dev);
+	if (!ret)
+		ret = really_probe(dev, drv);
+	__pm_runtime_put(dev);
 
 	return ret;
 }
@@ -306,6 +310,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +330,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 7)
  2009-06-27 14:51                       ` Alan Stern
@ 2009-06-27 21:51                         ` Rafael J. Wysocki
  2009-06-27 21:51                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-27 21:51 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Saturday 27 June 2009, Alan Stern wrote:
> On Sat, 27 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > Speaking of races, have you noticed that the way power.work_done gets 
> > > used is racy?
> > 
> > Not really. :-)
> > 
> > > You can't wait for the completion before releasing the 
> > > lock, but then anything could happen.
> > > 
> > > A safer approach would be to use a wait_queue.
> > 
> > I'm not sure what you mean exactly.  What's the race?
> 
> Come to think of it, there really is a problem here.  Because the
> wait_for_completion call occurs outside the spinlock, it can race with
> the init_completion call.

I don't really think it can, because if either __pm_runtime_suspend(), or
__pm_runtime_resume() finds RPM_SUSPENDING set in the status, it will wait for
the completion and won't reinitialize it until it's been completed.

> It's not good for both of them to run at the same time; the completion's
> internal spinlock and list pointers could get corrupted.

Nevertheless, I reworked the patch to use a wait queue instead of the
completion.  This also helps pm_runtime_disable() to ensure that
->runtime_idle() won't be running after it returns.

> Therefore I stand by my original assertion: The struct completion 
> should be replaced with a wait_queue.  Set the runtime_error field to 
> -EINPROGRESS initially, and make other threads wait until the value 
> changes.

Since runtime_error only changes along with the status, I think it's sufficient
to wait for the status to change.

The updated patch below also addresses some other comments from your previous
messages and from Magnus.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 7)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/dd.c            |   10 
 drivers/base/power/Makefile  |    1 
 drivers/base/power/main.c    |   16 
 drivers/base/power/power.h   |   11 
 drivers/base/power/runtime.c |  846 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |  117 +++++
 include/linux/pm_runtime.h   |  124 ++++++
 kernel/power/Kconfig         |   14 
 kernel/power/main.c          |   17 
 9 files changed, 1145 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsible for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  If the device does go to low
+ *	power and if device_may_wakeup(dev) is true, remote wake-up (i.e., a
+ *	hardware mechanism allowing the device to request a change of its power
+ *	state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at the request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,97 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_NOTIFY		Idle notification has been scheduled for the device.
+ *
+ * RPM_NOTIFYING	Device bus type's ->runtime_idle() callback is being
+ *			executed (as a result of a scheduled idle notification
+ *			request).
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPEND		Attempt to suspend the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_suspend() callback has not been
+ *			executed yet.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUME		Attempt to wake up the device has started (as a result
+ *			of a scheduled request or synchronously), but the device
+ *			bus type's ->runtime_resume() callback has not been
+ *			executed yet.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+
+#define RPM_NOTIFY	0x001
+#define RPM_NOTIFYING	0x002
+#define RPM_IDLE	0x004
+#define RPM_SUSPEND	0x008
+#define RPM_SUSPENDING	0x010
+#define RPM_SUSPENDED	0x020
+#define RPM_WAKE	0x040
+#define RPM_RESUME	0x080
+#define RPM_RESUMING	0x100
+
+#define RPM_ERROR	0x1FF
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	work;
+	wait_queue_head_t	wait_queue;
+	unsigned int		ignore_children:1;
+	unsigned int		runtime_disabled:1;
+	unsigned int		runtime_status;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,846 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/sched.h>
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+static struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+
+	return container_of(dw, struct device, power.suspend_work);
+}
+
+static struct device *pm_work_to_device(struct work_struct *work)
+{
+	return container_of(work, struct device, power.work);
+}
+
+/**
+ * pm_runtime_idle - Notify device bus type if the device can be suspended.
+ * @dev: Device to notify the bus type about.
+ *
+ * It is possible that suspend request was scheduled and resume was requested
+ * before this function has a chance to run.  If there's a suspend request
+ * pending only, return doing nothing, but if resume was requested in addition
+ * to it, cancel the suspend request.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	unsigned long flags;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & ~(RPM_NOTIFY|RPM_WAKE))
+		/*
+		 * Device suspended or run-time PM operation in progress. The
+		 * RPM_NOTIFY bit should have been cleared in that case.
+		 */
+		goto out;
+
+	dev->power.runtime_status &= ~RPM_NOTIFY;
+
+	if (dev->power.runtime_status == RPM_WAKE) {
+		/*
+		 * Resume has been requested, and because all of the suspend
+		 * status bits are clear, there must be a suspend request
+		 * pending (otherwise, the resume request would have been
+		 * rejected).  We have to cancel that request.
+		 */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/*
+		 * Return if someone else has changed the status.  Otherwise,
+		 * the idle notification may still be worth running.
+		 */
+		if (dev->power.runtime_status != RPM_WAKE)
+			goto out;
+	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/*
+	 * The role of the RPM_NOTIFYING bit is to prevent ->runtime_idle() from
+	 * running in parallel with itself and to help pm_runtime_disable() make
+	 * sure that the ->runtime_idle() callback will not be running after it
+	 * returns.
+	 */
+	dev->power.runtime_status = RPM_NOTIFYING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* The status might have been changed while executing runtime_idle(). */
+	dev->power.runtime_status &= ~RPM_NOTIFYING;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * pm_runtime_idle_work - Run pm_runtime_idle() via pm_wq.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the idle notification has been scheduled
+ * for and run pm_runtime_idle() for it.
+ */
+static void pm_runtime_idle_work(struct work_struct *work)
+{
+	pm_runtime_idle(pm_work_to_device(work));
+}
+
+/**
+ * pm_runtime_put_atomic - Decrement resume counter and queue idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter, check if the device's run-time PM
+ * status is right for suspending and queue up a request to run
+ * pm_runtime_idle() for it.
+ */
+void pm_runtime_put_atomic(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		goto out;
+	}
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	/*
+	 * The notification is asynchronous so that this function can be called
+	 * from interrupt context.
+	 */
+	dev->power.runtime_status = RPM_NOTIFY;
+	INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put_atomic);
+
+/**
+ * pm_runtime_put - Decrement resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the device's resume counter and run pm_runtime_idle() for it.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	if (!__pm_runtime_put(dev)) {
+		dev_WARN(dev, "Unbalanced %s", __func__);
+		return;
+	}
+
+	pm_runtime_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * __pm_runtime_suspend - Carry out run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be suspended and run the ->runtime_suspend() callback
+ * provided by its bus type.  If another suspend has been started earlier, wait
+ * for it to finish.  If there's an idle notification pending, cancel it.  If
+ * there's a suspend request scheduled while this function is running and @sync
+ * is 'true', cancel that request.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	bool cancel_pending = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_SUSPENDED) {
+		/* Device suspended, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		/* Another suspend is running in parallel with us. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_SUSPENDING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		error = dev->power.runtime_error;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		/* Resume is scheduled or in progress. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * If there's a suspend request pending and we're not running as a
+	 * result of it, the request has to be cancelled, because it may be
+	 * scheduled in the future and we can't leave it behind us.
+	 */
+	if (sync && (dev->power.runtime_status & RPM_IDLE))
+		cancel_pending = true;
+
+	/* Clear the suspend status bits in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_IDLE|RPM_SUSPEND);
+
+	if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled) {
+		/* We are forbidden to suspend. */
+		error = -EAGAIN;
+		goto out;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		error = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_SUSPEND in case we have to start over, to prevent idle
+	 * notifications from happening and new suspend requests from being
+	 * scheduled.
+	 */
+	dev->power.runtime_status |= RPM_SUSPEND;
+
+	if (cancel_pending) {
+		/* Cancel the concurrent pending suspend request. */
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	if (dev->power.runtime_status & RPM_NOTIFY) {
+		/* Idle notification is pending, cancel it. */
+		dev->power.runtime_status &= ~RPM_NOTIFY;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+	dev->power.runtime_status |= RPM_SUSPENDING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status &= RPM_NOTIFYING;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	wake_up_all(&dev->power.wait_queue);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent) {
+		parent = dev->parent;
+		atomic_dec(&parent->power.child_count);
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent && !parent->power.ignore_children)
+		pm_runtime_idle(parent);
+
+	if (error == -EBUSY || error == -EAGAIN)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run __pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (dev->power.runtime_status & RPM_SUSPENDED)
+		/* Device is suspended, nothing to do. */
+		error = -ECANCELED;
+	else if (atomic_read(&dev->power.resume_count) > 0
+	    || dev->power.runtime_disabled
+	    || (dev->power.runtime_status & (RPM_WAKE|RPM_RESUME|RPM_RESUMING)))
+		/* Can't suspend now. */
+		error = -EAGAIN;
+	else if (dev->power.runtime_status &
+				(RPM_IDLE|RPM_SUSPEND|RPM_SUSPENDING))
+		/* Already suspending or suspend request pending. */
+		error = -EINPROGRESS;
+	else if (!pm_children_suspended(dev))
+		error = -EBUSY;
+	if (error)
+		goto out;
+
+	dev->power.runtime_status |= RPM_IDLE;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Carry out run-time resume of given device.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device can be woken up and run the ->runtime_resume() callback
+ * provided by its bus type.  If another resume has been started earlier, wait
+ * for it to finish.  If there's a suspend running in parallel with this
+ * function, wait for it to finish and resume the device.  If there's a suspend
+ * request or idle notification pending, cancel it.  If there's a resume request
+ * scheduled while this function is running and @sync is 'true', cancel that
+ * request.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	int error = -EINVAL;
+
+	might_sleep();
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR)
+		goto out;
+
+	if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = 0;
+		goto out;
+	}
+
+	if (dev->power.runtime_status & RPM_RESUMING) {
+		DEFINE_WAIT(wait);
+
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_RESUMING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		error = dev->power.runtime_error;
+		goto out;
+	}
+
+	if (dev->power.runtime_disabled) {
+		/* Clear the resume flags before returning. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+		error = -EAGAIN;
+		goto out;
+	}
+
+	/*
+	 * Set RPM_RESUME in case we have to start over, to prevent suspends and
+	 * idle notifications from happening and new resume requests from being
+	 * queued up.
+	 */
+	dev->power.runtime_status |= RPM_RESUME;
+
+	if (dev->power.runtime_status & RPM_SUSPENDING) {
+		DEFINE_WAIT(wait);
+
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_SUSPENDING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+		goto repeat_locked;
+	}
+
+	if ((dev->power.runtime_status & (RPM_IDLE|RPM_WAKE))
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request is pending that we're supposed to cancel. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+		goto repeat;
+	}
+
+	/*
+	 * Clear RPM_SUSPEND in case we've been running in parallel with
+	 * __pm_runtime_suspend().
+	 */
+	dev->power.runtime_status &= ~RPM_SUSPEND;
+
+	if ((sync && (dev->power.runtime_status & RPM_WAKE))
+	    || (dev->power.runtime_status & RPM_NOTIFY)) {
+		/*
+		 * Idle notification is pending and since we're running the
+		 * device is not idle, or there's a resume request pending and
+		 * we're not running as a result of it.  In both cases it's
+		 * better to cancel the request.
+		 */
+		dev->power.runtime_status &= ~(RPM_NOTIFY|RPM_WAKE);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+		goto repeat;
+	}
+
+	/* Clear the resume status flags in case we have to return. */
+	dev->power.runtime_status &= ~(RPM_WAKE|RPM_RESUME);
+
+	if (!(dev->power.runtime_status & RPM_SUSPENDED)) {
+		/*
+		 * If the device is not suspended at this point, we have
+		 * nothing to do.
+		 */
+		error = 0;
+		goto out;
+	}
+
+	if (!put_parent && parent) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		put_parent = true;
+		error = pm_runtime_get_and_resume(parent);
+		if (error)
+			goto out_parent;
+
+		error = -EINVAL;
+		dev->power.runtime_status |= RPM_RESUME;
+		goto repeat;
+	}
+
+	dev->power.runtime_status &= ~RPM_SUSPENDED;
+
+	if (parent)
+		atomic_inc(&parent->power.child_count);
+
+	dev->power.runtime_status |= RPM_RESUMING;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status &= ~RPM_RESUMING;
+	if (error)
+		dev->power.runtime_status = RPM_ERROR;
+	dev->power.runtime_error = error;
+	wake_up_all(&dev->power.wait_queue);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ out_parent:
+	if (put_parent)
+		pm_runtime_put(parent);
+
+	if (!error)
+		pm_runtime_idle(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_resume);
+
+/**
+ * pm_runtime_resume_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the work has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_resume_work(struct work_struct *work)
+{
+	__pm_runtime_resume(pm_work_to_device(work), false);
+}
+
+/**
+ * pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ */
+int pm_request_resume(struct device *dev)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		error = -EINVAL;
+	} else if (dev->power.runtime_disabled) {
+		error = -EAGAIN;
+	} else if (!(dev->power.runtime_status & ~RPM_NOTIFYING)) {
+		/* Device is operational, nothing to do. */
+		error = -ECANCELED;
+	} else if (dev->power.runtime_status & RPM_NOTIFY) {
+		/*
+		 * Device has an idle notification pending, which is not a
+		 * problem unless there's a suspend request pending in addition
+		 * to it.  In that case, ask the idle notification work function
+		 * to cancel the suspend request.
+		 */
+		if (dev->power.runtime_status & RPM_IDLE) {
+			dev->power.runtime_status &= ~RPM_IDLE;
+			dev->power.runtime_status |= RPM_WAKE;
+			error = -EALREADY;
+		} else {
+			error = -ECANCELED;
+		}
+	} else if (dev->power.runtime_status &
+				(RPM_WAKE|RPM_RESUME|RPM_RESUMING)) {
+		error = -EINPROGRESS;
+	}
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status & RPM_IDLE) {
+		/* Suspend request is pending.  Make sure it won't run. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+		INIT_WORK(&dev->power.work, pm_runtime_idle_work);
+		error = -EALREADY;
+		goto queue;
+	}
+
+	if ((dev->power.runtime_status & RPM_SUSPENDED) && parent)
+		atomic_inc(&parent->power.child_count);
+
+	INIT_WORK(&dev->power.work, pm_runtime_resume_work);
+
+ queue:
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_set_status - Set run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New run-time PM status of the device.
+ *
+ * If run-time PM of the device is disabled or its run-time PM status is
+ * RPM_ERROR, the status may be set either to RPM_ACTIVE, or to RPM_SUSPENDED,
+ * as long as that reflects the actual state of the device.
+ */
+void __pm_runtime_set_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status == status)
+		goto out;
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && !dev->power.runtime_disabled)
+		goto out;
+
+	if (parent) {
+		if (status == RPM_SUSPENDED)
+			atomic_dec(&parent->power.child_count);
+		else if (dev->power.runtime_status == RPM_SUSPENDED)
+			atomic_inc(&parent->power.child_count);
+	}
+	dev->power.runtime_status = status;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!dev->power.runtime_disabled)
+		goto out;
+
+	if (!__pm_runtime_put(dev))
+		dev_WARN(dev, "Unbalanced %s", __func__);
+
+	if (!atomic_read(&dev->power.resume_count))
+		dev->power.runtime_disabled = false;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ *
+ * Set the power.runtime_disabled flag for the device, cancel all pending
+ * run-time PM requests for it and wait for operations in progress to complete.
+ * The device can be either active or suspended after its run-time PM has been
+ * disabled.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	pm_runtime_get(dev);
+
+	if (dev->power.runtime_disabled)
+		goto out;
+
+	dev->power.runtime_disabled = true;
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & (RPM_IDLE|RPM_WAKE))
+	    && !(dev->power.runtime_status &
+				(RPM_SUSPEND|RPM_SUSPENDING|RPM_SUSPENDED))) {
+		/* Suspend request pending. */
+		dev->power.runtime_status &= ~RPM_IDLE;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & (RPM_WAKE|RPM_NOTIFY))) {
+		/* Resume request or idle notification pending. */
+		dev->power.runtime_status &= ~(RPM_WAKE|RPM_NOTIFY);
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & (RPM_SUSPENDING|RPM_RESUMING))) {
+		DEFINE_WAIT(wait);
+
+		/* Suspend or wake-up in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status &
+					(RPM_SUSPENDING|RPM_RESUMING)))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+
+	if (dev->power.runtime_status != RPM_ERROR
+	    && (dev->power.runtime_status & RPM_NOTIFYING)) {
+		DEFINE_WAIT(wait);
+
+		/* Idle notification in progress. */
+		for (;;) {
+			prepare_to_wait(&dev->power.wait_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!(dev->power.runtime_status & RPM_NOTIFYING))
+				break;
+
+			spin_unlock_irqrestore(&dev->power.lock, flags);
+
+			schedule();
+
+			spin_lock_irqsave(&dev->power.lock, flags);
+		}
+		finish_wait(&dev->power.wait_queue, &wait);
+	}
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+
+	dev->power.runtime_status = RPM_ACTIVE;
+	dev->power.runtime_disabled = true;
+	atomic_set(&dev->power.resume_count, 1);
+
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+	init_waitqueue_head(&dev->power.wait_queue);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	if (dev->parent)
+		atomic_inc(&dev->parent->power.child_count);
+}
+
+/**
+ * pm_runtime_remove - Prepare for removing a device from device hierarchy.
+ * @dev: Device object being removed from device hierarchy.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	struct device *parent = dev->parent;
+
+	pm_runtime_disable(dev);
+
+	if (dev->power.runtime_status != RPM_SUSPENDED && parent) {
+		atomic_dec(&parent->power.child_count);
+		if (!parent->power.ignore_children)
+			pm_runtime_idle(parent);
+	}
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,124 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern void pm_runtime_put_atomic(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern int pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int pm_request_resume(struct device *dev);
+extern void __pm_runtime_set_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline void pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !dev->power.runtime_disabled;
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline void pm_runtime_put_atomic(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	return -ENOSYS;
+}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int pm_request_resume(struct device *dev)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_set_status(struct device *dev,
+					    unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline int pm_runtime_get_and_resume(struct device *dev)
+{
+	pm_runtime_get(dev);
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline void pm_runtime_set_active(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_set_suspended(struct device *dev)
+{
+	__pm_runtime_set_status(dev, RPM_SUSPENDED);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,7 +203,10 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
-	ret = really_probe(dev, drv);
+	ret = pm_runtime_get_and_resume(dev);
+	if (!ret)
+		ret = really_probe(dev, drv);
+	__pm_runtime_put(dev);
 
 	return ret;
 }
@@ -306,6 +310,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -324,6 +330,8 @@ static void __device_release_driver(stru
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBOUND_DRIVER,
 						     dev);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 21:13                   ` Alan Stern
  2009-06-26 22:32                     ` Rafael J. Wysocki
  2009-06-26 22:32                     ` Rafael J. Wysocki
@ 2009-06-28 10:25                     ` Rafael J. Wysocki
  2009-06-28 21:07                       ` Alan Stern
  2009-06-28 21:07                       ` Alan Stern
  2009-06-28 10:25                     ` Rafael J. Wysocki
  3 siblings, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-28 10:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 26 June 2009, Alan Stern wrote:
> On Fri, 26 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > It occurs to me that the problem would be solved if were a cancel_work
> > > routine.  In the same vein, it ought to be possible for
> > > cancel_delayed_work to run in interrupt context.  I'll see what can be
> > > done.
> > 
> > Having looked at the workqueue code I'm not sure if there's a way to implement
> > that in a non-racy way.  Which may be the reason why there are no such
> > functions already. :-)
> 
> Well, I'll give it a try.

I did that too. :-)

It seems that if we do something like in the appended patch, then
cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
$subject patch slightly.

Best,
Rafael

---
 include/linux/workqueue.h |    2 ++
 kernel/workqueue.c        |   40 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 38 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/workqueue.h
===================================================================
--- linux-2.6.orig/include/linux/workqueue.h
+++ linux-2.6/include/linux/workqueue.h
@@ -223,6 +223,7 @@ int execute_in_process_context(work_func
 extern int flush_work(struct work_struct *work);
 
 extern int cancel_work_sync(struct work_struct *work);
+extern int cancel_work(struct work_struct *work);
 
 /*
  * Kill off a pending schedule_delayed_work().  Note that the work callback
@@ -241,6 +242,7 @@ static inline int cancel_delayed_work(st
 }
 
 extern int cancel_delayed_work_sync(struct delayed_work *work);
+extern int cancel_delayed_work_dequeue(struct delayed_work *dwork);
 
 /* Obsolete. use cancel_delayed_work_sync() */
 static inline
Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c
+++ linux-2.6/kernel/workqueue.c
@@ -536,7 +536,7 @@ static void wait_on_work(struct work_str
 		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
 }
 
-static int __cancel_work_timer(struct work_struct *work,
+static int __cancel_work_timer(struct work_struct *work, bool wait,
 				struct timer_list* timer)
 {
 	int ret;
@@ -545,7 +545,8 @@ static int __cancel_work_timer(struct wo
 		ret = (timer && likely(del_timer(timer)));
 		if (!ret)
 			ret = try_to_grab_pending(work);
-		wait_on_work(work);
+		if (wait)
+			wait_on_work(work);
 	} while (unlikely(ret < 0));
 
 	work_clear_pending(work);
@@ -575,11 +576,27 @@ static int __cancel_work_timer(struct wo
  */
 int cancel_work_sync(struct work_struct *work)
 {
-	return __cancel_work_timer(work, NULL);
+	return __cancel_work_timer(work, true, NULL);
 }
 EXPORT_SYMBOL_GPL(cancel_work_sync);
 
 /**
+ * cancel_work - kill off a work without waiting for its callback to terminate
+ * @work: the work which is to be canceled
+ *
+ * Returns true if @work was pending.
+ *
+ * cancel_work() will cancel the work if it is queued, but it will not block
+ * until the works callback completes.  Apart from this, it works like
+ * cancel_work_sync().
+ */
+int cancel_work(struct work_struct *work)
+{
+	return __cancel_work_timer(work, false, NULL);
+}
+EXPORT_SYMBOL_GPL(cancel_work);
+
+/**
  * cancel_delayed_work_sync - reliably kill off a delayed work.
  * @dwork: the delayed work struct
  *
@@ -590,10 +607,25 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
  */
 int cancel_delayed_work_sync(struct delayed_work *dwork)
 {
-	return __cancel_work_timer(&dwork->work, &dwork->timer);
+	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
 }
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
+/**
+ * cancel_delayed_work_dequeue - kill off a delayed work.
+ * @dwork: the delayed work struct
+ *
+ * Returns true if @dwork was pending.
+ *
+ * cancel_delayed_work_dequeue() will not wait for the work's callback to
+ * terminate.  Apart from this it works like cancel_delayed_work_sync().
+ */
+int cancel_delayed_work_dequeue(struct delayed_work *dwork)
+{
+	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
+}
+EXPORT_SYMBOL(cancel_delayed_work_dequeue);
+
 static struct workqueue_struct *keventd_wq __read_mostly;
 
 /**

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-26 21:13                   ` Alan Stern
                                       ` (2 preceding siblings ...)
  2009-06-28 10:25                     ` Rafael J. Wysocki
@ 2009-06-28 10:25                     ` Rafael J. Wysocki
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-28 10:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 26 June 2009, Alan Stern wrote:
> On Fri, 26 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > It occurs to me that the problem would be solved if were a cancel_work
> > > routine.  In the same vein, it ought to be possible for
> > > cancel_delayed_work to run in interrupt context.  I'll see what can be
> > > done.
> > 
> > Having looked at the workqueue code I'm not sure if there's a way to implement
> > that in a non-racy way.  Which may be the reason why there are no such
> > functions already. :-)
> 
> Well, I'll give it a try.

I did that too. :-)

It seems that if we do something like in the appended patch, then
cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
$subject patch slightly.

Best,
Rafael

---
 include/linux/workqueue.h |    2 ++
 kernel/workqueue.c        |   40 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 38 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/workqueue.h
===================================================================
--- linux-2.6.orig/include/linux/workqueue.h
+++ linux-2.6/include/linux/workqueue.h
@@ -223,6 +223,7 @@ int execute_in_process_context(work_func
 extern int flush_work(struct work_struct *work);
 
 extern int cancel_work_sync(struct work_struct *work);
+extern int cancel_work(struct work_struct *work);
 
 /*
  * Kill off a pending schedule_delayed_work().  Note that the work callback
@@ -241,6 +242,7 @@ static inline int cancel_delayed_work(st
 }
 
 extern int cancel_delayed_work_sync(struct delayed_work *work);
+extern int cancel_delayed_work_dequeue(struct delayed_work *dwork);
 
 /* Obsolete. use cancel_delayed_work_sync() */
 static inline
Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c
+++ linux-2.6/kernel/workqueue.c
@@ -536,7 +536,7 @@ static void wait_on_work(struct work_str
 		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
 }
 
-static int __cancel_work_timer(struct work_struct *work,
+static int __cancel_work_timer(struct work_struct *work, bool wait,
 				struct timer_list* timer)
 {
 	int ret;
@@ -545,7 +545,8 @@ static int __cancel_work_timer(struct wo
 		ret = (timer && likely(del_timer(timer)));
 		if (!ret)
 			ret = try_to_grab_pending(work);
-		wait_on_work(work);
+		if (wait)
+			wait_on_work(work);
 	} while (unlikely(ret < 0));
 
 	work_clear_pending(work);
@@ -575,11 +576,27 @@ static int __cancel_work_timer(struct wo
  */
 int cancel_work_sync(struct work_struct *work)
 {
-	return __cancel_work_timer(work, NULL);
+	return __cancel_work_timer(work, true, NULL);
 }
 EXPORT_SYMBOL_GPL(cancel_work_sync);
 
 /**
+ * cancel_work - kill off a work without waiting for its callback to terminate
+ * @work: the work which is to be canceled
+ *
+ * Returns true if @work was pending.
+ *
+ * cancel_work() will cancel the work if it is queued, but it will not block
+ * until the works callback completes.  Apart from this, it works like
+ * cancel_work_sync().
+ */
+int cancel_work(struct work_struct *work)
+{
+	return __cancel_work_timer(work, false, NULL);
+}
+EXPORT_SYMBOL_GPL(cancel_work);
+
+/**
  * cancel_delayed_work_sync - reliably kill off a delayed work.
  * @dwork: the delayed work struct
  *
@@ -590,10 +607,25 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
  */
 int cancel_delayed_work_sync(struct delayed_work *dwork)
 {
-	return __cancel_work_timer(&dwork->work, &dwork->timer);
+	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
 }
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
+/**
+ * cancel_delayed_work_dequeue - kill off a delayed work.
+ * @dwork: the delayed work struct
+ *
+ * Returns true if @dwork was pending.
+ *
+ * cancel_delayed_work_dequeue() will not wait for the work's callback to
+ * terminate.  Apart from this it works like cancel_delayed_work_sync().
+ */
+int cancel_delayed_work_dequeue(struct delayed_work *dwork)
+{
+	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
+}
+EXPORT_SYMBOL(cancel_delayed_work_dequeue);
+
 static struct workqueue_struct *keventd_wq __read_mostly;
 
 /**

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-28 10:25                     ` Rafael J. Wysocki
  2009-06-28 21:07                       ` Alan Stern
@ 2009-06-28 21:07                       ` Alan Stern
  2009-06-29  0:15                         ` Rafael J. Wysocki
  2009-06-29  0:15                         ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-28 21:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sun, 28 Jun 2009, Rafael J. Wysocki wrote:

> It seems that if we do something like in the appended patch, then
> cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
> $subject patch slightly.

I merged your patch with my own work, leading to the patch below.

There were a bunch of things I didn't like about the existing code,
particularly cancel_delayed_work.  To start with, it seems like a large
enough routine that it shouldn't be inlined.  More importantly, it
foolishly calls del_timer_sync, resulting in the unnecessary
restriction that it cannot be used in_interrupt.  Finally, although it
will deactivate a delayed_work's timer, it doesn't even try to remove
the item from the workqueue if the timer has already expired.

Your cancel_delayed_work_dequeue is better -- so much better that I
don't see any reason to keep the original cancel_delayed_work at all.  
I got rid of it and used your routine instead.

I also changed the comments you wrote for cancel_work.  You can see 
that now they are much more explicit and complete.

The original version of __cancel_work_timer is not safe to use
in_interrupt.  If it is called from a handler whose IRQ interrupted
delayed_work_timer_fn, it can loop indefinitely.  Therefore I added a
check; if it finds that the work_struct is currently being enqueued and
it is running in_interrupt, it gives up right away.  There are a few
other improvements too.

Consequently it is now safe to call cancel_work and cancel_delayed_work
in_interrupt or while holding a spinlock.  This means you can use these
functions to cancel the various PM runtime work items whenever needed.  
As a result, you don't need two work_structs in dev_pm_info; a single
delayed_work will be enough.

Tell me what you think.

Alan Stern



Index: usb-2.6/include/linux/workqueue.h
===================================================================
--- usb-2.6.orig/include/linux/workqueue.h
+++ usb-2.6/include/linux/workqueue.h
@@ -223,24 +223,10 @@ int execute_in_process_context(work_func
 extern int flush_work(struct work_struct *work);
 
 extern int cancel_work_sync(struct work_struct *work);
-
-/*
- * Kill off a pending schedule_delayed_work().  Note that the work callback
- * function may still be running on return from cancel_delayed_work(), unless
- * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
- * cancel_work_sync() to wait on it.
- */
-static inline int cancel_delayed_work(struct delayed_work *work)
-{
-	int ret;
-
-	ret = del_timer_sync(&work->timer);
-	if (ret)
-		work_clear_pending(&work->work);
-	return ret;
-}
+extern int cancel_work(struct work_struct *work);
 
 extern int cancel_delayed_work_sync(struct delayed_work *work);
+extern int cancel_delayed_work(struct delayed_work *dwork);
 
 /* Obsolete. use cancel_delayed_work_sync() */
 static inline
Index: usb-2.6/kernel/workqueue.c
===================================================================
--- usb-2.6.orig/kernel/workqueue.c
+++ usb-2.6/kernel/workqueue.c
@@ -465,6 +465,7 @@ static int try_to_grab_pending(struct wo
 {
 	struct cpu_workqueue_struct *cwq;
 	int ret = -1;
+	unsigned long flags;
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
 		return 0;
@@ -478,7 +479,7 @@ static int try_to_grab_pending(struct wo
 	if (!cwq)
 		return ret;
 
-	spin_lock_irq(&cwq->lock);
+	spin_lock_irqsave(&cwq->lock, flags);
 	if (!list_empty(&work->entry)) {
 		/*
 		 * This work is queued, but perhaps we locked the wrong cwq.
@@ -491,7 +492,7 @@ static int try_to_grab_pending(struct wo
 			ret = 1;
 		}
 	}
-	spin_unlock_irq(&cwq->lock);
+	spin_unlock_irqrestore(&cwq->lock, flags);
 
 	return ret;
 }
@@ -536,18 +537,26 @@ static void wait_on_work(struct work_str
 		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
 }
 
-static int __cancel_work_timer(struct work_struct *work,
+static int __cancel_work_timer(struct work_struct *work, bool wait,
 				struct timer_list* timer)
 {
 	int ret;
 
-	do {
-		ret = (timer && likely(del_timer(timer)));
-		if (!ret)
-			ret = try_to_grab_pending(work);
-		wait_on_work(work);
-	} while (unlikely(ret < 0));
+	if (timer && likely(del_timer(timer))) {
+		ret = 1;
+		goto done;
+	}
 
+	for (;;) {
+		ret = try_to_grab_pending(work);
+		if (likely(ret >= 0))
+			break;
+		if (in_interrupt())
+			return ret;
+	}
+	if (ret == 0 && wait)
+		wait_on_work(work);
+ done:
 	work_clear_pending(work);
 	return ret;
 }
@@ -575,11 +584,43 @@ static int __cancel_work_timer(struct wo
  */
 int cancel_work_sync(struct work_struct *work)
 {
-	return __cancel_work_timer(work, NULL);
+	return __cancel_work_timer(work, true, NULL);
 }
 EXPORT_SYMBOL_GPL(cancel_work_sync);
 
 /**
+ * cancel_work - try to cancel a pending work_struct.
+ * @work: the work_struct to cancel
+ *
+ * Try to cancel a pending work_struct before it starts running.
+ * Upon return, @work may safely be reused if the return value
+ * is 1 or the return value is 0 and the work callback function
+ * doesn't resubmit @work.
+ *
+ * The callback function may be running upon return if the return value
+ * is <= 0; use cancel_work_sync() to wait for the callback function
+ * to finish.
+ *
+ * There's not much point using this routine unless you can guarantee
+ * that neither the callback function nor anything else is in the
+ * process of submitting @work (or is about to do so).  The only good
+ * reason might be that optimistically trying to cancel @work has less
+ * overhead than letting it go ahead and run.
+ *
+ * This routine may be called from interrupt context.
+ *
+ * Returns: 1 if @work was removed from its workqueue,
+ *	    0 if @work was not pending (may be running),
+ *	   -1 if @work was concurrently being enqueued and we were
+ *		called in_interrupt.
+ */
+int cancel_work(struct work_struct *work)
+{
+	return __cancel_work_timer(work, false, NULL);
+}
+EXPORT_SYMBOL_GPL(cancel_work);
+
+/**
  * cancel_delayed_work_sync - reliably kill off a delayed work.
  * @dwork: the delayed work struct
  *
@@ -590,10 +631,24 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
  */
 int cancel_delayed_work_sync(struct delayed_work *dwork)
 {
-	return __cancel_work_timer(&dwork->work, &dwork->timer);
+	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
 }
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
+/**
+ * cancel_delayed_work - try to cancel a delayed_work_struct.
+ * @dwork: the delayed_work_struct to cancel
+ *
+ * Try to cancel a pending delayed_work, either by deactivating its
+ * timer or by removing it from its workqueue.  This routine is just
+ * like cancel_work() except that it handles a delayed_work.
+ */
+int cancel_delayed_work(struct delayed_work *dwork)
+{
+	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
+}
+EXPORT_SYMBOL(cancel_delayed_work);
+
 static struct workqueue_struct *keventd_wq __read_mostly;
 
 /**


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-28 10:25                     ` Rafael J. Wysocki
@ 2009-06-28 21:07                       ` Alan Stern
  2009-06-28 21:07                       ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-28 21:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sun, 28 Jun 2009, Rafael J. Wysocki wrote:

> It seems that if we do something like in the appended patch, then
> cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
> $subject patch slightly.

I merged your patch with my own work, leading to the patch below.

There were a bunch of things I didn't like about the existing code,
particularly cancel_delayed_work.  To start with, it seems like a large
enough routine that it shouldn't be inlined.  More importantly, it
foolishly calls del_timer_sync, resulting in the unnecessary
restriction that it cannot be used in_interrupt.  Finally, although it
will deactivate a delayed_work's timer, it doesn't even try to remove
the item from the workqueue if the timer has already expired.

Your cancel_delayed_work_dequeue is better -- so much better that I
don't see any reason to keep the original cancel_delayed_work at all.  
I got rid of it and used your routine instead.

I also changed the comments you wrote for cancel_work.  You can see 
that now they are much more explicit and complete.

The original version of __cancel_work_timer is not safe to use
in_interrupt.  If it is called from a handler whose IRQ interrupted
delayed_work_timer_fn, it can loop indefinitely.  Therefore I added a
check; if it finds that the work_struct is currently being enqueued and
it is running in_interrupt, it gives up right away.  There are a few
other improvements too.

Consequently it is now safe to call cancel_work and cancel_delayed_work
in_interrupt or while holding a spinlock.  This means you can use these
functions to cancel the various PM runtime work items whenever needed.  
As a result, you don't need two work_structs in dev_pm_info; a single
delayed_work will be enough.

Tell me what you think.

Alan Stern



Index: usb-2.6/include/linux/workqueue.h
===================================================================
--- usb-2.6.orig/include/linux/workqueue.h
+++ usb-2.6/include/linux/workqueue.h
@@ -223,24 +223,10 @@ int execute_in_process_context(work_func
 extern int flush_work(struct work_struct *work);
 
 extern int cancel_work_sync(struct work_struct *work);
-
-/*
- * Kill off a pending schedule_delayed_work().  Note that the work callback
- * function may still be running on return from cancel_delayed_work(), unless
- * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
- * cancel_work_sync() to wait on it.
- */
-static inline int cancel_delayed_work(struct delayed_work *work)
-{
-	int ret;
-
-	ret = del_timer_sync(&work->timer);
-	if (ret)
-		work_clear_pending(&work->work);
-	return ret;
-}
+extern int cancel_work(struct work_struct *work);
 
 extern int cancel_delayed_work_sync(struct delayed_work *work);
+extern int cancel_delayed_work(struct delayed_work *dwork);
 
 /* Obsolete. use cancel_delayed_work_sync() */
 static inline
Index: usb-2.6/kernel/workqueue.c
===================================================================
--- usb-2.6.orig/kernel/workqueue.c
+++ usb-2.6/kernel/workqueue.c
@@ -465,6 +465,7 @@ static int try_to_grab_pending(struct wo
 {
 	struct cpu_workqueue_struct *cwq;
 	int ret = -1;
+	unsigned long flags;
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
 		return 0;
@@ -478,7 +479,7 @@ static int try_to_grab_pending(struct wo
 	if (!cwq)
 		return ret;
 
-	spin_lock_irq(&cwq->lock);
+	spin_lock_irqsave(&cwq->lock, flags);
 	if (!list_empty(&work->entry)) {
 		/*
 		 * This work is queued, but perhaps we locked the wrong cwq.
@@ -491,7 +492,7 @@ static int try_to_grab_pending(struct wo
 			ret = 1;
 		}
 	}
-	spin_unlock_irq(&cwq->lock);
+	spin_unlock_irqrestore(&cwq->lock, flags);
 
 	return ret;
 }
@@ -536,18 +537,26 @@ static void wait_on_work(struct work_str
 		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
 }
 
-static int __cancel_work_timer(struct work_struct *work,
+static int __cancel_work_timer(struct work_struct *work, bool wait,
 				struct timer_list* timer)
 {
 	int ret;
 
-	do {
-		ret = (timer && likely(del_timer(timer)));
-		if (!ret)
-			ret = try_to_grab_pending(work);
-		wait_on_work(work);
-	} while (unlikely(ret < 0));
+	if (timer && likely(del_timer(timer))) {
+		ret = 1;
+		goto done;
+	}
 
+	for (;;) {
+		ret = try_to_grab_pending(work);
+		if (likely(ret >= 0))
+			break;
+		if (in_interrupt())
+			return ret;
+	}
+	if (ret == 0 && wait)
+		wait_on_work(work);
+ done:
 	work_clear_pending(work);
 	return ret;
 }
@@ -575,11 +584,43 @@ static int __cancel_work_timer(struct wo
  */
 int cancel_work_sync(struct work_struct *work)
 {
-	return __cancel_work_timer(work, NULL);
+	return __cancel_work_timer(work, true, NULL);
 }
 EXPORT_SYMBOL_GPL(cancel_work_sync);
 
 /**
+ * cancel_work - try to cancel a pending work_struct.
+ * @work: the work_struct to cancel
+ *
+ * Try to cancel a pending work_struct before it starts running.
+ * Upon return, @work may safely be reused if the return value
+ * is 1 or the return value is 0 and the work callback function
+ * doesn't resubmit @work.
+ *
+ * The callback function may be running upon return if the return value
+ * is <= 0; use cancel_work_sync() to wait for the callback function
+ * to finish.
+ *
+ * There's not much point using this routine unless you can guarantee
+ * that neither the callback function nor anything else is in the
+ * process of submitting @work (or is about to do so).  The only good
+ * reason might be that optimistically trying to cancel @work has less
+ * overhead than letting it go ahead and run.
+ *
+ * This routine may be called from interrupt context.
+ *
+ * Returns: 1 if @work was removed from its workqueue,
+ *	    0 if @work was not pending (may be running),
+ *	   -1 if @work was concurrently being enqueued and we were
+ *		called in_interrupt.
+ */
+int cancel_work(struct work_struct *work)
+{
+	return __cancel_work_timer(work, false, NULL);
+}
+EXPORT_SYMBOL_GPL(cancel_work);
+
+/**
  * cancel_delayed_work_sync - reliably kill off a delayed work.
  * @dwork: the delayed work struct
  *
@@ -590,10 +631,24 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
  */
 int cancel_delayed_work_sync(struct delayed_work *dwork)
 {
-	return __cancel_work_timer(&dwork->work, &dwork->timer);
+	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
 }
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
+/**
+ * cancel_delayed_work - try to cancel a delayed_work_struct.
+ * @dwork: the delayed_work_struct to cancel
+ *
+ * Try to cancel a pending delayed_work, either by deactivating its
+ * timer or by removing it from its workqueue.  This routine is just
+ * like cancel_work() except that it handles a delayed_work.
+ */
+int cancel_delayed_work(struct delayed_work *dwork)
+{
+	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
+}
+EXPORT_SYMBOL(cancel_delayed_work);
+
 static struct workqueue_struct *keventd_wq __read_mostly;
 
 /**

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-28 21:07                       ` Alan Stern
  2009-06-29  0:15                         ` Rafael J. Wysocki
@ 2009-06-29  0:15                         ` Rafael J. Wysocki
  2009-06-29  3:05                           ` Alan Stern
  2009-06-29  3:05                           ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29  0:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sunday 28 June 2009, Alan Stern wrote:
> On Sun, 28 Jun 2009, Rafael J. Wysocki wrote:
> 
> > It seems that if we do something like in the appended patch, then
> > cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
> > $subject patch slightly.
> 
> I merged your patch with my own work, leading to the patch below.
> 
> There were a bunch of things I didn't like about the existing code,
> particularly cancel_delayed_work.  To start with, it seems like a large
> enough routine that it shouldn't be inlined.

Agreed.

> More importantly, it foolishly calls del_timer_sync, resulting in the
> unnecessary restriction that it cannot be used in_interrupt.  Finally,
> although it will deactivate a delayed_work's timer, it doesn't even try to
> remove the item from the workqueue if the timer has already expired.
> 
> Your cancel_delayed_work_dequeue is better -- so much better that I
> don't see any reason to keep the original cancel_delayed_work at all.  
> I got rid of it and used your routine instead.
> 
> I also changed the comments you wrote for cancel_work.  You can see 
> that now they are much more explicit and complete.
> 
> The original version of __cancel_work_timer is not safe to use
> in_interrupt.  If it is called from a handler whose IRQ interrupted
> delayed_work_timer_fn, it can loop indefinitely.

Right, I overlooked that.

> Therefore I added a check; if it finds that the work_struct is currently
> being enqueued and it is running in_interrupt, it gives up right away.

Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
the work to be queued and run?  Out of couriosity, what the caller is supposed
to do then?

> There are a few other improvements too.
> 
> Consequently it is now safe to call cancel_work and cancel_delayed_work
> in_interrupt or while holding a spinlock.  This means you can use these
> functions to cancel the various PM runtime work items whenever needed.  
> As a result, you don't need two work_structs in dev_pm_info; a single
> delayed_work will be enough.

Yes, I'm going to rebase the framework patch on top of this one.

> Tell me what you think.

I like the patch. :-)

Best,
Rafael

 
> Index: usb-2.6/include/linux/workqueue.h
> ===================================================================
> --- usb-2.6.orig/include/linux/workqueue.h
> +++ usb-2.6/include/linux/workqueue.h
> @@ -223,24 +223,10 @@ int execute_in_process_context(work_func
>  extern int flush_work(struct work_struct *work);
>  
>  extern int cancel_work_sync(struct work_struct *work);
> -
> -/*
> - * Kill off a pending schedule_delayed_work().  Note that the work callback
> - * function may still be running on return from cancel_delayed_work(), unless
> - * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
> - * cancel_work_sync() to wait on it.
> - */
> -static inline int cancel_delayed_work(struct delayed_work *work)
> -{
> -	int ret;
> -
> -	ret = del_timer_sync(&work->timer);
> -	if (ret)
> -		work_clear_pending(&work->work);
> -	return ret;
> -}
> +extern int cancel_work(struct work_struct *work);
>  
>  extern int cancel_delayed_work_sync(struct delayed_work *work);
> +extern int cancel_delayed_work(struct delayed_work *dwork);
>  
>  /* Obsolete. use cancel_delayed_work_sync() */
>  static inline
> Index: usb-2.6/kernel/workqueue.c
> ===================================================================
> --- usb-2.6.orig/kernel/workqueue.c
> +++ usb-2.6/kernel/workqueue.c
> @@ -465,6 +465,7 @@ static int try_to_grab_pending(struct wo
>  {
>  	struct cpu_workqueue_struct *cwq;
>  	int ret = -1;
> +	unsigned long flags;
>  
>  	if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
>  		return 0;
> @@ -478,7 +479,7 @@ static int try_to_grab_pending(struct wo
>  	if (!cwq)
>  		return ret;
>  
> -	spin_lock_irq(&cwq->lock);
> +	spin_lock_irqsave(&cwq->lock, flags);
>  	if (!list_empty(&work->entry)) {
>  		/*
>  		 * This work is queued, but perhaps we locked the wrong cwq.
> @@ -491,7 +492,7 @@ static int try_to_grab_pending(struct wo
>  			ret = 1;
>  		}
>  	}
> -	spin_unlock_irq(&cwq->lock);
> +	spin_unlock_irqrestore(&cwq->lock, flags);
>  
>  	return ret;
>  }
> @@ -536,18 +537,26 @@ static void wait_on_work(struct work_str
>  		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
>  }
>  
> -static int __cancel_work_timer(struct work_struct *work,
> +static int __cancel_work_timer(struct work_struct *work, bool wait,
>  				struct timer_list* timer)
>  {
>  	int ret;
>  
> -	do {
> -		ret = (timer && likely(del_timer(timer)));
> -		if (!ret)
> -			ret = try_to_grab_pending(work);
> -		wait_on_work(work);
> -	} while (unlikely(ret < 0));
> +	if (timer && likely(del_timer(timer))) {
> +		ret = 1;
> +		goto done;
> +	}
>  
> +	for (;;) {
> +		ret = try_to_grab_pending(work);
> +		if (likely(ret >= 0))
> +			break;
> +		if (in_interrupt())
> +			return ret;
> +	}
> +	if (ret == 0 && wait)
> +		wait_on_work(work);
> + done:
>  	work_clear_pending(work);
>  	return ret;
>  }
> @@ -575,11 +584,43 @@ static int __cancel_work_timer(struct wo
>   */
>  int cancel_work_sync(struct work_struct *work)
>  {
> -	return __cancel_work_timer(work, NULL);
> +	return __cancel_work_timer(work, true, NULL);
>  }
>  EXPORT_SYMBOL_GPL(cancel_work_sync);
>  
>  /**
> + * cancel_work - try to cancel a pending work_struct.
> + * @work: the work_struct to cancel
> + *
> + * Try to cancel a pending work_struct before it starts running.
> + * Upon return, @work may safely be reused if the return value
> + * is 1 or the return value is 0 and the work callback function
> + * doesn't resubmit @work.
> + *
> + * The callback function may be running upon return if the return value
> + * is <= 0; use cancel_work_sync() to wait for the callback function
> + * to finish.
> + *
> + * There's not much point using this routine unless you can guarantee
> + * that neither the callback function nor anything else is in the
> + * process of submitting @work (or is about to do so).  The only good
> + * reason might be that optimistically trying to cancel @work has less
> + * overhead than letting it go ahead and run.
> + *
> + * This routine may be called from interrupt context.
> + *
> + * Returns: 1 if @work was removed from its workqueue,
> + *	    0 if @work was not pending (may be running),
> + *	   -1 if @work was concurrently being enqueued and we were
> + *		called in_interrupt.
> + */
> +int cancel_work(struct work_struct *work)
> +{
> +	return __cancel_work_timer(work, false, NULL);
> +}
> +EXPORT_SYMBOL_GPL(cancel_work);
> +
> +/**
>   * cancel_delayed_work_sync - reliably kill off a delayed work.
>   * @dwork: the delayed work struct
>   *
> @@ -590,10 +631,24 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
>   */
>  int cancel_delayed_work_sync(struct delayed_work *dwork)
>  {
> -	return __cancel_work_timer(&dwork->work, &dwork->timer);
> +	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
>  }
>  EXPORT_SYMBOL(cancel_delayed_work_sync);
>  
> +/**
> + * cancel_delayed_work - try to cancel a delayed_work_struct.
> + * @dwork: the delayed_work_struct to cancel
> + *
> + * Try to cancel a pending delayed_work, either by deactivating its
> + * timer or by removing it from its workqueue.  This routine is just
> + * like cancel_work() except that it handles a delayed_work.
> + */
> +int cancel_delayed_work(struct delayed_work *dwork)
> +{
> +	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
> +}
> +EXPORT_SYMBOL(cancel_delayed_work);
> +
>  static struct workqueue_struct *keventd_wq __read_mostly;
>  
>  /**

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-28 21:07                       ` Alan Stern
@ 2009-06-29  0:15                         ` Rafael J. Wysocki
  2009-06-29  0:15                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29  0:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sunday 28 June 2009, Alan Stern wrote:
> On Sun, 28 Jun 2009, Rafael J. Wysocki wrote:
> 
> > It seems that if we do something like in the appended patch, then
> > cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
> > $subject patch slightly.
> 
> I merged your patch with my own work, leading to the patch below.
> 
> There were a bunch of things I didn't like about the existing code,
> particularly cancel_delayed_work.  To start with, it seems like a large
> enough routine that it shouldn't be inlined.

Agreed.

> More importantly, it foolishly calls del_timer_sync, resulting in the
> unnecessary restriction that it cannot be used in_interrupt.  Finally,
> although it will deactivate a delayed_work's timer, it doesn't even try to
> remove the item from the workqueue if the timer has already expired.
> 
> Your cancel_delayed_work_dequeue is better -- so much better that I
> don't see any reason to keep the original cancel_delayed_work at all.  
> I got rid of it and used your routine instead.
> 
> I also changed the comments you wrote for cancel_work.  You can see 
> that now they are much more explicit and complete.
> 
> The original version of __cancel_work_timer is not safe to use
> in_interrupt.  If it is called from a handler whose IRQ interrupted
> delayed_work_timer_fn, it can loop indefinitely.

Right, I overlooked that.

> Therefore I added a check; if it finds that the work_struct is currently
> being enqueued and it is running in_interrupt, it gives up right away.

Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
the work to be queued and run?  Out of couriosity, what the caller is supposed
to do then?

> There are a few other improvements too.
> 
> Consequently it is now safe to call cancel_work and cancel_delayed_work
> in_interrupt or while holding a spinlock.  This means you can use these
> functions to cancel the various PM runtime work items whenever needed.  
> As a result, you don't need two work_structs in dev_pm_info; a single
> delayed_work will be enough.

Yes, I'm going to rebase the framework patch on top of this one.

> Tell me what you think.

I like the patch. :-)

Best,
Rafael

 
> Index: usb-2.6/include/linux/workqueue.h
> ===================================================================
> --- usb-2.6.orig/include/linux/workqueue.h
> +++ usb-2.6/include/linux/workqueue.h
> @@ -223,24 +223,10 @@ int execute_in_process_context(work_func
>  extern int flush_work(struct work_struct *work);
>  
>  extern int cancel_work_sync(struct work_struct *work);
> -
> -/*
> - * Kill off a pending schedule_delayed_work().  Note that the work callback
> - * function may still be running on return from cancel_delayed_work(), unless
> - * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
> - * cancel_work_sync() to wait on it.
> - */
> -static inline int cancel_delayed_work(struct delayed_work *work)
> -{
> -	int ret;
> -
> -	ret = del_timer_sync(&work->timer);
> -	if (ret)
> -		work_clear_pending(&work->work);
> -	return ret;
> -}
> +extern int cancel_work(struct work_struct *work);
>  
>  extern int cancel_delayed_work_sync(struct delayed_work *work);
> +extern int cancel_delayed_work(struct delayed_work *dwork);
>  
>  /* Obsolete. use cancel_delayed_work_sync() */
>  static inline
> Index: usb-2.6/kernel/workqueue.c
> ===================================================================
> --- usb-2.6.orig/kernel/workqueue.c
> +++ usb-2.6/kernel/workqueue.c
> @@ -465,6 +465,7 @@ static int try_to_grab_pending(struct wo
>  {
>  	struct cpu_workqueue_struct *cwq;
>  	int ret = -1;
> +	unsigned long flags;
>  
>  	if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
>  		return 0;
> @@ -478,7 +479,7 @@ static int try_to_grab_pending(struct wo
>  	if (!cwq)
>  		return ret;
>  
> -	spin_lock_irq(&cwq->lock);
> +	spin_lock_irqsave(&cwq->lock, flags);
>  	if (!list_empty(&work->entry)) {
>  		/*
>  		 * This work is queued, but perhaps we locked the wrong cwq.
> @@ -491,7 +492,7 @@ static int try_to_grab_pending(struct wo
>  			ret = 1;
>  		}
>  	}
> -	spin_unlock_irq(&cwq->lock);
> +	spin_unlock_irqrestore(&cwq->lock, flags);
>  
>  	return ret;
>  }
> @@ -536,18 +537,26 @@ static void wait_on_work(struct work_str
>  		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
>  }
>  
> -static int __cancel_work_timer(struct work_struct *work,
> +static int __cancel_work_timer(struct work_struct *work, bool wait,
>  				struct timer_list* timer)
>  {
>  	int ret;
>  
> -	do {
> -		ret = (timer && likely(del_timer(timer)));
> -		if (!ret)
> -			ret = try_to_grab_pending(work);
> -		wait_on_work(work);
> -	} while (unlikely(ret < 0));
> +	if (timer && likely(del_timer(timer))) {
> +		ret = 1;
> +		goto done;
> +	}
>  
> +	for (;;) {
> +		ret = try_to_grab_pending(work);
> +		if (likely(ret >= 0))
> +			break;
> +		if (in_interrupt())
> +			return ret;
> +	}
> +	if (ret == 0 && wait)
> +		wait_on_work(work);
> + done:
>  	work_clear_pending(work);
>  	return ret;
>  }
> @@ -575,11 +584,43 @@ static int __cancel_work_timer(struct wo
>   */
>  int cancel_work_sync(struct work_struct *work)
>  {
> -	return __cancel_work_timer(work, NULL);
> +	return __cancel_work_timer(work, true, NULL);
>  }
>  EXPORT_SYMBOL_GPL(cancel_work_sync);
>  
>  /**
> + * cancel_work - try to cancel a pending work_struct.
> + * @work: the work_struct to cancel
> + *
> + * Try to cancel a pending work_struct before it starts running.
> + * Upon return, @work may safely be reused if the return value
> + * is 1 or the return value is 0 and the work callback function
> + * doesn't resubmit @work.
> + *
> + * The callback function may be running upon return if the return value
> + * is <= 0; use cancel_work_sync() to wait for the callback function
> + * to finish.
> + *
> + * There's not much point using this routine unless you can guarantee
> + * that neither the callback function nor anything else is in the
> + * process of submitting @work (or is about to do so).  The only good
> + * reason might be that optimistically trying to cancel @work has less
> + * overhead than letting it go ahead and run.
> + *
> + * This routine may be called from interrupt context.
> + *
> + * Returns: 1 if @work was removed from its workqueue,
> + *	    0 if @work was not pending (may be running),
> + *	   -1 if @work was concurrently being enqueued and we were
> + *		called in_interrupt.
> + */
> +int cancel_work(struct work_struct *work)
> +{
> +	return __cancel_work_timer(work, false, NULL);
> +}
> +EXPORT_SYMBOL_GPL(cancel_work);
> +
> +/**
>   * cancel_delayed_work_sync - reliably kill off a delayed work.
>   * @dwork: the delayed work struct
>   *
> @@ -590,10 +631,24 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
>   */
>  int cancel_delayed_work_sync(struct delayed_work *dwork)
>  {
> -	return __cancel_work_timer(&dwork->work, &dwork->timer);
> +	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
>  }
>  EXPORT_SYMBOL(cancel_delayed_work_sync);
>  
> +/**
> + * cancel_delayed_work - try to cancel a delayed_work_struct.
> + * @dwork: the delayed_work_struct to cancel
> + *
> + * Try to cancel a pending delayed_work, either by deactivating its
> + * timer or by removing it from its workqueue.  This routine is just
> + * like cancel_work() except that it handles a delayed_work.
> + */
> +int cancel_delayed_work(struct delayed_work *dwork)
> +{
> +	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
> +}
> +EXPORT_SYMBOL(cancel_delayed_work);
> +
>  static struct workqueue_struct *keventd_wq __read_mostly;
>  
>  /**

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29  0:15                         ` Rafael J. Wysocki
@ 2009-06-29  3:05                           ` Alan Stern
  2009-06-29 14:09                             ` Rafael J. Wysocki
  2009-06-29 14:09                             ` Rafael J. Wysocki
  2009-06-29  3:05                           ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29  3:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > The original version of __cancel_work_timer is not safe to use
> > in_interrupt.  If it is called from a handler whose IRQ interrupted
> > delayed_work_timer_fn, it can loop indefinitely.
> 
> Right, I overlooked that.
> 
> > Therefore I added a check; if it finds that the work_struct is currently
> > being enqueued and it is running in_interrupt, it gives up right away.
> 
> Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
> the work to be queued and run?

Yes.  That's better than leaving the work queued but with the "pending" 
flag cleared.  :-)

>  Out of couriosity, what the caller is supposed
> to do then?

In the case of cancel_work, this is a simple race.  The work_struct was
being submitted at the same time as the cancellation occurred.  The end
result is the same as if the submission had been slightly later: The
work is on the queue and it will run.  If the caller can guarantee that 
the work is not in the process of being submitted (as described in the 
kerneldoc) then this situation will never arise.

In the case of cancel_delayed_work, things are more complicated.  If
the cancellation had occurred a little earlier, it would have
deactivated the timer.  If it had occurred a little later, it would
have removed the item from the workqueue.  But since it arrived at
exactly the wrong time -- while the timer routine is enqueuing the work
-- there's nothing it can do.  The caller has to cope as best he can.

For runtime PM this isn't a big issue.  The only delayed work we have
is a delayed autosuspend request.  These things get cancelled when:

	pm_runtime_suspend runs synchronously.  That happens in
	process context so we're okay.

	pm_runtime_suspend_atomic (not yet written!) is called.
	If the cancellation fails, we'll have to return an error.
	The suspend will happen later, when the work item runs.
	Ultimately, the best we can do is recommend that people
	don't mix pm_request_suspend with pm_runtime_suspend_atomic.


Which reminds me...  The way you've got things set up, 
pm_runtime_put_atomic queues an idle notification, right?  That's 
a little inconsistent with the naming of the other routines.

Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
that can safely be called in an atomic context -- it implies that it
will call the runtime_notify callback while holding the spinlock.  The
routine to queue an idle-notify request should be called something like
pm_request_put -- although that name isn't so great because it sounds 
like the put gets deferred instead of the notification.

Alan Stern



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29  0:15                         ` Rafael J. Wysocki
  2009-06-29  3:05                           ` Alan Stern
@ 2009-06-29  3:05                           ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29  3:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > The original version of __cancel_work_timer is not safe to use
> > in_interrupt.  If it is called from a handler whose IRQ interrupted
> > delayed_work_timer_fn, it can loop indefinitely.
> 
> Right, I overlooked that.
> 
> > Therefore I added a check; if it finds that the work_struct is currently
> > being enqueued and it is running in_interrupt, it gives up right away.
> 
> Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
> the work to be queued and run?

Yes.  That's better than leaving the work queued but with the "pending" 
flag cleared.  :-)

>  Out of couriosity, what the caller is supposed
> to do then?

In the case of cancel_work, this is a simple race.  The work_struct was
being submitted at the same time as the cancellation occurred.  The end
result is the same as if the submission had been slightly later: The
work is on the queue and it will run.  If the caller can guarantee that 
the work is not in the process of being submitted (as described in the 
kerneldoc) then this situation will never arise.

In the case of cancel_delayed_work, things are more complicated.  If
the cancellation had occurred a little earlier, it would have
deactivated the timer.  If it had occurred a little later, it would
have removed the item from the workqueue.  But since it arrived at
exactly the wrong time -- while the timer routine is enqueuing the work
-- there's nothing it can do.  The caller has to cope as best he can.

For runtime PM this isn't a big issue.  The only delayed work we have
is a delayed autosuspend request.  These things get cancelled when:

	pm_runtime_suspend runs synchronously.  That happens in
	process context so we're okay.

	pm_runtime_suspend_atomic (not yet written!) is called.
	If the cancellation fails, we'll have to return an error.
	The suspend will happen later, when the work item runs.
	Ultimately, the best we can do is recommend that people
	don't mix pm_request_suspend with pm_runtime_suspend_atomic.


Which reminds me...  The way you've got things set up, 
pm_runtime_put_atomic queues an idle notification, right?  That's 
a little inconsistent with the naming of the other routines.

Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
that can safely be called in an atomic context -- it implies that it
will call the runtime_notify callback while holding the spinlock.  The
routine to queue an idle-notify request should be called something like
pm_request_put -- although that name isn't so great because it sounds 
like the put gets deferred instead of the notification.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29  3:05                           ` Alan Stern
  2009-06-29 14:09                             ` Rafael J. Wysocki
@ 2009-06-29 14:09                             ` Rafael J. Wysocki
  2009-06-29 14:29                               ` Alan Stern
  2009-06-29 14:29                               ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 14:09 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > The original version of __cancel_work_timer is not safe to use
> > > in_interrupt.  If it is called from a handler whose IRQ interrupted
> > > delayed_work_timer_fn, it can loop indefinitely.
> > 
> > Right, I overlooked that.
> > 
> > > Therefore I added a check; if it finds that the work_struct is currently
> > > being enqueued and it is running in_interrupt, it gives up right away.
> > 
> > Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
> > the work to be queued and run?
> 
> Yes.  That's better than leaving the work queued but with the "pending" 
> flag cleared.  :-)

So there still is a problem, I'm afraid (details below).

> >  Out of couriosity, what the caller is supposed
> > to do then?
> 
> In the case of cancel_work, this is a simple race.  The work_struct was
> being submitted at the same time as the cancellation occurred.  The end
> result is the same as if the submission had been slightly later: The
> work is on the queue and it will run.  If the caller can guarantee that 
> the work is not in the process of being submitted (as described in the 
> kerneldoc) then this situation will never arise.
> 
> In the case of cancel_delayed_work, things are more complicated.  If
> the cancellation had occurred a little earlier, it would have
> deactivated the timer.  If it had occurred a little later, it would
> have removed the item from the workqueue.  But since it arrived at
> exactly the wrong time -- while the timer routine is enqueuing the work
> -- there's nothing it can do.  The caller has to cope as best he can.
> 
> For runtime PM this isn't a big issue.  The only delayed work we have
> is a delayed autosuspend request.  These things get cancelled when:
> 
> 	pm_runtime_suspend runs synchronously.  That happens in
> 	process context so we're okay.
> 
> 	pm_runtime_suspend_atomic (not yet written!) is called.

This is going to be added in a separate patch.

> 	If the cancellation fails, we'll have to return an error.
> 	The suspend will happen later, when the work item runs.
> 	Ultimately, the best we can do is recommend that people
> 	don't mix pm_request_suspend with pm_runtime_suspend_atomic.

Well, not only in that cases and in fact this is where the actual problem is.

Namely, pm_request_suspend() and pm_request_resume() have to cancel any
pending requests in a reliable way so that the work struct can be used safely
after they've returned.

Assume for example that there's a suspend request pending while
pm_request_resume() is being called.  pm_request_resume() uses
cancel_delayed_work() to kill off the request, but that's in interrupt and it
happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
which attempts to queue up an idle notification request before the
delayed suspend request has a chance to run and bad things happen.

So, it seems, pm_request_resume() can't kill suspend requests by itself
and instead it has to queue up resume requests for this purpose, which
brings us right back to the problem of two requests queued up at a time
(a delayed suspend request and a resume request that is supposed to cancel it).

Nevertheless, using your workqueue patch we can still simplify things quite a
bit, so I think it's worth doing anyway.

> Which reminds me...  The way you've got things set up, 
> pm_runtime_put_atomic queues an idle notification, right?  That's 
> a little inconsistent with the naming of the other routines.
> 
> Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> that can safely be called in an atomic context -- it implies that it
> will call the runtime_notify callback while holding the spinlock.  The
> routine to queue an idle-notify request should be called something like
> pm_request_put -- although that name isn't so great because it sounds 
> like the put gets deferred instead of the notification.

There can be pm_request_put() and pm_request_put_sync(), for example.
Or pm_request_put_async() and pm_request_put(), depending on which version is
going to be used more often.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29  3:05                           ` Alan Stern
@ 2009-06-29 14:09                             ` Rafael J. Wysocki
  2009-06-29 14:09                             ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 14:09 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > The original version of __cancel_work_timer is not safe to use
> > > in_interrupt.  If it is called from a handler whose IRQ interrupted
> > > delayed_work_timer_fn, it can loop indefinitely.
> > 
> > Right, I overlooked that.
> > 
> > > Therefore I added a check; if it finds that the work_struct is currently
> > > being enqueued and it is running in_interrupt, it gives up right away.
> > 
> > Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
> > the work to be queued and run?
> 
> Yes.  That's better than leaving the work queued but with the "pending" 
> flag cleared.  :-)

So there still is a problem, I'm afraid (details below).

> >  Out of couriosity, what the caller is supposed
> > to do then?
> 
> In the case of cancel_work, this is a simple race.  The work_struct was
> being submitted at the same time as the cancellation occurred.  The end
> result is the same as if the submission had been slightly later: The
> work is on the queue and it will run.  If the caller can guarantee that 
> the work is not in the process of being submitted (as described in the 
> kerneldoc) then this situation will never arise.
> 
> In the case of cancel_delayed_work, things are more complicated.  If
> the cancellation had occurred a little earlier, it would have
> deactivated the timer.  If it had occurred a little later, it would
> have removed the item from the workqueue.  But since it arrived at
> exactly the wrong time -- while the timer routine is enqueuing the work
> -- there's nothing it can do.  The caller has to cope as best he can.
> 
> For runtime PM this isn't a big issue.  The only delayed work we have
> is a delayed autosuspend request.  These things get cancelled when:
> 
> 	pm_runtime_suspend runs synchronously.  That happens in
> 	process context so we're okay.
> 
> 	pm_runtime_suspend_atomic (not yet written!) is called.

This is going to be added in a separate patch.

> 	If the cancellation fails, we'll have to return an error.
> 	The suspend will happen later, when the work item runs.
> 	Ultimately, the best we can do is recommend that people
> 	don't mix pm_request_suspend with pm_runtime_suspend_atomic.

Well, not only in that cases and in fact this is where the actual problem is.

Namely, pm_request_suspend() and pm_request_resume() have to cancel any
pending requests in a reliable way so that the work struct can be used safely
after they've returned.

Assume for example that there's a suspend request pending while
pm_request_resume() is being called.  pm_request_resume() uses
cancel_delayed_work() to kill off the request, but that's in interrupt and it
happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
which attempts to queue up an idle notification request before the
delayed suspend request has a chance to run and bad things happen.

So, it seems, pm_request_resume() can't kill suspend requests by itself
and instead it has to queue up resume requests for this purpose, which
brings us right back to the problem of two requests queued up at a time
(a delayed suspend request and a resume request that is supposed to cancel it).

Nevertheless, using your workqueue patch we can still simplify things quite a
bit, so I think it's worth doing anyway.

> Which reminds me...  The way you've got things set up, 
> pm_runtime_put_atomic queues an idle notification, right?  That's 
> a little inconsistent with the naming of the other routines.
> 
> Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> that can safely be called in an atomic context -- it implies that it
> will call the runtime_notify callback while holding the spinlock.  The
> routine to queue an idle-notify request should be called something like
> pm_request_put -- although that name isn't so great because it sounds 
> like the put gets deferred instead of the notification.

There can be pm_request_put() and pm_request_put_sync(), for example.
Or pm_request_put_async() and pm_request_put(), depending on which version is
going to be used more often.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 14:09                             ` Rafael J. Wysocki
@ 2009-06-29 14:29                               ` Alan Stern
  2009-06-29 14:54                                 ` Rafael J. Wysocki
  2009-06-29 14:54                                 ` Rafael J. Wysocki
  2009-06-29 14:29                               ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 14:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> Well, not only in that cases and in fact this is where the actual problem is.
> 
> Namely, pm_request_suspend() and pm_request_resume() have to cancel any
> pending requests in a reliable way so that the work struct can be used safely
> after they've returned.

Right.

> Assume for example that there's a suspend request pending while
> pm_request_resume() is being called.  pm_request_resume() uses
> cancel_delayed_work() to kill off the request, but that's in interrupt and it
> happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
> which attempts to queue up an idle notification request before the
> delayed suspend request has a chance to run and bad things happen.
> 
> So, it seems, pm_request_resume() can't kill suspend requests by itself
> and instead it has to queue up resume requests for this purpose, which
> brings us right back to the problem of two requests queued up at a time
> (a delayed suspend request and a resume request that is supposed to cancel it).

No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
suspend request is pending) then rpm_request_resume doesn't need to do 
anything.  The device is already resumed!  Sure, it can try to kill the 
request and change the state to RPM_ACTIVE, but it doesn't need to.

Think about it.  Even if the suspend request were killed off, there's 
always the possibility that someone could call rpm_runtime_suspend 
right afterward.  If the driver really wants to resume the device and 
prevent it from suspending again, then the driver should call 
pm_runtime_get before pm_request_resume.  Then it won't matter if the 
suspend request runs.

> Nevertheless, using your workqueue patch we can still simplify things quite a
> bit, so I think it's worth doing anyway.

Me too.  :-)

> > Which reminds me...  The way you've got things set up, 
> > pm_runtime_put_atomic queues an idle notification, right?  That's 
> > a little inconsistent with the naming of the other routines.
> > 
> > Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> > that can safely be called in an atomic context -- it implies that it
> > will call the runtime_notify callback while holding the spinlock.  The
> > routine to queue an idle-notify request should be called something like
> > pm_request_put -- although that name isn't so great because it sounds 
> > like the put gets deferred instead of the notification.
> 
> There can be pm_request_put() and pm_request_put_sync(), for example.
> Or pm_request_put_async() and pm_request_put(), depending on which version is
> going to be used more often.

I don't follow you.  We only need one version of pm_request_put.  Did 
you mean "pm_runtime_put" and "pm_runtime_put_async"?  That would make 
sense.

If you use that (instead of pm_request_put) then would you want to
similarly rename pm_request_resume and pm_request_suspend to
pm_runtime_resume_async and pm_runtime_suspend_async?

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 14:09                             ` Rafael J. Wysocki
  2009-06-29 14:29                               ` Alan Stern
@ 2009-06-29 14:29                               ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 14:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> Well, not only in that cases and in fact this is where the actual problem is.
> 
> Namely, pm_request_suspend() and pm_request_resume() have to cancel any
> pending requests in a reliable way so that the work struct can be used safely
> after they've returned.

Right.

> Assume for example that there's a suspend request pending while
> pm_request_resume() is being called.  pm_request_resume() uses
> cancel_delayed_work() to kill off the request, but that's in interrupt and it
> happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
> which attempts to queue up an idle notification request before the
> delayed suspend request has a chance to run and bad things happen.
> 
> So, it seems, pm_request_resume() can't kill suspend requests by itself
> and instead it has to queue up resume requests for this purpose, which
> brings us right back to the problem of two requests queued up at a time
> (a delayed suspend request and a resume request that is supposed to cancel it).

No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
suspend request is pending) then rpm_request_resume doesn't need to do 
anything.  The device is already resumed!  Sure, it can try to kill the 
request and change the state to RPM_ACTIVE, but it doesn't need to.

Think about it.  Even if the suspend request were killed off, there's 
always the possibility that someone could call rpm_runtime_suspend 
right afterward.  If the driver really wants to resume the device and 
prevent it from suspending again, then the driver should call 
pm_runtime_get before pm_request_resume.  Then it won't matter if the 
suspend request runs.

> Nevertheless, using your workqueue patch we can still simplify things quite a
> bit, so I think it's worth doing anyway.

Me too.  :-)

> > Which reminds me...  The way you've got things set up, 
> > pm_runtime_put_atomic queues an idle notification, right?  That's 
> > a little inconsistent with the naming of the other routines.
> > 
> > Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> > that can safely be called in an atomic context -- it implies that it
> > will call the runtime_notify callback while holding the spinlock.  The
> > routine to queue an idle-notify request should be called something like
> > pm_request_put -- although that name isn't so great because it sounds 
> > like the put gets deferred instead of the notification.
> 
> There can be pm_request_put() and pm_request_put_sync(), for example.
> Or pm_request_put_async() and pm_request_put(), depending on which version is
> going to be used more often.

I don't follow you.  We only need one version of pm_request_put.  Did 
you mean "pm_runtime_put" and "pm_runtime_put_async"?  That would make 
sense.

If you use that (instead of pm_request_put) then would you want to
similarly rename pm_request_resume and pm_request_suspend to
pm_runtime_resume_async and pm_runtime_suspend_async?

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 14:29                               ` Alan Stern
  2009-06-29 14:54                                 ` Rafael J. Wysocki
@ 2009-06-29 14:54                                 ` Rafael J. Wysocki
  2009-06-29 15:27                                   ` Alan Stern
  2009-06-29 15:27                                   ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 14:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Well, not only in that cases and in fact this is where the actual problem is.
> > 
> > Namely, pm_request_suspend() and pm_request_resume() have to cancel any
> > pending requests in a reliable way so that the work struct can be used safely
> > after they've returned.
> 
> Right.
> 
> > Assume for example that there's a suspend request pending while
> > pm_request_resume() is being called.  pm_request_resume() uses
> > cancel_delayed_work() to kill off the request, but that's in interrupt and it
> > happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
> > which attempts to queue up an idle notification request before the
> > delayed suspend request has a chance to run and bad things happen.
> > 
> > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > and instead it has to queue up resume requests for this purpose, which
> > brings us right back to the problem of two requests queued up at a time
> > (a delayed suspend request and a resume request that is supposed to cancel it).
> 
> No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> suspend request is pending) then rpm_request_resume doesn't need to do 
> anything.  The device is already resumed!  Sure, it can try to kill the 
> request and change the state to RPM_ACTIVE, but it doesn't need to.

I think it does need to do that, because the reuqest may be scheduled way
in the future and we can't preserve its work structure until it runs.
pm_request_resume() doesn't know in advance when the suspend work function is
going to be queued up and run.

> Think about it.  Even if the suspend request were killed off, there's 
> always the possibility that someone could call rpm_runtime_suspend 
> right afterward.  If the driver really wants to resume the device and 
> prevent it from suspending again, then the driver should call 
> pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> suspend request runs.

No, it doesn't matter if the request runs, but it does matter if the work
structure used for queuing it up may be used for another purpose. :-)

> > Nevertheless, using your workqueue patch we can still simplify things quite a
> > bit, so I think it's worth doing anyway.
> 
> Me too.  :-)
> 
> > > Which reminds me...  The way you've got things set up, 
> > > pm_runtime_put_atomic queues an idle notification, right?  That's 
> > > a little inconsistent with the naming of the other routines.
> > > 
> > > Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> > > that can safely be called in an atomic context -- it implies that it
> > > will call the runtime_notify callback while holding the spinlock.  The
> > > routine to queue an idle-notify request should be called something like
> > > pm_request_put -- although that name isn't so great because it sounds 
> > > like the put gets deferred instead of the notification.
> > 
> > There can be pm_request_put() and pm_request_put_sync(), for example.
> > Or pm_request_put_async() and pm_request_put(), depending on which version is
> > going to be used more often.
> 
> I don't follow you.  We only need one version of pm_request_put.  Did 
> you mean "pm_runtime_put" and "pm_runtime_put_async"?  That would make 
> sense.

Yes, I did, sorry.

> If you use that (instead of pm_request_put) then would you want to
> similarly rename pm_request_resume and pm_request_suspend to
> pm_runtime_resume_async and pm_runtime_suspend_async?

Well, I think the pm_request_[suspend|resume] names are better. :-)

The problem with pm_<something>_put is that it does two things at a time,
decrements the resume counter and runs or queues up an idle notification.
Perhaps it's a good idea to call it after the second thing and change
pm_runtime_get() to pm_runtime_inuse(), so that we have:

* pm_runtime_inuse() - increment the resume counter
* pm_runtime_idle() - decrement the resume counter and run idle notification
* pm_request_idle()  - decrement the resume counter and queue idle notification

and __pm_runtime_idle() as the "bare" idle notification function?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 14:29                               ` Alan Stern
@ 2009-06-29 14:54                                 ` Rafael J. Wysocki
  2009-06-29 14:54                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 14:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Well, not only in that cases and in fact this is where the actual problem is.
> > 
> > Namely, pm_request_suspend() and pm_request_resume() have to cancel any
> > pending requests in a reliable way so that the work struct can be used safely
> > after they've returned.
> 
> Right.
> 
> > Assume for example that there's a suspend request pending while
> > pm_request_resume() is being called.  pm_request_resume() uses
> > cancel_delayed_work() to kill off the request, but that's in interrupt and it
> > happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
> > which attempts to queue up an idle notification request before the
> > delayed suspend request has a chance to run and bad things happen.
> > 
> > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > and instead it has to queue up resume requests for this purpose, which
> > brings us right back to the problem of two requests queued up at a time
> > (a delayed suspend request and a resume request that is supposed to cancel it).
> 
> No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> suspend request is pending) then rpm_request_resume doesn't need to do 
> anything.  The device is already resumed!  Sure, it can try to kill the 
> request and change the state to RPM_ACTIVE, but it doesn't need to.

I think it does need to do that, because the reuqest may be scheduled way
in the future and we can't preserve its work structure until it runs.
pm_request_resume() doesn't know in advance when the suspend work function is
going to be queued up and run.

> Think about it.  Even if the suspend request were killed off, there's 
> always the possibility that someone could call rpm_runtime_suspend 
> right afterward.  If the driver really wants to resume the device and 
> prevent it from suspending again, then the driver should call 
> pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> suspend request runs.

No, it doesn't matter if the request runs, but it does matter if the work
structure used for queuing it up may be used for another purpose. :-)

> > Nevertheless, using your workqueue patch we can still simplify things quite a
> > bit, so I think it's worth doing anyway.
> 
> Me too.  :-)
> 
> > > Which reminds me...  The way you've got things set up, 
> > > pm_runtime_put_atomic queues an idle notification, right?  That's 
> > > a little inconsistent with the naming of the other routines.
> > > 
> > > Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> > > that can safely be called in an atomic context -- it implies that it
> > > will call the runtime_notify callback while holding the spinlock.  The
> > > routine to queue an idle-notify request should be called something like
> > > pm_request_put -- although that name isn't so great because it sounds 
> > > like the put gets deferred instead of the notification.
> > 
> > There can be pm_request_put() and pm_request_put_sync(), for example.
> > Or pm_request_put_async() and pm_request_put(), depending on which version is
> > going to be used more often.
> 
> I don't follow you.  We only need one version of pm_request_put.  Did 
> you mean "pm_runtime_put" and "pm_runtime_put_async"?  That would make 
> sense.

Yes, I did, sorry.

> If you use that (instead of pm_request_put) then would you want to
> similarly rename pm_request_resume and pm_request_suspend to
> pm_runtime_resume_async and pm_runtime_suspend_async?

Well, I think the pm_request_[suspend|resume] names are better. :-)

The problem with pm_<something>_put is that it does two things at a time,
decrements the resume counter and runs or queues up an idle notification.
Perhaps it's a good idea to call it after the second thing and change
pm_runtime_get() to pm_runtime_inuse(), so that we have:

* pm_runtime_inuse() - increment the resume counter
* pm_runtime_idle() - decrement the resume counter and run idle notification
* pm_request_idle()  - decrement the resume counter and queue idle notification

and __pm_runtime_idle() as the "bare" idle notification function?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 14:54                                 ` Rafael J. Wysocki
  2009-06-29 15:27                                   ` Alan Stern
@ 2009-06-29 15:27                                   ` Alan Stern
  2009-06-29 15:55                                     ` Rafael J. Wysocki
  2009-06-29 15:55                                     ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 15:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > > and instead it has to queue up resume requests for this purpose, which
> > > brings us right back to the problem of two requests queued up at a time
> > > (a delayed suspend request and a resume request that is supposed to cancel it).
> > 
> > No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> > suspend request is pending) then rpm_request_resume doesn't need to do 
> > anything.  The device is already resumed!  Sure, it can try to kill the 
> > request and change the state to RPM_ACTIVE, but it doesn't need to.
> 
> I think it does need to do that, because the reuqest may be scheduled way
> in the future and we can't preserve its work structure until it runs.
> pm_request_resume() doesn't know in advance when the suspend work function is
> going to be queued up and run.

It doesn't need to know.  All it needs to do is guarantee that the
device will be in a resumed state some time not long after the function
returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
is like calling it while the status is RPM_ACTIVE.  In neither case
does it have to do anything, because the device will already be resumed
when it returns.

Perhaps instead we should provide a way to kill a pending suspend
request?  It's not clear that anyone would need this.  The only reason
I can think of is if you wanted to change the timeout duration.  But it
wouldn't be able to run in interrupt context.

> > Think about it.  Even if the suspend request were killed off, there's 
> > always the possibility that someone could call rpm_runtime_suspend 
> > right afterward.  If the driver really wants to resume the device and 
> > prevent it from suspending again, then the driver should call 
> > pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> > suspend request runs.
> 
> No, it doesn't matter if the request runs, but it does matter if the work
> structure used for queuing it up may be used for another purpose. :-)

What else would it be used for?  If rpm_request_resume returns without 
doing anything and leaves the status set to RPM_IDLE, then the work 
structure won't be reused until the status changes.


> The problem with pm_<something>_put is that it does two things at a time,
> decrements the resume counter and runs or queues up an idle notification.
> Perhaps it's a good idea to call it after the second thing and change
> pm_runtime_get() to pm_runtime_inuse(), so that we have:
> 
> * pm_runtime_inuse() - increment the resume counter
> * pm_runtime_idle() - decrement the resume counter and run idle notification
> * pm_request_idle()  - decrement the resume counter and queue idle notification
> 
> and __pm_runtime_idle() as the "bare" idle notification function?

I could live with that, but the nice thing about "get" and "put" is
that they directly suggest a counter is being maintained and therefore
the calls have to balance.  Maybe we should just call it 
rpm_request_put and not worry that the put happens immediately.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 14:54                                 ` Rafael J. Wysocki
@ 2009-06-29 15:27                                   ` Alan Stern
  2009-06-29 15:27                                   ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 15:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > > and instead it has to queue up resume requests for this purpose, which
> > > brings us right back to the problem of two requests queued up at a time
> > > (a delayed suspend request and a resume request that is supposed to cancel it).
> > 
> > No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> > suspend request is pending) then rpm_request_resume doesn't need to do 
> > anything.  The device is already resumed!  Sure, it can try to kill the 
> > request and change the state to RPM_ACTIVE, but it doesn't need to.
> 
> I think it does need to do that, because the reuqest may be scheduled way
> in the future and we can't preserve its work structure until it runs.
> pm_request_resume() doesn't know in advance when the suspend work function is
> going to be queued up and run.

It doesn't need to know.  All it needs to do is guarantee that the
device will be in a resumed state some time not long after the function
returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
is like calling it while the status is RPM_ACTIVE.  In neither case
does it have to do anything, because the device will already be resumed
when it returns.

Perhaps instead we should provide a way to kill a pending suspend
request?  It's not clear that anyone would need this.  The only reason
I can think of is if you wanted to change the timeout duration.  But it
wouldn't be able to run in interrupt context.

> > Think about it.  Even if the suspend request were killed off, there's 
> > always the possibility that someone could call rpm_runtime_suspend 
> > right afterward.  If the driver really wants to resume the device and 
> > prevent it from suspending again, then the driver should call 
> > pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> > suspend request runs.
> 
> No, it doesn't matter if the request runs, but it does matter if the work
> structure used for queuing it up may be used for another purpose. :-)

What else would it be used for?  If rpm_request_resume returns without 
doing anything and leaves the status set to RPM_IDLE, then the work 
structure won't be reused until the status changes.


> The problem with pm_<something>_put is that it does two things at a time,
> decrements the resume counter and runs or queues up an idle notification.
> Perhaps it's a good idea to call it after the second thing and change
> pm_runtime_get() to pm_runtime_inuse(), so that we have:
> 
> * pm_runtime_inuse() - increment the resume counter
> * pm_runtime_idle() - decrement the resume counter and run idle notification
> * pm_request_idle()  - decrement the resume counter and queue idle notification
> 
> and __pm_runtime_idle() as the "bare" idle notification function?

I could live with that, but the nice thing about "get" and "put" is
that they directly suggest a counter is being maintained and therefore
the calls have to balance.  Maybe we should just call it 
rpm_request_put and not worry that the put happens immediately.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 15:27                                   ` Alan Stern
  2009-06-29 15:55                                     ` Rafael J. Wysocki
@ 2009-06-29 15:55                                     ` Rafael J. Wysocki
  2009-06-29 16:10                                       ` Alan Stern
  1 sibling, 1 reply; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 15:55 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > > > and instead it has to queue up resume requests for this purpose, which
> > > > brings us right back to the problem of two requests queued up at a time
> > > > (a delayed suspend request and a resume request that is supposed to cancel it).
> > > 
> > > No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> > > suspend request is pending) then rpm_request_resume doesn't need to do 
> > > anything.  The device is already resumed!  Sure, it can try to kill the 
> > > request and change the state to RPM_ACTIVE, but it doesn't need to.
> > 
> > I think it does need to do that, because the reuqest may be scheduled way
> > in the future and we can't preserve its work structure until it runs.
> > pm_request_resume() doesn't know in advance when the suspend work function is
> > going to be queued up and run.
> 
> It doesn't need to know.  All it needs to do is guarantee that the
> device will be in a resumed state some time not long after the function
> returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> is like calling it while the status is RPM_ACTIVE.  In neither case
> does it have to do anything, because the device will already be resumed
> when it returns.

Not exactly, because RPM_IDLE prevents idle notifications from being run,
as it means a suspend has already been requested, which is not really the
case after pm_request_resume().

> Perhaps instead we should provide a way to kill a pending suspend
> request?  It's not clear that anyone would need this.  The only reason
> I can think of is if you wanted to change the timeout duration.  But it
> wouldn't be able to run in interrupt context.
> 
> > > Think about it.  Even if the suspend request were killed off, there's 
> > > always the possibility that someone could call rpm_runtime_suspend 
> > > right afterward.  If the driver really wants to resume the device and 
> > > prevent it from suspending again, then the driver should call 
> > > pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> > > suspend request runs.
> > 
> > No, it doesn't matter if the request runs, but it does matter if the work
> > structure used for queuing it up may be used for another purpose. :-)
> 
> What else would it be used for?  If rpm_request_resume returns without 
> doing anything and leaves the status set to RPM_IDLE, then the work 
> structure won't be reused until the status changes.

Which is not right, because we may want to run ->runtime_idle() before
the status is changed.

That's why I think pm_request_resume() should queue up a resume request if
a suspend request is pending.

> > The problem with pm_<something>_put is that it does two things at a time,
> > decrements the resume counter and runs or queues up an idle notification.
> > Perhaps it's a good idea to call it after the second thing and change
> > pm_runtime_get() to pm_runtime_inuse(), so that we have:
> > 
> > * pm_runtime_inuse() - increment the resume counter
> > * pm_runtime_idle() - decrement the resume counter and run idle notification
> > * pm_request_idle()  - decrement the resume counter and queue idle notification
> > 
> > and __pm_runtime_idle() as the "bare" idle notification function?
> 
> I could live with that, but the nice thing about "get" and "put" is
> that they directly suggest a counter is being maintained and therefore
> the calls have to balance.  Maybe we should just call it 
> rpm_request_put and not worry that the put happens immediately.

OK

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 15:27                                   ` Alan Stern
@ 2009-06-29 15:55                                     ` Rafael J. Wysocki
  2009-06-29 15:55                                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 15:55 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > > > and instead it has to queue up resume requests for this purpose, which
> > > > brings us right back to the problem of two requests queued up at a time
> > > > (a delayed suspend request and a resume request that is supposed to cancel it).
> > > 
> > > No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> > > suspend request is pending) then rpm_request_resume doesn't need to do 
> > > anything.  The device is already resumed!  Sure, it can try to kill the 
> > > request and change the state to RPM_ACTIVE, but it doesn't need to.
> > 
> > I think it does need to do that, because the reuqest may be scheduled way
> > in the future and we can't preserve its work structure until it runs.
> > pm_request_resume() doesn't know in advance when the suspend work function is
> > going to be queued up and run.
> 
> It doesn't need to know.  All it needs to do is guarantee that the
> device will be in a resumed state some time not long after the function
> returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> is like calling it while the status is RPM_ACTIVE.  In neither case
> does it have to do anything, because the device will already be resumed
> when it returns.

Not exactly, because RPM_IDLE prevents idle notifications from being run,
as it means a suspend has already been requested, which is not really the
case after pm_request_resume().

> Perhaps instead we should provide a way to kill a pending suspend
> request?  It's not clear that anyone would need this.  The only reason
> I can think of is if you wanted to change the timeout duration.  But it
> wouldn't be able to run in interrupt context.
> 
> > > Think about it.  Even if the suspend request were killed off, there's 
> > > always the possibility that someone could call rpm_runtime_suspend 
> > > right afterward.  If the driver really wants to resume the device and 
> > > prevent it from suspending again, then the driver should call 
> > > pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> > > suspend request runs.
> > 
> > No, it doesn't matter if the request runs, but it does matter if the work
> > structure used for queuing it up may be used for another purpose. :-)
> 
> What else would it be used for?  If rpm_request_resume returns without 
> doing anything and leaves the status set to RPM_IDLE, then the work 
> structure won't be reused until the status changes.

Which is not right, because we may want to run ->runtime_idle() before
the status is changed.

That's why I think pm_request_resume() should queue up a resume request if
a suspend request is pending.

> > The problem with pm_<something>_put is that it does two things at a time,
> > decrements the resume counter and runs or queues up an idle notification.
> > Perhaps it's a good idea to call it after the second thing and change
> > pm_runtime_get() to pm_runtime_inuse(), so that we have:
> > 
> > * pm_runtime_inuse() - increment the resume counter
> > * pm_runtime_idle() - decrement the resume counter and run idle notification
> > * pm_request_idle()  - decrement the resume counter and queue idle notification
> > 
> > and __pm_runtime_idle() as the "bare" idle notification function?
> 
> I could live with that, but the nice thing about "get" and "put" is
> that they directly suggest a counter is being maintained and therefore
> the calls have to balance.  Maybe we should just call it 
> rpm_request_put and not worry that the put happens immediately.

OK

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 15:55                                     ` Rafael J. Wysocki
@ 2009-06-29 16:10                                       ` Alan Stern
  2009-06-29 16:39                                         ` Rafael J. Wysocki
  2009-06-29 16:39                                         ` Rafael J. Wysocki
  0 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 16:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > It doesn't need to know.  All it needs to do is guarantee that the
> > device will be in a resumed state some time not long after the function
> > returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> > is like calling it while the status is RPM_ACTIVE.  In neither case
> > does it have to do anything, because the device will already be resumed
> > when it returns.
> 
> Not exactly, because RPM_IDLE prevents idle notifications from being run,
> as it means a suspend has already been requested, which is not really the
> case after pm_request_resume().

Yes it is.  A delayed suspend and an immediate resume have _both_ been
requested.  We are within our rights to say that the resume request 
gets fulfilled immediately (by doing nothing) and the suspend request 
will be fulfilled later.

> > > No, it doesn't matter if the request runs, but it does matter if the work
> > > structure used for queuing it up may be used for another purpose. :-)
> > 
> > What else would it be used for?  If rpm_request_resume returns without 
> > doing anything and leaves the status set to RPM_IDLE, then the work 
> > structure won't be reused until the status changes.
> 
> Which is not right, because we may want to run ->runtime_idle() before
> the status is changed.

If the status is RPM_IDLE then there's already a suspend request
queued.  So what reason is there for sending idle notifications?  The 
whole point of idle notifications is to let the driver know that it 
might want to initiate a suspend -- but one has already been initiated.

> That's why I think pm_request_resume() should queue up a resume request if
> a suspend request is pending.

Surely you don't mean we should suspend the device and then resume it
immediately afterward?  What would be the point?  Just leave the device 
active throughout.

As long as the behavior is documented, I think it will be okay for
pm_request_resume not to cancel a pending suspend request.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 16:10                                       ` Alan Stern
  2009-06-29 16:39                                         ` Rafael J. Wysocki
@ 2009-06-29 16:39                                         ` Rafael J. Wysocki
  2009-06-29 17:29                                           ` Alan Stern
  2009-06-29 17:29                                           ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 16:39 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > It doesn't need to know.  All it needs to do is guarantee that the
> > > device will be in a resumed state some time not long after the function
> > > returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> > > is like calling it while the status is RPM_ACTIVE.  In neither case
> > > does it have to do anything, because the device will already be resumed
> > > when it returns.
> > 
> > Not exactly, because RPM_IDLE prevents idle notifications from being run,
> > as it means a suspend has already been requested, which is not really the
> > case after pm_request_resume().
> 
> Yes it is.  A delayed suspend and an immediate resume have _both_ been
> requested.  We are within our rights to say that the resume request 
> gets fulfilled immediately (by doing nothing) and the suspend request 
> will be fulfilled later.

Theoretically, we are, but practically we want to be able to use
pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
that found the device operational, but that would result in queuing a request
using the same work structure that is used by the pending suspend request.
Don't you see a problem here?

> > > > No, it doesn't matter if the request runs, but it does matter if the work
> > > > structure used for queuing it up may be used for another purpose. :-)
> > > 
> > > What else would it be used for?  If rpm_request_resume returns without 
> > > doing anything and leaves the status set to RPM_IDLE, then the work 
> > > structure won't be reused until the status changes.
> > 
> > Which is not right, because we may want to run ->runtime_idle() before
> > the status is changed.
> 
> If the status is RPM_IDLE then there's already a suspend request
> queued.  So what reason is there for sending idle notifications?  The 
> whole point of idle notifications is to let the driver know that it 
> might want to initiate a suspend -- but one has already been initiated.
> 
> > That's why I think pm_request_resume() should queue up a resume request if
> > a suspend request is pending.
> 
> Surely you don't mean we should suspend the device and then resume it
> immediately afterward?

No, I don't.

> What would be the point?  Just leave the device active throughout.

Absolutely.

> As long as the behavior is documented, I think it will be okay for
> pm_request_resume not to cancel a pending suspend request.

I could agree with that, but what about pm_runtime_resume() happening after
a suspend request has been scheduled?  Should it also ignore the pending
suspend request?

In which case it would be consistent to allow to schedule suspends even though
the resume counter is greater than 0.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 16:10                                       ` Alan Stern
@ 2009-06-29 16:39                                         ` Rafael J. Wysocki
  2009-06-29 16:39                                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 16:39 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > It doesn't need to know.  All it needs to do is guarantee that the
> > > device will be in a resumed state some time not long after the function
> > > returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> > > is like calling it while the status is RPM_ACTIVE.  In neither case
> > > does it have to do anything, because the device will already be resumed
> > > when it returns.
> > 
> > Not exactly, because RPM_IDLE prevents idle notifications from being run,
> > as it means a suspend has already been requested, which is not really the
> > case after pm_request_resume().
> 
> Yes it is.  A delayed suspend and an immediate resume have _both_ been
> requested.  We are within our rights to say that the resume request 
> gets fulfilled immediately (by doing nothing) and the suspend request 
> will be fulfilled later.

Theoretically, we are, but practically we want to be able to use
pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
that found the device operational, but that would result in queuing a request
using the same work structure that is used by the pending suspend request.
Don't you see a problem here?

> > > > No, it doesn't matter if the request runs, but it does matter if the work
> > > > structure used for queuing it up may be used for another purpose. :-)
> > > 
> > > What else would it be used for?  If rpm_request_resume returns without 
> > > doing anything and leaves the status set to RPM_IDLE, then the work 
> > > structure won't be reused until the status changes.
> > 
> > Which is not right, because we may want to run ->runtime_idle() before
> > the status is changed.
> 
> If the status is RPM_IDLE then there's already a suspend request
> queued.  So what reason is there for sending idle notifications?  The 
> whole point of idle notifications is to let the driver know that it 
> might want to initiate a suspend -- but one has already been initiated.
> 
> > That's why I think pm_request_resume() should queue up a resume request if
> > a suspend request is pending.
> 
> Surely you don't mean we should suspend the device and then resume it
> immediately afterward?

No, I don't.

> What would be the point?  Just leave the device active throughout.

Absolutely.

> As long as the behavior is documented, I think it will be okay for
> pm_request_resume not to cancel a pending suspend request.

I could agree with that, but what about pm_runtime_resume() happening after
a suspend request has been scheduled?  Should it also ignore the pending
suspend request?

In which case it would be consistent to allow to schedule suspends even though
the resume counter is greater than 0.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 16:39                                         ` Rafael J. Wysocki
  2009-06-29 17:29                                           ` Alan Stern
@ 2009-06-29 17:29                                           ` Alan Stern
  2009-06-29 18:25                                             ` Rafael J. Wysocki
  2009-06-29 18:25                                             ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 17:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> Theoretically, we are, but practically we want to be able to use
> pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
> that found the device operational, but that would result in queuing a request
> using the same work structure that is used by the pending suspend request.
> Don't you see a problem here?

This is a different situation.  pm_runtime_resume does have the luxury 
of killing the suspend request, and it should do so.

Let's think about it this way.  Why does a driver call
pm_request_resume in the first place?  Because an interrupt handler or
spinlocked region wants to do some I/O, so the device has to
be active.

But when will it do the I/O?  If the device is currently suspended, the
driver can do the I/O at the end of its runtime_resume callback.  But
if the status is RPM_ACTIVE, the callback won't be invoked, so the
interrupt handler will have to do the I/O directly.  The same is true
for RPM_IDLE.

Except for one problem: In RPM_IDLE, a suspend might occur at any time.  
(In theory the same thing could happen in RPM_ACTIVE.)  To prevent
this, the driver can call pm_runtime_get before pm_request_resume.  
When the I/O is all finished, it calls pm_request_put.

If the work routine starts running before the pm_request_put, it will 
see that the counter is positive so it will set the status back to 
RPM_ACTIVE.  Then the put will queue an idle notification.  If the work 
routine hasn't started running before the pm_request_put then the 
status will remain RPM_IDLE all along.

Regardless, it's not necessary for pm_request_resume to kill the 
suspend request.  And even if it did, the driver would still need to 
implement both pathways for doing the I/O.


> > As long as the behavior is documented, I think it will be okay for
> > pm_request_resume not to cancel a pending suspend request.
> 
> I could agree with that, but what about pm_runtime_resume() happening after
> a suspend request has been scheduled?  Should it also ignore the pending
> suspend request?

It could, but probably it shouldn't.

> In which case it would be consistent to allow to schedule suspends even though
> the resume counter is greater than 0.

True enough, although I'm not sure there's a good reason for it.  You 
certainly can increment the resume counter after scheduling a suspend 
request -- the effect would be the same.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 16:39                                         ` Rafael J. Wysocki
@ 2009-06-29 17:29                                           ` Alan Stern
  2009-06-29 17:29                                           ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 17:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> Theoretically, we are, but practically we want to be able to use
> pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
> that found the device operational, but that would result in queuing a request
> using the same work structure that is used by the pending suspend request.
> Don't you see a problem here?

This is a different situation.  pm_runtime_resume does have the luxury 
of killing the suspend request, and it should do so.

Let's think about it this way.  Why does a driver call
pm_request_resume in the first place?  Because an interrupt handler or
spinlocked region wants to do some I/O, so the device has to
be active.

But when will it do the I/O?  If the device is currently suspended, the
driver can do the I/O at the end of its runtime_resume callback.  But
if the status is RPM_ACTIVE, the callback won't be invoked, so the
interrupt handler will have to do the I/O directly.  The same is true
for RPM_IDLE.

Except for one problem: In RPM_IDLE, a suspend might occur at any time.  
(In theory the same thing could happen in RPM_ACTIVE.)  To prevent
this, the driver can call pm_runtime_get before pm_request_resume.  
When the I/O is all finished, it calls pm_request_put.

If the work routine starts running before the pm_request_put, it will 
see that the counter is positive so it will set the status back to 
RPM_ACTIVE.  Then the put will queue an idle notification.  If the work 
routine hasn't started running before the pm_request_put then the 
status will remain RPM_IDLE all along.

Regardless, it's not necessary for pm_request_resume to kill the 
suspend request.  And even if it did, the driver would still need to 
implement both pathways for doing the I/O.


> > As long as the behavior is documented, I think it will be okay for
> > pm_request_resume not to cancel a pending suspend request.
> 
> I could agree with that, but what about pm_runtime_resume() happening after
> a suspend request has been scheduled?  Should it also ignore the pending
> suspend request?

It could, but probably it shouldn't.

> In which case it would be consistent to allow to schedule suspends even though
> the resume counter is greater than 0.

True enough, although I'm not sure there's a good reason for it.  You 
certainly can increment the resume counter after scheduling a suspend 
request -- the effect would be the same.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 17:29                                           ` Alan Stern
  2009-06-29 18:25                                             ` Rafael J. Wysocki
@ 2009-06-29 18:25                                             ` Rafael J. Wysocki
  2009-06-29 19:25                                               ` Alan Stern
  1 sibling, 1 reply; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 18:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Theoretically, we are, but practically we want to be able to use
> > pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
> > that found the device operational, but that would result in queuing a request
> > using the same work structure that is used by the pending suspend request.
> > Don't you see a problem here?
> 
> This is a different situation.  pm_runtime_resume does have the luxury 
> of killing the suspend request, and it should do so.

I should have said pm_request_resume(), sorry.

> Let's think about it this way.  Why does a driver call
> pm_request_resume in the first place?  Because an interrupt handler or
> spinlocked region wants to do some I/O, so the device has to
> be active.

Right.

> But when will it do the I/O?  If the device is currently suspended, the
> driver can do the I/O at the end of its runtime_resume callback.  But
> if the status is RPM_ACTIVE, the callback won't be invoked, so the
> interrupt handler will have to do the I/O directly.  The same is true
> for RPM_IDLE.

I still agree.

> Except for one problem: In RPM_IDLE, a suspend might occur at any time.  
> (In theory the same thing could happen in RPM_ACTIVE.)  To prevent
> this, the driver can call pm_runtime_get before pm_request_resume.  
> When the I/O is all finished, it calls pm_request_put.

At which point pm_request_put() tries to queue up an idle notification that
uses the same work_struct as the pending suspend request.  Not good.

> If the work routine starts running before the pm_request_put, it will 
> see that the counter is positive so it will set the status back to 
> RPM_ACTIVE.  Then the put will queue an idle notification.  If the work 
> routine hasn't started running before the pm_request_put then the 
> status will remain RPM_IDLE all along.
> 
> Regardless, it's not necessary for pm_request_resume to kill the 
> suspend request.  And even if it did, the driver would still need to 
> implement both pathways for doing the I/O.

IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().
Thus, it should either queue up a request to run pm_runtime_resume() or leave
the status as though pm_runtime_resume() ran.  Anything else would be
internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
requests, pm_request_resume() should do the same or the other way around.

Now, arguably, ignoring pending suspend requests is somewhat easier from
the core's point of view, but it may not be so for drivers.

> > > As long as the behavior is documented, I think it will be okay for
> > > pm_request_resume not to cancel a pending suspend request.
> > 
> > I could agree with that, but what about pm_runtime_resume() happening after
> > a suspend request has been scheduled?  Should it also ignore the pending
> > suspend request?
> 
> It could, but probably it shouldn't.

So, IMO, pm_request_resume() shouldn't as well.

> > In which case it would be consistent to allow to schedule suspends even though
> > the resume counter is greater than 0.
> 
> True enough, although I'm not sure there's a good reason for it.  You 
> certainly can increment the resume counter after scheduling a suspend 
> request -- the effect would be the same.

Yes, it would.

My point is that the core should always treat pending suspend requests in the
same way.  If they are canceled by pm_runtime_resume(), then
pm_request_resume() should also cancel them and it shouldn't be possible
to schedule a suspend request when the resume counter is greater than 0.
In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
should also ignore them and there's no point to prevent pm_request_suspend()
from scheduling a suspend request if the resume counter is greater than 0.

Any other type of behavior has a potential to confuse driver writers.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 17:29                                           ` Alan Stern
@ 2009-06-29 18:25                                             ` Rafael J. Wysocki
  2009-06-29 18:25                                             ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 18:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Theoretically, we are, but practically we want to be able to use
> > pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
> > that found the device operational, but that would result in queuing a request
> > using the same work structure that is used by the pending suspend request.
> > Don't you see a problem here?
> 
> This is a different situation.  pm_runtime_resume does have the luxury 
> of killing the suspend request, and it should do so.

I should have said pm_request_resume(), sorry.

> Let's think about it this way.  Why does a driver call
> pm_request_resume in the first place?  Because an interrupt handler or
> spinlocked region wants to do some I/O, so the device has to
> be active.

Right.

> But when will it do the I/O?  If the device is currently suspended, the
> driver can do the I/O at the end of its runtime_resume callback.  But
> if the status is RPM_ACTIVE, the callback won't be invoked, so the
> interrupt handler will have to do the I/O directly.  The same is true
> for RPM_IDLE.

I still agree.

> Except for one problem: In RPM_IDLE, a suspend might occur at any time.  
> (In theory the same thing could happen in RPM_ACTIVE.)  To prevent
> this, the driver can call pm_runtime_get before pm_request_resume.  
> When the I/O is all finished, it calls pm_request_put.

At which point pm_request_put() tries to queue up an idle notification that
uses the same work_struct as the pending suspend request.  Not good.

> If the work routine starts running before the pm_request_put, it will 
> see that the counter is positive so it will set the status back to 
> RPM_ACTIVE.  Then the put will queue an idle notification.  If the work 
> routine hasn't started running before the pm_request_put then the 
> status will remain RPM_IDLE all along.
> 
> Regardless, it's not necessary for pm_request_resume to kill the 
> suspend request.  And even if it did, the driver would still need to 
> implement both pathways for doing the I/O.

IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().
Thus, it should either queue up a request to run pm_runtime_resume() or leave
the status as though pm_runtime_resume() ran.  Anything else would be
internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
requests, pm_request_resume() should do the same or the other way around.

Now, arguably, ignoring pending suspend requests is somewhat easier from
the core's point of view, but it may not be so for drivers.

> > > As long as the behavior is documented, I think it will be okay for
> > > pm_request_resume not to cancel a pending suspend request.
> > 
> > I could agree with that, but what about pm_runtime_resume() happening after
> > a suspend request has been scheduled?  Should it also ignore the pending
> > suspend request?
> 
> It could, but probably it shouldn't.

So, IMO, pm_request_resume() shouldn't as well.

> > In which case it would be consistent to allow to schedule suspends even though
> > the resume counter is greater than 0.
> 
> True enough, although I'm not sure there's a good reason for it.  You 
> certainly can increment the resume counter after scheduling a suspend 
> request -- the effect would be the same.

Yes, it would.

My point is that the core should always treat pending suspend requests in the
same way.  If they are canceled by pm_runtime_resume(), then
pm_request_resume() should also cancel them and it shouldn't be possible
to schedule a suspend request when the resume counter is greater than 0.
In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
should also ignore them and there's no point to prevent pm_request_suspend()
from scheduling a suspend request if the resume counter is greater than 0.

Any other type of behavior has a potential to confuse driver writers.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 18:25                                             ` Rafael J. Wysocki
@ 2009-06-29 19:25                                               ` Alan Stern
  2009-06-29 21:04                                                 ` Rafael J. Wysocki
  2009-06-29 21:04                                                 ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Rafael J. Wysocki
  0 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 19:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().

Normal top halves don't trigger before the circumstances are
appropriate.  For example, if you enable remote wakeup on a USB device,
it won't send a wakeup signal before it has been powered down.  A
driver calling pm_request_resume while the device is still resumed is
like a USB device sending a wakeup request while it is still powered
up.  So IMO the analogy with top halves isn't a good one.

> Thus, it should either queue up a request to run pm_runtime_resume() or leave
> the status as though pm_runtime_resume() ran.  Anything else would be
> internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
> requests, pm_request_resume() should do the same or the other way around.
> 
> Now, arguably, ignoring pending suspend requests is somewhat easier from
> the core's point of view, but it may not be so for drivers.

The argument I gave in the previous email demonstrates that it doesn't
make any difference to drivers.  Either way, they have to use two I/O
pathways, they have to do a pm_runtime_get before pm_request_resume,
and they have to do a pm_request_put after the I/O is done.

Of course, this is all somewhat theoretical.  I still don't know of any 
actual drivers that do the equivalent of pm_request_resume.

> My point is that the core should always treat pending suspend requests in the
> same way.  If they are canceled by pm_runtime_resume(), then
> pm_request_resume() should also cancel them and it shouldn't be possible
> to schedule a suspend request when the resume counter is greater than 0.
> In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
> should also ignore them and there's no point to prevent pm_request_suspend()
> from scheduling a suspend request if the resume counter is greater than 0.
> 
> Any other type of behavior has a potential to confuse driver writers.

Another possible approach you could take when the call to
cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
in addition to RPM_IDLE and leave the suspend request queued.  When
__pm_runtime_suspend sees both flags are set, it should abort and set
the status directly back to RPM_ACTIVE.  At that time the idle
notifications can start up again.

Is this any better?  I can't see how drivers would care, though.

Alan Stern

P.S.: What do you think should happen if there's a delayed suspend
request pending, then pm_request_resume is called (and it leaves the
request queued), and then someone calls pm_runtime_suspend?  You've got
two pending requests and a synchronous call all active at the same
time!

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 19:25                                               ` Alan Stern
@ 2009-06-29 21:04                                                 ` Rafael J. Wysocki
  2009-06-29 22:00                                                   ` Alan Stern
  2009-06-29 21:04                                                 ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Rafael J. Wysocki
  1 sibling, 1 reply; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 21:04 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().
> 
> Normal top halves don't trigger before the circumstances are
> appropriate.  For example, if you enable remote wakeup on a USB device,
> it won't send a wakeup signal before it has been powered down.  A
> driver calling pm_request_resume while the device is still resumed is
> like a USB device sending a wakeup request while it is still powered
> up.  So IMO the analogy with top halves isn't a good one.
> 
> > Thus, it should either queue up a request to run pm_runtime_resume() or leave
> > the status as though pm_runtime_resume() ran.  Anything else would be
> > internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
> > requests, pm_request_resume() should do the same or the other way around.
> > 
> > Now, arguably, ignoring pending suspend requests is somewhat easier from
> > the core's point of view, but it may not be so for drivers.
> 
> The argument I gave in the previous email demonstrates that it doesn't
> make any difference to drivers.  Either way, they have to use two I/O
> pathways, they have to do a pm_runtime_get before pm_request_resume,
> and they have to do a pm_request_put after the I/O is done.
> 
> Of course, this is all somewhat theoretical.  I still don't know of any 
> actual drivers that do the equivalent of pm_request_resume.
> 
> > My point is that the core should always treat pending suspend requests in the
> > same way.  If they are canceled by pm_runtime_resume(), then
> > pm_request_resume() should also cancel them and it shouldn't be possible
> > to schedule a suspend request when the resume counter is greater than 0.
> > In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
> > should also ignore them and there's no point to prevent pm_request_suspend()
> > from scheduling a suspend request if the resume counter is greater than 0.
> > 
> > Any other type of behavior has a potential to confuse driver writers.
> 
> Another possible approach you could take when the call to
> cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> in addition to RPM_IDLE and leave the suspend request queued.  When
> __pm_runtime_suspend sees both flags are set, it should abort and set
> the status directly back to RPM_ACTIVE.  At that time the idle
> notifications can start up again.
> 
> Is this any better?  I can't see how drivers would care, though.

There still is the problem that the suspend request is occupying the
work_struct which cannot be used for any other purpose.  I don't think this
is avoidable, though.  This way or another it is possible to have two requests
pending at a time.

Perhaps the simplest thing to do would be to simply ignore pending suspend
requests in both pm_request_resume() and pm_runtime_resume() and to allow
them to be scheduled at any time.  That shouldn't hurt anything as long as
pm_runtime_suspend() is smart enough, but it has to be anyway, because it
can be run synchronously at any time.

The only question is what pm_runtime_suspend() should do when it sees a pending
suspend request and quite frankly I think it can just ignore it as well,
leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
this.

Then, we'll need a separate work structure for suspend requests, but I have no
problem with that.

> P.S.: What do you think should happen if there's a delayed suspend
> request pending, then pm_request_resume is called (and it leaves the
> request queued), and then someone calls pm_runtime_suspend?  You've got
> two pending requests and a synchronous call all active at the same
> time!

That's easy, pm_runtime_suspend() sees a pending resume, so it quits and the
other things work out as usual.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 19:25                                               ` Alan Stern
  2009-06-29 21:04                                                 ` Rafael J. Wysocki
@ 2009-06-29 21:04                                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 21:04 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().
> 
> Normal top halves don't trigger before the circumstances are
> appropriate.  For example, if you enable remote wakeup on a USB device,
> it won't send a wakeup signal before it has been powered down.  A
> driver calling pm_request_resume while the device is still resumed is
> like a USB device sending a wakeup request while it is still powered
> up.  So IMO the analogy with top halves isn't a good one.
> 
> > Thus, it should either queue up a request to run pm_runtime_resume() or leave
> > the status as though pm_runtime_resume() ran.  Anything else would be
> > internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
> > requests, pm_request_resume() should do the same or the other way around.
> > 
> > Now, arguably, ignoring pending suspend requests is somewhat easier from
> > the core's point of view, but it may not be so for drivers.
> 
> The argument I gave in the previous email demonstrates that it doesn't
> make any difference to drivers.  Either way, they have to use two I/O
> pathways, they have to do a pm_runtime_get before pm_request_resume,
> and they have to do a pm_request_put after the I/O is done.
> 
> Of course, this is all somewhat theoretical.  I still don't know of any 
> actual drivers that do the equivalent of pm_request_resume.
> 
> > My point is that the core should always treat pending suspend requests in the
> > same way.  If they are canceled by pm_runtime_resume(), then
> > pm_request_resume() should also cancel them and it shouldn't be possible
> > to schedule a suspend request when the resume counter is greater than 0.
> > In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
> > should also ignore them and there's no point to prevent pm_request_suspend()
> > from scheduling a suspend request if the resume counter is greater than 0.
> > 
> > Any other type of behavior has a potential to confuse driver writers.
> 
> Another possible approach you could take when the call to
> cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> in addition to RPM_IDLE and leave the suspend request queued.  When
> __pm_runtime_suspend sees both flags are set, it should abort and set
> the status directly back to RPM_ACTIVE.  At that time the idle
> notifications can start up again.
> 
> Is this any better?  I can't see how drivers would care, though.

There still is the problem that the suspend request is occupying the
work_struct which cannot be used for any other purpose.  I don't think this
is avoidable, though.  This way or another it is possible to have two requests
pending at a time.

Perhaps the simplest thing to do would be to simply ignore pending suspend
requests in both pm_request_resume() and pm_runtime_resume() and to allow
them to be scheduled at any time.  That shouldn't hurt anything as long as
pm_runtime_suspend() is smart enough, but it has to be anyway, because it
can be run synchronously at any time.

The only question is what pm_runtime_suspend() should do when it sees a pending
suspend request and quite frankly I think it can just ignore it as well,
leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
this.

Then, we'll need a separate work structure for suspend requests, but I have no
problem with that.

> P.S.: What do you think should happen if there's a delayed suspend
> request pending, then pm_request_resume is called (and it leaves the
> request queued), and then someone calls pm_runtime_suspend?  You've got
> two pending requests and a synchronous call all active at the same
> time!

That's easy, pm_runtime_suspend() sees a pending resume, so it quits and the
other things work out as usual.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 21:04                                                 ` Rafael J. Wysocki
@ 2009-06-29 22:00                                                   ` Alan Stern
  2009-06-29 22:50                                                     ` Rafael J. Wysocki
  2009-06-29 22:50                                                     ` Rafael J. Wysocki
  0 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-29 22:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > Another possible approach you could take when the call to
> > cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> > in addition to RPM_IDLE and leave the suspend request queued.  When
> > __pm_runtime_suspend sees both flags are set, it should abort and set
> > the status directly back to RPM_ACTIVE.  At that time the idle
> > notifications can start up again.
> > 
> > Is this any better?  I can't see how drivers would care, though.
> 
> There still is the problem that the suspend request is occupying the
> work_struct which cannot be used for any other purpose.

What other purpose?  We don't send idle notifications in RPM_IDLE and
resume requests don't need to be stored since (as described above) they
just set the RPM_WAKE flag.  Hence nothing else needs to use the
work_struct.

>  I don't think this
> is avoidable, though.  This way or another it is possible to have two requests
> pending at a time.
> 
> Perhaps the simplest thing to do would be to simply ignore pending suspend
> requests in both pm_request_resume() and pm_runtime_resume() and to allow
> them to be scheduled at any time.  That shouldn't hurt anything as long as
> pm_runtime_suspend() is smart enough, but it has to be anyway, because it
> can be run synchronously at any time.
> 
> The only question is what pm_runtime_suspend() should do when it sees a pending
> suspend request and quite frankly I think it can just ignore it as well,
> leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
> be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
> this.
> 
> Then, we'll need a separate work structure for suspend requests, but I have no
> problem with that.

You seem to be thinking about these requests in a very different way
from me.  They don't form a queue or anything like that.  Instead they
mean "Change the device's power state to this value as soon as
possible" -- and they are needed only because sometimes (in atomic or
interrupt contexts) the change can't be made right away.

That's why it doesn't make any sense to have both a suspend and a 
resume request pending at the same time.  It would mean the driver is 
telling us "Change the device's power state to both low-power and 
full-power as soon as possible"!

We should settle on a general policy for how to handle it when a 
driver makes the mistake of telling us to do contradictory things.  
There are three natural policies:

	The first request takes precedence over the second;

	The second request takes precedence over the first;

	Resumes take precedence over suspends.

Any one of those would be acceptable.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 22:00                                                   ` Alan Stern
  2009-06-29 22:50                                                     ` Rafael J. Wysocki
@ 2009-06-29 22:50                                                     ` Rafael J. Wysocki
  2009-06-30 15:10                                                       ` Alan Stern
  2009-06-30 15:10                                                       ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 22:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Tuesday 30 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > Another possible approach you could take when the call to
> > > cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> > > in addition to RPM_IDLE and leave the suspend request queued.  When
> > > __pm_runtime_suspend sees both flags are set, it should abort and set
> > > the status directly back to RPM_ACTIVE.  At that time the idle
> > > notifications can start up again.
> > > 
> > > Is this any better?  I can't see how drivers would care, though.
> > 
> > There still is the problem that the suspend request is occupying the
> > work_struct which cannot be used for any other purpose.
> 
> What other purpose?  We don't send idle notifications in RPM_IDLE

OK

> and resume requests don't need to be stored since (as described above) they
> just set the RPM_WAKE flag.  Hence nothing else needs to use the
> work_struct.

Good.  I'd go for it, then.  OK?

> >  I don't think this
> > is avoidable, though.  This way or another it is possible to have two requests
> > pending at a time.
> > 
> > Perhaps the simplest thing to do would be to simply ignore pending suspend
> > requests in both pm_request_resume() and pm_runtime_resume() and to allow
> > them to be scheduled at any time.  That shouldn't hurt anything as long as
> > pm_runtime_suspend() is smart enough, but it has to be anyway, because it
> > can be run synchronously at any time.
> > 
> > The only question is what pm_runtime_suspend() should do when it sees a pending
> > suspend request and quite frankly I think it can just ignore it as well,
> > leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
> > be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
> > this.
> > 
> > Then, we'll need a separate work structure for suspend requests, but I have no
> > problem with that.
> 
> You seem to be thinking about these requests in a very different way
> from me.  They don't form a queue or anything like that.  Instead they
> mean "Change the device's power state to this value as soon as
> possible" -- and they are needed only because sometimes (in atomic or
> interrupt contexts) the change can't be made right away.
> 
> That's why it doesn't make any sense to have both a suspend and a 
> resume request pending at the same time.  It would mean the driver is 
> telling us "Change the device's power state to both low-power and 
> full-power as soon as possible"!
> 
> We should settle on a general policy for how to handle it when a 
> driver makes the mistake of telling us to do contradictory things.  
> There are three natural policies:
> 
> 	The first request takes precedence over the second;
> 
> 	The second request takes precedence over the first;
> 
> 	Resumes take precedence over suspends.
> 
> Any one of those would be acceptable.

IMO resumes should take precedence over suspends, because resume usually means
"there's I/O to process" and we usually we want the I/O to be processed as soon
as possible (deferred wake-up will usually mean deferred I/O and that would
hurt user experience).

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 22:00                                                   ` Alan Stern
@ 2009-06-29 22:50                                                     ` Rafael J. Wysocki
  2009-06-29 22:50                                                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-29 22:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Tuesday 30 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > Another possible approach you could take when the call to
> > > cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> > > in addition to RPM_IDLE and leave the suspend request queued.  When
> > > __pm_runtime_suspend sees both flags are set, it should abort and set
> > > the status directly back to RPM_ACTIVE.  At that time the idle
> > > notifications can start up again.
> > > 
> > > Is this any better?  I can't see how drivers would care, though.
> > 
> > There still is the problem that the suspend request is occupying the
> > work_struct which cannot be used for any other purpose.
> 
> What other purpose?  We don't send idle notifications in RPM_IDLE

OK

> and resume requests don't need to be stored since (as described above) they
> just set the RPM_WAKE flag.  Hence nothing else needs to use the
> work_struct.

Good.  I'd go for it, then.  OK?

> >  I don't think this
> > is avoidable, though.  This way or another it is possible to have two requests
> > pending at a time.
> > 
> > Perhaps the simplest thing to do would be to simply ignore pending suspend
> > requests in both pm_request_resume() and pm_runtime_resume() and to allow
> > them to be scheduled at any time.  That shouldn't hurt anything as long as
> > pm_runtime_suspend() is smart enough, but it has to be anyway, because it
> > can be run synchronously at any time.
> > 
> > The only question is what pm_runtime_suspend() should do when it sees a pending
> > suspend request and quite frankly I think it can just ignore it as well,
> > leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
> > be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
> > this.
> > 
> > Then, we'll need a separate work structure for suspend requests, but I have no
> > problem with that.
> 
> You seem to be thinking about these requests in a very different way
> from me.  They don't form a queue or anything like that.  Instead they
> mean "Change the device's power state to this value as soon as
> possible" -- and they are needed only because sometimes (in atomic or
> interrupt contexts) the change can't be made right away.
> 
> That's why it doesn't make any sense to have both a suspend and a 
> resume request pending at the same time.  It would mean the driver is 
> telling us "Change the device's power state to both low-power and 
> full-power as soon as possible"!
> 
> We should settle on a general policy for how to handle it when a 
> driver makes the mistake of telling us to do contradictory things.  
> There are three natural policies:
> 
> 	The first request takes precedence over the second;
> 
> 	The second request takes precedence over the first;
> 
> 	Resumes take precedence over suspends.
> 
> Any one of those would be acceptable.

IMO resumes should take precedence over suspends, because resume usually means
"there's I/O to process" and we usually we want the I/O to be processed as soon
as possible (deferred wake-up will usually mean deferred I/O and that would
hurt user experience).

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 22:50                                                     ` Rafael J. Wysocki
  2009-06-30 15:10                                                       ` Alan Stern
@ 2009-06-30 15:10                                                       ` Alan Stern
  2009-06-30 22:30                                                         ` [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) Rafael J. Wysocki
  2009-06-30 22:30                                                         ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-30 15:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Tue, 30 Jun 2009, Rafael J. Wysocki wrote:

> > There are three natural policies:
> > 
> > 	The first request takes precedence over the second;
> > 
> > 	The second request takes precedence over the first;
> > 
> > 	Resumes take precedence over suspends.
> > 
> > Any one of those would be acceptable.
> 
> IMO resumes should take precedence over suspends, because resume usually means
> "there's I/O to process" and we usually we want the I/O to be processed as soon
> as possible (deferred wake-up will usually mean deferred I/O and that would
> hurt user experience).

I don't know.  I gave this a lot of thought last night, and it seems
that the best approach would be to always obey the most recent request.  
(In the case of delayed suspend requests, this is ambiguous because
there are two times involved: when the request was originally submitted
and when the delay expires.  We should use the time of original
submission.)  The only exception is pm_request_put; it shouldn't
override an existing suspend request just to send an idle notification.

If this seems difficult to implement, it's an indication that the
overall design needs to be reworked.  Here's what I came up with:

The possible statuses are reduced to: RPM_ACTIVE, RPM_SUSPENDING, 
RPM_SUSPENDED, RPM_RESUMING, and RPM_ERROR.  These most directly 
correspond to the core's view of the device state.  Transitions are:

	RPM_ACTIVE <-> RPM_SUSPENDING -> RPM_SUSPENDED ->
		RPM_RESUMING -> RPM_ACTIVE ...

Failure of a suspend causes the backward transition from RPM_SUSPENDING 
to RPM_ACTIVE, and errors cause a transition to RPM_ERROR.  Otherwise 
we always go forward.

Instead of a delayed_work_struct, we'll have a regular work_struct plus 
a separate timer_list structure.  That way we get more control over 
what happens when the timer expires.  The timer callback routine will 
submit the work_struct, but it will do so under the spinlock and only 
after making the proper checks.

There will be only one work callback routine.  It decides what to do
based on the status and a new field: async_action.  The possible values
for async_action are 0 (do nothing), ASYNC_SUSPEND, ASYNC_RESUME, and
ASYNC_NOTIFY.

There will be a few extra fields: a work_pending flag, the timer 
expiration value (which doubles as a timer_pending flag by being set
to 0 when the timer isn't pending), and maybe some others.

There are restrictions on what can happen in each state.  The timer is
allowed to run only in RPM_RESUMING and RPM_ACTIVE, and ASYNC_NOTIFY is
allowed only in those states.  ASYNC_RESUME isn't allowed in
RPM_RESUMING or RPM_ACTIVE, and ASYNC_SUSPEND isn't allowed in
RPM_SUSPENDING or RPM_SUSPENDED.  Pending work isn't allowed in
RPM_RESUMING or RPM_SUSPENDING; if a request is submitted at such times
is merely sets async_action, and the work will be scheduled when the
resume or suspend finishes.  This is to avoid forcing the workqueue
thread to wait unnecessarily.

__pm_runtime_suspend and __pm_runtime_resume start out by cancelling 
the timer and the work_struct (if they are pending) and by setting 
async_action to 0.  The cancellations don't have to wait; if a callback 
routine is already running then when it gets the spinlock it will see 
that it has nothing to do.

If __pm_runtime_suspend was called asynchronously and the status is
already RPM_SUSPENDING, it can return after taking these actions.  If
the status is already RPM_RESUMING, it should set async_action to
ASYNC_SUSPEND and then return.  __pm_runtime_resume behaves similarly.

The pm_request_* routines similarly cancel a pending timer and clear
async_action.  They should cancel any pending work unless they're going
to submit new work anyway.

That's enough to give you the general idea.  I think this design is 
a lot cleaner than the current one.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)
  2009-06-29 22:50                                                     ` Rafael J. Wysocki
@ 2009-06-30 15:10                                                       ` Alan Stern
  2009-06-30 15:10                                                       ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-06-30 15:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Tue, 30 Jun 2009, Rafael J. Wysocki wrote:

> > There are three natural policies:
> > 
> > 	The first request takes precedence over the second;
> > 
> > 	The second request takes precedence over the first;
> > 
> > 	Resumes take precedence over suspends.
> > 
> > Any one of those would be acceptable.
> 
> IMO resumes should take precedence over suspends, because resume usually means
> "there's I/O to process" and we usually we want the I/O to be processed as soon
> as possible (deferred wake-up will usually mean deferred I/O and that would
> hurt user experience).

I don't know.  I gave this a lot of thought last night, and it seems
that the best approach would be to always obey the most recent request.  
(In the case of delayed suspend requests, this is ambiguous because
there are two times involved: when the request was originally submitted
and when the delay expires.  We should use the time of original
submission.)  The only exception is pm_request_put; it shouldn't
override an existing suspend request just to send an idle notification.

If this seems difficult to implement, it's an indication that the
overall design needs to be reworked.  Here's what I came up with:

The possible statuses are reduced to: RPM_ACTIVE, RPM_SUSPENDING, 
RPM_SUSPENDED, RPM_RESUMING, and RPM_ERROR.  These most directly 
correspond to the core's view of the device state.  Transitions are:

	RPM_ACTIVE <-> RPM_SUSPENDING -> RPM_SUSPENDED ->
		RPM_RESUMING -> RPM_ACTIVE ...

Failure of a suspend causes the backward transition from RPM_SUSPENDING 
to RPM_ACTIVE, and errors cause a transition to RPM_ERROR.  Otherwise 
we always go forward.

Instead of a delayed_work_struct, we'll have a regular work_struct plus 
a separate timer_list structure.  That way we get more control over 
what happens when the timer expires.  The timer callback routine will 
submit the work_struct, but it will do so under the spinlock and only 
after making the proper checks.

There will be only one work callback routine.  It decides what to do
based on the status and a new field: async_action.  The possible values
for async_action are 0 (do nothing), ASYNC_SUSPEND, ASYNC_RESUME, and
ASYNC_NOTIFY.

There will be a few extra fields: a work_pending flag, the timer 
expiration value (which doubles as a timer_pending flag by being set
to 0 when the timer isn't pending), and maybe some others.

There are restrictions on what can happen in each state.  The timer is
allowed to run only in RPM_RESUMING and RPM_ACTIVE, and ASYNC_NOTIFY is
allowed only in those states.  ASYNC_RESUME isn't allowed in
RPM_RESUMING or RPM_ACTIVE, and ASYNC_SUSPEND isn't allowed in
RPM_SUSPENDING or RPM_SUSPENDED.  Pending work isn't allowed in
RPM_RESUMING or RPM_SUSPENDING; if a request is submitted at such times
is merely sets async_action, and the work will be scheduled when the
resume or suspend finishes.  This is to avoid forcing the workqueue
thread to wait unnecessarily.

__pm_runtime_suspend and __pm_runtime_resume start out by cancelling 
the timer and the work_struct (if they are pending) and by setting 
async_action to 0.  The cancellations don't have to wait; if a callback 
routine is already running then when it gets the spinlock it will see 
that it has nothing to do.

If __pm_runtime_suspend was called asynchronously and the status is
already RPM_SUSPENDING, it can return after taking these actions.  If
the status is already RPM_RESUMING, it should set async_action to
ASYNC_SUSPEND and then return.  __pm_runtime_resume behaves similarly.

The pm_request_* routines similarly cancel a pending timer and clear
async_action.  They should cancel any pending work unless they're going
to submit new work anyway.

That's enough to give you the general idea.  I think this design is 
a lot cleaner than the current one.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-06-30 15:10                                                       ` Alan Stern
  2009-06-30 22:30                                                         ` [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) Rafael J. Wysocki
@ 2009-06-30 22:30                                                         ` Rafael J. Wysocki
  2009-07-01 15:35                                                           ` Alan Stern
  2009-07-01 15:35                                                           ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-30 22:30 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Tuesday 30 June 2009, Alan Stern wrote:
... 
> That's enough to give you the general idea.  I think this design is 
> a lot cleaner than the current one.

Well, I'm not really happy with starting over, but if you think we should do
that, then let's do it.

I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
provide a mechanism by which it is possible:
(1) to schedule a delayed execution of ->runtime_suspend(), possibly from
    interrupt context,
(2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
    from interrupt context,
(3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
    synchronous way (I'm not sure about ->runtime_idle())
_and_ to ensure that these callbacks will be executed when it makes sense.
There's no other point, because the core has no information to make choices,
it can only prevent wrong things from happening, if possible.

I think you will agree that the users of the framework should be able to
prevent ->runtime_suspend() from being called and that's what the usage counter
is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
framework, when the usage counter is nonzero.

BTW, I don't think resume_count is the best name; it used to be in the version
of my patch where it was automatically incremented when ->runtime_resume() was
about to called.  usage_count is probably better.

Next, I think that the framework should refuse to call ->runtime_suspend() and
->runtime_idle() if the children of the device are not suspended and the
"ignore children" flag is unset.  The counter of unsuspended children is used
for that.  I think the rule should be that it is decremented for the parent
whenever ->runtime_suspend() is called for a child and it is incremented
for the parent whenever ->runtime_resume() is called for a child.

Now, the question is what rules should apply to the ordering and possible
simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
->runtime_resume().  I think the following rules make sense:

  * It is forbidden to run ->runtime_suspend() twice in a row.

  * It is forbidden to run ->runtime_suspend() in parallel with another instance
    of ->runtime_suspend().

  * It is forbidden to run ->runtime_resume() twice in a row.

  * It is forbidden to run ->runtime_resume() in parallel with another instance
    of ->runtime_resume().

  * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
    ->runtime_idle(), but the latter case is preferred. 

  * It is allowed to run ->runtime_resume() after ->runtime_suspend().

  * It is forbidden to run ->runtime_resume() after ->runtime_idle().

  * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
    parallel with each other.

  * It is forbidden to run ->runtime_idle() twice in a row.

  * It is forbidden to run ->runtime_idle() in parallel with another instance
    of ->runtime_idle().

  * It is forbidden to run ->runtime_idle() after ->runtime_suspend().

  * It is allowed to run ->runtime_idle() after ->runtime_resume().

  * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
    ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
    call ->runtime_suspend() from within ->runtime_idle().

  * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
    ->runtime_suspend() is running.

  * If ->runtime_resume() is about to be called immediately after
    ->runtime_suspend(), the execution of ->runtime_suspend() should be
    prevented from happening, if possible, in which case the execution of
    ->runtime_resume() shouldn't happen.

  * If ->runtime_suspend() is about to be called immediately after
    ->runtime_resume(), the execution of ->runtime_resume() should be
    prevented from happening, if possible, in which case the execution of
    ->runtime_suspend() shouldn't happen.

[Are there any more rules related to these callbacks we should take into
account?]

Next, if we agree about the rules above, the question is what helper functions
should be provided by the core allowing these rules to be followed
automatically and what error codes should be returned by them in case it
wasn't possible to proceed without breaking the rules.

IMO, it is reasonable to provide:

  * pm_schedule_suspend(dev, delay) - schedule the execution of
    ->runtime_suspend(dev) after delay.

  * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.

  * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.

  * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
    into the run-time PM workqueue.

  * pm_runtime_get(dev) - increment the device's usage counter.

  * pm_runtime_put(dev) - decrement the device's usage counter.

  * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
    counter is zero and all of the device's children are suspended (or the
    "ignore children" flag is set).

  * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
    into the run-time PM workqueue.  The usage counter and children will be
    checked immediately before executing ->runtime_idle(dev).

I'm not sure if it is really necessary to combine pm_runtime_idle() or
pm_request_idle() with pm_runtime_put().  At least right now I don't see any
real value of that.

I also am not sure what error codes should be returned by the above helper
functions and in what conditions.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-06-30 15:10                                                       ` Alan Stern
@ 2009-06-30 22:30                                                         ` Rafael J. Wysocki
  2009-06-30 22:30                                                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-30 22:30 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Tuesday 30 June 2009, Alan Stern wrote:
... 
> That's enough to give you the general idea.  I think this design is 
> a lot cleaner than the current one.

Well, I'm not really happy with starting over, but if you think we should do
that, then let's do it.

I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
provide a mechanism by which it is possible:
(1) to schedule a delayed execution of ->runtime_suspend(), possibly from
    interrupt context,
(2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
    from interrupt context,
(3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
    synchronous way (I'm not sure about ->runtime_idle())
_and_ to ensure that these callbacks will be executed when it makes sense.
There's no other point, because the core has no information to make choices,
it can only prevent wrong things from happening, if possible.

I think you will agree that the users of the framework should be able to
prevent ->runtime_suspend() from being called and that's what the usage counter
is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
framework, when the usage counter is nonzero.

BTW, I don't think resume_count is the best name; it used to be in the version
of my patch where it was automatically incremented when ->runtime_resume() was
about to called.  usage_count is probably better.

Next, I think that the framework should refuse to call ->runtime_suspend() and
->runtime_idle() if the children of the device are not suspended and the
"ignore children" flag is unset.  The counter of unsuspended children is used
for that.  I think the rule should be that it is decremented for the parent
whenever ->runtime_suspend() is called for a child and it is incremented
for the parent whenever ->runtime_resume() is called for a child.

Now, the question is what rules should apply to the ordering and possible
simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
->runtime_resume().  I think the following rules make sense:

  * It is forbidden to run ->runtime_suspend() twice in a row.

  * It is forbidden to run ->runtime_suspend() in parallel with another instance
    of ->runtime_suspend().

  * It is forbidden to run ->runtime_resume() twice in a row.

  * It is forbidden to run ->runtime_resume() in parallel with another instance
    of ->runtime_resume().

  * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
    ->runtime_idle(), but the latter case is preferred. 

  * It is allowed to run ->runtime_resume() after ->runtime_suspend().

  * It is forbidden to run ->runtime_resume() after ->runtime_idle().

  * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
    parallel with each other.

  * It is forbidden to run ->runtime_idle() twice in a row.

  * It is forbidden to run ->runtime_idle() in parallel with another instance
    of ->runtime_idle().

  * It is forbidden to run ->runtime_idle() after ->runtime_suspend().

  * It is allowed to run ->runtime_idle() after ->runtime_resume().

  * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
    ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
    call ->runtime_suspend() from within ->runtime_idle().

  * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
    ->runtime_suspend() is running.

  * If ->runtime_resume() is about to be called immediately after
    ->runtime_suspend(), the execution of ->runtime_suspend() should be
    prevented from happening, if possible, in which case the execution of
    ->runtime_resume() shouldn't happen.

  * If ->runtime_suspend() is about to be called immediately after
    ->runtime_resume(), the execution of ->runtime_resume() should be
    prevented from happening, if possible, in which case the execution of
    ->runtime_suspend() shouldn't happen.

[Are there any more rules related to these callbacks we should take into
account?]

Next, if we agree about the rules above, the question is what helper functions
should be provided by the core allowing these rules to be followed
automatically and what error codes should be returned by them in case it
wasn't possible to proceed without breaking the rules.

IMO, it is reasonable to provide:

  * pm_schedule_suspend(dev, delay) - schedule the execution of
    ->runtime_suspend(dev) after delay.

  * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.

  * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.

  * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
    into the run-time PM workqueue.

  * pm_runtime_get(dev) - increment the device's usage counter.

  * pm_runtime_put(dev) - decrement the device's usage counter.

  * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
    counter is zero and all of the device's children are suspended (or the
    "ignore children" flag is set).

  * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
    into the run-time PM workqueue.  The usage counter and children will be
    checked immediately before executing ->runtime_idle(dev).

I'm not sure if it is really necessary to combine pm_runtime_idle() or
pm_request_idle() with pm_runtime_put().  At least right now I don't see any
real value of that.

I also am not sure what error codes should be returned by the above helper
functions and in what conditions.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-06-30 22:30                                                         ` Rafael J. Wysocki
@ 2009-07-01 15:35                                                           ` Alan Stern
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
  2009-07-01 15:35                                                           ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-01 15:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:

> On Tuesday 30 June 2009, Alan Stern wrote:
> ... 
> > That's enough to give you the general idea.  I think this design is 
> > a lot cleaner than the current one.
> 
> Well, I'm not really happy with starting over, but if you think we should do
> that, then let's do it.

It's not a complete restart.  Much of the existing interface and quite
a bit of code would remain the same.

> I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
> provide a mechanism by which it is possible:
> (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
>     interrupt context,
> (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
>     from interrupt context,
> (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
>     synchronous way (I'm not sure about ->runtime_idle())

Yes, runtime_idle also, for drivers that require minimal overhead.

> _and_ to ensure that these callbacks will be executed when it makes sense.

Thus if the situation changes before the callback can be made, so that
it no longer makes sense, the framework should cancel the callback.

> There's no other point, because the core has no information to make choices,
> it can only prevent wrong things from happening, if possible.

Exactly.

> I think you will agree that the users of the framework should be able to
> prevent ->runtime_suspend() from being called and that's what the usage counter
> is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
> framework, when the usage counter is nonzero.

Right, because then by definition the device is in use so it can't be 
idle.

> BTW, I don't think resume_count is the best name; it used to be in the version
> of my patch where it was automatically incremented when ->runtime_resume() was
> about to called.  usage_count is probably better.

Fine.

> Next, I think that the framework should refuse to call ->runtime_suspend() and
> ->runtime_idle() if the children of the device are not suspended and the
> "ignore children" flag is unset.

Yes; this is part of the "makes sense" requirement.

>  The counter of unsuspended children is used
> for that.  I think the rule should be that it is decremented for the parent
> whenever ->runtime_suspend() is called for a child and it is incremented
> for the parent whenever ->runtime_resume() is called for a child.

Of course.  (Minor change: decremented when runtime_suspend _succeeds_ 
for a child.)

> Now, the question is what rules should apply to the ordering and possible
> simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> ->runtime_resume().  I think the following rules make sense:

Oh dear.  I wouldn't attempt to make a complete list of all possible 
interactions.  It's too hard to know whether you have really covered 
all the cases.

>   * It is forbidden to run ->runtime_suspend() twice in a row.
> 
>   * It is forbidden to run ->runtime_suspend() in parallel with another instance
>     of ->runtime_suspend().
> 
>   * It is forbidden to run ->runtime_resume() twice in a row.
> 
>   * It is forbidden to run ->runtime_resume() in parallel with another instance
>     of ->runtime_resume().
> 
>   * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
>     ->runtime_idle(), but the latter case is preferred. 
> 
>   * It is allowed to run ->runtime_resume() after ->runtime_suspend().
> 
>   * It is forbidden to run ->runtime_resume() after ->runtime_idle().
> 
>   * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
>     parallel with each other.
> 
>   * It is forbidden to run ->runtime_idle() twice in a row.
> 
>   * It is forbidden to run ->runtime_idle() in parallel with another instance
>     of ->runtime_idle().
> 
>   * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
> 
>   * It is allowed to run ->runtime_idle() after ->runtime_resume().
> 
>   * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
>     ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
>     call ->runtime_suspend() from within ->runtime_idle().
> 
>   * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
>     ->runtime_suspend() is running.

We can summarize these rules as follows:

	Never allow more than one callback at a time, except that
	runtime_suspend may be invoked while runtime_idle is running.

	Don't call runtime_resume while the device is active.

	Don't call runtime_suspend or runtime_idle while the device
	is suspended.

	Don't invoke any callbacks if the device state is unknown
	(RPM_ERROR).

Implicit is the notion that the device is suspended when 
runtime_suspend returns successfully, it is active when runtime_resume 
returns successfully, and it is unknown when either returns an error.

>   * If ->runtime_resume() is about to be called immediately after
>     ->runtime_suspend(), the execution of ->runtime_suspend() should be
>     prevented from happening, if possible, in which case the execution of
>     ->runtime_resume() shouldn't happen.
> 
>   * If ->runtime_suspend() is about to be called immediately after
>     ->runtime_resume(), the execution of ->runtime_resume() should be
>     prevented from happening, if possible, in which case the execution of
>     ->runtime_suspend() shouldn't happen.

These could be considered optional optimizations.  Or if you prefer, 
they could be covered by a "New requests override previous requests" 
rule.

> [Are there any more rules related to these callbacks we should take into
> account?]

	Runtime PM callbacks are mutually exclusive with other driver
	core callbacks (probe, remove, dev_pm_ops, etc.).

	If a callback occurs asynchronously then it will be invoked
	in process context.  If it occurs as part of a synchronous
	request then it is invoked in the caller's context.

Related to this is the requirement that pm_runtime_idle,
pm_runtime_suspend, and pm_runtime_resume must always be called in
process context whereas pm_runtime_idle_atomic,
pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
in any context.

> Next, if we agree about the rules above, the question is what helper functions
> should be provided by the core allowing these rules to be followed
> automatically and what error codes should be returned by them in case it
> wasn't possible to proceed without breaking the rules.
> 
> IMO, it is reasonable to provide:
> 
>   * pm_schedule_suspend(dev, delay) - schedule the execution of
>     ->runtime_suspend(dev) after delay.
> 
>   * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
> 
>   * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
> 
>   * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
>     into the run-time PM workqueue.
> 
>   * pm_runtime_get(dev) - increment the device's usage counter.
> 
>   * pm_runtime_put(dev) - decrement the device's usage counter.
> 
>   * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
>     counter is zero and all of the device's children are suspended (or the
>     "ignore children" flag is set).
> 
>   * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
>     into the run-time PM workqueue.  The usage counter and children will be
>     checked immediately before executing ->runtime_idle(dev).

Should the counters also be checked when the request is submitted?  
And should the same go for pm_schedule_suspend?  These are nontrivial
questions; good arguments can be made both ways.

> I'm not sure if it is really necessary to combine pm_runtime_idle() or
> pm_request_idle() with pm_runtime_put().  At least right now I don't see any
> real value of that.

Likewise combining pm_runtime_get with pm_runtime_resume.  The only
value is to make things easier for drivers, because these will be very
common idioms.

> I also am not sure what error codes should be returned by the above helper
> functions and in what conditions.

The error codes you have been using seem okay to me, in general.

However, some of those requests would violate the rules in a trivial 
way.  For these we might return a positive value rather than a negative 
error code.  For example, calling pm_runtime_resume while the device is 
already active shouldn't be considered an error.  But it can't be 
considered a complete success either, because it won't invoke the 
runtime_resume method.

To be determined: How runtime PM will interact with system sleep.


About all I can add is the "New requests override previous requests"  
policy.  This would apply to all the non-synchronous requests, whether
they are delayed or added directly to the workqueue.  If a new request
(synchronous or not) is received before the old one has started to run,
the old one will be cancelled.  This holds even if the new request is
redundant, like a resume request received while the device is active.

There is one exception to this rule: An idle_notify request does not 
cancel a delayed or queued suspend request.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-06-30 22:30                                                         ` Rafael J. Wysocki
  2009-07-01 15:35                                                           ` Alan Stern
@ 2009-07-01 15:35                                                           ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-01 15:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:

> On Tuesday 30 June 2009, Alan Stern wrote:
> ... 
> > That's enough to give you the general idea.  I think this design is 
> > a lot cleaner than the current one.
> 
> Well, I'm not really happy with starting over, but if you think we should do
> that, then let's do it.

It's not a complete restart.  Much of the existing interface and quite
a bit of code would remain the same.

> I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
> provide a mechanism by which it is possible:
> (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
>     interrupt context,
> (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
>     from interrupt context,
> (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
>     synchronous way (I'm not sure about ->runtime_idle())

Yes, runtime_idle also, for drivers that require minimal overhead.

> _and_ to ensure that these callbacks will be executed when it makes sense.

Thus if the situation changes before the callback can be made, so that
it no longer makes sense, the framework should cancel the callback.

> There's no other point, because the core has no information to make choices,
> it can only prevent wrong things from happening, if possible.

Exactly.

> I think you will agree that the users of the framework should be able to
> prevent ->runtime_suspend() from being called and that's what the usage counter
> is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
> framework, when the usage counter is nonzero.

Right, because then by definition the device is in use so it can't be 
idle.

> BTW, I don't think resume_count is the best name; it used to be in the version
> of my patch where it was automatically incremented when ->runtime_resume() was
> about to called.  usage_count is probably better.

Fine.

> Next, I think that the framework should refuse to call ->runtime_suspend() and
> ->runtime_idle() if the children of the device are not suspended and the
> "ignore children" flag is unset.

Yes; this is part of the "makes sense" requirement.

>  The counter of unsuspended children is used
> for that.  I think the rule should be that it is decremented for the parent
> whenever ->runtime_suspend() is called for a child and it is incremented
> for the parent whenever ->runtime_resume() is called for a child.

Of course.  (Minor change: decremented when runtime_suspend _succeeds_ 
for a child.)

> Now, the question is what rules should apply to the ordering and possible
> simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> ->runtime_resume().  I think the following rules make sense:

Oh dear.  I wouldn't attempt to make a complete list of all possible 
interactions.  It's too hard to know whether you have really covered 
all the cases.

>   * It is forbidden to run ->runtime_suspend() twice in a row.
> 
>   * It is forbidden to run ->runtime_suspend() in parallel with another instance
>     of ->runtime_suspend().
> 
>   * It is forbidden to run ->runtime_resume() twice in a row.
> 
>   * It is forbidden to run ->runtime_resume() in parallel with another instance
>     of ->runtime_resume().
> 
>   * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
>     ->runtime_idle(), but the latter case is preferred. 
> 
>   * It is allowed to run ->runtime_resume() after ->runtime_suspend().
> 
>   * It is forbidden to run ->runtime_resume() after ->runtime_idle().
> 
>   * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
>     parallel with each other.
> 
>   * It is forbidden to run ->runtime_idle() twice in a row.
> 
>   * It is forbidden to run ->runtime_idle() in parallel with another instance
>     of ->runtime_idle().
> 
>   * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
> 
>   * It is allowed to run ->runtime_idle() after ->runtime_resume().
> 
>   * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
>     ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
>     call ->runtime_suspend() from within ->runtime_idle().
> 
>   * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
>     ->runtime_suspend() is running.

We can summarize these rules as follows:

	Never allow more than one callback at a time, except that
	runtime_suspend may be invoked while runtime_idle is running.

	Don't call runtime_resume while the device is active.

	Don't call runtime_suspend or runtime_idle while the device
	is suspended.

	Don't invoke any callbacks if the device state is unknown
	(RPM_ERROR).

Implicit is the notion that the device is suspended when 
runtime_suspend returns successfully, it is active when runtime_resume 
returns successfully, and it is unknown when either returns an error.

>   * If ->runtime_resume() is about to be called immediately after
>     ->runtime_suspend(), the execution of ->runtime_suspend() should be
>     prevented from happening, if possible, in which case the execution of
>     ->runtime_resume() shouldn't happen.
> 
>   * If ->runtime_suspend() is about to be called immediately after
>     ->runtime_resume(), the execution of ->runtime_resume() should be
>     prevented from happening, if possible, in which case the execution of
>     ->runtime_suspend() shouldn't happen.

These could be considered optional optimizations.  Or if you prefer, 
they could be covered by a "New requests override previous requests" 
rule.

> [Are there any more rules related to these callbacks we should take into
> account?]

	Runtime PM callbacks are mutually exclusive with other driver
	core callbacks (probe, remove, dev_pm_ops, etc.).

	If a callback occurs asynchronously then it will be invoked
	in process context.  If it occurs as part of a synchronous
	request then it is invoked in the caller's context.

Related to this is the requirement that pm_runtime_idle,
pm_runtime_suspend, and pm_runtime_resume must always be called in
process context whereas pm_runtime_idle_atomic,
pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
in any context.

> Next, if we agree about the rules above, the question is what helper functions
> should be provided by the core allowing these rules to be followed
> automatically and what error codes should be returned by them in case it
> wasn't possible to proceed without breaking the rules.
> 
> IMO, it is reasonable to provide:
> 
>   * pm_schedule_suspend(dev, delay) - schedule the execution of
>     ->runtime_suspend(dev) after delay.
> 
>   * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
> 
>   * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
> 
>   * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
>     into the run-time PM workqueue.
> 
>   * pm_runtime_get(dev) - increment the device's usage counter.
> 
>   * pm_runtime_put(dev) - decrement the device's usage counter.
> 
>   * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
>     counter is zero and all of the device's children are suspended (or the
>     "ignore children" flag is set).
> 
>   * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
>     into the run-time PM workqueue.  The usage counter and children will be
>     checked immediately before executing ->runtime_idle(dev).

Should the counters also be checked when the request is submitted?  
And should the same go for pm_schedule_suspend?  These are nontrivial
questions; good arguments can be made both ways.

> I'm not sure if it is really necessary to combine pm_runtime_idle() or
> pm_request_idle() with pm_runtime_put().  At least right now I don't see any
> real value of that.

Likewise combining pm_runtime_get with pm_runtime_resume.  The only
value is to make things easier for drivers, because these will be very
common idioms.

> I also am not sure what error codes should be returned by the above helper
> functions and in what conditions.

The error codes you have been using seem okay to me, in general.

However, some of those requests would violate the rules in a trivial 
way.  For these we might return a positive value rather than a negative 
error code.  For example, calling pm_runtime_resume while the device is 
already active shouldn't be considered an error.  But it can't be 
considered a complete success either, because it won't invoke the 
runtime_resume method.

To be determined: How runtime PM will interact with system sleep.


About all I can add is the "New requests override previous requests"  
policy.  This would apply to all the non-synchronous requests, whether
they are delayed or added directly to the workqueue.  If a new request
(synchronous or not) is received before the old one has started to run,
the old one will be cancelled.  This holds even if the new request is
redundant, like a resume request received while the device is active.

There is one exception to this rule: An idle_notify request does not 
cancel a delayed or queued suspend request.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-01 15:35                                                           ` Alan Stern
@ 2009-07-01 22:19                                                             ` Rafael J. Wysocki
  2009-07-02 15:42                                                               ` Rafael J. Wysocki
                                                                                 ` (3 more replies)
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
  1 sibling, 4 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-01 22:19 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wednesday 01 July 2009, Alan Stern wrote:
> On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
> 
> > On Tuesday 30 June 2009, Alan Stern wrote:
> > ... 
> > > That's enough to give you the general idea.  I think this design is 
> > > a lot cleaner than the current one.
> > 
> > Well, I'm not really happy with starting over, but if you think we should do
> > that, then let's do it.
> 
> It's not a complete restart.  Much of the existing interface and quite
> a bit of code would remain the same.
> 
> > I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> > and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
> > provide a mechanism by which it is possible:
> > (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
> >     interrupt context,
> > (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
> >     from interrupt context,
> > (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
> >     synchronous way (I'm not sure about ->runtime_idle())
> 
> Yes, runtime_idle also, for drivers that require minimal overhead.
> 
> > _and_ to ensure that these callbacks will be executed when it makes sense.
> 
> Thus if the situation changes before the callback can be made, so that
> it no longer makes sense, the framework should cancel the callback.

Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
resume request to be queued up and pm_runtime_resume() is called synchronously
exactly at the time the request's work function is started.  There are two
attempts to resume in progress, but only one of them can call
->runtime_resume(), so what's the other one supposed to do?  The asynchronous
one can just return error code, but the the caller of the synchronous
pm_runtime_resume() must know whether or not the resume was successful.
So, perhaps, if the synchronous resume happens to lose the race, it should
wait for the other one to complete, check the device's status and return 0 if
it's active?  That wouldn't cause the workqueue thread to wait.

> > There's no other point, because the core has no information to make choices,
> > it can only prevent wrong things from happening, if possible.
> 
> Exactly.
> 
> > I think you will agree that the users of the framework should be able to
> > prevent ->runtime_suspend() from being called and that's what the usage counter
> > is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
> > framework, when the usage counter is nonzero.
> 
> Right, because then by definition the device is in use so it can't be 
> idle.
> 
> > BTW, I don't think resume_count is the best name; it used to be in the version
> > of my patch where it was automatically incremented when ->runtime_resume() was
> > about to called.  usage_count is probably better.
> 
> Fine.
> 
> > Next, I think that the framework should refuse to call ->runtime_suspend() and
> > ->runtime_idle() if the children of the device are not suspended and the
> > "ignore children" flag is unset.
> 
> Yes; this is part of the "makes sense" requirement.
> 
> >  The counter of unsuspended children is used
> > for that.  I think the rule should be that it is decremented for the parent
> > whenever ->runtime_suspend() is called for a child and it is incremented
> > for the parent whenever ->runtime_resume() is called for a child.
> 
> Of course.  (Minor change: decremented when runtime_suspend _succeeds_ 
> for a child.)
> 
> > Now, the question is what rules should apply to the ordering and possible
> > simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> > ->runtime_resume().  I think the following rules make sense:
> 
> Oh dear.  I wouldn't attempt to make a complete list of all possible 
> interactions.  It's too hard to know whether you have really covered 
> all the cases.
> 
> >   * It is forbidden to run ->runtime_suspend() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_suspend() in parallel with another instance
> >     of ->runtime_suspend().
> > 
> >   * It is forbidden to run ->runtime_resume() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_resume() in parallel with another instance
> >     of ->runtime_resume().
> > 
> >   * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
> >     ->runtime_idle(), but the latter case is preferred. 
> > 
> >   * It is allowed to run ->runtime_resume() after ->runtime_suspend().
> > 
> >   * It is forbidden to run ->runtime_resume() after ->runtime_idle().
> > 
> >   * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
> >     parallel with each other.
> > 
> >   * It is forbidden to run ->runtime_idle() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_idle() in parallel with another instance
> >     of ->runtime_idle().
> > 
> >   * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
> > 
> >   * It is allowed to run ->runtime_idle() after ->runtime_resume().
> > 
> >   * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
> >     ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
> >     call ->runtime_suspend() from within ->runtime_idle().
> > 
> >   * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
> >     ->runtime_suspend() is running.
> 
> We can summarize these rules as follows:
> 
> 	Never allow more than one callback at a time, except that
> 	runtime_suspend may be invoked while runtime_idle is running.

Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
after that resume is requested by remote wake-up, ->runtime_resume() may also
be run while ->runtime_idle() is still running.

OTOH, we need to know when ->runtime_idle() has completed, because we have to
ensure it won't still be running after run-time PM has been disabled for the
device.

IMO, we need two flags, one indicating that either ->runtime_suspend(), or
->runtime_resume() is being executed (they are mutually exclusive) and the
the other one indicating that ->runtime_idle() is being executed.  For the
purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
RPM_IN_TRANSITION.

With this notation, the above rule may be translated as:

    Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
    ->runtime_idle() if RPM_IDLE_RUNNING is set.

Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

> 	Don't call runtime_resume while the device is active.
> 
> 	Don't call runtime_suspend or runtime_idle while the device
> 	is suspended.
> 
> 	Don't invoke any callbacks if the device state is unknown
> 	(RPM_ERROR).
> 
> Implicit is the notion that the device is suspended when 
> runtime_suspend returns successfully, it is active when runtime_resume 
> returns successfully, and it is unknown when either returns an error.

Yes.

There are two possible "final" states, so I'd use one flag to indicate the
current status.  Let's call it RPM_SUSPENDED for now (which means that the
device is suspended when it's set and active otherwise) and I think we can make
the rule that this flag is only changed after successful execution of
->runtime_suspend() or ->runtime_resume().

Whether the device is suspending or resuming follows from the values of
RPM_SUSPENDED and RPM_IN_TRANSITION.

> >   * If ->runtime_resume() is about to be called immediately after
> >     ->runtime_suspend(), the execution of ->runtime_suspend() should be
> >     prevented from happening, if possible, in which case the execution of
> >     ->runtime_resume() shouldn't happen.
> > 
> >   * If ->runtime_suspend() is about to be called immediately after
> >     ->runtime_resume(), the execution of ->runtime_resume() should be
> >     prevented from happening, if possible, in which case the execution of
> >     ->runtime_suspend() shouldn't happen.
> 
> These could be considered optional optimizations.  Or if you prefer, 
> they could be covered by a "New requests override previous requests" 
> rule.

I'm not sure if I agree with this rule yet.

> > [Are there any more rules related to these callbacks we should take into
> > account?]
> 
> 	Runtime PM callbacks are mutually exclusive with other driver
> 	core callbacks (probe, remove, dev_pm_ops, etc.).

OK
 
> 	If a callback occurs asynchronously then it will be invoked
> 	in process context.  If it occurs as part of a synchronous
> 	request then it is invoked in the caller's context.
> 
> Related to this is the requirement that pm_runtime_idle,
> pm_runtime_suspend, and pm_runtime_resume must always be called in
> process context whereas pm_runtime_idle_atomic,
> pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
> in any context.

OK

> > Next, if we agree about the rules above, the question is what helper functions
> > should be provided by the core allowing these rules to be followed
> > automatically and what error codes should be returned by them in case it
> > wasn't possible to proceed without breaking the rules.
> > 
> > IMO, it is reasonable to provide:
> > 
> >   * pm_schedule_suspend(dev, delay) - schedule the execution of
> >     ->runtime_suspend(dev) after delay.
> > 
> >   * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
> > 
> >   * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
> > 
> >   * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
> >     into the run-time PM workqueue.
> > 
> >   * pm_runtime_get(dev) - increment the device's usage counter.
> > 
> >   * pm_runtime_put(dev) - decrement the device's usage counter.
> > 
> >   * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
> >     counter is zero and all of the device's children are suspended (or the
> >     "ignore children" flag is set).
> > 
> >   * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
> >     into the run-time PM workqueue.  The usage counter and children will be
> >     checked immediately before executing ->runtime_idle(dev).
> 
> Should the counters also be checked when the request is submitted?  
> And should the same go for pm_schedule_suspend?  These are nontrivial
> questions; good arguments can be made both ways.

That's the difficult part. :-)

First, I think a delayed suspend should be treated in a special way, because
it's not really a request to suspend.  Namely, as long as the timer hasn't
triggered yet, nothing happens and there's nothing against the rules above.
A request to suspend is queued up after the timer has triggered and the timer
function is where the rules come into play.  IOW, it consists of two
operations, setting up a timer and queuing up a request to suspend when the
timer triggers.  IMO the first of them can be done at any time, while the other
one may be affected by the rules.

It implies that we should really introduce a timer and a timer function that
will queue up suspend requests, instead of using struct delayed_work.

Second, I think it may be a good idea to use the usage counter to block further
requests while submitting a resume request.

Namely, suppose that pm_request_resume() increments usage_count and returns 0,
if the resume was not necessary and the caller can do the I/O by itself, or
error code, which means that it was necessary to queue up a resume request.
If 0 is returned, the caller is supposed to do the I/O and call
pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
supposed to take care of the I/O, in which case the request's work function
should call pm_runtime_put() when done.  [If it was impossible to queue up a
request, error code is returned, but the usage counter is decremented by
pm_request_resume(), so that the caller need not handle that special case,
hopefully rare.]

This implies that it may be a good idea to check usage_count when submitting
idle notification and suspend requests (where in case of suspend a request is
submitted by the timer function, when the timer has already triggered, so
there's no need to check the counter while setting up the timer).

The counter of unsuspended children may change after a request has been
submitted and before its work function has a chance to run, so I don't see much
point checking it when submitting requests.

So, if the above idea is adopted, idle notification and suspend requests
won't be queued up when a resume request is pending (there's the question what
the timer function attempting to queue up a suspend request is supposed to do
in such a case) and in the other cases we can use the following rules:

    Any pending request takes precedence over a new idle notification request.

    If a new request is not an idle notification request, it takes precedence
    over the pending one, so it cancels it with the help of cancel_work().

[In the latter case, if a suspend request is canceled, we may want to set up the
timer for another one.]  For that, we're going to need a single flag, say
RPM_PENDING, which is set whenever a request is queued up.

> > I'm not sure if it is really necessary to combine pm_runtime_idle() or
> > pm_request_idle() with pm_runtime_put().  At least right now I don't see any
> > real value of that.
> 
> Likewise combining pm_runtime_get with pm_runtime_resume.  The only
> value is to make things easier for drivers, because these will be very
> common idioms.
> 
> > I also am not sure what error codes should be returned by the above helper
> > functions and in what conditions.
> 
> The error codes you have been using seem okay to me, in general.
> 
> However, some of those requests would violate the rules in a trivial 
> way.  For these we might return a positive value rather than a negative 
> error code.  For example, calling pm_runtime_resume while the device is 
> already active shouldn't be considered an error.  But it can't be 
> considered a complete success either, because it won't invoke the 
> runtime_resume method.

That need not matter from the caller's point of view, though.  In the case of
pm_runtime_resume() the caller will probably be mostly interested whether or
not it can do I/O after the function has returned.

> To be determined: How runtime PM will interact with system sleep.

Yes.  My first idea was to disable run-time PM before entering a system sleep
state, but that would involve canceling all of the pending requests.

> About all I can add is the "New requests override previous requests"  
> policy.  This would apply to all the non-synchronous requests, whether
> they are delayed or added directly to the workqueue.  If a new request
> (synchronous or not) is received before the old one has started to run,
> the old one will be cancelled.  This holds even if the new request is
> redundant, like a resume request received while the device is active.
> 
> There is one exception to this rule: An idle_notify request does not 
> cancel a delayed or queued suspend request.

I'm not sure if such a rigid rule will be really useful.

Also, as I said above, I think we shouldn't regard setting up the suspend
timer as queuing up a request, but as a totally separate operation.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-01 15:35                                                           ` Alan Stern
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
@ 2009-07-01 22:19                                                             ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-01 22:19 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Wednesday 01 July 2009, Alan Stern wrote:
> On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
> 
> > On Tuesday 30 June 2009, Alan Stern wrote:
> > ... 
> > > That's enough to give you the general idea.  I think this design is 
> > > a lot cleaner than the current one.
> > 
> > Well, I'm not really happy with starting over, but if you think we should do
> > that, then let's do it.
> 
> It's not a complete restart.  Much of the existing interface and quite
> a bit of code would remain the same.
> 
> > I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> > and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
> > provide a mechanism by which it is possible:
> > (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
> >     interrupt context,
> > (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
> >     from interrupt context,
> > (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
> >     synchronous way (I'm not sure about ->runtime_idle())
> 
> Yes, runtime_idle also, for drivers that require minimal overhead.
> 
> > _and_ to ensure that these callbacks will be executed when it makes sense.
> 
> Thus if the situation changes before the callback can be made, so that
> it no longer makes sense, the framework should cancel the callback.

Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
resume request to be queued up and pm_runtime_resume() is called synchronously
exactly at the time the request's work function is started.  There are two
attempts to resume in progress, but only one of them can call
->runtime_resume(), so what's the other one supposed to do?  The asynchronous
one can just return error code, but the the caller of the synchronous
pm_runtime_resume() must know whether or not the resume was successful.
So, perhaps, if the synchronous resume happens to lose the race, it should
wait for the other one to complete, check the device's status and return 0 if
it's active?  That wouldn't cause the workqueue thread to wait.

> > There's no other point, because the core has no information to make choices,
> > it can only prevent wrong things from happening, if possible.
> 
> Exactly.
> 
> > I think you will agree that the users of the framework should be able to
> > prevent ->runtime_suspend() from being called and that's what the usage counter
> > is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
> > framework, when the usage counter is nonzero.
> 
> Right, because then by definition the device is in use so it can't be 
> idle.
> 
> > BTW, I don't think resume_count is the best name; it used to be in the version
> > of my patch where it was automatically incremented when ->runtime_resume() was
> > about to called.  usage_count is probably better.
> 
> Fine.
> 
> > Next, I think that the framework should refuse to call ->runtime_suspend() and
> > ->runtime_idle() if the children of the device are not suspended and the
> > "ignore children" flag is unset.
> 
> Yes; this is part of the "makes sense" requirement.
> 
> >  The counter of unsuspended children is used
> > for that.  I think the rule should be that it is decremented for the parent
> > whenever ->runtime_suspend() is called for a child and it is incremented
> > for the parent whenever ->runtime_resume() is called for a child.
> 
> Of course.  (Minor change: decremented when runtime_suspend _succeeds_ 
> for a child.)
> 
> > Now, the question is what rules should apply to the ordering and possible
> > simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> > ->runtime_resume().  I think the following rules make sense:
> 
> Oh dear.  I wouldn't attempt to make a complete list of all possible 
> interactions.  It's too hard to know whether you have really covered 
> all the cases.
> 
> >   * It is forbidden to run ->runtime_suspend() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_suspend() in parallel with another instance
> >     of ->runtime_suspend().
> > 
> >   * It is forbidden to run ->runtime_resume() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_resume() in parallel with another instance
> >     of ->runtime_resume().
> > 
> >   * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
> >     ->runtime_idle(), but the latter case is preferred. 
> > 
> >   * It is allowed to run ->runtime_resume() after ->runtime_suspend().
> > 
> >   * It is forbidden to run ->runtime_resume() after ->runtime_idle().
> > 
> >   * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
> >     parallel with each other.
> > 
> >   * It is forbidden to run ->runtime_idle() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_idle() in parallel with another instance
> >     of ->runtime_idle().
> > 
> >   * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
> > 
> >   * It is allowed to run ->runtime_idle() after ->runtime_resume().
> > 
> >   * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
> >     ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
> >     call ->runtime_suspend() from within ->runtime_idle().
> > 
> >   * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
> >     ->runtime_suspend() is running.
> 
> We can summarize these rules as follows:
> 
> 	Never allow more than one callback at a time, except that
> 	runtime_suspend may be invoked while runtime_idle is running.

Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
after that resume is requested by remote wake-up, ->runtime_resume() may also
be run while ->runtime_idle() is still running.

OTOH, we need to know when ->runtime_idle() has completed, because we have to
ensure it won't still be running after run-time PM has been disabled for the
device.

IMO, we need two flags, one indicating that either ->runtime_suspend(), or
->runtime_resume() is being executed (they are mutually exclusive) and the
the other one indicating that ->runtime_idle() is being executed.  For the
purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
RPM_IN_TRANSITION.

With this notation, the above rule may be translated as:

    Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
    ->runtime_idle() if RPM_IDLE_RUNNING is set.

Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

> 	Don't call runtime_resume while the device is active.
> 
> 	Don't call runtime_suspend or runtime_idle while the device
> 	is suspended.
> 
> 	Don't invoke any callbacks if the device state is unknown
> 	(RPM_ERROR).
> 
> Implicit is the notion that the device is suspended when 
> runtime_suspend returns successfully, it is active when runtime_resume 
> returns successfully, and it is unknown when either returns an error.

Yes.

There are two possible "final" states, so I'd use one flag to indicate the
current status.  Let's call it RPM_SUSPENDED for now (which means that the
device is suspended when it's set and active otherwise) and I think we can make
the rule that this flag is only changed after successful execution of
->runtime_suspend() or ->runtime_resume().

Whether the device is suspending or resuming follows from the values of
RPM_SUSPENDED and RPM_IN_TRANSITION.

> >   * If ->runtime_resume() is about to be called immediately after
> >     ->runtime_suspend(), the execution of ->runtime_suspend() should be
> >     prevented from happening, if possible, in which case the execution of
> >     ->runtime_resume() shouldn't happen.
> > 
> >   * If ->runtime_suspend() is about to be called immediately after
> >     ->runtime_resume(), the execution of ->runtime_resume() should be
> >     prevented from happening, if possible, in which case the execution of
> >     ->runtime_suspend() shouldn't happen.
> 
> These could be considered optional optimizations.  Or if you prefer, 
> they could be covered by a "New requests override previous requests" 
> rule.

I'm not sure if I agree with this rule yet.

> > [Are there any more rules related to these callbacks we should take into
> > account?]
> 
> 	Runtime PM callbacks are mutually exclusive with other driver
> 	core callbacks (probe, remove, dev_pm_ops, etc.).

OK
 
> 	If a callback occurs asynchronously then it will be invoked
> 	in process context.  If it occurs as part of a synchronous
> 	request then it is invoked in the caller's context.
> 
> Related to this is the requirement that pm_runtime_idle,
> pm_runtime_suspend, and pm_runtime_resume must always be called in
> process context whereas pm_runtime_idle_atomic,
> pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
> in any context.

OK

> > Next, if we agree about the rules above, the question is what helper functions
> > should be provided by the core allowing these rules to be followed
> > automatically and what error codes should be returned by them in case it
> > wasn't possible to proceed without breaking the rules.
> > 
> > IMO, it is reasonable to provide:
> > 
> >   * pm_schedule_suspend(dev, delay) - schedule the execution of
> >     ->runtime_suspend(dev) after delay.
> > 
> >   * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
> > 
> >   * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
> > 
> >   * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
> >     into the run-time PM workqueue.
> > 
> >   * pm_runtime_get(dev) - increment the device's usage counter.
> > 
> >   * pm_runtime_put(dev) - decrement the device's usage counter.
> > 
> >   * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
> >     counter is zero and all of the device's children are suspended (or the
> >     "ignore children" flag is set).
> > 
> >   * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
> >     into the run-time PM workqueue.  The usage counter and children will be
> >     checked immediately before executing ->runtime_idle(dev).
> 
> Should the counters also be checked when the request is submitted?  
> And should the same go for pm_schedule_suspend?  These are nontrivial
> questions; good arguments can be made both ways.

That's the difficult part. :-)

First, I think a delayed suspend should be treated in a special way, because
it's not really a request to suspend.  Namely, as long as the timer hasn't
triggered yet, nothing happens and there's nothing against the rules above.
A request to suspend is queued up after the timer has triggered and the timer
function is where the rules come into play.  IOW, it consists of two
operations, setting up a timer and queuing up a request to suspend when the
timer triggers.  IMO the first of them can be done at any time, while the other
one may be affected by the rules.

It implies that we should really introduce a timer and a timer function that
will queue up suspend requests, instead of using struct delayed_work.

Second, I think it may be a good idea to use the usage counter to block further
requests while submitting a resume request.

Namely, suppose that pm_request_resume() increments usage_count and returns 0,
if the resume was not necessary and the caller can do the I/O by itself, or
error code, which means that it was necessary to queue up a resume request.
If 0 is returned, the caller is supposed to do the I/O and call
pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
supposed to take care of the I/O, in which case the request's work function
should call pm_runtime_put() when done.  [If it was impossible to queue up a
request, error code is returned, but the usage counter is decremented by
pm_request_resume(), so that the caller need not handle that special case,
hopefully rare.]

This implies that it may be a good idea to check usage_count when submitting
idle notification and suspend requests (where in case of suspend a request is
submitted by the timer function, when the timer has already triggered, so
there's no need to check the counter while setting up the timer).

The counter of unsuspended children may change after a request has been
submitted and before its work function has a chance to run, so I don't see much
point checking it when submitting requests.

So, if the above idea is adopted, idle notification and suspend requests
won't be queued up when a resume request is pending (there's the question what
the timer function attempting to queue up a suspend request is supposed to do
in such a case) and in the other cases we can use the following rules:

    Any pending request takes precedence over a new idle notification request.

    If a new request is not an idle notification request, it takes precedence
    over the pending one, so it cancels it with the help of cancel_work().

[In the latter case, if a suspend request is canceled, we may want to set up the
timer for another one.]  For that, we're going to need a single flag, say
RPM_PENDING, which is set whenever a request is queued up.

> > I'm not sure if it is really necessary to combine pm_runtime_idle() or
> > pm_request_idle() with pm_runtime_put().  At least right now I don't see any
> > real value of that.
> 
> Likewise combining pm_runtime_get with pm_runtime_resume.  The only
> value is to make things easier for drivers, because these will be very
> common idioms.
> 
> > I also am not sure what error codes should be returned by the above helper
> > functions and in what conditions.
> 
> The error codes you have been using seem okay to me, in general.
> 
> However, some of those requests would violate the rules in a trivial 
> way.  For these we might return a positive value rather than a negative 
> error code.  For example, calling pm_runtime_resume while the device is 
> already active shouldn't be considered an error.  But it can't be 
> considered a complete success either, because it won't invoke the 
> runtime_resume method.

That need not matter from the caller's point of view, though.  In the case of
pm_runtime_resume() the caller will probably be mostly interested whether or
not it can do I/O after the function has returned.

> To be determined: How runtime PM will interact with system sleep.

Yes.  My first idea was to disable run-time PM before entering a system sleep
state, but that would involve canceling all of the pending requests.

> About all I can add is the "New requests override previous requests"  
> policy.  This would apply to all the non-synchronous requests, whether
> they are delayed or added directly to the workqueue.  If a new request
> (synchronous or not) is received before the old one has started to run,
> the old one will be cancelled.  This holds even if the new request is
> redundant, like a resume request received while the device is active.
> 
> There is one exception to this rule: An idle_notify request does not 
> cancel a delayed or queued suspend request.

I'm not sure if such a rigid rule will be really useful.

Also, as I said above, I think we shouldn't regard setting up the suspend
timer as queuing up a request, but as a totally separate operation.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
  2009-07-02 15:42                                                               ` Rafael J. Wysocki
@ 2009-07-02 15:42                                                               ` Rafael J. Wysocki
  2009-07-02 15:55                                                               ` Alan Stern
  2009-07-02 15:55                                                               ` Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-02 15:42 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 02 July 2009, Rafael J. Wysocki wrote:
> On Wednesday 01 July 2009, Alan Stern wrote:
> > On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
...
> > Should the counters also be checked when the request is submitted?  
> > And should the same go for pm_schedule_suspend?  These are nontrivial
> > questions; good arguments can be made both ways.
> 
> That's the difficult part. :-)
> 
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend.  Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play.  IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers.  IMO the first of them can be done at any time, while the other
> one may be affected by the rules.
> 
> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.
> 
> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
> 
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done.  [If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]
> 
> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
> 
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.
> 
> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
>
>     Any pending request takes precedence over a new idle notification request.
> 
>     If a new request is not an idle notification request, it takes precedence
>     over the pending one, so it cancels it with the help of cancel_work().
> 
> [In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.]  For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

After some reconsideration I'd like to change that in the following way:

     Any pending request takes precedence over a new idle notification request.

     A pending request takes precedence over a new request of the same type.

    If the new request is not an idle notification request and is not of the
    same type as the pending one, it takes precedence over the pending one, so
    it cancels the pending request with the help of cancel_work().

So, instead of a single flag, I'd like to use a 2-bit field to store
information about pending requests, where the 4 values are RPM_REQ_NONE,
RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME.

Also, IMO it makes sense to queue up an idle notification or suspend request
regardless of the current status of the device, as long as the usage counter is
greater than 0, because the status can always change after the request has been
submitted and before its work function is executed.

So, I think we can use something like this:

struct dev_pm_info {
	pm_message_t		power_state;
	unsigned			can_wakeup:1;
	unsigned			should_wakeup:1;
	enum dpm_state		status;		/* Owned by the PM core */
#ifdef CONFIG_PM_SLEEP
	struct list_head	entry;
#endif
#ifdef CONFIG_PM_RUNTIME
	struct timer_list	suspend_timer;
	wait_queue_head_t	wait_queue;
	struct work_struct	work;
	spinlock_t		lock;
	atomic_t		usage_count;
	atomic_t		child_count;
	unsigned int		ignore_children:1;
	unsigned int		enabled:1; /* 'true' if run-time PM is enabled */
	unsigned int		idle_notification:1; /* 'true' if ->runtime_idle() is running */
	unsigned int		in_transition:1; /* 'true' if ->runtime_[suspend|resume]() is running */
	unsigned int		suspended:1; /* 'true' if current status is 'suspended' */
	unsigned int		pending_request:2; /* RPM_REQ_NONE, RPM_REQ_IDLE, etc. */
	unsigned int		runtime_error:1; /* 'ture' if the last transition failed */
	int			error_code; /* Error code returned by the last executed callback */
#endif
};

with the following rules regarding the (most important) helper functions:

  pm_schedule_suspend(dev, delay) is always successful.  It adds a new timer
  with pm_request_suspend() as the timer function, dev as the data and
  jiffies + delay as the expiration time.  If the timer is pending when this
  function is called, the timer is deactivated using del_timer() and replaced
  by the new timer.

  pm_request_suspend() checks if 'usage_count' is zero and returns -EAGAIN
  if not.  Next, it checks if 'pending_request' is RPM_REQ_SUSPEND and returns
  -EALREADY in that case.  Next, if 'pending_request' is RPM_REQ_IDLE,
  the request is cancelled.  Finally, a new suspend request is submitted.

  pm_runtime_suspend() checks if 'usage_count' is zero and returns -EAGAIN
  if not.  Next, it checks 'in_transition' and 'suspended' and returns 0 if the
  former is unset and the latter is set.  If 'in_transition' is set and 'suspended'
  is not set (the device is currently suspending), the behavior depends on
  whether or not the function was called synchronously, in which case it waits
  for the other suspend to finish.  If it was called via the workqueue,
  -EINPROGRESS is returned.  Next, 'in_transition' is set, ->runtime_suspend()
  is executed amd 'in_transition' is unset.  If ->runtime_suspend() returned 0,
  'suspended' is set and 0 is returned.  Otherwise, if the error code was
  -EAGAIN or -EBUSY, 'suspended' is not set and the error code is returned.
  Otherwise, 'runtime_error' is set and the error code is returned ('suspended'
  is not set).

  pm_request_resume() increments 'usage_count' and checks 'suspended' and
  'in_transition'.  If both 'suspended' and 'in_transition" are not set, 0 is
  returned and the caller is supposed to decrement 'usage_count', with the
  help of pm_runtime_put().  Otherwise, the function checks if
  'pending_request' is different from zero, in which case the pending request
  is canceled.  Finally, a new resume request is submitted and -EBUSY is
  returned.  In that case, 'usage_count' will be decremented by the request's
  work function (not by pm_runtime_resume(), but by the wrapper function that
  calls it).

  pm_runtime_resume() increments 'usage_count' and checks 'in_transition' and
  'suspended'.  If both are unset, 0 is returned.  If both are set (the device
  is resuming) the behavior depends on whether or not the function was called
  synchronously, in which case it waits for the concurrent resume to complete,
  while it immediately returns -EINPROGRESS in the other case.  If 'suspended'
  is not set, but 'in_transision' is set (the device is suspending), the
  function waits for the suspend to complete and starts over.  Next,
  'in_transition' is set, ->runtime_resume() is executed and 'in_transition'
  is unset.  If ->runtime_resume() returned 0, 'suspended' is unset and 0 is
  returned.  Otherwise, 'runtime_error' is set and the error code from
  ->runtime_resume() is returned ('suspended' is not unset).  'usage_count' is
  always decremented before return, regardless of the return value.

  pm_request_idle() checks 'usage_count' and returns -EAGAIN if it's greater
  than 0.  Next, it checks 'pending_request' and immediately returns -EBUSY, if
  it's different from RPM_REQ_NONE and RPM_REQ_IDLE, or -EALREADY, if it's
  equal to RPM_REQ_IDLE.  Finally, new idle notification request is submitted.

  pm_runtime_idle() checks 'usage_count' and returns -EAGAIN if it's greater
  than 0.  Next, it checks 'suspended' and 'in_transition' and returns -EBUSY
  if any of them is set.  Next, it checks 'idle_notification' and returns
  -EINPROGRESS is it's set.  Finally, 'idle_notification' is set,
  ->runtime_idle()  is executed and 'idle_notification' is unset.

Additionally, all of the helper functions return -EINVAL immediately if
'runtime_error' is set.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
@ 2009-07-02 15:42                                                               ` Rafael J. Wysocki
  2009-07-02 15:42                                                               ` Rafael J. Wysocki
                                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-02 15:42 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 02 July 2009, Rafael J. Wysocki wrote:
> On Wednesday 01 July 2009, Alan Stern wrote:
> > On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
...
> > Should the counters also be checked when the request is submitted?  
> > And should the same go for pm_schedule_suspend?  These are nontrivial
> > questions; good arguments can be made both ways.
> 
> That's the difficult part. :-)
> 
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend.  Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play.  IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers.  IMO the first of them can be done at any time, while the other
> one may be affected by the rules.
> 
> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.
> 
> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
> 
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done.  [If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]
> 
> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
> 
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.
> 
> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
>
>     Any pending request takes precedence over a new idle notification request.
> 
>     If a new request is not an idle notification request, it takes precedence
>     over the pending one, so it cancels it with the help of cancel_work().
> 
> [In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.]  For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

After some reconsideration I'd like to change that in the following way:

     Any pending request takes precedence over a new idle notification request.

     A pending request takes precedence over a new request of the same type.

    If the new request is not an idle notification request and is not of the
    same type as the pending one, it takes precedence over the pending one, so
    it cancels the pending request with the help of cancel_work().

So, instead of a single flag, I'd like to use a 2-bit field to store
information about pending requests, where the 4 values are RPM_REQ_NONE,
RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME.

Also, IMO it makes sense to queue up an idle notification or suspend request
regardless of the current status of the device, as long as the usage counter is
greater than 0, because the status can always change after the request has been
submitted and before its work function is executed.

So, I think we can use something like this:

struct dev_pm_info {
	pm_message_t		power_state;
	unsigned			can_wakeup:1;
	unsigned			should_wakeup:1;
	enum dpm_state		status;		/* Owned by the PM core */
#ifdef CONFIG_PM_SLEEP
	struct list_head	entry;
#endif
#ifdef CONFIG_PM_RUNTIME
	struct timer_list	suspend_timer;
	wait_queue_head_t	wait_queue;
	struct work_struct	work;
	spinlock_t		lock;
	atomic_t		usage_count;
	atomic_t		child_count;
	unsigned int		ignore_children:1;
	unsigned int		enabled:1; /* 'true' if run-time PM is enabled */
	unsigned int		idle_notification:1; /* 'true' if ->runtime_idle() is running */
	unsigned int		in_transition:1; /* 'true' if ->runtime_[suspend|resume]() is running */
	unsigned int		suspended:1; /* 'true' if current status is 'suspended' */
	unsigned int		pending_request:2; /* RPM_REQ_NONE, RPM_REQ_IDLE, etc. */
	unsigned int		runtime_error:1; /* 'ture' if the last transition failed */
	int			error_code; /* Error code returned by the last executed callback */
#endif
};

with the following rules regarding the (most important) helper functions:

  pm_schedule_suspend(dev, delay) is always successful.  It adds a new timer
  with pm_request_suspend() as the timer function, dev as the data and
  jiffies + delay as the expiration time.  If the timer is pending when this
  function is called, the timer is deactivated using del_timer() and replaced
  by the new timer.

  pm_request_suspend() checks if 'usage_count' is zero and returns -EAGAIN
  if not.  Next, it checks if 'pending_request' is RPM_REQ_SUSPEND and returns
  -EALREADY in that case.  Next, if 'pending_request' is RPM_REQ_IDLE,
  the request is cancelled.  Finally, a new suspend request is submitted.

  pm_runtime_suspend() checks if 'usage_count' is zero and returns -EAGAIN
  if not.  Next, it checks 'in_transition' and 'suspended' and returns 0 if the
  former is unset and the latter is set.  If 'in_transition' is set and 'suspended'
  is not set (the device is currently suspending), the behavior depends on
  whether or not the function was called synchronously, in which case it waits
  for the other suspend to finish.  If it was called via the workqueue,
  -EINPROGRESS is returned.  Next, 'in_transition' is set, ->runtime_suspend()
  is executed amd 'in_transition' is unset.  If ->runtime_suspend() returned 0,
  'suspended' is set and 0 is returned.  Otherwise, if the error code was
  -EAGAIN or -EBUSY, 'suspended' is not set and the error code is returned.
  Otherwise, 'runtime_error' is set and the error code is returned ('suspended'
  is not set).

  pm_request_resume() increments 'usage_count' and checks 'suspended' and
  'in_transition'.  If both 'suspended' and 'in_transition" are not set, 0 is
  returned and the caller is supposed to decrement 'usage_count', with the
  help of pm_runtime_put().  Otherwise, the function checks if
  'pending_request' is different from zero, in which case the pending request
  is canceled.  Finally, a new resume request is submitted and -EBUSY is
  returned.  In that case, 'usage_count' will be decremented by the request's
  work function (not by pm_runtime_resume(), but by the wrapper function that
  calls it).

  pm_runtime_resume() increments 'usage_count' and checks 'in_transition' and
  'suspended'.  If both are unset, 0 is returned.  If both are set (the device
  is resuming) the behavior depends on whether or not the function was called
  synchronously, in which case it waits for the concurrent resume to complete,
  while it immediately returns -EINPROGRESS in the other case.  If 'suspended'
  is not set, but 'in_transision' is set (the device is suspending), the
  function waits for the suspend to complete and starts over.  Next,
  'in_transition' is set, ->runtime_resume() is executed and 'in_transition'
  is unset.  If ->runtime_resume() returned 0, 'suspended' is unset and 0 is
  returned.  Otherwise, 'runtime_error' is set and the error code from
  ->runtime_resume() is returned ('suspended' is not unset).  'usage_count' is
  always decremented before return, regardless of the return value.

  pm_request_idle() checks 'usage_count' and returns -EAGAIN if it's greater
  than 0.  Next, it checks 'pending_request' and immediately returns -EBUSY, if
  it's different from RPM_REQ_NONE and RPM_REQ_IDLE, or -EALREADY, if it's
  equal to RPM_REQ_IDLE.  Finally, new idle notification request is submitted.

  pm_runtime_idle() checks 'usage_count' and returns -EAGAIN if it's greater
  than 0.  Next, it checks 'suspended' and 'in_transition' and returns -EBUSY
  if any of them is set.  Next, it checks 'idle_notification' and returns
  -EINPROGRESS is it's set.  Finally, 'idle_notification' is set,
  ->runtime_idle()  is executed and 'idle_notification' is unset.

Additionally, all of the helper functions return -EINVAL immediately if
'runtime_error' is set.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
                                                                                 ` (2 preceding siblings ...)
  2009-07-02 15:55                                                               ` Alan Stern
@ 2009-07-02 15:55                                                               ` Alan Stern
  2009-07-02 17:50                                                                 ` Rafael J. Wysocki
  2009-07-02 17:50                                                                 ` Rafael J. Wysocki
  3 siblings, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-02 15:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > > _and_ to ensure that these callbacks will be executed when it makes sense.
> > 
> > Thus if the situation changes before the callback can be made, so that
> > it no longer makes sense, the framework should cancel the callback.
> 
> Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
> resume request to be queued up and pm_runtime_resume() is called synchronously
> exactly at the time the request's work function is started.  There are two
> attempts to resume in progress, but only one of them can call
> ->runtime_resume(), so what's the other one supposed to do?  The asynchronous
> one can just return error code, but the the caller of the synchronous
> pm_runtime_resume() must know whether or not the resume was successful.
> So, perhaps, if the synchronous resume happens to lose the race, it should
> wait for the other one to complete, check the device's status and return 0 if
> it's active?  That wouldn't cause the workqueue thread to wait.

I didn't address this explicitly in the previous message, but yes.  
This is no different from the way your current version works.

Similarly, if a synchronous resume call occurs while a suspend is in 
progress, it should wait until the suspend finishes and then carry out 
a resume.

> > We can summarize these rules as follows:
> > 
> > 	Never allow more than one callback at a time, except that
> > 	runtime_suspend may be invoked while runtime_idle is running.
> 
> Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
> after that resume is requested by remote wake-up, ->runtime_resume() may also
> be run while ->runtime_idle() is still running.

Yes, I didn't think of that case.  We have to allow either of the other 
two to be invoked while runtime_idle is running.  But we can rule out 
calling runtime_idle recursively.

> OTOH, we need to know when ->runtime_idle() has completed, because we have to
> ensure it won't still be running after run-time PM has been disabled for the
> device.
> 
> IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> ->runtime_resume() is being executed (they are mutually exclusive) and the
> the other one indicating that ->runtime_idle() is being executed.  For the
> purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> RPM_IN_TRANSITION.

The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
(status == RPM_SUSPENDING || status == RPM_RESUMING).

> With this notation, the above rule may be translated as:
> 
>     Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
>     ->runtime_idle() if RPM_IDLE_RUNNING is set.
> 
> Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

That is equivalent to my conclusion above.

> There are two possible "final" states, so I'd use one flag to indicate the
> current status.  Let's call it RPM_SUSPENDED for now (which means that the
> device is suspended when it's set and active otherwise) and I think we can make
> the rule that this flag is only changed after successful execution of
> ->runtime_suspend() or ->runtime_resume().
> 
> Whether the device is suspending or resuming follows from the values of
> RPM_SUSPENDED and RPM_IN_TRANSITION.

You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a 
single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).  
It doesn't make much difference which you choose.


> > Should the counters also be checked when the request is submitted?  
> > And should the same go for pm_schedule_suspend?  These are nontrivial
> > questions; good arguments can be made both ways.
> 
> That's the difficult part. :-)
> 
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend.  Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play.  IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers.  IMO the first of them can be done at any time, while the other
> one may be affected by the rules.

I don't agree.  For example, suppose the device has an active child
when the driver says: Suspend it in 30 seconds.  If the child is then
removed after only 10 seconds, does it make sense to go ahead with
suspending the parent 20 seconds later?  No -- if the parent is going
to be suspended, the decision as to when should be made at the time the
child is removed, not beforehand.

(Even more concretely, suppose there is a 30-second inactivity timeout
for autosuspend.  Removing the child counts as activity and so should
restart the timer.)

To put it another way, suppose you accept a delayed request under
inappropriate conditions.  If the conditions don't change, the whole
thing was a waste of effort.  And if the conditions do change, then the
whole delayed request should be reconsidered anyhow.  So why accept it?

> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.

Yes, this was part of my proposal.

> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
> 
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done.  [If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]

Trying to keep track of reasons for incrementing and decrementing 
usage_count is very difficult to do in the core.  What happens if 
pm_request_resume increments the count but then the driver calls 
pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
routine can run?

It's better to make the driver responsible for maintaining the counter
value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
better than having the core automatically change the counter.

> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
> 
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.

As I said above, if the counters don't change then the submission was 
unnecessary, and if they do change then the submission should be 
reconsidered.  Therefore they _should_ be checked in submissions.

> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
> 
>     Any pending request takes precedence over a new idle notification request.

For pending resume requests this rule is unnecessary; it's invalid to
submit an idle notification request while a resume request is pending
(since resume requests can be pending only in the RPM_SUSPENDING and
RPM_SUSPENDED states while idle notification requests are accepted only
in RPM_RESUMING and RPM_ACTIVE).

For pending suspends, I think we should allow synchronous idle
notifications while the suspend is pending.  The runtime_idle callback
might then start its own suspend before the workqueue can get around to
it.  You're right about async idle requests though; that was the 
exception I noted below.

>     If a new request is not an idle notification request, it takes precedence
>     over the pending one, so it cancels it with the help of cancel_work().
> 
> [In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.]  For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

That's what I called work_pending in my proposal.

> > The error codes you have been using seem okay to me, in general.
> > 
> > However, some of those requests would violate the rules in a trivial 
> > way.  For these we might return a positive value rather than a negative 
> > error code.  For example, calling pm_runtime_resume while the device is 
> > already active shouldn't be considered an error.  But it can't be 
> > considered a complete success either, because it won't invoke the 
> > runtime_resume method.
> 
> That need not matter from the caller's point of view, though.  In the case of
> pm_runtime_resume() the caller will probably be mostly interested whether or
> not it can do I/O after the function has returned.

Yes.  But the driver might depend on something happening inside the
runtime_resume method, so it would need to know if a successful
pm_runtime_resume wasn't going to invoke the callback.

> > To be determined: How runtime PM will interact with system sleep.
> 
> Yes.  My first idea was to disable run-time PM before entering a system sleep
> state, but that would involve canceling all of the pending requests.

Or simply freezing the workqueue.

> > About all I can add is the "New requests override previous requests"  
> > policy.  This would apply to all the non-synchronous requests, whether
> > they are delayed or added directly to the workqueue.  If a new request
> > (synchronous or not) is received before the old one has started to run,
> > the old one will be cancelled.  This holds even if the new request is
> > redundant, like a resume request received while the device is active.
> > 
> > There is one exception to this rule: An idle_notify request does not 
> > cancel a delayed or queued suspend request.
> 
> I'm not sure if such a rigid rule will be really useful.

A rigid rule is easier to understand and apply than one with a large
number of special cases.  However, in the statement of the rule above,
I forgot to mention that this applies only if the new request is valid,
i.e., if it's not forbidden by the current status or the counter
values.

> Also, as I said above, I think we shouldn't regard setting up the suspend
> timer as queuing up a request, but as a totally separate operation.

Well, there can't be any pending resume requests when the suspend timer
is set up, so we have to consider only pending idle notifications or
pending suspends.  I agree, we would want to allow an idle notification
to remain pending when the suspend timer is set up.  As for pending
suspends, we _should_ allow the new request to override the old one.  
This will come up whenever the timeout value is changed.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-01 22:19                                                             ` Rafael J. Wysocki
  2009-07-02 15:42                                                               ` Rafael J. Wysocki
  2009-07-02 15:42                                                               ` Rafael J. Wysocki
@ 2009-07-02 15:55                                                               ` Alan Stern
  2009-07-02 15:55                                                               ` Alan Stern
  3 siblings, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-02 15:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > > _and_ to ensure that these callbacks will be executed when it makes sense.
> > 
> > Thus if the situation changes before the callback can be made, so that
> > it no longer makes sense, the framework should cancel the callback.
> 
> Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
> resume request to be queued up and pm_runtime_resume() is called synchronously
> exactly at the time the request's work function is started.  There are two
> attempts to resume in progress, but only one of them can call
> ->runtime_resume(), so what's the other one supposed to do?  The asynchronous
> one can just return error code, but the the caller of the synchronous
> pm_runtime_resume() must know whether or not the resume was successful.
> So, perhaps, if the synchronous resume happens to lose the race, it should
> wait for the other one to complete, check the device's status and return 0 if
> it's active?  That wouldn't cause the workqueue thread to wait.

I didn't address this explicitly in the previous message, but yes.  
This is no different from the way your current version works.

Similarly, if a synchronous resume call occurs while a suspend is in 
progress, it should wait until the suspend finishes and then carry out 
a resume.

> > We can summarize these rules as follows:
> > 
> > 	Never allow more than one callback at a time, except that
> > 	runtime_suspend may be invoked while runtime_idle is running.
> 
> Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
> after that resume is requested by remote wake-up, ->runtime_resume() may also
> be run while ->runtime_idle() is still running.

Yes, I didn't think of that case.  We have to allow either of the other 
two to be invoked while runtime_idle is running.  But we can rule out 
calling runtime_idle recursively.

> OTOH, we need to know when ->runtime_idle() has completed, because we have to
> ensure it won't still be running after run-time PM has been disabled for the
> device.
> 
> IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> ->runtime_resume() is being executed (they are mutually exclusive) and the
> the other one indicating that ->runtime_idle() is being executed.  For the
> purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> RPM_IN_TRANSITION.

The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
(status == RPM_SUSPENDING || status == RPM_RESUMING).

> With this notation, the above rule may be translated as:
> 
>     Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
>     ->runtime_idle() if RPM_IDLE_RUNNING is set.
> 
> Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

That is equivalent to my conclusion above.

> There are two possible "final" states, so I'd use one flag to indicate the
> current status.  Let's call it RPM_SUSPENDED for now (which means that the
> device is suspended when it's set and active otherwise) and I think we can make
> the rule that this flag is only changed after successful execution of
> ->runtime_suspend() or ->runtime_resume().
> 
> Whether the device is suspending or resuming follows from the values of
> RPM_SUSPENDED and RPM_IN_TRANSITION.

You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a 
single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).  
It doesn't make much difference which you choose.


> > Should the counters also be checked when the request is submitted?  
> > And should the same go for pm_schedule_suspend?  These are nontrivial
> > questions; good arguments can be made both ways.
> 
> That's the difficult part. :-)
> 
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend.  Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play.  IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers.  IMO the first of them can be done at any time, while the other
> one may be affected by the rules.

I don't agree.  For example, suppose the device has an active child
when the driver says: Suspend it in 30 seconds.  If the child is then
removed after only 10 seconds, does it make sense to go ahead with
suspending the parent 20 seconds later?  No -- if the parent is going
to be suspended, the decision as to when should be made at the time the
child is removed, not beforehand.

(Even more concretely, suppose there is a 30-second inactivity timeout
for autosuspend.  Removing the child counts as activity and so should
restart the timer.)

To put it another way, suppose you accept a delayed request under
inappropriate conditions.  If the conditions don't change, the whole
thing was a waste of effort.  And if the conditions do change, then the
whole delayed request should be reconsidered anyhow.  So why accept it?

> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.

Yes, this was part of my proposal.

> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
> 
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done.  [If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]

Trying to keep track of reasons for incrementing and decrementing 
usage_count is very difficult to do in the core.  What happens if 
pm_request_resume increments the count but then the driver calls 
pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
routine can run?

It's better to make the driver responsible for maintaining the counter
value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
better than having the core automatically change the counter.

> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
> 
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.

As I said above, if the counters don't change then the submission was 
unnecessary, and if they do change then the submission should be 
reconsidered.  Therefore they _should_ be checked in submissions.

> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
> 
>     Any pending request takes precedence over a new idle notification request.

For pending resume requests this rule is unnecessary; it's invalid to
submit an idle notification request while a resume request is pending
(since resume requests can be pending only in the RPM_SUSPENDING and
RPM_SUSPENDED states while idle notification requests are accepted only
in RPM_RESUMING and RPM_ACTIVE).

For pending suspends, I think we should allow synchronous idle
notifications while the suspend is pending.  The runtime_idle callback
might then start its own suspend before the workqueue can get around to
it.  You're right about async idle requests though; that was the 
exception I noted below.

>     If a new request is not an idle notification request, it takes precedence
>     over the pending one, so it cancels it with the help of cancel_work().
> 
> [In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.]  For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

That's what I called work_pending in my proposal.

> > The error codes you have been using seem okay to me, in general.
> > 
> > However, some of those requests would violate the rules in a trivial 
> > way.  For these we might return a positive value rather than a negative 
> > error code.  For example, calling pm_runtime_resume while the device is 
> > already active shouldn't be considered an error.  But it can't be 
> > considered a complete success either, because it won't invoke the 
> > runtime_resume method.
> 
> That need not matter from the caller's point of view, though.  In the case of
> pm_runtime_resume() the caller will probably be mostly interested whether or
> not it can do I/O after the function has returned.

Yes.  But the driver might depend on something happening inside the
runtime_resume method, so it would need to know if a successful
pm_runtime_resume wasn't going to invoke the callback.

> > To be determined: How runtime PM will interact with system sleep.
> 
> Yes.  My first idea was to disable run-time PM before entering a system sleep
> state, but that would involve canceling all of the pending requests.

Or simply freezing the workqueue.

> > About all I can add is the "New requests override previous requests"  
> > policy.  This would apply to all the non-synchronous requests, whether
> > they are delayed or added directly to the workqueue.  If a new request
> > (synchronous or not) is received before the old one has started to run,
> > the old one will be cancelled.  This holds even if the new request is
> > redundant, like a resume request received while the device is active.
> > 
> > There is one exception to this rule: An idle_notify request does not 
> > cancel a delayed or queued suspend request.
> 
> I'm not sure if such a rigid rule will be really useful.

A rigid rule is easier to understand and apply than one with a large
number of special cases.  However, in the statement of the rule above,
I forgot to mention that this applies only if the new request is valid,
i.e., if it's not forbidden by the current status or the counter
values.

> Also, as I said above, I think we shouldn't regard setting up the suspend
> timer as queuing up a request, but as a totally separate operation.

Well, there can't be any pending resume requests when the suspend timer
is set up, so we have to consider only pending idle notifications or
pending suspends.  I agree, we would want to allow an idle notification
to remain pending when the suspend timer is set up.  As for pending
suspends, we _should_ allow the new request to override the old one.  
This will come up whenever the timeout value is changed.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 15:55                                                               ` Alan Stern
@ 2009-07-02 17:50                                                                 ` Rafael J. Wysocki
  2009-07-02 19:53                                                                   ` Alan Stern
  2009-07-02 19:53                                                                   ` Alan Stern
  2009-07-02 17:50                                                                 ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-02 17:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > > _and_ to ensure that these callbacks will be executed when it makes sense.
> > > 
> > > Thus if the situation changes before the callback can be made, so that
> > > it no longer makes sense, the framework should cancel the callback.
> > 
> > Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
> > resume request to be queued up and pm_runtime_resume() is called synchronously
> > exactly at the time the request's work function is started.  There are two
> > attempts to resume in progress, but only one of them can call
> > ->runtime_resume(), so what's the other one supposed to do?  The asynchronous
> > one can just return error code, but the the caller of the synchronous
> > pm_runtime_resume() must know whether or not the resume was successful.
> > So, perhaps, if the synchronous resume happens to lose the race, it should
> > wait for the other one to complete, check the device's status and return 0 if
> > it's active?  That wouldn't cause the workqueue thread to wait.
> 
> I didn't address this explicitly in the previous message, but yes.  
> This is no different from the way your current version works.
> 
> Similarly, if a synchronous resume call occurs while a suspend is in 
> progress, it should wait until the suspend finishes and then carry out 
> a resume.

Agreed.

> > > We can summarize these rules as follows:
> > > 
> > > 	Never allow more than one callback at a time, except that
> > > 	runtime_suspend may be invoked while runtime_idle is running.
> > 
> > Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
> > after that resume is requested by remote wake-up, ->runtime_resume() may also
> > be run while ->runtime_idle() is still running.
> 
> Yes, I didn't think of that case.  We have to allow either of the other 
> two to be invoked while runtime_idle is running.  But we can rule out 
> calling runtime_idle recursively.
> 
> > OTOH, we need to know when ->runtime_idle() has completed, because we have to
> > ensure it won't still be running after run-time PM has been disabled for the
> > device.
> > 
> > IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> > ->runtime_resume() is being executed (they are mutually exclusive) and the
> > the other one indicating that ->runtime_idle() is being executed.  For the
> > purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> > RPM_IN_TRANSITION.
> 
> The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> (status == RPM_SUSPENDING || status == RPM_RESUMING).

I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

> > With this notation, the above rule may be translated as:
> > 
> >     Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
> >     ->runtime_idle() if RPM_IDLE_RUNNING is set.
> > 
> > Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> > set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.
> 
> That is equivalent to my conclusion above.
> 
> > There are two possible "final" states, so I'd use one flag to indicate the
> > current status.  Let's call it RPM_SUSPENDED for now (which means that the
> > device is suspended when it's set and active otherwise) and I think we can make
> > the rule that this flag is only changed after successful execution of
> > ->runtime_suspend() or ->runtime_resume().
> > 
> > Whether the device is suspending or resuming follows from the values of
> > RPM_SUSPENDED and RPM_IN_TRANSITION.
> 
> You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a 
> single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).  
> It doesn't make much difference which you choose.

No, it doesn't.

Still, the additional flag for 'idle notification is in progress' is still
necessary for the following two reasons:

(1) Idle notifications cannot be run (synchronously) when one is already in
    progress, so we need a means to determine whether or not this is the case.

(2) If run-time PM is to be disabled, the function doing that must guarantee
    that ->runtime_idle() won't be running after it's returned, so it needs to
    know how to check that.

> > > Should the counters also be checked when the request is submitted?  
> > > And should the same go for pm_schedule_suspend?  These are nontrivial
> > > questions; good arguments can be made both ways.
> > 
> > That's the difficult part. :-)
> > 
> > First, I think a delayed suspend should be treated in a special way, because
> > it's not really a request to suspend.  Namely, as long as the timer hasn't
> > triggered yet, nothing happens and there's nothing against the rules above.
> > A request to suspend is queued up after the timer has triggered and the timer
> > function is where the rules come into play.  IOW, it consists of two
> > operations, setting up a timer and queuing up a request to suspend when the
> > timer triggers.  IMO the first of them can be done at any time, while the other
> > one may be affected by the rules.
> 
> I don't agree.  For example, suppose the device has an active child
> when the driver says: Suspend it in 30 seconds.  If the child is then
> removed after only 10 seconds, does it make sense to go ahead with
> suspending the parent 20 seconds later?  No -- if the parent is going
> to be suspended, the decision as to when should be made at the time the
> child is removed, not beforehand.

There are two functions, on that sets up the timer and the other that queues
up the request.  This is the second one that makes the decision if the request
is still worth queuing up.

> (Even more concretely, suppose there is a 30-second inactivity timeout
> for autosuspend.  Removing the child counts as activity and so should
> restart the timer.)
> 
> To put it another way, suppose you accept a delayed request under
> inappropriate conditions.  If the conditions don't change, the whole
> thing was a waste of effort.  And if the conditions do change, then the
> whole delayed request should be reconsidered anyhow.

The problem is, even if you always accept a delayed request under appropriate
conditions, you still have to reconsider it before putting it into the work
queue, because the conditions might have changed.  So, you'd like to do this:

(1) Check if the conditions are appropriate, set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

while I think it will be simpler to do this:

(1) Set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
between (1) and (2), so I don't really see a practical difference.

> So why accept it?

Beacuse that simplifies things?

For example, suppose ->runtime_resume() has been called as
a result of a remote wake-up (ie. after pm_request_resume()) and it has some
I/O to process, but it is known beforehand that the device will most likely be
inactive after the I/O is done.  So, it's tempting to call
pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
inappropriate (the device is not regarded as suspended).  However, calling
pm_schedule_suspend() with a long enough delay doesn't break any rules related
to the ->runtime_*() callbacks, so why should it be forbidden?

Next, suppose pm_schedule_suspend() is called, but it fails because the
conditions are inappropriate.  What's the caller supposed to do?  Wait for the
conditions to change and repeat?  But why should it bother if the conditions
may still change before ->runtime_suspend() is actually called?

IMO, it's the caller's problem whether or not what it does is useful or
efficient.  The core's problem is to ensure that it doesn't break things.

> > It implies that we should really introduce a timer and a timer function that
> > will queue up suspend requests, instead of using struct delayed_work.
> 
> Yes, this was part of my proposal.
> 
> > Second, I think it may be a good idea to use the usage counter to block further
> > requests while submitting a resume request.
> > 
> > Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> > if the resume was not necessary and the caller can do the I/O by itself, or
> > error code, which means that it was necessary to queue up a resume request.
> > If 0 is returned, the caller is supposed to do the I/O and call
> > pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> > supposed to take care of the I/O, in which case the request's work function
> > should call pm_runtime_put() when done.  [If it was impossible to queue up a
> > request, error code is returned, but the usage counter is decremented by
> > pm_request_resume(), so that the caller need not handle that special case,
> > hopefully rare.]
> 
> Trying to keep track of reasons for incrementing and decrementing 
> usage_count is very difficult to do in the core.  What happens if 
> pm_request_resume increments the count but then the driver calls 
> pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> routine can run?

Nothing wrong, as long as the increments and decrements are balanced (if they
aren't balanced, there is a bug in the driver anyway).  In fact, for this to
work we need the rule that a new request of the same type doesn't replace an
existing one.  Then, the submitted resume request cannot be canceled, so the
work function will run and drop the usage counter.

> It's better to make the driver responsible for maintaining the counter
> value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> better than having the core automatically change the counter.

So the caller will do:

pm_runtime_get(dev);
error = pm_request_resume(dev);
if (error)
    goto out;
<process I/O>
pm_runtime_put();

but how is it supposed to ensure that pm_runtime_put() will be called after
executing the 'goto out' thing?

Anyway, we don't need to use the usage counter for that (although it's cheap).
Instead, we can make pm_request_suspend() and pm_request_idle() check if a
resume request is pending and fail if that's the case.

> > This implies that it may be a good idea to check usage_count when submitting
> > idle notification and suspend requests (where in case of suspend a request is
> > submitted by the timer function, when the timer has already triggered, so
> > there's no need to check the counter while setting up the timer).
> > 
> > The counter of unsuspended children may change after a request has been
> > submitted and before its work function has a chance to run, so I don't see much
> > point checking it when submitting requests.
> 
> As I said above, if the counters don't change then the submission was 
> unnecessary, and if they do change then the submission should be 
> reconsidered.  Therefore they _should_ be checked in submissions.

Let's put it another way.  What's the practical benefit to the caller if we
always check the counters in submissions?

> > So, if the above idea is adopted, idle notification and suspend requests
> > won't be queued up when a resume request is pending (there's the question what
> > the timer function attempting to queue up a suspend request is supposed to do
> > in such a case) and in the other cases we can use the following rules:
> > 
> >     Any pending request takes precedence over a new idle notification request.
> 
> For pending resume requests this rule is unnecessary; it's invalid to
> submit an idle notification request while a resume request is pending
> (since resume requests can be pending only in the RPM_SUSPENDING and
> RPM_SUSPENDED states while idle notification requests are accepted only
> in RPM_RESUMING and RPM_ACTIVE).

It is correct nevertheless. :-)

> For pending suspends, I think we should allow synchronous idle
> notifications while the suspend is pending.

Sure, I was talking only about requests here, where by 'request' I understood
a work item put into the workqueue.

> The runtime_idle callback might then start its own suspend before the
> workqueue can get around to it.  You're right about async idle requests
> though; that was the exception I noted below.
> 
> >     If a new request is not an idle notification request, it takes precedence
> >     over the pending one, so it cancels it with the help of cancel_work().
> > 
> > [In the latter case, if a suspend request is canceled, we may want to set up the
> > timer for another one.]  For that, we're going to need a single flag, say
> > RPM_PENDING, which is set whenever a request is queued up.
> 
> That's what I called work_pending in my proposal.

Well, after some reconsideration I think it's not enough (as I wrote in my last
message), becuase it generally makes sense to make the following rule:

    A pending request always takes precedence over a new request of the same
    type.

So, for example, if pm_request_resume() is called and there's a resume request
pending already, the new pm_request_resume() should just let the pending
request alone and quit.

Thus, it seems reasonable to remember what type of a request is pending
(i don't think we can figure it out from the status fields in 100% of the
cases).

> > > The error codes you have been using seem okay to me, in general.
> > > 
> > > However, some of those requests would violate the rules in a trivial 
> > > way.  For these we might return a positive value rather than a negative 
> > > error code.  For example, calling pm_runtime_resume while the device is 
> > > already active shouldn't be considered an error.  But it can't be 
> > > considered a complete success either, because it won't invoke the 
> > > runtime_resume method.
> > 
> > That need not matter from the caller's point of view, though.  In the case of
> > pm_runtime_resume() the caller will probably be mostly interested whether or
> > not it can do I/O after the function has returned.
> 
> Yes.  But the driver might depend on something happening inside the
> runtime_resume method, so it would need to know if a successful
> pm_runtime_resume wasn't going to invoke the callback.

Hmm.  That would require the driver to know that the device was suspended,
but in that case pm_runtime_resume() returning 0 would mean that _someone_
ran ->runtime_resume() for it in any case.

If the driver doesn't know if the device was suspended beforehand, it cannot
depend on the execution of ->runtime_resume().

> > > To be determined: How runtime PM will interact with system sleep.
> > 
> > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > state, but that would involve canceling all of the pending requests.
> 
> Or simply freezing the workqueue.

Well, what about the synchronous calls?  How are we going to prevent them
from happening after freezing the workqueue?

> > > About all I can add is the "New requests override previous requests"  
> > > policy.  This would apply to all the non-synchronous requests, whether
> > > they are delayed or added directly to the workqueue.  If a new request
> > > (synchronous or not) is received before the old one has started to run,
> > > the old one will be cancelled.  This holds even if the new request is
> > > redundant, like a resume request received while the device is active.
> > > 
> > > There is one exception to this rule: An idle_notify request does not 
> > > cancel a delayed or queued suspend request.
> > 
> > I'm not sure if such a rigid rule will be really useful.
> 
> A rigid rule is easier to understand and apply than one with a large
> number of special cases.  However, in the statement of the rule above,
> I forgot to mention that this applies only if the new request is valid,
> i.e., if it's not forbidden by the current status or the counter
> values.

Ah, OK.  I'd also like to add the rule about requests of the same type
(if there's one pending already, the new one is discareded).

> > Also, as I said above, I think we shouldn't regard setting up the suspend
> > timer as queuing up a request, but as a totally separate operation.
> 
> Well, there can't be any pending resume requests when the suspend timer
> is set up, so we have to consider only pending idle notifications or
> pending suspends.  I agree, we would want to allow an idle notification
> to remain pending when the suspend timer is set up.  As for pending
> suspends, we _should_ allow the new request to override the old one.  
> This will come up whenever the timeout value is changed.

Now there's a point in which allowing to set up the suspend timer at any time
simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
is called and it sees a timer pending, it deactivates the timer with
del_timer() and sets up a new one with add_timer().  It doesn't need to worry
about whether the suspend request has been queued up already or
pm_runtime_suspend() is running or something.  Things will work themselves out
anyway eventually.

Otherwise, after calling del_timer() we'll need to check if the timer was pending
and if it wasn't, then if the suspend requests has been queued up already, and
if it has, then if pm_runtime_suspend() is running (the current status is
RPM_SUSPENDING) etc.  That doesn't look particularly clean.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 15:55                                                               ` Alan Stern
  2009-07-02 17:50                                                                 ` Rafael J. Wysocki
@ 2009-07-02 17:50                                                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-02 17:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > > _and_ to ensure that these callbacks will be executed when it makes sense.
> > > 
> > > Thus if the situation changes before the callback can be made, so that
> > > it no longer makes sense, the framework should cancel the callback.
> > 
> > Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
> > resume request to be queued up and pm_runtime_resume() is called synchronously
> > exactly at the time the request's work function is started.  There are two
> > attempts to resume in progress, but only one of them can call
> > ->runtime_resume(), so what's the other one supposed to do?  The asynchronous
> > one can just return error code, but the the caller of the synchronous
> > pm_runtime_resume() must know whether or not the resume was successful.
> > So, perhaps, if the synchronous resume happens to lose the race, it should
> > wait for the other one to complete, check the device's status and return 0 if
> > it's active?  That wouldn't cause the workqueue thread to wait.
> 
> I didn't address this explicitly in the previous message, but yes.  
> This is no different from the way your current version works.
> 
> Similarly, if a synchronous resume call occurs while a suspend is in 
> progress, it should wait until the suspend finishes and then carry out 
> a resume.

Agreed.

> > > We can summarize these rules as follows:
> > > 
> > > 	Never allow more than one callback at a time, except that
> > > 	runtime_suspend may be invoked while runtime_idle is running.
> > 
> > Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
> > after that resume is requested by remote wake-up, ->runtime_resume() may also
> > be run while ->runtime_idle() is still running.
> 
> Yes, I didn't think of that case.  We have to allow either of the other 
> two to be invoked while runtime_idle is running.  But we can rule out 
> calling runtime_idle recursively.
> 
> > OTOH, we need to know when ->runtime_idle() has completed, because we have to
> > ensure it won't still be running after run-time PM has been disabled for the
> > device.
> > 
> > IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> > ->runtime_resume() is being executed (they are mutually exclusive) and the
> > the other one indicating that ->runtime_idle() is being executed.  For the
> > purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> > RPM_IN_TRANSITION.
> 
> The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> (status == RPM_SUSPENDING || status == RPM_RESUMING).

I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

> > With this notation, the above rule may be translated as:
> > 
> >     Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
> >     ->runtime_idle() if RPM_IDLE_RUNNING is set.
> > 
> > Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> > set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.
> 
> That is equivalent to my conclusion above.
> 
> > There are two possible "final" states, so I'd use one flag to indicate the
> > current status.  Let's call it RPM_SUSPENDED for now (which means that the
> > device is suspended when it's set and active otherwise) and I think we can make
> > the rule that this flag is only changed after successful execution of
> > ->runtime_suspend() or ->runtime_resume().
> > 
> > Whether the device is suspending or resuming follows from the values of
> > RPM_SUSPENDED and RPM_IN_TRANSITION.
> 
> You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a 
> single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).  
> It doesn't make much difference which you choose.

No, it doesn't.

Still, the additional flag for 'idle notification is in progress' is still
necessary for the following two reasons:

(1) Idle notifications cannot be run (synchronously) when one is already in
    progress, so we need a means to determine whether or not this is the case.

(2) If run-time PM is to be disabled, the function doing that must guarantee
    that ->runtime_idle() won't be running after it's returned, so it needs to
    know how to check that.

> > > Should the counters also be checked when the request is submitted?  
> > > And should the same go for pm_schedule_suspend?  These are nontrivial
> > > questions; good arguments can be made both ways.
> > 
> > That's the difficult part. :-)
> > 
> > First, I think a delayed suspend should be treated in a special way, because
> > it's not really a request to suspend.  Namely, as long as the timer hasn't
> > triggered yet, nothing happens and there's nothing against the rules above.
> > A request to suspend is queued up after the timer has triggered and the timer
> > function is where the rules come into play.  IOW, it consists of two
> > operations, setting up a timer and queuing up a request to suspend when the
> > timer triggers.  IMO the first of them can be done at any time, while the other
> > one may be affected by the rules.
> 
> I don't agree.  For example, suppose the device has an active child
> when the driver says: Suspend it in 30 seconds.  If the child is then
> removed after only 10 seconds, does it make sense to go ahead with
> suspending the parent 20 seconds later?  No -- if the parent is going
> to be suspended, the decision as to when should be made at the time the
> child is removed, not beforehand.

There are two functions, on that sets up the timer and the other that queues
up the request.  This is the second one that makes the decision if the request
is still worth queuing up.

> (Even more concretely, suppose there is a 30-second inactivity timeout
> for autosuspend.  Removing the child counts as activity and so should
> restart the timer.)
> 
> To put it another way, suppose you accept a delayed request under
> inappropriate conditions.  If the conditions don't change, the whole
> thing was a waste of effort.  And if the conditions do change, then the
> whole delayed request should be reconsidered anyhow.

The problem is, even if you always accept a delayed request under appropriate
conditions, you still have to reconsider it before putting it into the work
queue, because the conditions might have changed.  So, you'd like to do this:

(1) Check if the conditions are appropriate, set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

while I think it will be simpler to do this:

(1) Set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
between (1) and (2), so I don't really see a practical difference.

> So why accept it?

Beacuse that simplifies things?

For example, suppose ->runtime_resume() has been called as
a result of a remote wake-up (ie. after pm_request_resume()) and it has some
I/O to process, but it is known beforehand that the device will most likely be
inactive after the I/O is done.  So, it's tempting to call
pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
inappropriate (the device is not regarded as suspended).  However, calling
pm_schedule_suspend() with a long enough delay doesn't break any rules related
to the ->runtime_*() callbacks, so why should it be forbidden?

Next, suppose pm_schedule_suspend() is called, but it fails because the
conditions are inappropriate.  What's the caller supposed to do?  Wait for the
conditions to change and repeat?  But why should it bother if the conditions
may still change before ->runtime_suspend() is actually called?

IMO, it's the caller's problem whether or not what it does is useful or
efficient.  The core's problem is to ensure that it doesn't break things.

> > It implies that we should really introduce a timer and a timer function that
> > will queue up suspend requests, instead of using struct delayed_work.
> 
> Yes, this was part of my proposal.
> 
> > Second, I think it may be a good idea to use the usage counter to block further
> > requests while submitting a resume request.
> > 
> > Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> > if the resume was not necessary and the caller can do the I/O by itself, or
> > error code, which means that it was necessary to queue up a resume request.
> > If 0 is returned, the caller is supposed to do the I/O and call
> > pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> > supposed to take care of the I/O, in which case the request's work function
> > should call pm_runtime_put() when done.  [If it was impossible to queue up a
> > request, error code is returned, but the usage counter is decremented by
> > pm_request_resume(), so that the caller need not handle that special case,
> > hopefully rare.]
> 
> Trying to keep track of reasons for incrementing and decrementing 
> usage_count is very difficult to do in the core.  What happens if 
> pm_request_resume increments the count but then the driver calls 
> pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> routine can run?

Nothing wrong, as long as the increments and decrements are balanced (if they
aren't balanced, there is a bug in the driver anyway).  In fact, for this to
work we need the rule that a new request of the same type doesn't replace an
existing one.  Then, the submitted resume request cannot be canceled, so the
work function will run and drop the usage counter.

> It's better to make the driver responsible for maintaining the counter
> value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> better than having the core automatically change the counter.

So the caller will do:

pm_runtime_get(dev);
error = pm_request_resume(dev);
if (error)
    goto out;
<process I/O>
pm_runtime_put();

but how is it supposed to ensure that pm_runtime_put() will be called after
executing the 'goto out' thing?

Anyway, we don't need to use the usage counter for that (although it's cheap).
Instead, we can make pm_request_suspend() and pm_request_idle() check if a
resume request is pending and fail if that's the case.

> > This implies that it may be a good idea to check usage_count when submitting
> > idle notification and suspend requests (where in case of suspend a request is
> > submitted by the timer function, when the timer has already triggered, so
> > there's no need to check the counter while setting up the timer).
> > 
> > The counter of unsuspended children may change after a request has been
> > submitted and before its work function has a chance to run, so I don't see much
> > point checking it when submitting requests.
> 
> As I said above, if the counters don't change then the submission was 
> unnecessary, and if they do change then the submission should be 
> reconsidered.  Therefore they _should_ be checked in submissions.

Let's put it another way.  What's the practical benefit to the caller if we
always check the counters in submissions?

> > So, if the above idea is adopted, idle notification and suspend requests
> > won't be queued up when a resume request is pending (there's the question what
> > the timer function attempting to queue up a suspend request is supposed to do
> > in such a case) and in the other cases we can use the following rules:
> > 
> >     Any pending request takes precedence over a new idle notification request.
> 
> For pending resume requests this rule is unnecessary; it's invalid to
> submit an idle notification request while a resume request is pending
> (since resume requests can be pending only in the RPM_SUSPENDING and
> RPM_SUSPENDED states while idle notification requests are accepted only
> in RPM_RESUMING and RPM_ACTIVE).

It is correct nevertheless. :-)

> For pending suspends, I think we should allow synchronous idle
> notifications while the suspend is pending.

Sure, I was talking only about requests here, where by 'request' I understood
a work item put into the workqueue.

> The runtime_idle callback might then start its own suspend before the
> workqueue can get around to it.  You're right about async idle requests
> though; that was the exception I noted below.
> 
> >     If a new request is not an idle notification request, it takes precedence
> >     over the pending one, so it cancels it with the help of cancel_work().
> > 
> > [In the latter case, if a suspend request is canceled, we may want to set up the
> > timer for another one.]  For that, we're going to need a single flag, say
> > RPM_PENDING, which is set whenever a request is queued up.
> 
> That's what I called work_pending in my proposal.

Well, after some reconsideration I think it's not enough (as I wrote in my last
message), becuase it generally makes sense to make the following rule:

    A pending request always takes precedence over a new request of the same
    type.

So, for example, if pm_request_resume() is called and there's a resume request
pending already, the new pm_request_resume() should just let the pending
request alone and quit.

Thus, it seems reasonable to remember what type of a request is pending
(i don't think we can figure it out from the status fields in 100% of the
cases).

> > > The error codes you have been using seem okay to me, in general.
> > > 
> > > However, some of those requests would violate the rules in a trivial 
> > > way.  For these we might return a positive value rather than a negative 
> > > error code.  For example, calling pm_runtime_resume while the device is 
> > > already active shouldn't be considered an error.  But it can't be 
> > > considered a complete success either, because it won't invoke the 
> > > runtime_resume method.
> > 
> > That need not matter from the caller's point of view, though.  In the case of
> > pm_runtime_resume() the caller will probably be mostly interested whether or
> > not it can do I/O after the function has returned.
> 
> Yes.  But the driver might depend on something happening inside the
> runtime_resume method, so it would need to know if a successful
> pm_runtime_resume wasn't going to invoke the callback.

Hmm.  That would require the driver to know that the device was suspended,
but in that case pm_runtime_resume() returning 0 would mean that _someone_
ran ->runtime_resume() for it in any case.

If the driver doesn't know if the device was suspended beforehand, it cannot
depend on the execution of ->runtime_resume().

> > > To be determined: How runtime PM will interact with system sleep.
> > 
> > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > state, but that would involve canceling all of the pending requests.
> 
> Or simply freezing the workqueue.

Well, what about the synchronous calls?  How are we going to prevent them
from happening after freezing the workqueue?

> > > About all I can add is the "New requests override previous requests"  
> > > policy.  This would apply to all the non-synchronous requests, whether
> > > they are delayed or added directly to the workqueue.  If a new request
> > > (synchronous or not) is received before the old one has started to run,
> > > the old one will be cancelled.  This holds even if the new request is
> > > redundant, like a resume request received while the device is active.
> > > 
> > > There is one exception to this rule: An idle_notify request does not 
> > > cancel a delayed or queued suspend request.
> > 
> > I'm not sure if such a rigid rule will be really useful.
> 
> A rigid rule is easier to understand and apply than one with a large
> number of special cases.  However, in the statement of the rule above,
> I forgot to mention that this applies only if the new request is valid,
> i.e., if it's not forbidden by the current status or the counter
> values.

Ah, OK.  I'd also like to add the rule about requests of the same type
(if there's one pending already, the new one is discareded).

> > Also, as I said above, I think we shouldn't regard setting up the suspend
> > timer as queuing up a request, but as a totally separate operation.
> 
> Well, there can't be any pending resume requests when the suspend timer
> is set up, so we have to consider only pending idle notifications or
> pending suspends.  I agree, we would want to allow an idle notification
> to remain pending when the suspend timer is set up.  As for pending
> suspends, we _should_ allow the new request to override the old one.  
> This will come up whenever the timeout value is changed.

Now there's a point in which allowing to set up the suspend timer at any time
simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
is called and it sees a timer pending, it deactivates the timer with
del_timer() and sets up a new one with add_timer().  It doesn't need to worry
about whether the suspend request has been queued up already or
pm_runtime_suspend() is running or something.  Things will work themselves out
anyway eventually.

Otherwise, after calling del_timer() we'll need to check if the timer was pending
and if it wasn't, then if the suspend requests has been queued up already, and
if it has, then if pm_runtime_suspend() is running (the current status is
RPM_SUSPENDING) etc.  That doesn't look particularly clean.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 17:50                                                                 ` Rafael J. Wysocki
  2009-07-02 19:53                                                                   ` Alan Stern
@ 2009-07-02 19:53                                                                   ` Alan Stern
  2009-07-02 23:05                                                                     ` Rafael J. Wysocki
  2009-07-02 23:05                                                                     ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-02 19:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> > (status == RPM_SUSPENDING || status == RPM_RESUMING).
> 
> I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

Okay, but hopefully you won't mind if I continue to use the old state 
names in conversation.

> Still, the additional flag for 'idle notification is in progress' is still
> necessary for the following two reasons:
> 
> (1) Idle notifications cannot be run (synchronously) when one is already in
>     progress, so we need a means to determine whether or not this is the case.
> 
> (2) If run-time PM is to be disabled, the function doing that must guarantee
>     that ->runtime_idle() won't be running after it's returned, so it needs to
>     know how to check that.

Agreed.


> > I don't agree.  For example, suppose the device has an active child
> > when the driver says: Suspend it in 30 seconds.  If the child is then
> > removed after only 10 seconds, does it make sense to go ahead with
> > suspending the parent 20 seconds later?  No -- if the parent is going
> > to be suspended, the decision as to when should be made at the time the
> > child is removed, not beforehand.
> 
> There are two functions, on that sets up the timer and the other that queues
> up the request.  This is the second one that makes the decision if the request
> is still worth queuing up.
> 
> > (Even more concretely, suppose there is a 30-second inactivity timeout
> > for autosuspend.  Removing the child counts as activity and so should
> > restart the timer.)
> > 
> > To put it another way, suppose you accept a delayed request under
> > inappropriate conditions.  If the conditions don't change, the whole
> > thing was a waste of effort.  And if the conditions do change, then the
> > whole delayed request should be reconsidered anyhow.
> 
> The problem is, even if you always accept a delayed request under appropriate
> conditions, you still have to reconsider it before putting it into the work
> queue, because the conditions might have changed.  So, you'd like to do this:
> 
> (1) Check if the conditions are appropriate, set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
> 
> while I think it will be simpler to do this:
> 
> (1) Set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
> 
> In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> between (1) and (2), so I don't really see a practical difference.

A cycle like that would cancel the timer anyway.  Maybe that's what you 
meant...

Hmm.  What sort of conditions are we talking about?  One possiblity is 
that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED.  It's 
completely useless to start a timer then; if the state changes the 
timer will be cancelled, and if it doesn't change then the request 
won't be queued when the timer expires.

The other possiblity is that either the children or usage counter is 
positive.  If the counter decrements to 0 so that a suspend is feasible 
then we would send an idle notification.  At that point the driver 
could decide what to do; the most likely response would be to 
reschedule the suspend.  In fact, it's hard to think of a situation 
where the driver would want to just let the timer keep on running.

> For example, suppose ->runtime_resume() has been called as
> a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> I/O to process, but it is known beforehand that the device will most likely be
> inactive after the I/O is done.  So, it's tempting to call
> pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> inappropriate (the device is not regarded as suspended).

??  Conditions are perfectly appropriate, since suspend requests are 
allowed in the RESUMING state.

Unless the driver also did a pm_runtime_get, of course.  But in that 
case it would have to do a pm_runtime_put eventually, at which point it 
could schedule the suspend.

>  However, calling
> pm_schedule_suspend() with a long enough delay doesn't break any rules related
> to the ->runtime_*() callbacks, so why should it be forbidden?

It isn't.

> Next, suppose pm_schedule_suspend() is called, but it fails because the
> conditions are inappropriate.  What's the caller supposed to do?  Wait for the
> conditions to change and repeat?

In a manner of speaking.  More precisely, whatever code is responsible 
for changing the conditions should call pm_schedule_suspend.  Or set up 
an idle notification, leading indirectly to pm_schedule_suspend.

>  But why should it bother if the conditions
> may still change before ->runtime_suspend() is actually called?

It should bother because conditions might _not_ change, in which case
the suspend would occur.  But for what you are proposing, if the
conditions don't change then the suspend will not occur.

> IMO, it's the caller's problem whether or not what it does is useful or
> efficient.  The core's problem is to ensure that it doesn't break things.

But what's the drawback?  The extra overhead of checking whether two
counters are positive is minuscule compared to the effort of setting up
a timer.  And it's even better when you consider that the mostly likely
outcome of letting the timer run is that the timer handler would fail
to queue a suspend request (because the counters are unchanged).


> > Trying to keep track of reasons for incrementing and decrementing 
> > usage_count is very difficult to do in the core.  What happens if 
> > pm_request_resume increments the count but then the driver calls 
> > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> > routine can run?
> 
> Nothing wrong, as long as the increments and decrements are balanced (if they
> aren't balanced, there is a bug in the driver anyway).

That's my point -- in this situation it's very difficult for the driver
to balance them.  There would be no decrement to balance
pm_request_resume's automatic increment, because the work routine would
never run.

>  In fact, for this to
> work we need the rule that a new request of the same type doesn't replace an
> existing one.  Then, the submitted resume request cannot be canceled, so the
> work function will run and drop the usage counter.

A new pm_schedule_suspend _should_ replace an existing one.  For 
idle_notify and resume requests, this rule is more or less a no-op.

> > It's better to make the driver responsible for maintaining the counter
> > value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> > better than having the core automatically change the counter.
> 
> So the caller will do:
> 
> pm_runtime_get(dev);
> error = pm_request_resume(dev);
> if (error)
>     goto out;
> <process I/O>
> pm_runtime_put();

["error" isn't a good name.  The return value would be 0 to indicate 
the request was accepted and queued, or 1 to indicate the device is 
already active.  Or perhaps vice versa.]

> but how is it supposed to ensure that pm_runtime_put() will be called after
> executing the 'goto out' thing?

The same way it knows that the runtime_resume method has to process the
pending I/O.  That is, the presence of I/O to process means that once
the processing is over, the driver should call pm_runtime_put.

> Anyway, we don't need to use the usage counter for that (although it's cheap).
> Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> resume request is pending and fail if that's the case.

But what about pm_runtime_suspend?  I think we need to use the counter.
Besides, the states in which suspend requests and idle requests are 
valid are disjoint from the states in which resume requests are valid.

> Let's put it another way.  What's the practical benefit to the caller if we
> always check the counters in submissions?

It saves the overhead of setting up and running a useless timer.  It 
avoids a race between the timer routine and pm_runtime_put.


> > >     Any pending request takes precedence over a new idle notification request.
> > 
> > For pending resume requests this rule is unnecessary; it's invalid to
> > submit an idle notification request while a resume request is pending
> > (since resume requests can be pending only in the RPM_SUSPENDING and
> > RPM_SUSPENDED states while idle notification requests are accepted only
> > in RPM_RESUMING and RPM_ACTIVE).
> 
> It is correct nevertheless. :-)

Okay, if you want.  Provided you agree that "pending request" doesn't 
include unexpired suspend timers.

> Well, after some reconsideration I think it's not enough (as I wrote in my last
> message), becuase it generally makes sense to make the following rule:
> 
>     A pending request always takes precedence over a new request of the same
>     type.
> 
> So, for example, if pm_request_resume() is called and there's a resume request
> pending already, the new pm_request_resume() should just let the pending
> request alone and quit.

Do you mean we shouldn't cancel the work item and then requeue it?  I
agree.  In fact I'd go even farther: If the timer routine find an idle
request pending, it shouldn't cancel it -- instead it should simply
change async_action to ASYNC_SUSPEND.  That's a simple optimization.  
Regardless, the effect isn't visible to drivers.

> Thus, it seems reasonable to remember what type of a request is pending
> (i don't think we can figure it out from the status fields in 100% of the
> cases).

That's what the async_action field in my proposal is for.


> > Yes.  But the driver might depend on something happening inside the
> > runtime_resume method, so it would need to know if a successful
> > pm_runtime_resume wasn't going to invoke the callback.
> 
> Hmm.  That would require the driver to know that the device was suspended,
> but in that case pm_runtime_resume() returning 0 would mean that _someone_
> ran ->runtime_resume() for it in any case.
> 
> If the driver doesn't know if the device was suspended beforehand, it cannot
> depend on the execution of ->runtime_resume().

Exactly.  Therefore it needs to be told if pm_runtime_resume isn't 
going to call the runtime_resume method, so that it can take 
appropriate remedial action.


> > > > To be determined: How runtime PM will interact with system sleep.
> > > 
> > > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > > state, but that would involve canceling all of the pending requests.
> > 
> > Or simply freezing the workqueue.
> 
> Well, what about the synchronous calls?  How are we going to prevent them
> from happening after freezing the workqueue?

How about your "rpm_disabled" flag?


> Now there's a point in which allowing to set up the suspend timer at any time
> simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
> is called and it sees a timer pending, it deactivates the timer with
> del_timer() and sets up a new one with add_timer().  It doesn't need to worry
> about whether the suspend request has been queued up already or
> pm_runtime_suspend() is running or something.  Things will work themselves out
> anyway eventually.
> 
> Otherwise, after calling del_timer() we'll need to check if the timer was pending
> and if it wasn't, then if the suspend requests has been queued up already, and
> if it has, then if pm_runtime_suspend() is running (the current status is
> RPM_SUSPENDING) etc.  That doesn't look particularly clean.

It's not as bad as you think.  In pseudo code:

	ret = suspend_allowed(dev);
	if (ret)
		return ret;
	if (dev->power.timer_expiration) {
		del_timer(&dev->power.timer);
		dev->power.timer_expiration = 0;
	}
	if (dev->power.work_pending) {
		cancel_work(&dev->power.work);
		dev->power.work_pending = 0;
		dev->power.async_action = 0;
	}
	dev->power.timer_expiration = max(jiffies + delay, 1UL);
	mod_timer(&dev->power.timer, delay);

The middle section could usefully be put in a subroutine.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 17:50                                                                 ` Rafael J. Wysocki
@ 2009-07-02 19:53                                                                   ` Alan Stern
  2009-07-02 19:53                                                                   ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-02 19:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> > (status == RPM_SUSPENDING || status == RPM_RESUMING).
> 
> I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

Okay, but hopefully you won't mind if I continue to use the old state 
names in conversation.

> Still, the additional flag for 'idle notification is in progress' is still
> necessary for the following two reasons:
> 
> (1) Idle notifications cannot be run (synchronously) when one is already in
>     progress, so we need a means to determine whether or not this is the case.
> 
> (2) If run-time PM is to be disabled, the function doing that must guarantee
>     that ->runtime_idle() won't be running after it's returned, so it needs to
>     know how to check that.

Agreed.


> > I don't agree.  For example, suppose the device has an active child
> > when the driver says: Suspend it in 30 seconds.  If the child is then
> > removed after only 10 seconds, does it make sense to go ahead with
> > suspending the parent 20 seconds later?  No -- if the parent is going
> > to be suspended, the decision as to when should be made at the time the
> > child is removed, not beforehand.
> 
> There are two functions, on that sets up the timer and the other that queues
> up the request.  This is the second one that makes the decision if the request
> is still worth queuing up.
> 
> > (Even more concretely, suppose there is a 30-second inactivity timeout
> > for autosuspend.  Removing the child counts as activity and so should
> > restart the timer.)
> > 
> > To put it another way, suppose you accept a delayed request under
> > inappropriate conditions.  If the conditions don't change, the whole
> > thing was a waste of effort.  And if the conditions do change, then the
> > whole delayed request should be reconsidered anyhow.
> 
> The problem is, even if you always accept a delayed request under appropriate
> conditions, you still have to reconsider it before putting it into the work
> queue, because the conditions might have changed.  So, you'd like to do this:
> 
> (1) Check if the conditions are appropriate, set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
> 
> while I think it will be simpler to do this:
> 
> (1) Set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
> 
> In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> between (1) and (2), so I don't really see a practical difference.

A cycle like that would cancel the timer anyway.  Maybe that's what you 
meant...

Hmm.  What sort of conditions are we talking about?  One possiblity is 
that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED.  It's 
completely useless to start a timer then; if the state changes the 
timer will be cancelled, and if it doesn't change then the request 
won't be queued when the timer expires.

The other possiblity is that either the children or usage counter is 
positive.  If the counter decrements to 0 so that a suspend is feasible 
then we would send an idle notification.  At that point the driver 
could decide what to do; the most likely response would be to 
reschedule the suspend.  In fact, it's hard to think of a situation 
where the driver would want to just let the timer keep on running.

> For example, suppose ->runtime_resume() has been called as
> a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> I/O to process, but it is known beforehand that the device will most likely be
> inactive after the I/O is done.  So, it's tempting to call
> pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> inappropriate (the device is not regarded as suspended).

??  Conditions are perfectly appropriate, since suspend requests are 
allowed in the RESUMING state.

Unless the driver also did a pm_runtime_get, of course.  But in that 
case it would have to do a pm_runtime_put eventually, at which point it 
could schedule the suspend.

>  However, calling
> pm_schedule_suspend() with a long enough delay doesn't break any rules related
> to the ->runtime_*() callbacks, so why should it be forbidden?

It isn't.

> Next, suppose pm_schedule_suspend() is called, but it fails because the
> conditions are inappropriate.  What's the caller supposed to do?  Wait for the
> conditions to change and repeat?

In a manner of speaking.  More precisely, whatever code is responsible 
for changing the conditions should call pm_schedule_suspend.  Or set up 
an idle notification, leading indirectly to pm_schedule_suspend.

>  But why should it bother if the conditions
> may still change before ->runtime_suspend() is actually called?

It should bother because conditions might _not_ change, in which case
the suspend would occur.  But for what you are proposing, if the
conditions don't change then the suspend will not occur.

> IMO, it's the caller's problem whether or not what it does is useful or
> efficient.  The core's problem is to ensure that it doesn't break things.

But what's the drawback?  The extra overhead of checking whether two
counters are positive is minuscule compared to the effort of setting up
a timer.  And it's even better when you consider that the mostly likely
outcome of letting the timer run is that the timer handler would fail
to queue a suspend request (because the counters are unchanged).


> > Trying to keep track of reasons for incrementing and decrementing 
> > usage_count is very difficult to do in the core.  What happens if 
> > pm_request_resume increments the count but then the driver calls 
> > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> > routine can run?
> 
> Nothing wrong, as long as the increments and decrements are balanced (if they
> aren't balanced, there is a bug in the driver anyway).

That's my point -- in this situation it's very difficult for the driver
to balance them.  There would be no decrement to balance
pm_request_resume's automatic increment, because the work routine would
never run.

>  In fact, for this to
> work we need the rule that a new request of the same type doesn't replace an
> existing one.  Then, the submitted resume request cannot be canceled, so the
> work function will run and drop the usage counter.

A new pm_schedule_suspend _should_ replace an existing one.  For 
idle_notify and resume requests, this rule is more or less a no-op.

> > It's better to make the driver responsible for maintaining the counter
> > value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> > better than having the core automatically change the counter.
> 
> So the caller will do:
> 
> pm_runtime_get(dev);
> error = pm_request_resume(dev);
> if (error)
>     goto out;
> <process I/O>
> pm_runtime_put();

["error" isn't a good name.  The return value would be 0 to indicate 
the request was accepted and queued, or 1 to indicate the device is 
already active.  Or perhaps vice versa.]

> but how is it supposed to ensure that pm_runtime_put() will be called after
> executing the 'goto out' thing?

The same way it knows that the runtime_resume method has to process the
pending I/O.  That is, the presence of I/O to process means that once
the processing is over, the driver should call pm_runtime_put.

> Anyway, we don't need to use the usage counter for that (although it's cheap).
> Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> resume request is pending and fail if that's the case.

But what about pm_runtime_suspend?  I think we need to use the counter.
Besides, the states in which suspend requests and idle requests are 
valid are disjoint from the states in which resume requests are valid.

> Let's put it another way.  What's the practical benefit to the caller if we
> always check the counters in submissions?

It saves the overhead of setting up and running a useless timer.  It 
avoids a race between the timer routine and pm_runtime_put.


> > >     Any pending request takes precedence over a new idle notification request.
> > 
> > For pending resume requests this rule is unnecessary; it's invalid to
> > submit an idle notification request while a resume request is pending
> > (since resume requests can be pending only in the RPM_SUSPENDING and
> > RPM_SUSPENDED states while idle notification requests are accepted only
> > in RPM_RESUMING and RPM_ACTIVE).
> 
> It is correct nevertheless. :-)

Okay, if you want.  Provided you agree that "pending request" doesn't 
include unexpired suspend timers.

> Well, after some reconsideration I think it's not enough (as I wrote in my last
> message), becuase it generally makes sense to make the following rule:
> 
>     A pending request always takes precedence over a new request of the same
>     type.
> 
> So, for example, if pm_request_resume() is called and there's a resume request
> pending already, the new pm_request_resume() should just let the pending
> request alone and quit.

Do you mean we shouldn't cancel the work item and then requeue it?  I
agree.  In fact I'd go even farther: If the timer routine find an idle
request pending, it shouldn't cancel it -- instead it should simply
change async_action to ASYNC_SUSPEND.  That's a simple optimization.  
Regardless, the effect isn't visible to drivers.

> Thus, it seems reasonable to remember what type of a request is pending
> (i don't think we can figure it out from the status fields in 100% of the
> cases).

That's what the async_action field in my proposal is for.


> > Yes.  But the driver might depend on something happening inside the
> > runtime_resume method, so it would need to know if a successful
> > pm_runtime_resume wasn't going to invoke the callback.
> 
> Hmm.  That would require the driver to know that the device was suspended,
> but in that case pm_runtime_resume() returning 0 would mean that _someone_
> ran ->runtime_resume() for it in any case.
> 
> If the driver doesn't know if the device was suspended beforehand, it cannot
> depend on the execution of ->runtime_resume().

Exactly.  Therefore it needs to be told if pm_runtime_resume isn't 
going to call the runtime_resume method, so that it can take 
appropriate remedial action.


> > > > To be determined: How runtime PM will interact with system sleep.
> > > 
> > > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > > state, but that would involve canceling all of the pending requests.
> > 
> > Or simply freezing the workqueue.
> 
> Well, what about the synchronous calls?  How are we going to prevent them
> from happening after freezing the workqueue?

How about your "rpm_disabled" flag?


> Now there's a point in which allowing to set up the suspend timer at any time
> simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
> is called and it sees a timer pending, it deactivates the timer with
> del_timer() and sets up a new one with add_timer().  It doesn't need to worry
> about whether the suspend request has been queued up already or
> pm_runtime_suspend() is running or something.  Things will work themselves out
> anyway eventually.
> 
> Otherwise, after calling del_timer() we'll need to check if the timer was pending
> and if it wasn't, then if the suspend requests has been queued up already, and
> if it has, then if pm_runtime_suspend() is running (the current status is
> RPM_SUSPENDING) etc.  That doesn't look particularly clean.

It's not as bad as you think.  In pseudo code:

	ret = suspend_allowed(dev);
	if (ret)
		return ret;
	if (dev->power.timer_expiration) {
		del_timer(&dev->power.timer);
		dev->power.timer_expiration = 0;
	}
	if (dev->power.work_pending) {
		cancel_work(&dev->power.work);
		dev->power.work_pending = 0;
		dev->power.async_action = 0;
	}
	dev->power.timer_expiration = max(jiffies + delay, 1UL);
	mod_timer(&dev->power.timer, delay);

The middle section could usefully be put in a subroutine.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 19:53                                                                   ` Alan Stern
  2009-07-02 23:05                                                                     ` Rafael J. Wysocki
@ 2009-07-02 23:05                                                                     ` Rafael J. Wysocki
  2009-07-03 20:58                                                                       ` Alan Stern
  2009-07-03 20:58                                                                       ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-02 23:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> > > (status == RPM_SUSPENDING || status == RPM_RESUMING).
> > 
> > I thought of replacing the old flags with RPM_IN_TRANSITION, actually.
> 
> Okay, but hopefully you won't mind if I continue to use the old state 
> names in conversation.

Sure.

> > Still, the additional flag for 'idle notification is in progress' is still
> > necessary for the following two reasons:
> > 
> > (1) Idle notifications cannot be run (synchronously) when one is already in
> >     progress, so we need a means to determine whether or not this is the case.
> > 
> > (2) If run-time PM is to be disabled, the function doing that must guarantee
> >     that ->runtime_idle() won't be running after it's returned, so it needs to
> >     know how to check that.
> 
> Agreed.
> 
> 
> > > I don't agree.  For example, suppose the device has an active child
> > > when the driver says: Suspend it in 30 seconds.  If the child is then
> > > removed after only 10 seconds, does it make sense to go ahead with
> > > suspending the parent 20 seconds later?  No -- if the parent is going
> > > to be suspended, the decision as to when should be made at the time the
> > > child is removed, not beforehand.
> > 
> > There are two functions, on that sets up the timer and the other that queues
> > up the request.  This is the second one that makes the decision if the request
> > is still worth queuing up.
> > 
> > > (Even more concretely, suppose there is a 30-second inactivity timeout
> > > for autosuspend.  Removing the child counts as activity and so should
> > > restart the timer.)
> > > 
> > > To put it another way, suppose you accept a delayed request under
> > > inappropriate conditions.  If the conditions don't change, the whole
> > > thing was a waste of effort.  And if the conditions do change, then the
> > > whole delayed request should be reconsidered anyhow.
> > 
> > The problem is, even if you always accept a delayed request under appropriate
> > conditions, you still have to reconsider it before putting it into the work
> > queue, because the conditions might have changed.  So, you'd like to do this:
> > 
> > (1) Check if the conditions are appropriate, set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> > 
> > while I think it will be simpler to do this:
> > 
> > (1) Set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> > 
> > In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> > between (1) and (2), so I don't really see a practical difference.
> 
> A cycle like that would cancel the timer anyway.  Maybe that's what you 
> meant...

Yes.

> Hmm.  What sort of conditions are we talking about?  One possiblity is 
> that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED.  It's 
> completely useless to start a timer then; if the state changes the 
> timer will be cancelled, and if it doesn't change then the request 
> won't be queued when the timer expires.

OK

> The other possiblity is that either the children or usage counter is 
> positive.  If the counter decrements to 0 so that a suspend is feasible 
> then we would send an idle notification.  At that point the driver 
> could decide what to do; the most likely response would be to 
> reschedule the suspend.  In fact, it's hard to think of a situation 
> where the driver would want to just let the timer keep on running.

OK

> > For example, suppose ->runtime_resume() has been called as
> > a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> > I/O to process, but it is known beforehand that the device will most likely be
> > inactive after the I/O is done.  So, it's tempting to call
> > pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> > inappropriate (the device is not regarded as suspended).
> 
> ??  Conditions are perfectly appropriate, since suspend requests are 
> allowed in the RESUMING state.

OK

> Unless the driver also did a pm_runtime_get, of course.  But in that 
> case it would have to do a pm_runtime_put eventually, at which point it 
> could schedule the suspend.
> 
> >  However, calling
> > pm_schedule_suspend() with a long enough delay doesn't break any rules related
> > to the ->runtime_*() callbacks, so why should it be forbidden?
> 
> It isn't.
> 
> > Next, suppose pm_schedule_suspend() is called, but it fails because the
> > conditions are inappropriate.  What's the caller supposed to do?  Wait for the
> > conditions to change and repeat?
> 
> In a manner of speaking.  More precisely, whatever code is responsible 
> for changing the conditions should call pm_schedule_suspend.  Or set up 
> an idle notification, leading indirectly to pm_schedule_suspend.
> 
> >  But why should it bother if the conditions
> > may still change before ->runtime_suspend() is actually called?
> 
> It should bother because conditions might _not_ change, in which case
> the suspend would occur.  But for what you are proposing, if the
> conditions don't change then the suspend will not occur.
> 
> > IMO, it's the caller's problem whether or not what it does is useful or
> > efficient.  The core's problem is to ensure that it doesn't break things.
> 
> But what's the drawback?  The extra overhead of checking whether two
> counters are positive is minuscule compared to the effort of setting up
> a timer.  And it's even better when you consider that the mostly likely
> outcome of letting the timer run is that the timer handler would fail
> to queue a suspend request (because the counters are unchanged).
> 
> 
> > > Trying to keep track of reasons for incrementing and decrementing 
> > > usage_count is very difficult to do in the core.  What happens if 
> > > pm_request_resume increments the count but then the driver calls 
> > > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> > > routine can run?
> > 
> > Nothing wrong, as long as the increments and decrements are balanced (if they
> > aren't balanced, there is a bug in the driver anyway).
> 
> That's my point -- in this situation it's very difficult for the driver
> to balance them.  There would be no decrement to balance
> pm_request_resume's automatic increment, because the work routine would
> never run.
> 
> >  In fact, for this to
> > work we need the rule that a new request of the same type doesn't replace an
> > existing one.  Then, the submitted resume request cannot be canceled, so the
> > work function will run and drop the usage counter.
> 
> A new pm_schedule_suspend _should_ replace an existing one.  For 
> idle_notify and resume requests, this rule is more or less a no-op.
> 
> > > It's better to make the driver responsible for maintaining the counter
> > > value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> > > better than having the core automatically change the counter.
> > 
> > So the caller will do:
> > 
> > pm_runtime_get(dev);
> > error = pm_request_resume(dev);
> > if (error)
> >     goto out;
> > <process I/O>
> > pm_runtime_put();
> 
> ["error" isn't a good name.  The return value would be 0 to indicate 
> the request was accepted and queued, or 1 to indicate the device is 
> already active.  Or perhaps vice versa.]

Why do you insist on using positive values?  Also, there are other situations
possible (like run-time PM is disabled etc.).

> > but how is it supposed to ensure that pm_runtime_put() will be called after
> > executing the 'goto out' thing?
> 
> The same way it knows that the runtime_resume method has to process the
> pending I/O.  That is, the presence of I/O to process means that once
> the processing is over, the driver should call pm_runtime_put.

I overlooked the fact that if pm_request_resume() returns a value indicating
that the request has been queued up, the status is such that it won't allow any
other requests to be queued up and only pm_runtime_resume() can.  The status
will still remain this way until ->runtime_resume() has returned, so the caller
can just call pm_runtime_put() right after pm_request_resume() in that case
(unless it wants to process I/O after ->runtime_resume() has returned, but
then it can increment the usage counter in ->runtime_resume()).

> > Anyway, we don't need to use the usage counter for that (although it's cheap).
> > Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> > resume request is pending and fail if that's the case.
> 
> But what about pm_runtime_suspend?  I think we need to use the counter.
> Besides, the states in which suspend requests and idle requests are 
> valid are disjoint from the states in which resume requests are valid.

That's correct.  pm_runtime_suspend() should check the counter IMO, but it
shouldn't change it.

Also, it looks like the status bits are sufficient to prevent suspend requests
or synchronous suspends from happening at wrong times, from the core's point
of view, so scratch the idea of using the usage counter to block them.

> > Let's put it another way.  What's the practical benefit to the caller if we
> > always check the counters in submissions?
> 
> It saves the overhead of setting up and running a useless timer.  It 
> avoids a race between the timer routine and pm_runtime_put.

OK

> > > >     Any pending request takes precedence over a new idle notification request.
> > > 
> > > For pending resume requests this rule is unnecessary; it's invalid to
> > > submit an idle notification request while a resume request is pending
> > > (since resume requests can be pending only in the RPM_SUSPENDING and
> > > RPM_SUSPENDED states while idle notification requests are accepted only
> > > in RPM_RESUMING and RPM_ACTIVE).
> > 
> > It is correct nevertheless. :-)
> 
> Okay, if you want.  Provided you agree that "pending request" doesn't 
> include unexpired suspend timers.

Sure.

> > Well, after some reconsideration I think it's not enough (as I wrote in my last
> > message), becuase it generally makes sense to make the following rule:
> > 
> >     A pending request always takes precedence over a new request of the same
> >     type.
> > 
> > So, for example, if pm_request_resume() is called and there's a resume request
> > pending already, the new pm_request_resume() should just let the pending
> > request alone and quit.
> 
> Do you mean we shouldn't cancel the work item and then requeue it?  I
> agree.  In fact I'd go even farther: If the timer routine find an idle
> request pending, it shouldn't cancel it -- instead it should simply
> change async_action to ASYNC_SUSPEND.  That's a simple optimization.  
> Regardless, the effect isn't visible to drivers.

I don't really like the async_action idea, as you might have noticed.

> > Thus, it seems reasonable to remember what type of a request is pending
> > (i don't think we can figure it out from the status fields in 100% of the
> > cases).
> 
> That's what the async_action field in my proposal is for.

Ah.  Why don't we just use a request type field instead?

In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
(RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

Additionally, we'll need an "idle notification is running" flag as we've aleady
agreed, but that's independent on the status and request type (except that, I
think, it should be forbidden to set the request type to RPM_REQ_IDLE if
this flag is set).

That would pretty much suffice to represent all of the possibilities.

I'd also add a "disabled" flag indicating that run-time PM of the device is
disabled, an "error" flag indicating that one of the
->runtime_[suspend/resume]() callbacks has failed to do its job and
and an int field to store the error code returned by the failing callback (in
case the failure happened in an asynchronous routine).

> > > Yes.  But the driver might depend on something happening inside the
> > > runtime_resume method, so it would need to know if a successful
> > > pm_runtime_resume wasn't going to invoke the callback.
> > 
> > Hmm.  That would require the driver to know that the device was suspended,
> > but in that case pm_runtime_resume() returning 0 would mean that _someone_
> > ran ->runtime_resume() for it in any case.
> > 
> > If the driver doesn't know if the device was suspended beforehand, it cannot
> > depend on the execution of ->runtime_resume().
> 
> Exactly.  Therefore it needs to be told if pm_runtime_resume isn't 
> going to call the runtime_resume method, so that it can take 
> appropriate remedial action.

OK, it can return 1 if the status was already RPM_ACTIVE.

> > > > > To be determined: How runtime PM will interact with system sleep.
> > > > 
> > > > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > > > state, but that would involve canceling all of the pending requests.
> > > 
> > > Or simply freezing the workqueue.
> > 
> > Well, what about the synchronous calls?  How are we going to prevent them
> > from happening after freezing the workqueue?
> 
> How about your "rpm_disabled" flag?

That's fine, we'd also need to wait for running callbacks to finish too.  And
I'm still not convinced if we should preserve requests queued up before the
system sleep.  Or keep the suspend timer running for that matter.

> > Now there's a point in which allowing to set up the suspend timer at any time
> > simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
> > is called and it sees a timer pending, it deactivates the timer with
> > del_timer() and sets up a new one with add_timer().  It doesn't need to worry
> > about whether the suspend request has been queued up already or
> > pm_runtime_suspend() is running or something.  Things will work themselves out
> > anyway eventually.
> > 
> > Otherwise, after calling del_timer() we'll need to check if the timer was pending
> > and if it wasn't, then if the suspend requests has been queued up already, and
> > if it has, then if pm_runtime_suspend() is running (the current status is
> > RPM_SUSPENDING) etc.  That doesn't look particularly clean.
> 
> It's not as bad as you think.  In pseudo code:
> 
> 	ret = suspend_allowed(dev);
> 	if (ret)
> 		return ret;
> 	if (dev->power.timer_expiration) {
> 		del_timer(&dev->power.timer);
> 		dev->power.timer_expiration = 0;
> 	}
> 	if (dev->power.work_pending) {
> 		cancel_work(&dev->power.work);
> 		dev->power.work_pending = 0;
> 		dev->power.async_action = 0;
> 	}
> 	dev->power.timer_expiration = max(jiffies + delay, 1UL);
> 	mod_timer(&dev->power.timer, delay);
> 
> The middle section could usefully be put in a subroutine.

Could you please remind me what timer_expiration is for?

So, at a high level, the pm_request_* and pm_schedule_* functions would work
like this (I'm omitting acquiring and releasing locks):

pm_request_idle()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
    RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
  * return -EALREADY if 'request type' is RPM_REQ_IDLE
  * return -EINPROGRESS if 'idle notification in progress' is set
  * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
    ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
    on 'request type' at the time when the work function is run)
  * return 0

pm_schedule_suspend()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
  * return -EALREADY if 'runtime status' is RPM_SUSPENDED
  * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
  * if suspend timer is pending, deactivate it
  * if 'request type' is not RPM_REQ_NONE, cancel the work
  * set up a timer to execute pm_request_suspend()
  * return 0

pm_request_suspend()
  * return if 'disabled' is set or 'runtime_error' is set
  * return if 'usage_count' > 0 or 'child_count' > 0
  * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
  * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
  * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
    execute ->runtime_suspend()

pm_request_resume()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
  * return -EALREADY if 'request type' is RPM_REQ_RESUME
  * if suspend timer is pending, deactivate it
  * if 'request type' is not RPM_REQ_NONE, cancel the work
  * return 1 if 'runtime status' is RPM_ACTIVE
  * change 'request type' to RPM_REQ_RESUME and queue up a request to
    execute ->runtime_resume()
  * return 0

Or did I miss anything?

Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 19:53                                                                   ` Alan Stern
@ 2009-07-02 23:05                                                                     ` Rafael J. Wysocki
  2009-07-02 23:05                                                                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-02 23:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> > > (status == RPM_SUSPENDING || status == RPM_RESUMING).
> > 
> > I thought of replacing the old flags with RPM_IN_TRANSITION, actually.
> 
> Okay, but hopefully you won't mind if I continue to use the old state 
> names in conversation.

Sure.

> > Still, the additional flag for 'idle notification is in progress' is still
> > necessary for the following two reasons:
> > 
> > (1) Idle notifications cannot be run (synchronously) when one is already in
> >     progress, so we need a means to determine whether or not this is the case.
> > 
> > (2) If run-time PM is to be disabled, the function doing that must guarantee
> >     that ->runtime_idle() won't be running after it's returned, so it needs to
> >     know how to check that.
> 
> Agreed.
> 
> 
> > > I don't agree.  For example, suppose the device has an active child
> > > when the driver says: Suspend it in 30 seconds.  If the child is then
> > > removed after only 10 seconds, does it make sense to go ahead with
> > > suspending the parent 20 seconds later?  No -- if the parent is going
> > > to be suspended, the decision as to when should be made at the time the
> > > child is removed, not beforehand.
> > 
> > There are two functions, on that sets up the timer and the other that queues
> > up the request.  This is the second one that makes the decision if the request
> > is still worth queuing up.
> > 
> > > (Even more concretely, suppose there is a 30-second inactivity timeout
> > > for autosuspend.  Removing the child counts as activity and so should
> > > restart the timer.)
> > > 
> > > To put it another way, suppose you accept a delayed request under
> > > inappropriate conditions.  If the conditions don't change, the whole
> > > thing was a waste of effort.  And if the conditions do change, then the
> > > whole delayed request should be reconsidered anyhow.
> > 
> > The problem is, even if you always accept a delayed request under appropriate
> > conditions, you still have to reconsider it before putting it into the work
> > queue, because the conditions might have changed.  So, you'd like to do this:
> > 
> > (1) Check if the conditions are appropriate, set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> > 
> > while I think it will be simpler to do this:
> > 
> > (1) Set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> > 
> > In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> > between (1) and (2), so I don't really see a practical difference.
> 
> A cycle like that would cancel the timer anyway.  Maybe that's what you 
> meant...

Yes.

> Hmm.  What sort of conditions are we talking about?  One possiblity is 
> that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED.  It's 
> completely useless to start a timer then; if the state changes the 
> timer will be cancelled, and if it doesn't change then the request 
> won't be queued when the timer expires.

OK

> The other possiblity is that either the children or usage counter is 
> positive.  If the counter decrements to 0 so that a suspend is feasible 
> then we would send an idle notification.  At that point the driver 
> could decide what to do; the most likely response would be to 
> reschedule the suspend.  In fact, it's hard to think of a situation 
> where the driver would want to just let the timer keep on running.

OK

> > For example, suppose ->runtime_resume() has been called as
> > a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> > I/O to process, but it is known beforehand that the device will most likely be
> > inactive after the I/O is done.  So, it's tempting to call
> > pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> > inappropriate (the device is not regarded as suspended).
> 
> ??  Conditions are perfectly appropriate, since suspend requests are 
> allowed in the RESUMING state.

OK

> Unless the driver also did a pm_runtime_get, of course.  But in that 
> case it would have to do a pm_runtime_put eventually, at which point it 
> could schedule the suspend.
> 
> >  However, calling
> > pm_schedule_suspend() with a long enough delay doesn't break any rules related
> > to the ->runtime_*() callbacks, so why should it be forbidden?
> 
> It isn't.
> 
> > Next, suppose pm_schedule_suspend() is called, but it fails because the
> > conditions are inappropriate.  What's the caller supposed to do?  Wait for the
> > conditions to change and repeat?
> 
> In a manner of speaking.  More precisely, whatever code is responsible 
> for changing the conditions should call pm_schedule_suspend.  Or set up 
> an idle notification, leading indirectly to pm_schedule_suspend.
> 
> >  But why should it bother if the conditions
> > may still change before ->runtime_suspend() is actually called?
> 
> It should bother because conditions might _not_ change, in which case
> the suspend would occur.  But for what you are proposing, if the
> conditions don't change then the suspend will not occur.
> 
> > IMO, it's the caller's problem whether or not what it does is useful or
> > efficient.  The core's problem is to ensure that it doesn't break things.
> 
> But what's the drawback?  The extra overhead of checking whether two
> counters are positive is minuscule compared to the effort of setting up
> a timer.  And it's even better when you consider that the mostly likely
> outcome of letting the timer run is that the timer handler would fail
> to queue a suspend request (because the counters are unchanged).
> 
> 
> > > Trying to keep track of reasons for incrementing and decrementing 
> > > usage_count is very difficult to do in the core.  What happens if 
> > > pm_request_resume increments the count but then the driver calls 
> > > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> > > routine can run?
> > 
> > Nothing wrong, as long as the increments and decrements are balanced (if they
> > aren't balanced, there is a bug in the driver anyway).
> 
> That's my point -- in this situation it's very difficult for the driver
> to balance them.  There would be no decrement to balance
> pm_request_resume's automatic increment, because the work routine would
> never run.
> 
> >  In fact, for this to
> > work we need the rule that a new request of the same type doesn't replace an
> > existing one.  Then, the submitted resume request cannot be canceled, so the
> > work function will run and drop the usage counter.
> 
> A new pm_schedule_suspend _should_ replace an existing one.  For 
> idle_notify and resume requests, this rule is more or less a no-op.
> 
> > > It's better to make the driver responsible for maintaining the counter
> > > value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> > > better than having the core automatically change the counter.
> > 
> > So the caller will do:
> > 
> > pm_runtime_get(dev);
> > error = pm_request_resume(dev);
> > if (error)
> >     goto out;
> > <process I/O>
> > pm_runtime_put();
> 
> ["error" isn't a good name.  The return value would be 0 to indicate 
> the request was accepted and queued, or 1 to indicate the device is 
> already active.  Or perhaps vice versa.]

Why do you insist on using positive values?  Also, there are other situations
possible (like run-time PM is disabled etc.).

> > but how is it supposed to ensure that pm_runtime_put() will be called after
> > executing the 'goto out' thing?
> 
> The same way it knows that the runtime_resume method has to process the
> pending I/O.  That is, the presence of I/O to process means that once
> the processing is over, the driver should call pm_runtime_put.

I overlooked the fact that if pm_request_resume() returns a value indicating
that the request has been queued up, the status is such that it won't allow any
other requests to be queued up and only pm_runtime_resume() can.  The status
will still remain this way until ->runtime_resume() has returned, so the caller
can just call pm_runtime_put() right after pm_request_resume() in that case
(unless it wants to process I/O after ->runtime_resume() has returned, but
then it can increment the usage counter in ->runtime_resume()).

> > Anyway, we don't need to use the usage counter for that (although it's cheap).
> > Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> > resume request is pending and fail if that's the case.
> 
> But what about pm_runtime_suspend?  I think we need to use the counter.
> Besides, the states in which suspend requests and idle requests are 
> valid are disjoint from the states in which resume requests are valid.

That's correct.  pm_runtime_suspend() should check the counter IMO, but it
shouldn't change it.

Also, it looks like the status bits are sufficient to prevent suspend requests
or synchronous suspends from happening at wrong times, from the core's point
of view, so scratch the idea of using the usage counter to block them.

> > Let's put it another way.  What's the practical benefit to the caller if we
> > always check the counters in submissions?
> 
> It saves the overhead of setting up and running a useless timer.  It 
> avoids a race between the timer routine and pm_runtime_put.

OK

> > > >     Any pending request takes precedence over a new idle notification request.
> > > 
> > > For pending resume requests this rule is unnecessary; it's invalid to
> > > submit an idle notification request while a resume request is pending
> > > (since resume requests can be pending only in the RPM_SUSPENDING and
> > > RPM_SUSPENDED states while idle notification requests are accepted only
> > > in RPM_RESUMING and RPM_ACTIVE).
> > 
> > It is correct nevertheless. :-)
> 
> Okay, if you want.  Provided you agree that "pending request" doesn't 
> include unexpired suspend timers.

Sure.

> > Well, after some reconsideration I think it's not enough (as I wrote in my last
> > message), becuase it generally makes sense to make the following rule:
> > 
> >     A pending request always takes precedence over a new request of the same
> >     type.
> > 
> > So, for example, if pm_request_resume() is called and there's a resume request
> > pending already, the new pm_request_resume() should just let the pending
> > request alone and quit.
> 
> Do you mean we shouldn't cancel the work item and then requeue it?  I
> agree.  In fact I'd go even farther: If the timer routine find an idle
> request pending, it shouldn't cancel it -- instead it should simply
> change async_action to ASYNC_SUSPEND.  That's a simple optimization.  
> Regardless, the effect isn't visible to drivers.

I don't really like the async_action idea, as you might have noticed.

> > Thus, it seems reasonable to remember what type of a request is pending
> > (i don't think we can figure it out from the status fields in 100% of the
> > cases).
> 
> That's what the async_action field in my proposal is for.

Ah.  Why don't we just use a request type field instead?

In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
(RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

Additionally, we'll need an "idle notification is running" flag as we've aleady
agreed, but that's independent on the status and request type (except that, I
think, it should be forbidden to set the request type to RPM_REQ_IDLE if
this flag is set).

That would pretty much suffice to represent all of the possibilities.

I'd also add a "disabled" flag indicating that run-time PM of the device is
disabled, an "error" flag indicating that one of the
->runtime_[suspend/resume]() callbacks has failed to do its job and
and an int field to store the error code returned by the failing callback (in
case the failure happened in an asynchronous routine).

> > > Yes.  But the driver might depend on something happening inside the
> > > runtime_resume method, so it would need to know if a successful
> > > pm_runtime_resume wasn't going to invoke the callback.
> > 
> > Hmm.  That would require the driver to know that the device was suspended,
> > but in that case pm_runtime_resume() returning 0 would mean that _someone_
> > ran ->runtime_resume() for it in any case.
> > 
> > If the driver doesn't know if the device was suspended beforehand, it cannot
> > depend on the execution of ->runtime_resume().
> 
> Exactly.  Therefore it needs to be told if pm_runtime_resume isn't 
> going to call the runtime_resume method, so that it can take 
> appropriate remedial action.

OK, it can return 1 if the status was already RPM_ACTIVE.

> > > > > To be determined: How runtime PM will interact with system sleep.
> > > > 
> > > > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > > > state, but that would involve canceling all of the pending requests.
> > > 
> > > Or simply freezing the workqueue.
> > 
> > Well, what about the synchronous calls?  How are we going to prevent them
> > from happening after freezing the workqueue?
> 
> How about your "rpm_disabled" flag?

That's fine, we'd also need to wait for running callbacks to finish too.  And
I'm still not convinced if we should preserve requests queued up before the
system sleep.  Or keep the suspend timer running for that matter.

> > Now there's a point in which allowing to set up the suspend timer at any time
> > simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
> > is called and it sees a timer pending, it deactivates the timer with
> > del_timer() and sets up a new one with add_timer().  It doesn't need to worry
> > about whether the suspend request has been queued up already or
> > pm_runtime_suspend() is running or something.  Things will work themselves out
> > anyway eventually.
> > 
> > Otherwise, after calling del_timer() we'll need to check if the timer was pending
> > and if it wasn't, then if the suspend requests has been queued up already, and
> > if it has, then if pm_runtime_suspend() is running (the current status is
> > RPM_SUSPENDING) etc.  That doesn't look particularly clean.
> 
> It's not as bad as you think.  In pseudo code:
> 
> 	ret = suspend_allowed(dev);
> 	if (ret)
> 		return ret;
> 	if (dev->power.timer_expiration) {
> 		del_timer(&dev->power.timer);
> 		dev->power.timer_expiration = 0;
> 	}
> 	if (dev->power.work_pending) {
> 		cancel_work(&dev->power.work);
> 		dev->power.work_pending = 0;
> 		dev->power.async_action = 0;
> 	}
> 	dev->power.timer_expiration = max(jiffies + delay, 1UL);
> 	mod_timer(&dev->power.timer, delay);
> 
> The middle section could usefully be put in a subroutine.

Could you please remind me what timer_expiration is for?

So, at a high level, the pm_request_* and pm_schedule_* functions would work
like this (I'm omitting acquiring and releasing locks):

pm_request_idle()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
    RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
  * return -EALREADY if 'request type' is RPM_REQ_IDLE
  * return -EINPROGRESS if 'idle notification in progress' is set
  * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
    ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
    on 'request type' at the time when the work function is run)
  * return 0

pm_schedule_suspend()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
  * return -EALREADY if 'runtime status' is RPM_SUSPENDED
  * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
  * if suspend timer is pending, deactivate it
  * if 'request type' is not RPM_REQ_NONE, cancel the work
  * set up a timer to execute pm_request_suspend()
  * return 0

pm_request_suspend()
  * return if 'disabled' is set or 'runtime_error' is set
  * return if 'usage_count' > 0 or 'child_count' > 0
  * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
  * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
  * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
    execute ->runtime_suspend()

pm_request_resume()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
  * return -EALREADY if 'request type' is RPM_REQ_RESUME
  * if suspend timer is pending, deactivate it
  * if 'request type' is not RPM_REQ_NONE, cancel the work
  * return 1 if 'runtime status' is RPM_ACTIVE
  * change 'request type' to RPM_REQ_RESUME and queue up a request to
    execute ->runtime_resume()
  * return 0

Or did I miss anything?

Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 23:05                                                                     ` Rafael J. Wysocki
  2009-07-03 20:58                                                                       ` Alan Stern
@ 2009-07-03 20:58                                                                       ` Alan Stern
  2009-07-03 23:57                                                                         ` Rafael J. Wysocki
  2009-07-03 23:57                                                                         ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-03 20:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:

> > ["error" isn't a good name.  The return value would be 0 to indicate 
> > the request was accepted and queued, or 1 to indicate the device is 
> > already active.  Or perhaps vice versa.]
> 
> Why do you insist on using positive values?  Also, there are other situations
> possible (like run-time PM is disabled etc.).

I think we should use positive values to indicate situations that
aren't the "nominal" case but also aren't errors.  This simplifies
error checking in drivers.  For example, you wouldn't want to print a
debugging or warning message just because the device happened to be
active already when you called pm_runtime_resume.


> I don't really like the async_action idea, as you might have noticed.

Do you mean that you don't like the field, or that you don't like its name?

> > > Thus, it seems reasonable to remember what type of a request is pending
> > > (i don't think we can figure it out from the status fields in 100% of the
> > > cases).
> > 
> > That's what the async_action field in my proposal is for.
> 
> Ah.  Why don't we just use a request type field instead?

"A rose by any other name..."

> In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.

> Additionally, we'll need an "idle notification is running" flag as we've aleady
> agreed, but that's independent on the status and request type (except that, I
> think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> this flag is set).

I don't see why; we can allow drivers to queue an idle notification
from within their runtime_idle routine (even though it might seem
pointless).  What we should forbid is calling pm_runtime_idle when the
flag is set.

> That would pretty much suffice to represent all of the possibilities.
> 
> I'd also add a "disabled" flag indicating that run-time PM of the device is
> disabled, an "error" flag indicating that one of the
> ->runtime_[suspend/resume]() callbacks has failed to do its job and
> and an int field to store the error code returned by the failing callback (in
> case the failure happened in an asynchronous routine).

Sure -- those are all things in the current design which should remain.  
As well as the wait_queue.


> That's fine, we'd also need to wait for running callbacks to finish too.  And
> I'm still not convinced if we should preserve requests queued up before the
> system sleep.  Or keep the suspend timer running for that matter.

This all does into the "to-be-determined" category.  :-)


> Could you please remind me what timer_expiration is for?

It is the jiffies value for the next timer expiration, or 0 if the
timer isn't pending.  Its purpose is to allow us to correctly
reschedule suspend requests.

Suppose the timer expires at about the same time as a new
pm_schedule_suspend call occurs.  If the timer routine hasn't queued
the work item yet then there's nothing to cancel, so how do we prevent
a suspend request from being added to the workqueue?  Answer: The timer
routine checks timer_expiration.  If the value stored there is in the
future, then the routine knows it was triggered early and it shouldn't
submit the work item.

Also (a minor benefit), before calling del_timer we can check whether
timer_expiration is nonzero.

> So, at a high level, the pm_request_* and pm_schedule_* functions would work
> like this (I'm omitting acquiring and releasing locks):
> 
> pm_request_idle()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
>     RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0

We should allow the status to be RPM_RESUMING.

>   * return -EALREADY if 'request type' is RPM_REQ_IDLE

No, return 0.

>   * return -EINPROGRESS if 'idle notification in progress' is set

No, go ahead and schedule another idle notification.

>   * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
>     ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
>     on 'request type' at the time when the work function is run)

More simply, just queue the work item.

>   * return 0
> 
> pm_schedule_suspend()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
>   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
>   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING

The last two aren't right.  If the status is RPM_SUSPENDED or
RPM_SUSPENDING, cancel any pending work and set the type to
RPM_REQ_NONE before returning.  In other words, cancel a possible 
pending resume request.

>   * if suspend timer is pending, deactivate it

This step isn't needed here, since you're going to restart the timer 
anyway.

>   * if 'request type' is not RPM_REQ_NONE, cancel the work

Set timer_expiration = jiffies + delay.

>   * set up a timer to execute pm_request_suspend()
>   * return 0
> 
> pm_request_suspend()
>   * return if 'disabled' is set or 'runtime_error' is set
>   * return if 'usage_count' > 0 or 'child_count' > 0
>   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED

First cancel a possible pending resume request.

If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
timer (and set time_expiration to 0).

>   * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
>   * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
>     execute ->runtime_suspend()
> 
> pm_request_resume()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING

Or RPM_ACTIVE.

>   * return -EALREADY if 'request type' is RPM_REQ_RESUME

For these last two, first cancel a possible pending suspend request
and a possible timer.  Should we leave a pending idle request in place?  
And return 1, not an error code.

>   * if suspend timer is pending, deactivate it

The timer can't be pending at this point.

>   * if 'request type' is not RPM_REQ_NONE, cancel the work

At this point, 'request type' can only be RPM_REQ_NONE or 
RPM_REQ_RESUME.  In neither case do we want to cancel it.

>   * return 1 if 'runtime status' is RPM_ACTIVE

See above.

>   * change 'request type' to RPM_REQ_RESUME and queue up a request to
>     execute ->runtime_resume()

Queue the request only if the state is RPM_SUSPENDED.

>   * return 0
> 
> Or did I miss anything?

I think this is pretty close.  It'll be necessary to go back and reread 
the old email messages to make sure this really does everything we 
eventually agreed on.  :-)

Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
pm_runtime_idle.  There is an extra requirement: When a suspend or
resume is over, if 'request type' is set then schedule the work item.  
Doing things this way allows the workqueue thread to avoid waiting
around for the suspend or resume to finish.

Also, when a resume is over we should schedule an idle notification 
even if 'request type' is clear, provided the counters are 0.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-02 23:05                                                                     ` Rafael J. Wysocki
@ 2009-07-03 20:58                                                                       ` Alan Stern
  2009-07-03 20:58                                                                       ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-03 20:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:

> > ["error" isn't a good name.  The return value would be 0 to indicate 
> > the request was accepted and queued, or 1 to indicate the device is 
> > already active.  Or perhaps vice versa.]
> 
> Why do you insist on using positive values?  Also, there are other situations
> possible (like run-time PM is disabled etc.).

I think we should use positive values to indicate situations that
aren't the "nominal" case but also aren't errors.  This simplifies
error checking in drivers.  For example, you wouldn't want to print a
debugging or warning message just because the device happened to be
active already when you called pm_runtime_resume.


> I don't really like the async_action idea, as you might have noticed.

Do you mean that you don't like the field, or that you don't like its name?

> > > Thus, it seems reasonable to remember what type of a request is pending
> > > (i don't think we can figure it out from the status fields in 100% of the
> > > cases).
> > 
> > That's what the async_action field in my proposal is for.
> 
> Ah.  Why don't we just use a request type field instead?

"A rose by any other name..."

> In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.

> Additionally, we'll need an "idle notification is running" flag as we've aleady
> agreed, but that's independent on the status and request type (except that, I
> think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> this flag is set).

I don't see why; we can allow drivers to queue an idle notification
from within their runtime_idle routine (even though it might seem
pointless).  What we should forbid is calling pm_runtime_idle when the
flag is set.

> That would pretty much suffice to represent all of the possibilities.
> 
> I'd also add a "disabled" flag indicating that run-time PM of the device is
> disabled, an "error" flag indicating that one of the
> ->runtime_[suspend/resume]() callbacks has failed to do its job and
> and an int field to store the error code returned by the failing callback (in
> case the failure happened in an asynchronous routine).

Sure -- those are all things in the current design which should remain.  
As well as the wait_queue.


> That's fine, we'd also need to wait for running callbacks to finish too.  And
> I'm still not convinced if we should preserve requests queued up before the
> system sleep.  Or keep the suspend timer running for that matter.

This all does into the "to-be-determined" category.  :-)


> Could you please remind me what timer_expiration is for?

It is the jiffies value for the next timer expiration, or 0 if the
timer isn't pending.  Its purpose is to allow us to correctly
reschedule suspend requests.

Suppose the timer expires at about the same time as a new
pm_schedule_suspend call occurs.  If the timer routine hasn't queued
the work item yet then there's nothing to cancel, so how do we prevent
a suspend request from being added to the workqueue?  Answer: The timer
routine checks timer_expiration.  If the value stored there is in the
future, then the routine knows it was triggered early and it shouldn't
submit the work item.

Also (a minor benefit), before calling del_timer we can check whether
timer_expiration is nonzero.

> So, at a high level, the pm_request_* and pm_schedule_* functions would work
> like this (I'm omitting acquiring and releasing locks):
> 
> pm_request_idle()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
>     RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0

We should allow the status to be RPM_RESUMING.

>   * return -EALREADY if 'request type' is RPM_REQ_IDLE

No, return 0.

>   * return -EINPROGRESS if 'idle notification in progress' is set

No, go ahead and schedule another idle notification.

>   * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
>     ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
>     on 'request type' at the time when the work function is run)

More simply, just queue the work item.

>   * return 0
> 
> pm_schedule_suspend()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
>   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
>   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING

The last two aren't right.  If the status is RPM_SUSPENDED or
RPM_SUSPENDING, cancel any pending work and set the type to
RPM_REQ_NONE before returning.  In other words, cancel a possible 
pending resume request.

>   * if suspend timer is pending, deactivate it

This step isn't needed here, since you're going to restart the timer 
anyway.

>   * if 'request type' is not RPM_REQ_NONE, cancel the work

Set timer_expiration = jiffies + delay.

>   * set up a timer to execute pm_request_suspend()
>   * return 0
> 
> pm_request_suspend()
>   * return if 'disabled' is set or 'runtime_error' is set
>   * return if 'usage_count' > 0 or 'child_count' > 0
>   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED

First cancel a possible pending resume request.

If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
timer (and set time_expiration to 0).

>   * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
>   * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
>     execute ->runtime_suspend()
> 
> pm_request_resume()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING

Or RPM_ACTIVE.

>   * return -EALREADY if 'request type' is RPM_REQ_RESUME

For these last two, first cancel a possible pending suspend request
and a possible timer.  Should we leave a pending idle request in place?  
And return 1, not an error code.

>   * if suspend timer is pending, deactivate it

The timer can't be pending at this point.

>   * if 'request type' is not RPM_REQ_NONE, cancel the work

At this point, 'request type' can only be RPM_REQ_NONE or 
RPM_REQ_RESUME.  In neither case do we want to cancel it.

>   * return 1 if 'runtime status' is RPM_ACTIVE

See above.

>   * change 'request type' to RPM_REQ_RESUME and queue up a request to
>     execute ->runtime_resume()

Queue the request only if the state is RPM_SUSPENDED.

>   * return 0
> 
> Or did I miss anything?

I think this is pretty close.  It'll be necessary to go back and reread 
the old email messages to make sure this really does everything we 
eventually agreed on.  :-)

Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
pm_runtime_idle.  There is an extra requirement: When a suspend or
resume is over, if 'request type' is set then schedule the work item.  
Doing things this way allows the workqueue thread to avoid waiting
around for the suspend or resume to finish.

Also, when a resume is over we should schedule an idle notification 
even if 'request type' is clear, provided the counters are 0.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-03 20:58                                                                       ` Alan Stern
@ 2009-07-03 23:57                                                                         ` Rafael J. Wysocki
  2009-07-04  3:12                                                                           ` Alan Stern
  2009-07-04  3:12                                                                           ` Alan Stern
  2009-07-03 23:57                                                                         ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-03 23:57 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 03 July 2009, Alan Stern wrote:
> On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > ["error" isn't a good name.  The return value would be 0 to indicate 
> > > the request was accepted and queued, or 1 to indicate the device is 
> > > already active.  Or perhaps vice versa.]
> > 
> > Why do you insist on using positive values?  Also, there are other situations
> > possible (like run-time PM is disabled etc.).
> 
> I think we should use positive values to indicate situations that
> aren't the "nominal" case but also aren't errors.  This simplifies
> error checking in drivers.  For example, you wouldn't want to print a
> debugging or warning message just because the device happened to be
> active already when you called pm_runtime_resume.

OK

> > I don't really like the async_action idea, as you might have noticed.
> 
> Do you mean that you don't like the field, or that you don't like its name?

The name, actually.  That's because I'd like to use the values for something
that's not 'async' in substance (more on that later).
 
> > > > Thus, it seems reasonable to remember what type of a request is pending
> > > > (i don't think we can figure it out from the status fields in 100% of the
> > > > cases).
> > > 
> > > That's what the async_action field in my proposal is for.
> > 
> > Ah.  Why don't we just use a request type field instead?
> 
> "A rose by any other name..."
> 
> > In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> > RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> > (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).
> 
> That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.
> 
> > Additionally, we'll need an "idle notification is running" flag as we've aleady
> > agreed, but that's independent on the status and request type (except that, I
> > think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> > this flag is set).
> 
> I don't see why; we can allow drivers to queue an idle notification
> from within their runtime_idle routine (even though it might seem
> pointless).  What we should forbid is calling pm_runtime_idle when the
> flag is set.

OK

> > That would pretty much suffice to represent all of the possibilities.
> > 
> > I'd also add a "disabled" flag indicating that run-time PM of the device is
> > disabled, an "error" flag indicating that one of the
> > ->runtime_[suspend/resume]() callbacks has failed to do its job and
> > and an int field to store the error code returned by the failing callback (in
> > case the failure happened in an asynchronous routine).
> 
> Sure -- those are all things in the current design which should remain.  
> As well as the wait_queue.

It occured to me in the meantime that if we added a 'request_pending' (or
'work_pending' or whatever similar) flag to the above, we could avoid using
cancel_work().  Namely, if 'request_pending' indicates that there's a work item
queued up, we could change 'request type' to NONE in case we didn't want the
work function to do anything.  Then, the work function would just unset
'request_pending' and quit if 'request type' is NONE.

I generally like the idea of changing 'request type' on the fly once we've
noticed that the currently pending request should be replaced by another one.
That would require us to introduce a big idle-suspend-resume function
choosing the callback to run based on 'request type', which would be quite
complicated.  But that function could also be used for the 'synchronous'
operations, so perhaps it's worth trying?

Such a function can take two arguments, dev and request, where the second
one determines the callback to run.  It can take the same values as 'request
type', where NONE means "you've been called from the workqueue, use 'request
type' from dev to check what to do", but your ASYNC_* names are not really
suitable here. :-)

> > That's fine, we'd also need to wait for running callbacks to finish too.  And
> > I'm still not convinced if we should preserve requests queued up before the
> > system sleep.  Or keep the suspend timer running for that matter.
> 
> This all does into the "to-be-determined" category.  :-)

Well, I'd like to choose something to start with.

> > Could you please remind me what timer_expiration is for?
> 
> It is the jiffies value for the next timer expiration, or 0 if the
> timer isn't pending.  Its purpose is to allow us to correctly
> reschedule suspend requests.
> 
> Suppose the timer expires at about the same time as a new
> pm_schedule_suspend call occurs.  If the timer routine hasn't queued
> the work item yet then there's nothing to cancel, so how do we prevent
> a suspend request from being added to the workqueue?  Answer: The timer
> routine checks timer_expiration.  If the value stored there is in the
> future, then the routine knows it was triggered early and it shouldn't
> submit the work item.
> 
> Also (a minor benefit), before calling del_timer we can check whether
> timer_expiration is nonzero.

OK, thanks.

> > So, at a high level, the pm_request_* and pm_schedule_* functions would work
> > like this (I'm omitting acquiring and releasing locks):
> > 
> > pm_request_idle()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
> >     RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
> 
> We should allow the status to be RPM_RESUMING.

OK

> >   * return -EALREADY if 'request type' is RPM_REQ_IDLE
> 
> No, return 0.

OK

> >   * return -EINPROGRESS if 'idle notification in progress' is set
> 
> No, go ahead and schedule another idle notification.

OK

> >   * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
> >     ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
> >     on 'request type' at the time when the work function is run)
> 
> More simply, just queue the work item.
> 
> >   * return 0
> > 
> > pm_schedule_suspend()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> 
> The last two aren't right.  If the status is RPM_SUSPENDED or
> RPM_SUSPENDING, cancel any pending work and set the type to
> RPM_REQ_NONE before returning.  In other words, cancel a possible 
> pending resume request.

Wy do you think the possible pending resume request should be canceled?

I don't really agree here.  Resume request really means there's data to
process, so we shouldn't cancel pending resume requests IMO.

The driver should be given a chance to process data in ->runtime_resume()
even if it doesn't use the usage counter.  Otherwise, the usage counter would
always have to be used along with resume requests, so having
pm_request_resume() that doesn't increment the usage counter would really be
pointless.

> >   * if suspend timer is pending, deactivate it
> 
> This step isn't needed here, since you're going to restart the timer 
> anyway.

OK, restart the timer.

> >   * if 'request type' is not RPM_REQ_NONE, cancel the work
> 
> Set timer_expiration = jiffies + delay.

OK

> >   * set up a timer to execute pm_request_suspend()
> >   * return 0
> > 
> > pm_request_suspend()
> >   * return if 'disabled' is set or 'runtime_error' is set
> >   * return if 'usage_count' > 0 or 'child_count' > 0
> >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> 
> First cancel a possible pending resume request.

I disagree.

> If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> timer (and set time_expiration to 0).

We're the timer function, so either the timer is not pending, or we've been
executed too early.

> >   * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
> >   * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
> >     execute ->runtime_suspend()
> > 
> > pm_request_resume()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> 
> Or RPM_ACTIVE.

Maybe return 1 in that case?

> >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> 
> For these last two, first cancel a possible pending suspend request
> and a possible timer.

Possible timer only, I think.  if 'request type' is RESUME, there can't be
suspend request pending.

> Should we leave a pending idle request in place?  

Probably not.  It's likely going to result in a suspend request that we would
cancel.

> And return 1, not an error code.
> 
> >   * if suspend timer is pending, deactivate it
> 
> The timer can't be pending at this point.

That's if we deactivated it earlier.  OK

> >   * if 'request type' is not RPM_REQ_NONE, cancel the work
> 
> At this point, 'request type' can only be RPM_REQ_NONE or 
> RPM_REQ_RESUME.  In neither case do we want to cancel it.
> 
> >   * return 1 if 'runtime status' is RPM_ACTIVE
> 
> See above.
> 
> >   * change 'request type' to RPM_REQ_RESUME and queue up a request to
> >     execute ->runtime_resume()
> 
> Queue the request only if the state is RPM_SUSPENDED.
> 
> >   * return 0
> > 
> > Or did I miss anything?
> 
> I think this is pretty close.  It'll be necessary to go back and reread 
> the old email messages to make sure this really does everything we 
> eventually agreed on.  :-)

I think it's sufficient if we agree on the final version. :-)

> Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> pm_runtime_idle.  There is an extra requirement: When a suspend or
> resume is over, if 'request type' is set then schedule the work item.  
> Doing things this way allows the workqueue thread to avoid waiting
> around for the suspend or resume to finish.

I agree except that I would like suspends to just fail when the status is
RPM_RESUMING.  The reason is that a sloppily written driver could enter a
busy-loop of suspending-resuming the device, without the possibility to process
data, if there's full symmetry between suspend and resume.  So, I'd like to
break that symmetry and make resume operations privileged with respect to
suspend and idle notifications.  

> Also, when a resume is over we should schedule an idle notification 
> even if 'request type' is clear, provided the counters are 0.

Agreed.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-03 20:58                                                                       ` Alan Stern
  2009-07-03 23:57                                                                         ` Rafael J. Wysocki
@ 2009-07-03 23:57                                                                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-03 23:57 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Friday 03 July 2009, Alan Stern wrote:
> On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > ["error" isn't a good name.  The return value would be 0 to indicate 
> > > the request was accepted and queued, or 1 to indicate the device is 
> > > already active.  Or perhaps vice versa.]
> > 
> > Why do you insist on using positive values?  Also, there are other situations
> > possible (like run-time PM is disabled etc.).
> 
> I think we should use positive values to indicate situations that
> aren't the "nominal" case but also aren't errors.  This simplifies
> error checking in drivers.  For example, you wouldn't want to print a
> debugging or warning message just because the device happened to be
> active already when you called pm_runtime_resume.

OK

> > I don't really like the async_action idea, as you might have noticed.
> 
> Do you mean that you don't like the field, or that you don't like its name?

The name, actually.  That's because I'd like to use the values for something
that's not 'async' in substance (more on that later).
 
> > > > Thus, it seems reasonable to remember what type of a request is pending
> > > > (i don't think we can figure it out from the status fields in 100% of the
> > > > cases).
> > > 
> > > That's what the async_action field in my proposal is for.
> > 
> > Ah.  Why don't we just use a request type field instead?
> 
> "A rose by any other name..."
> 
> > In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> > RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> > (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).
> 
> That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.
> 
> > Additionally, we'll need an "idle notification is running" flag as we've aleady
> > agreed, but that's independent on the status and request type (except that, I
> > think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> > this flag is set).
> 
> I don't see why; we can allow drivers to queue an idle notification
> from within their runtime_idle routine (even though it might seem
> pointless).  What we should forbid is calling pm_runtime_idle when the
> flag is set.

OK

> > That would pretty much suffice to represent all of the possibilities.
> > 
> > I'd also add a "disabled" flag indicating that run-time PM of the device is
> > disabled, an "error" flag indicating that one of the
> > ->runtime_[suspend/resume]() callbacks has failed to do its job and
> > and an int field to store the error code returned by the failing callback (in
> > case the failure happened in an asynchronous routine).
> 
> Sure -- those are all things in the current design which should remain.  
> As well as the wait_queue.

It occured to me in the meantime that if we added a 'request_pending' (or
'work_pending' or whatever similar) flag to the above, we could avoid using
cancel_work().  Namely, if 'request_pending' indicates that there's a work item
queued up, we could change 'request type' to NONE in case we didn't want the
work function to do anything.  Then, the work function would just unset
'request_pending' and quit if 'request type' is NONE.

I generally like the idea of changing 'request type' on the fly once we've
noticed that the currently pending request should be replaced by another one.
That would require us to introduce a big idle-suspend-resume function
choosing the callback to run based on 'request type', which would be quite
complicated.  But that function could also be used for the 'synchronous'
operations, so perhaps it's worth trying?

Such a function can take two arguments, dev and request, where the second
one determines the callback to run.  It can take the same values as 'request
type', where NONE means "you've been called from the workqueue, use 'request
type' from dev to check what to do", but your ASYNC_* names are not really
suitable here. :-)

> > That's fine, we'd also need to wait for running callbacks to finish too.  And
> > I'm still not convinced if we should preserve requests queued up before the
> > system sleep.  Or keep the suspend timer running for that matter.
> 
> This all does into the "to-be-determined" category.  :-)

Well, I'd like to choose something to start with.

> > Could you please remind me what timer_expiration is for?
> 
> It is the jiffies value for the next timer expiration, or 0 if the
> timer isn't pending.  Its purpose is to allow us to correctly
> reschedule suspend requests.
> 
> Suppose the timer expires at about the same time as a new
> pm_schedule_suspend call occurs.  If the timer routine hasn't queued
> the work item yet then there's nothing to cancel, so how do we prevent
> a suspend request from being added to the workqueue?  Answer: The timer
> routine checks timer_expiration.  If the value stored there is in the
> future, then the routine knows it was triggered early and it shouldn't
> submit the work item.
> 
> Also (a minor benefit), before calling del_timer we can check whether
> timer_expiration is nonzero.

OK, thanks.

> > So, at a high level, the pm_request_* and pm_schedule_* functions would work
> > like this (I'm omitting acquiring and releasing locks):
> > 
> > pm_request_idle()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
> >     RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
> 
> We should allow the status to be RPM_RESUMING.

OK

> >   * return -EALREADY if 'request type' is RPM_REQ_IDLE
> 
> No, return 0.

OK

> >   * return -EINPROGRESS if 'idle notification in progress' is set
> 
> No, go ahead and schedule another idle notification.

OK

> >   * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
> >     ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
> >     on 'request type' at the time when the work function is run)
> 
> More simply, just queue the work item.
> 
> >   * return 0
> > 
> > pm_schedule_suspend()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> 
> The last two aren't right.  If the status is RPM_SUSPENDED or
> RPM_SUSPENDING, cancel any pending work and set the type to
> RPM_REQ_NONE before returning.  In other words, cancel a possible 
> pending resume request.

Wy do you think the possible pending resume request should be canceled?

I don't really agree here.  Resume request really means there's data to
process, so we shouldn't cancel pending resume requests IMO.

The driver should be given a chance to process data in ->runtime_resume()
even if it doesn't use the usage counter.  Otherwise, the usage counter would
always have to be used along with resume requests, so having
pm_request_resume() that doesn't increment the usage counter would really be
pointless.

> >   * if suspend timer is pending, deactivate it
> 
> This step isn't needed here, since you're going to restart the timer 
> anyway.

OK, restart the timer.

> >   * if 'request type' is not RPM_REQ_NONE, cancel the work
> 
> Set timer_expiration = jiffies + delay.

OK

> >   * set up a timer to execute pm_request_suspend()
> >   * return 0
> > 
> > pm_request_suspend()
> >   * return if 'disabled' is set or 'runtime_error' is set
> >   * return if 'usage_count' > 0 or 'child_count' > 0
> >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> 
> First cancel a possible pending resume request.

I disagree.

> If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> timer (and set time_expiration to 0).

We're the timer function, so either the timer is not pending, or we've been
executed too early.

> >   * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
> >   * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
> >     execute ->runtime_suspend()
> > 
> > pm_request_resume()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> 
> Or RPM_ACTIVE.

Maybe return 1 in that case?

> >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> 
> For these last two, first cancel a possible pending suspend request
> and a possible timer.

Possible timer only, I think.  if 'request type' is RESUME, there can't be
suspend request pending.

> Should we leave a pending idle request in place?  

Probably not.  It's likely going to result in a suspend request that we would
cancel.

> And return 1, not an error code.
> 
> >   * if suspend timer is pending, deactivate it
> 
> The timer can't be pending at this point.

That's if we deactivated it earlier.  OK

> >   * if 'request type' is not RPM_REQ_NONE, cancel the work
> 
> At this point, 'request type' can only be RPM_REQ_NONE or 
> RPM_REQ_RESUME.  In neither case do we want to cancel it.
> 
> >   * return 1 if 'runtime status' is RPM_ACTIVE
> 
> See above.
> 
> >   * change 'request type' to RPM_REQ_RESUME and queue up a request to
> >     execute ->runtime_resume()
> 
> Queue the request only if the state is RPM_SUSPENDED.
> 
> >   * return 0
> > 
> > Or did I miss anything?
> 
> I think this is pretty close.  It'll be necessary to go back and reread 
> the old email messages to make sure this really does everything we 
> eventually agreed on.  :-)

I think it's sufficient if we agree on the final version. :-)

> Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> pm_runtime_idle.  There is an extra requirement: When a suspend or
> resume is over, if 'request type' is set then schedule the work item.  
> Doing things this way allows the workqueue thread to avoid waiting
> around for the suspend or resume to finish.

I agree except that I would like suspends to just fail when the status is
RPM_RESUMING.  The reason is that a sloppily written driver could enter a
busy-loop of suspending-resuming the device, without the possibility to process
data, if there's full symmetry between suspend and resume.  So, I'd like to
break that symmetry and make resume operations privileged with respect to
suspend and idle notifications.  

> Also, when a resume is over we should schedule an idle notification 
> even if 'request type' is clear, provided the counters are 0.

Agreed.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-03 23:57                                                                         ` Rafael J. Wysocki
  2009-07-04  3:12                                                                           ` Alan Stern
@ 2009-07-04  3:12                                                                           ` Alan Stern
  2009-07-04 21:27                                                                             ` Rafael J. Wysocki
  2009-07-04 21:27                                                                             ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-04  3:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > > I don't really like the async_action idea, as you might have noticed.
> > 
> > Do you mean that you don't like the field, or that you don't like its name?
> 
> The name, actually.  That's because I'd like to use the values for something
> that's not 'async' in substance (more on that later).

Okay.  I don't care about the name.

> It occured to me in the meantime that if we added a 'request_pending' (or
> 'work_pending' or whatever similar) flag to the above, we could avoid using
> cancel_work().  Namely, if 'request_pending' indicates that there's a work item
> queued up, we could change 'request type' to NONE in case we didn't want the
> work function to do anything.  Then, the work function would just unset
> 'request_pending' and quit if 'request type' is NONE.

You mean use request_pending to decide whether to call cancel_work, 
instead of looking at request_type?  That's right.

As for whether or not we should actually call cancel_work...  Which is 
more expensive: Calling cancel_work when no work is pending, or letting 
the work item run when it doesn't have anything to do?  Probably the 
latter.

> I generally like the idea of changing 'request type' on the fly once we've
> noticed that the currently pending request should be replaced by another one.

Me too.

> That would require us to introduce a big idle-suspend-resume function
> choosing the callback to run based on 'request type', which would be quite
> complicated.

It doesn't have to be very big or complicated:

	spin_lock_irq(&dev->power.lock);
	switch (dev->power.request_type) {
	case RPM_REQ_SUSPEND:
		__pm_runtime_suspend(dev, false);
		break;
	case RPM_REQ_RESUME:
		__pm_runtime_resume(dev, false);
		break;
	case RPM_REQ_IDLE:
		__pm_runtime_idle(dev, false);
		break;
	default:
	}
	spin_unlock_irq(&dev->power.lock);

It would be necessary to change the __pm_runtime_* routines, since they
would now have to be called with the lock held.

>  But that function could also be used for the 'synchronous'
> operations, so perhaps it's worth trying?
> 
> Such a function can take two arguments, dev and request, where the second
> one determines the callback to run.  It can take the same values as 'request
> type', where NONE means "you've been called from the workqueue, use 'request
> type' from dev to check what to do", but your ASYNC_* names are not really
> suitable here. :-)

I don't see any advantage in that approach.  The pm_runtime_* functions
already know what they want to do.  Why encode it in a request argument
only to decode it again?


> > > That's fine, we'd also need to wait for running callbacks to
> > > finish too.  And I'm still not convinced if we should preserve
> > > requests queued up before the system sleep.  Or keep the suspend
> > > timer running for that matter.
> > 
> > This all does into the "to-be-determined" category.  :-)
> 
> Well, I'd like to choose something to start with.

Pending suspends and the suspend timer don't matter much; we can cancel
them because they ought to get resubmitted after the system wakes up.  
Pending resumes are more difficult; depending on how they are treated
they could morph into immediate wakeup requests.

Perhaps even more tricky is how to handle things like the ACPI suspend
calls when the device is already runtime-suspended.  I don't know what 
we should do about that.


> > > pm_schedule_suspend()
> > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> > 
> > The last two aren't right.  If the status is RPM_SUSPENDED or
> > RPM_SUSPENDING, cancel any pending work and set the type to
> > RPM_REQ_NONE before returning.  In other words, cancel a possible 
> > pending resume request.
> 
> Wy do you think the possible pending resume request should be canceled?

It's part of the "most recent request wins" approach.

> I don't really agree here.  Resume request really means there's data to
> process, so we shouldn't cancel pending resume requests IMO.
> 
> The driver should be given a chance to process data in ->runtime_resume()
> even if it doesn't use the usage counter.  Otherwise, the usage counter would
> always have to be used along with resume requests, so having
> pm_request_resume() that doesn't increment the usage counter would really be
> pointless.

All right, I'll go along with this.  So instead of "most recent request 
wins", we have something like this:

	Resume requests (queued or in progress) override suspend and 
	idle requests (sync or async).

	Suspend requests (queued or in progress, but not unexpired)
	override idle requests (sync or async).

But this statement might not be precise enough, and I'm too tired to
think through all the ramifications right now.


> > > pm_request_suspend()
> > >   * return if 'disabled' is set or 'runtime_error' is set
> > >   * return if 'usage_count' > 0 or 'child_count' > 0
> > >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> > 
> > First cancel a possible pending resume request.
> 
> I disagree.

This is the same as the above, right?

> > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> > timer (and set time_expiration to 0).
> 
> We're the timer function, so either the timer is not pending, or we've been
> executed too early.

Oh, okay.  I thought perhaps you might have wanted to export
pm_request_suspend.  But this is really supposed to be just the timer 
handler.


> > > pm_request_resume()
> > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> > 
> > Or RPM_ACTIVE.
> 
> Maybe return 1 in that case?

Yes.

> > >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> > 
> > For these last two, first cancel a possible pending suspend request
> > and a possible timer.
> 
> Possible timer only, I think.  if 'request type' is RESUME, there can't be
> suspend request pending.

But there can be if the status is RPM_ACTIVE.  So okay, if the status 
isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME, 
then return 0.


> > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > pm_runtime_idle.  There is an extra requirement: When a suspend or
> > resume is over, if 'request type' is set then schedule the work item.  
> > Doing things this way allows the workqueue thread to avoid waiting
> > around for the suspend or resume to finish.
> 
> I agree except that I would like suspends to just fail when the status is
> RPM_RESUMING.  The reason is that a sloppily written driver could enter a
> busy-loop of suspending-resuming the device, without the possibility to process
> data, if there's full symmetry between suspend and resume.  So, I'd like to
> break that symmetry and make resume operations privileged with respect to
> suspend and idle notifications.  

This follows from the new precedence rule.

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-03 23:57                                                                         ` Rafael J. Wysocki
@ 2009-07-04  3:12                                                                           ` Alan Stern
  2009-07-04  3:12                                                                           ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-04  3:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > > I don't really like the async_action idea, as you might have noticed.
> > 
> > Do you mean that you don't like the field, or that you don't like its name?
> 
> The name, actually.  That's because I'd like to use the values for something
> that's not 'async' in substance (more on that later).

Okay.  I don't care about the name.

> It occured to me in the meantime that if we added a 'request_pending' (or
> 'work_pending' or whatever similar) flag to the above, we could avoid using
> cancel_work().  Namely, if 'request_pending' indicates that there's a work item
> queued up, we could change 'request type' to NONE in case we didn't want the
> work function to do anything.  Then, the work function would just unset
> 'request_pending' and quit if 'request type' is NONE.

You mean use request_pending to decide whether to call cancel_work, 
instead of looking at request_type?  That's right.

As for whether or not we should actually call cancel_work...  Which is 
more expensive: Calling cancel_work when no work is pending, or letting 
the work item run when it doesn't have anything to do?  Probably the 
latter.

> I generally like the idea of changing 'request type' on the fly once we've
> noticed that the currently pending request should be replaced by another one.

Me too.

> That would require us to introduce a big idle-suspend-resume function
> choosing the callback to run based on 'request type', which would be quite
> complicated.

It doesn't have to be very big or complicated:

	spin_lock_irq(&dev->power.lock);
	switch (dev->power.request_type) {
	case RPM_REQ_SUSPEND:
		__pm_runtime_suspend(dev, false);
		break;
	case RPM_REQ_RESUME:
		__pm_runtime_resume(dev, false);
		break;
	case RPM_REQ_IDLE:
		__pm_runtime_idle(dev, false);
		break;
	default:
	}
	spin_unlock_irq(&dev->power.lock);

It would be necessary to change the __pm_runtime_* routines, since they
would now have to be called with the lock held.

>  But that function could also be used for the 'synchronous'
> operations, so perhaps it's worth trying?
> 
> Such a function can take two arguments, dev and request, where the second
> one determines the callback to run.  It can take the same values as 'request
> type', where NONE means "you've been called from the workqueue, use 'request
> type' from dev to check what to do", but your ASYNC_* names are not really
> suitable here. :-)

I don't see any advantage in that approach.  The pm_runtime_* functions
already know what they want to do.  Why encode it in a request argument
only to decode it again?


> > > That's fine, we'd also need to wait for running callbacks to
> > > finish too.  And I'm still not convinced if we should preserve
> > > requests queued up before the system sleep.  Or keep the suspend
> > > timer running for that matter.
> > 
> > This all does into the "to-be-determined" category.  :-)
> 
> Well, I'd like to choose something to start with.

Pending suspends and the suspend timer don't matter much; we can cancel
them because they ought to get resubmitted after the system wakes up.  
Pending resumes are more difficult; depending on how they are treated
they could morph into immediate wakeup requests.

Perhaps even more tricky is how to handle things like the ACPI suspend
calls when the device is already runtime-suspended.  I don't know what 
we should do about that.


> > > pm_schedule_suspend()
> > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> > 
> > The last two aren't right.  If the status is RPM_SUSPENDED or
> > RPM_SUSPENDING, cancel any pending work and set the type to
> > RPM_REQ_NONE before returning.  In other words, cancel a possible 
> > pending resume request.
> 
> Wy do you think the possible pending resume request should be canceled?

It's part of the "most recent request wins" approach.

> I don't really agree here.  Resume request really means there's data to
> process, so we shouldn't cancel pending resume requests IMO.
> 
> The driver should be given a chance to process data in ->runtime_resume()
> even if it doesn't use the usage counter.  Otherwise, the usage counter would
> always have to be used along with resume requests, so having
> pm_request_resume() that doesn't increment the usage counter would really be
> pointless.

All right, I'll go along with this.  So instead of "most recent request 
wins", we have something like this:

	Resume requests (queued or in progress) override suspend and 
	idle requests (sync or async).

	Suspend requests (queued or in progress, but not unexpired)
	override idle requests (sync or async).

But this statement might not be precise enough, and I'm too tired to
think through all the ramifications right now.


> > > pm_request_suspend()
> > >   * return if 'disabled' is set or 'runtime_error' is set
> > >   * return if 'usage_count' > 0 or 'child_count' > 0
> > >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> > 
> > First cancel a possible pending resume request.
> 
> I disagree.

This is the same as the above, right?

> > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> > timer (and set time_expiration to 0).
> 
> We're the timer function, so either the timer is not pending, or we've been
> executed too early.

Oh, okay.  I thought perhaps you might have wanted to export
pm_request_suspend.  But this is really supposed to be just the timer 
handler.


> > > pm_request_resume()
> > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> > 
> > Or RPM_ACTIVE.
> 
> Maybe return 1 in that case?

Yes.

> > >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> > 
> > For these last two, first cancel a possible pending suspend request
> > and a possible timer.
> 
> Possible timer only, I think.  if 'request type' is RESUME, there can't be
> suspend request pending.

But there can be if the status is RPM_ACTIVE.  So okay, if the status 
isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME, 
then return 0.


> > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > pm_runtime_idle.  There is an extra requirement: When a suspend or
> > resume is over, if 'request type' is set then schedule the work item.  
> > Doing things this way allows the workqueue thread to avoid waiting
> > around for the suspend or resume to finish.
> 
> I agree except that I would like suspends to just fail when the status is
> RPM_RESUMING.  The reason is that a sloppily written driver could enter a
> busy-loop of suspending-resuming the device, without the possibility to process
> data, if there's full symmetry between suspend and resume.  So, I'd like to
> break that symmetry and make resume operations privileged with respect to
> suspend and idle notifications.  

This follows from the new precedence rule.

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-04  3:12                                                                           ` Alan Stern
  2009-07-04 21:27                                                                             ` Rafael J. Wysocki
@ 2009-07-04 21:27                                                                             ` Rafael J. Wysocki
  2009-07-05 14:50                                                                               ` Alan Stern
  2009-07-05 14:50                                                                               ` Alan Stern
  1 sibling, 2 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-04 21:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Saturday 04 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > > I don't really like the async_action idea, as you might have noticed.
> > > 
> > > Do you mean that you don't like the field, or that you don't like its name?
> > 
> > The name, actually.  That's because I'd like to use the values for something
> > that's not 'async' in substance (more on that later).
> 
> Okay.  I don't care about the name.
> 
> > It occured to me in the meantime that if we added a 'request_pending' (or
> > 'work_pending' or whatever similar) flag to the above, we could avoid using
> > cancel_work().  Namely, if 'request_pending' indicates that there's a work item
> > queued up, we could change 'request type' to NONE in case we didn't want the
> > work function to do anything.  Then, the work function would just unset
> > 'request_pending' and quit if 'request type' is NONE.
> 
> You mean use request_pending to decide whether to call cancel_work, 
> instead of looking at request_type?  That's right.
> 
> As for whether or not we should actually call cancel_work...  Which is 
> more expensive: Calling cancel_work when no work is pending, or letting 
> the work item run when it doesn't have anything to do?  Probably the 
> latter.

Agreed, but that doesn't affect functionality.  We can get the desired
functionality without the cancel_work() patch and then optimize things along
with that patch.  This way it'll be easier to demontrate the benefit of it.

> > I generally like the idea of changing 'request type' on the fly once we've
> > noticed that the currently pending request should be replaced by another one.
> 
> Me too.
> 
> > That would require us to introduce a big idle-suspend-resume function
> > choosing the callback to run based on 'request type', which would be quite
> > complicated.
> 
> It doesn't have to be very big or complicated:
> 
> 	spin_lock_irq(&dev->power.lock);

Ah, that is the detail I overlooked, we don't need to use
spin_lock_irqsave(), because these function will always be called with
interrupts enabled.

> 	switch (dev->power.request_type) {
> 	case RPM_REQ_SUSPEND:
> 		__pm_runtime_suspend(dev, false);
> 		break;
> 	case RPM_REQ_RESUME:
> 		__pm_runtime_resume(dev, false);
> 		break;
> 	case RPM_REQ_IDLE:
> 		__pm_runtime_idle(dev, false);
> 		break;
> 	default:
> 	}
> 	spin_unlock_irq(&dev->power.lock);
> 
> It would be necessary to change the __pm_runtime_* routines, since they
> would now have to be called with the lock held.
> 
> >  But that function could also be used for the 'synchronous'
> > operations, so perhaps it's worth trying?
> > 
> > Such a function can take two arguments, dev and request, where the second
> > one determines the callback to run.  It can take the same values as 'request
> > type', where NONE means "you've been called from the workqueue, use 'request
> > type' from dev to check what to do", but your ASYNC_* names are not really
> > suitable here. :-)
> 
> I don't see any advantage in that approach.  The pm_runtime_* functions
> already know what they want to do.  Why encode it in a request argument
> only to decode it again?

Well, scratch that anyway, I thought it would be necessary because of the
'irqsave' locking.

> > > > That's fine, we'd also need to wait for running callbacks to
> > > > finish too.  And I'm still not convinced if we should preserve
> > > > requests queued up before the system sleep.  Or keep the suspend
> > > > timer running for that matter.
> > > 
> > > This all does into the "to-be-determined" category.  :-)
> > 
> > Well, I'd like to choose something to start with.
> 
> Pending suspends and the suspend timer don't matter much; we can cancel
> them because they ought to get resubmitted after the system wakes up.  
> Pending resumes are more difficult; depending on how they are treated
> they could morph into immediate wakeup requests.
> 
> Perhaps even more tricky is how to handle things like the ACPI suspend
> calls when the device is already runtime-suspended.  I don't know what 
> we should do about that.

That almost entirely depends on the bus type.  For PCI and probably PNP as well
there's a notion of ACPI low power states and there are AML methods to put the
devices into these states.  Unfortunately, the ACPI low power state to put the
device into depends on the target sleep state of the system, so these devices
will probably have to be put into D0 before system suspend anyway.

I think that the bus type can handle this as long as it knows the state the
device is in before system suspend.  So, the only thing the core should do is
to block the execution of run-time PM framework functions during system
sleep and resume.  The state it leaves the device in shouldn't matter.

So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
device and wait for all run-time PM operations on it in progress to complete.

In the 'disabled' state the bus type or driver can modify the run-time PM
status to whatever they like anyway.  Perhaps we can provide a helper to
change 'request type' to RPM_REQ_NONE.

> > > > pm_schedule_suspend()
> > > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > > >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > > >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> > > 
> > > The last two aren't right.  If the status is RPM_SUSPENDED or
> > > RPM_SUSPENDING, cancel any pending work and set the type to
> > > RPM_REQ_NONE before returning.  In other words, cancel a possible 
> > > pending resume request.
> > 
> > Wy do you think the possible pending resume request should be canceled?
> 
> It's part of the "most recent request wins" approach.
>
> > I don't really agree here.  Resume request really means there's data to
> > process, so we shouldn't cancel pending resume requests IMO.
> > 
> > The driver should be given a chance to process data in ->runtime_resume()
> > even if it doesn't use the usage counter.  Otherwise, the usage counter would
> > always have to be used along with resume requests, so having
> > pm_request_resume() that doesn't increment the usage counter would really be
> > pointless.
> 
> All right, I'll go along with this.  So instead of "most recent request 
> wins", we have something like this:
> 
> 	Resume requests (queued or in progress) override suspend and 
> 	idle requests (sync or async).
> 
> 	Suspend requests (queued or in progress, but not unexpired)
> 	override idle requests (sync or async).
> 
> But this statement might not be precise enough, and I'm too tired to
> think through all the ramifications right now.

Fair enough. :-)

> > > > pm_request_suspend()
> > > >   * return if 'disabled' is set or 'runtime_error' is set
> > > >   * return if 'usage_count' > 0 or 'child_count' > 0
> > > >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> > > 
> > > First cancel a possible pending resume request.
> > 
> > I disagree.
> 
> This is the same as the above, right?

Yes.

> > > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> > > timer (and set time_expiration to 0).
> > 
> > We're the timer function, so either the timer is not pending, or we've been
> > executed too early.
> 
> Oh, okay.  I thought perhaps you might have wanted to export
> pm_request_suspend.  But this is really supposed to be just the timer 
> handler.

Yes, it is.

> > > > pm_request_resume()
> > > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> > > 
> > > Or RPM_ACTIVE.
> > 
> > Maybe return 1 in that case?
> 
> Yes.
> 
> > > >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> > > 
> > > For these last two, first cancel a possible pending suspend request
> > > and a possible timer.
> > 
> > Possible timer only, I think.  if 'request type' is RESUME, there can't be
> > suspend request pending.
> 
> But there can be if the status is RPM_ACTIVE.  So okay, if the status 
> isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME, 
> then return 0.
> 
> 
> > > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > > pm_runtime_idle.  There is an extra requirement: When a suspend or
> > > resume is over, if 'request type' is set then schedule the work item.  
> > > Doing things this way allows the workqueue thread to avoid waiting
> > > around for the suspend or resume to finish.
> > 
> > I agree except that I would like suspends to just fail when the status is
> > RPM_RESUMING.  The reason is that a sloppily written driver could enter a
> > busy-loop of suspending-resuming the device, without the possibility to process
> > data, if there's full symmetry between suspend and resume.  So, I'd like to
> > break that symmetry and make resume operations privileged with respect to
> > suspend and idle notifications.  
> 
> This follows from the new precedence rule.

Yes.

So, I guess we have the majority of things clarified and perhaps its time to
write a patch for further discussion. :-)

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-04  3:12                                                                           ` Alan Stern
@ 2009-07-04 21:27                                                                             ` Rafael J. Wysocki
  2009-07-04 21:27                                                                             ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-04 21:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Saturday 04 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > > I don't really like the async_action idea, as you might have noticed.
> > > 
> > > Do you mean that you don't like the field, or that you don't like its name?
> > 
> > The name, actually.  That's because I'd like to use the values for something
> > that's not 'async' in substance (more on that later).
> 
> Okay.  I don't care about the name.
> 
> > It occured to me in the meantime that if we added a 'request_pending' (or
> > 'work_pending' or whatever similar) flag to the above, we could avoid using
> > cancel_work().  Namely, if 'request_pending' indicates that there's a work item
> > queued up, we could change 'request type' to NONE in case we didn't want the
> > work function to do anything.  Then, the work function would just unset
> > 'request_pending' and quit if 'request type' is NONE.
> 
> You mean use request_pending to decide whether to call cancel_work, 
> instead of looking at request_type?  That's right.
> 
> As for whether or not we should actually call cancel_work...  Which is 
> more expensive: Calling cancel_work when no work is pending, or letting 
> the work item run when it doesn't have anything to do?  Probably the 
> latter.

Agreed, but that doesn't affect functionality.  We can get the desired
functionality without the cancel_work() patch and then optimize things along
with that patch.  This way it'll be easier to demontrate the benefit of it.

> > I generally like the idea of changing 'request type' on the fly once we've
> > noticed that the currently pending request should be replaced by another one.
> 
> Me too.
> 
> > That would require us to introduce a big idle-suspend-resume function
> > choosing the callback to run based on 'request type', which would be quite
> > complicated.
> 
> It doesn't have to be very big or complicated:
> 
> 	spin_lock_irq(&dev->power.lock);

Ah, that is the detail I overlooked, we don't need to use
spin_lock_irqsave(), because these function will always be called with
interrupts enabled.

> 	switch (dev->power.request_type) {
> 	case RPM_REQ_SUSPEND:
> 		__pm_runtime_suspend(dev, false);
> 		break;
> 	case RPM_REQ_RESUME:
> 		__pm_runtime_resume(dev, false);
> 		break;
> 	case RPM_REQ_IDLE:
> 		__pm_runtime_idle(dev, false);
> 		break;
> 	default:
> 	}
> 	spin_unlock_irq(&dev->power.lock);
> 
> It would be necessary to change the __pm_runtime_* routines, since they
> would now have to be called with the lock held.
> 
> >  But that function could also be used for the 'synchronous'
> > operations, so perhaps it's worth trying?
> > 
> > Such a function can take two arguments, dev and request, where the second
> > one determines the callback to run.  It can take the same values as 'request
> > type', where NONE means "you've been called from the workqueue, use 'request
> > type' from dev to check what to do", but your ASYNC_* names are not really
> > suitable here. :-)
> 
> I don't see any advantage in that approach.  The pm_runtime_* functions
> already know what they want to do.  Why encode it in a request argument
> only to decode it again?

Well, scratch that anyway, I thought it would be necessary because of the
'irqsave' locking.

> > > > That's fine, we'd also need to wait for running callbacks to
> > > > finish too.  And I'm still not convinced if we should preserve
> > > > requests queued up before the system sleep.  Or keep the suspend
> > > > timer running for that matter.
> > > 
> > > This all does into the "to-be-determined" category.  :-)
> > 
> > Well, I'd like to choose something to start with.
> 
> Pending suspends and the suspend timer don't matter much; we can cancel
> them because they ought to get resubmitted after the system wakes up.  
> Pending resumes are more difficult; depending on how they are treated
> they could morph into immediate wakeup requests.
> 
> Perhaps even more tricky is how to handle things like the ACPI suspend
> calls when the device is already runtime-suspended.  I don't know what 
> we should do about that.

That almost entirely depends on the bus type.  For PCI and probably PNP as well
there's a notion of ACPI low power states and there are AML methods to put the
devices into these states.  Unfortunately, the ACPI low power state to put the
device into depends on the target sleep state of the system, so these devices
will probably have to be put into D0 before system suspend anyway.

I think that the bus type can handle this as long as it knows the state the
device is in before system suspend.  So, the only thing the core should do is
to block the execution of run-time PM framework functions during system
sleep and resume.  The state it leaves the device in shouldn't matter.

So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
device and wait for all run-time PM operations on it in progress to complete.

In the 'disabled' state the bus type or driver can modify the run-time PM
status to whatever they like anyway.  Perhaps we can provide a helper to
change 'request type' to RPM_REQ_NONE.

> > > > pm_schedule_suspend()
> > > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > > >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > > >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> > > 
> > > The last two aren't right.  If the status is RPM_SUSPENDED or
> > > RPM_SUSPENDING, cancel any pending work and set the type to
> > > RPM_REQ_NONE before returning.  In other words, cancel a possible 
> > > pending resume request.
> > 
> > Wy do you think the possible pending resume request should be canceled?
> 
> It's part of the "most recent request wins" approach.
>
> > I don't really agree here.  Resume request really means there's data to
> > process, so we shouldn't cancel pending resume requests IMO.
> > 
> > The driver should be given a chance to process data in ->runtime_resume()
> > even if it doesn't use the usage counter.  Otherwise, the usage counter would
> > always have to be used along with resume requests, so having
> > pm_request_resume() that doesn't increment the usage counter would really be
> > pointless.
> 
> All right, I'll go along with this.  So instead of "most recent request 
> wins", we have something like this:
> 
> 	Resume requests (queued or in progress) override suspend and 
> 	idle requests (sync or async).
> 
> 	Suspend requests (queued or in progress, but not unexpired)
> 	override idle requests (sync or async).
> 
> But this statement might not be precise enough, and I'm too tired to
> think through all the ramifications right now.

Fair enough. :-)

> > > > pm_request_suspend()
> > > >   * return if 'disabled' is set or 'runtime_error' is set
> > > >   * return if 'usage_count' > 0 or 'child_count' > 0
> > > >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> > > 
> > > First cancel a possible pending resume request.
> > 
> > I disagree.
> 
> This is the same as the above, right?

Yes.

> > > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> > > timer (and set time_expiration to 0).
> > 
> > We're the timer function, so either the timer is not pending, or we've been
> > executed too early.
> 
> Oh, okay.  I thought perhaps you might have wanted to export
> pm_request_suspend.  But this is really supposed to be just the timer 
> handler.

Yes, it is.

> > > > pm_request_resume()
> > > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> > > 
> > > Or RPM_ACTIVE.
> > 
> > Maybe return 1 in that case?
> 
> Yes.
> 
> > > >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> > > 
> > > For these last two, first cancel a possible pending suspend request
> > > and a possible timer.
> > 
> > Possible timer only, I think.  if 'request type' is RESUME, there can't be
> > suspend request pending.
> 
> But there can be if the status is RPM_ACTIVE.  So okay, if the status 
> isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME, 
> then return 0.
> 
> 
> > > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > > pm_runtime_idle.  There is an extra requirement: When a suspend or
> > > resume is over, if 'request type' is set then schedule the work item.  
> > > Doing things this way allows the workqueue thread to avoid waiting
> > > around for the suspend or resume to finish.
> > 
> > I agree except that I would like suspends to just fail when the status is
> > RPM_RESUMING.  The reason is that a sloppily written driver could enter a
> > busy-loop of suspending-resuming the device, without the possibility to process
> > data, if there's full symmetry between suspend and resume.  So, I'd like to
> > break that symmetry and make resume operations privileged with respect to
> > suspend and idle notifications.  
> 
> This follows from the new precedence rule.

Yes.

So, I guess we have the majority of things clarified and perhaps its time to
write a patch for further discussion. :-)

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-04 21:27                                                                             ` Rafael J. Wysocki
  2009-07-05 14:50                                                                               ` Alan Stern
@ 2009-07-05 14:50                                                                               ` Alan Stern
  2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
  2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
  1 sibling, 2 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-05 14:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > As for whether or not we should actually call cancel_work...  Which is 
> > more expensive: Calling cancel_work when no work is pending, or letting 
> > the work item run when it doesn't have anything to do?  Probably the 
> > latter.
> 
> Agreed, but that doesn't affect functionality.  We can get the desired
> functionality without the cancel_work() patch and then optimize things along
> with that patch.  This way it'll be easier to demontrate the benefit of it.

Good idea.

> That almost entirely depends on the bus type.  For PCI and probably PNP as well
> there's a notion of ACPI low power states and there are AML methods to put the
> devices into these states.  Unfortunately, the ACPI low power state to put the
> device into depends on the target sleep state of the system, so these devices
> will probably have to be put into D0 before system suspend anyway.
> 
> I think that the bus type can handle this as long as it knows the state the
> device is in before system suspend.  So, the only thing the core should do is
> to block the execution of run-time PM framework functions during system
> sleep and resume.  The state it leaves the device in shouldn't matter.
> 
> So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> device and wait for all run-time PM operations on it in progress to complete.
> 
> In the 'disabled' state the bus type or driver can modify the run-time PM
> status to whatever they like anyway.  Perhaps we can provide a helper to
> change 'request type' to RPM_REQ_NONE.

The only modification that really makes sense is like you said, going
back to full power in preparation for the platform suspend operation.  
Therefore perhaps we should allow pm_runtime_resume to work even when
rpm_disabled is set.  And if we're going to cancel pending suspend and
idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

Which leaves only the question of what to do when a resume request is 
pending...

> So, I guess we have the majority of things clarified and perhaps its time to
> write a patch for further discussion. :-)

Go ahead!

Alan Stern


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-04 21:27                                                                             ` Rafael J. Wysocki
@ 2009-07-05 14:50                                                                               ` Alan Stern
  2009-07-05 14:50                                                                               ` Alan Stern
  1 sibling, 0 replies; 102+ messages in thread
From: Alan Stern @ 2009-07-05 14:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > As for whether or not we should actually call cancel_work...  Which is 
> > more expensive: Calling cancel_work when no work is pending, or letting 
> > the work item run when it doesn't have anything to do?  Probably the 
> > latter.
> 
> Agreed, but that doesn't affect functionality.  We can get the desired
> functionality without the cancel_work() patch and then optimize things along
> with that patch.  This way it'll be easier to demontrate the benefit of it.

Good idea.

> That almost entirely depends on the bus type.  For PCI and probably PNP as well
> there's a notion of ACPI low power states and there are AML methods to put the
> devices into these states.  Unfortunately, the ACPI low power state to put the
> device into depends on the target sleep state of the system, so these devices
> will probably have to be put into D0 before system suspend anyway.
> 
> I think that the bus type can handle this as long as it knows the state the
> device is in before system suspend.  So, the only thing the core should do is
> to block the execution of run-time PM framework functions during system
> sleep and resume.  The state it leaves the device in shouldn't matter.
> 
> So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> device and wait for all run-time PM operations on it in progress to complete.
> 
> In the 'disabled' state the bus type or driver can modify the run-time PM
> status to whatever they like anyway.  Perhaps we can provide a helper to
> change 'request type' to RPM_REQ_NONE.

The only modification that really makes sense is like you said, going
back to full power in preparation for the platform suspend operation.  
Therefore perhaps we should allow pm_runtime_resume to work even when
rpm_disabled is set.  And if we're going to cancel pending suspend and
idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

Which leaves only the question of what to do when a resume request is 
pending...

> So, I guess we have the majority of things clarified and perhaps its time to
> write a patch for further discussion. :-)

Go ahead!

Alan Stern

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-05 14:50                                                                               ` Alan Stern
  2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
@ 2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-05 21:47 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sunday 05 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > As for whether or not we should actually call cancel_work...  Which is 
> > > more expensive: Calling cancel_work when no work is pending, or letting 
> > > the work item run when it doesn't have anything to do?  Probably the 
> > > latter.
> > 
> > Agreed, but that doesn't affect functionality.  We can get the desired
> > functionality without the cancel_work() patch and then optimize things along
> > with that patch.  This way it'll be easier to demontrate the benefit of it.
> 
> Good idea.
> 
> > That almost entirely depends on the bus type.  For PCI and probably PNP as well
> > there's a notion of ACPI low power states and there are AML methods to put the
> > devices into these states.  Unfortunately, the ACPI low power state to put the
> > device into depends on the target sleep state of the system, so these devices
> > will probably have to be put into D0 before system suspend anyway.
> > 
> > I think that the bus type can handle this as long as it knows the state the
> > device is in before system suspend.  So, the only thing the core should do is
> > to block the execution of run-time PM framework functions during system
> > sleep and resume.  The state it leaves the device in shouldn't matter.
> > 
> > So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> > device and wait for all run-time PM operations on it in progress to complete.
> > 
> > In the 'disabled' state the bus type or driver can modify the run-time PM
> > status to whatever they like anyway.  Perhaps we can provide a helper to
> > change 'request type' to RPM_REQ_NONE.
> 
> The only modification that really makes sense is like you said, going
> back to full power in preparation for the platform suspend operation.  
> Therefore perhaps we should allow pm_runtime_resume to work even when
> rpm_disabled is set.  And if we're going to cancel pending suspend and
> idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

After we've disabled run-time PM with pm_runtime_disable(), the bus type
and driver can do whatever they like with the device, we don't care.  However,
they need to make sure that the state of the device will match its run-time PM
status when its run-time PM is enabled again.

> Which leaves only the question of what to do when a resume request is 
> pending...

I think the pm_runtime_disable() can carry out a synchronous wake-up if it
sees a pending resume request.  That would make sense in general, IMO, becuase
having a resume request pending usually means there's I/O to process and it's
better to allow the device to process that I/O before disabling the run-time
PM of it.

To put it differently, if there's a resume request pending, the run-time PM of
the device should be disabled while in the 'active' state rather than while in
the 'suspended' state.

Now, if we do that, the problem of run-time resume requests pending while
entering a system sleep state can be solved.  Namely, we can make
pm_runtime_disable() return a result that will be -EBUSY if a pending resume
request is found by it and 0 otherwise.  Then, that result can be used by
dpm_prepare() to decide whether to continue suspend or to terminate it if
the device is a wake-up one.

> > So, I guess we have the majority of things clarified and perhaps its time to
> > write a patch for further discussion. :-)
> 
> Go ahead!

In fact I've already done that, but I need to have a final look at it to check
if there are no obvious mistakes in there.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))
  2009-07-05 14:50                                                                               ` Alan Stern
@ 2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
  2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-07-05 21:47 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, LKML, ACPI Devel Maling List, Linux-pm mailing list,
	Ingo Molnar, Arjan van de Ven

On Sunday 05 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > As for whether or not we should actually call cancel_work...  Which is 
> > > more expensive: Calling cancel_work when no work is pending, or letting 
> > > the work item run when it doesn't have anything to do?  Probably the 
> > > latter.
> > 
> > Agreed, but that doesn't affect functionality.  We can get the desired
> > functionality without the cancel_work() patch and then optimize things along
> > with that patch.  This way it'll be easier to demontrate the benefit of it.
> 
> Good idea.
> 
> > That almost entirely depends on the bus type.  For PCI and probably PNP as well
> > there's a notion of ACPI low power states and there are AML methods to put the
> > devices into these states.  Unfortunately, the ACPI low power state to put the
> > device into depends on the target sleep state of the system, so these devices
> > will probably have to be put into D0 before system suspend anyway.
> > 
> > I think that the bus type can handle this as long as it knows the state the
> > device is in before system suspend.  So, the only thing the core should do is
> > to block the execution of run-time PM framework functions during system
> > sleep and resume.  The state it leaves the device in shouldn't matter.
> > 
> > So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> > device and wait for all run-time PM operations on it in progress to complete.
> > 
> > In the 'disabled' state the bus type or driver can modify the run-time PM
> > status to whatever they like anyway.  Perhaps we can provide a helper to
> > change 'request type' to RPM_REQ_NONE.
> 
> The only modification that really makes sense is like you said, going
> back to full power in preparation for the platform suspend operation.  
> Therefore perhaps we should allow pm_runtime_resume to work even when
> rpm_disabled is set.  And if we're going to cancel pending suspend and
> idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

After we've disabled run-time PM with pm_runtime_disable(), the bus type
and driver can do whatever they like with the device, we don't care.  However,
they need to make sure that the state of the device will match its run-time PM
status when its run-time PM is enabled again.

> Which leaves only the question of what to do when a resume request is 
> pending...

I think the pm_runtime_disable() can carry out a synchronous wake-up if it
sees a pending resume request.  That would make sense in general, IMO, becuase
having a resume request pending usually means there's I/O to process and it's
better to allow the device to process that I/O before disabling the run-time
PM of it.

To put it differently, if there's a resume request pending, the run-time PM of
the device should be disabled while in the 'active' state rather than while in
the 'suspended' state.

Now, if we do that, the problem of run-time resume requests pending while
entering a system sleep state can be solved.  Namely, we can make
pm_runtime_disable() return a result that will be -EBUSY if a pending resume
request is found by it and 0 otherwise.  Then, that result can be used by
dpm_prepare() to decide whether to continue suspend or to terminate it if
the device is a wake-up one.

> > So, I guess we have the majority of things clarified and perhaps its time to
> > write a patch for further discussion. :-)
> 
> Go ahead!

In fact I've already done that, but I need to have a final look at it to check
if there are no obvious mistakes in there.

Best,
Rafael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3)
@ 2009-06-22 23:21 Rafael J. Wysocki
  0 siblings, 0 replies; 102+ messages in thread
From: Rafael J. Wysocki @ 2009-06-22 23:21 UTC (permalink / raw)
  To: Alan Stern, linux-pm
  Cc: Greg KH, LKML, ACPI Devel Maling List, Ingo Molnar, Arjan van de Ven

Hi,

Below is a new revision of the patch introducing the run-time PM framework.

The most visible changes from the last version:

* I realized that if child_count is atomic, we can drop the parent locking from
  all of the functions, so I did that.

* Introduced pm_runtime_put() that decrements the resume counter and queues
  up an idle notification if the counter went down to 0 (and wasn't 0 previously).
  Using asynchronous notification makes it possible to call pm_runtime_put()
  from interrupt context, if necessary.

* Changed the meaning of the RPM_WAKE bit slightly (it is now also used for
  disabling run-time PM for a device along with the resume counter).

Please let me know if I've overlooked anything. :-)

Best,
Rafael


---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Introduce core framework for run-time PM of I/O devices (rev. 3)

Introduce a core framework for run-time power management of I/O
devices.  Add device run-time PM fields to 'struct dev_pm_info'
and device run-time PM callbacks to 'struct dev_pm_ops'.  Introduce
a run-time PM workqueue and define some device run-time PM helper
functions at the core level.  Document all these things.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |  434 ++++++++++++++++++++++
 drivers/base/dd.c                  |    9 
 drivers/base/power/Makefile        |    1 
 drivers/base/power/main.c          |   16 
 drivers/base/power/power.h         |   11 
 drivers/base/power/runtime.c       |  711 +++++++++++++++++++++++++++++++++++++
 include/linux/pm.h                 |   96 ++++
 include/linux/pm_runtime.h         |  141 +++++++
 kernel/power/Kconfig               |   14 
 kernel/power/main.c                |   17 
 10 files changed, 1440 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/Kconfig
===================================================================
--- linux-2.6.orig/kernel/power/Kconfig
+++ linux-2.6/kernel/power/Kconfig
@@ -208,3 +208,17 @@ config APM_EMULATION
 	  random kernel OOPSes or reboots that don't seem to be related to
 	  anything, try disabling/enabling this option (or disabling/enabling
 	  APM in your BIOS).
+
+config PM_RUNTIME
+	bool "Run-time PM core functionality"
+	depends on PM
+	---help---
+	  Enable functionality allowing I/O devices to be put into energy-saving
+	  (low power) states at run time (or autosuspended) after a specified
+	  period of inactivity and woken up in response to a hardware-generated
+	  wake-up event or a driver's request.
+
+	  Hardware support is generally required for this functionality to work
+	  and the bus type drivers of the buses the devices are on are
+	  responsibile for the actual handling of the autosuspend requests and
+	  wake-up events.
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -11,6 +11,7 @@
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/resume-trace.h>
+#include <linux/workqueue.h>
 
 #include "power.h"
 
@@ -217,8 +218,24 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+#ifdef CONFIG_PM_RUNTIME
+struct workqueue_struct *pm_wq;
+
+static int __init pm_start_workqueue(void)
+{
+	pm_wq = create_freezeable_workqueue("pm");
+
+	return pm_wq ? 0 : -ENOMEM;
+}
+#else
+static inline int pm_start_workqueue(void) { return 0; }
+#endif
+
 static int __init pm_init(void)
 {
+	int error = pm_start_workqueue();
+	if (error)
+		return error;
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -22,6 +22,9 @@
 #define _LINUX_PM_H
 
 #include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -165,6 +168,28 @@ typedef struct pm_message {
  * It is allowed to unregister devices while the above callbacks are being
  * executed.  However, it is not allowed to unregister a device from within any
  * of its own callbacks.
+ *
+ * There also are the following callbacks related to run-time power management
+ * of devices:
+ *
+ * @runtime_suspend: Prepare the device for a condition in which it won't be
+ *	able to communicate with the CPU(s) and RAM due to power management.
+ *	This need not mean that the device should be put into a low power state.
+ *	For example, if the device is behind a link which is about to be turned
+ *	off, the device may remain at full power.  Still, if the device does go
+ *	to low power and if device_may_wakeup(dev) is true, remote wake-up
+ *	(i.e. hardware mechanism allowing the device to request a change of its
+ *	power state, such as PCI PME) should be enabled for it.
+ *
+ * @runtime_resume: Put the device into the fully active state in response to a
+ *	wake-up event generated by hardware or at a request of software.  If
+ *	necessary, put the device into the full power state and restore its
+ *	registers, so that it is fully operational.
+ *
+ * @runtime_idle: Device appears to be inactive and it might be put into a low
+ *	power state if all of the necessary conditions are satisfied.  Check
+ *	these conditions and handle the device as appropriate, possibly queueing
+ *	a suspend request for it.
  */
 
 struct dev_pm_ops {
@@ -182,6 +207,9 @@ struct dev_pm_ops {
 	int (*thaw_noirq)(struct device *dev);
 	int (*poweroff_noirq)(struct device *dev);
 	int (*restore_noirq)(struct device *dev);
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
 };
 
 /**
@@ -315,14 +343,76 @@ enum dpm_state {
 	DPM_OFF_IRQ,
 };
 
+/**
+ * Device run-time power management state.
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.  They do not
+ * reflect the actual power state of the device or its status as seen by the
+ * driver.
+ *
+ * RPM_ACTIVE		Device is fully operational, no run-time PM requests are
+ *			pending for it.
+ *
+ * RPM_IDLE		It has been requested that the device be suspended.
+ *			Suspend request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_SUSPENDING	Device bus type's ->runtime_suspend() callback is being
+ *			executed.
+ *
+ * RPM_SUSPENDED	Device bus type's ->runtime_suspend() callback has
+ *			completed successfully.  The device is regarded as
+ *			suspended.
+ *
+ * RPM_WAKE		It has been requested that the device be woken up.
+ *			Resume request has been put into the run-time PM
+ *			workqueue and it's pending execution.
+ *
+ * RPM_RESUMING		Device bus type's ->runtime_resume() callback is being
+ *			executed.
+ *
+ * RPM_ERROR		Represents a condition from which the PM core cannot
+ *			recover by itself.  If the device's run-time PM status
+ *			field has this value, all of the run-time PM operations
+ *			carried out for the device by the core will fail, until
+ *			the status field is changed to either RPM_ACTIVE or
+ *			RPM_SUSPENDED (it is not valid to use the other values
+ *			in such a situation) by the device's driver or bus type.
+ *			This happens when the device bus type's
+ *			->runtime_suspend() or ->runtime_resume() callback
+ *			returns error code different from -EAGAIN or -EBUSY.
+ */
+
+#define RPM_ACTIVE	0
+#define RPM_IDLE	0x01
+#define RPM_SUSPENDING	0x02
+#define RPM_SUSPENDED	0x04
+#define RPM_WAKE	0x08
+#define RPM_RESUMING	0x10
+#define RPM_ERROR	0x1F
+
 struct dev_pm_info {
 	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	unsigned		should_wakeup:1;
+	unsigned int		can_wakeup:1;
+	unsigned int		should_wakeup:1;
 	enum dpm_state		status;		/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
+#ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 #endif
+#ifdef CONFIG_PM_RUNTIME
+	struct delayed_work	suspend_work;
+	struct work_struct	resume_work;
+	struct completion	work_done;
+	unsigned int		ignore_children:1;
+	unsigned int		suspend_aborted:1;
+	unsigned int		notify_running:1;
+	unsigned int		runtime_status:5;
+	int			runtime_error;
+	atomic_t		resume_count;
+	atomic_t		child_count;
+	spinlock_t		lock;
+#endif
 };
 
 /*
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PM)	+= sysfs.o
 obj-$(CONFIG_PM_SLEEP)	+= main.o
+obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/runtime.c
@@ -0,0 +1,711 @@
+/*
+ * drivers/base/power/runtime.c - Helper functions for device run-time PM
+ *
+ * Copyright (c) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/jiffies.h>
+
+/**
+ * __pm_get_child - Increment the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_get_child(struct device *dev)
+{
+	atomic_inc(&dev->power.child_count);
+}
+
+/**
+ * __pm_put_child - Decrement the counter of unsuspended children of a device.
+ * @dev: Device to handle;
+ */
+static void __pm_put_child(struct device *dev)
+{
+	if (!atomic_add_unless(&dev->power.child_count, -1, 0))
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+}
+
+/**
+ * pm_runtime_notify_idle - Run a device bus type's runtime_idle() callback.
+ * @dev: Device to notify.
+ */
+static void pm_runtime_notify_idle(struct device *dev)
+{
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
+		dev->bus->pm->runtime_idle(dev);
+}
+
+/**
+ * pm_runtime_notify_work - Run pm_runtime_notify_idle() for a device.
+ *
+ * Use @work to get the device object to run the notification for and execute
+ * pm_runtime_notify_idle().
+ */
+static void pm_runtime_notify_work(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	dev->power.runtime_status &= ~RPM_WAKE;
+	dev->power.notify_running = true;
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	pm_runtime_notify_idle(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	dev->power.notify_running = false;
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * pm_runtime_put - Decrement the resume counter and run idle notification.
+ * @dev: Device to handle.
+ *
+ * Decrement the resume counter of the device, check if it is possible to
+ * suspend it and notify its bus type in that case.
+ */
+void pm_runtime_put(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (!__pm_runtime_put(dev)) {
+		dev_warn(dev, "Unbalanced %s!\n", __func__);
+		goto out;
+	}
+
+	if (!pm_suspend_possible(dev))
+		goto out;
+
+	/* Do not schedule a notification if one is already in progress. */
+	if ((dev->power.runtime_status & RPM_WAKE) || dev->power.notify_running)
+		goto out;
+
+	/*
+	 * The notification is asynchronous, so that this function can be called
+	 * from interrupt context.  Set the run-time PM status to RPM_WAKE to
+	 * prevent resume_work from being reused for a resume request and to let
+	 * pm_runtime_close() know it has a request to cancel.  It also prevents
+	 * suspends from running or being scheduled until the work function is
+	 * executed.
+	 */
+	dev->power.runtime_status = RPM_WAKE;
+	INIT_WORK(&dev->power.resume_work, pm_runtime_notify_work);
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_put);
+
+/**
+ * pm_runtime_idle - Check if device can be suspended and notify its bus type.
+ * @dev: Device to check.
+ */
+void pm_runtime_idle(struct device *dev)
+{
+	if (pm_suspend_possible(dev))
+		pm_runtime_notify_idle(dev);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_idle);
+
+/**
+ * __pm_runtime_suspend - Run a device bus type's runtime_suspend() callback.
+ * @dev: Device to suspend.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the status of the device is appropriate and run the
+ * ->runtime_suspend() callback provided by the device's bus type driver.
+ * Update the run-time PM flags in the device object to reflect the current
+ * status of the device.
+ */
+int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	struct device *parent = NULL;
+	unsigned long flags;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDED) {
+		error = 0;
+		goto out;
+	} else if (atomic_read(&dev->power.resume_count) > 0
+	    || (dev->power.runtime_status & (RPM_WAKE | RPM_RESUMING))
+	    || (!sync && (dev->power.runtime_status & RPM_IDLE)
+	    && dev->power.suspend_aborted)) {
+		/*
+		 * We're forbidden to suspend the device (eg. it may be
+		 * resuming) or a pending suspend request has just been
+		 * cancelled and we're running as a result of that request.
+		 */
+		error = -EAGAIN;
+		goto out;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Another suspend is running in parallel with us.  Wait for it
+		 * to complete and return.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		return dev->power.runtime_error;
+	} else if (sync && dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.suspend_aborted) {
+		/*
+		 * Suspend request is pending, but we're not running as a result
+		 * of that request, so cancel it.  Since we're not clearing the
+		 * RPM_IDLE bit now, no new suspend requests will be queued up
+		 * while the pending one is waited for to finish.
+		 */
+		dev->power.suspend_aborted = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.suspend_aborted)
+			goto repeat;
+	}
+
+	if (!pm_children_suspended(dev)) {
+		/*
+		 * We can only suspend the device if all of its children have
+		 * been suspended.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = -EBUSY;
+		goto out;
+	}
+
+	dev->power.runtime_status = RPM_SUSPENDING;
+	init_completion(&dev->power.work_done);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+		error = dev->bus->pm->runtime_suspend(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	switch (error) {
+	case 0:
+		/*
+		 * Resume request might have been queued up in the meantime, in
+		 * which case the RPM_WAKE bit is also set in runtime_status.
+		 */
+		dev->power.runtime_status &= ~RPM_SUSPENDING;
+		dev->power.runtime_status |= RPM_SUSPENDED;
+		break;
+	case -EAGAIN:
+	case -EBUSY:
+		dev->power.runtime_status = RPM_ACTIVE;
+		break;
+	default:
+		dev->power.runtime_status = RPM_ERROR;
+	}
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+	if (!error && !(dev->power.runtime_status & RPM_WAKE) && dev->parent)
+		parent = dev->parent;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (parent) {
+		__pm_put_child(parent);
+
+		if (!atomic_read(&parent->power.resume_count)
+		    && !atomic_read(&parent->power.child_count)
+		    && !parent->power.ignore_children)
+			pm_runtime_notify_idle(parent);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_suspend);
+
+/**
+ * pm_runtime_suspend_work - Run pm_runtime_suspend() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the suspend has been scheduled for and
+ * run pm_runtime_suspend() for it.
+ */
+static void pm_runtime_suspend_work(struct work_struct *work)
+{
+	__pm_runtime_suspend(suspend_work_to_device(work), false);
+}
+
+/**
+ * pm_request_suspend - Schedule run-time suspend of given device.
+ * @dev: Device to suspend.
+ * @msec: Time to wait before attempting to suspend the device, in milliseconds.
+ */
+void pm_request_suspend(struct device *dev, unsigned int msec)
+{
+	unsigned long flags;
+	unsigned long delay = msecs_to_jiffies(msec);
+
+	if (atomic_read(&dev->power.resume_count) > 0)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ACTIVE)
+		goto out;
+
+	dev->power.runtime_status = RPM_IDLE;
+	dev->power.suspend_aborted = false;
+	queue_delayed_work(pm_wq, &dev->power.suspend_work, delay);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_request_suspend);
+
+/**
+ * __pm_runtime_resume - Run a device bus type's runtime_resume() callback.
+ * @dev: Device to resume.
+ * @sync: If unset, the funtion has been called via pm_wq.
+ *
+ * Check if the device is really suspended and run the ->runtime_resume()
+ * callback provided by the device's bus type driver.  Update the run-time PM
+ * flags in the device object to reflect the current status of the device.  If
+ * runtime suspend is in progress while this function is being run, wait for it
+ * to finish before resuming the device.  If runtime suspend is scheduled, but
+ * it hasn't started yet, cancel it and we're done.
+ */
+int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	bool put_parent = false;
+	unsigned int status;
+	int error = -EINVAL;
+
+	might_sleep();
+
+	/*
+	 * This makes concurrent __pm_runtime_suspend() and pm_request_suspend()
+	 * started after us, or restarted, return immediately, so only the ones
+	 * started before us can execute ->runtime_suspend().
+	 */
+	__pm_runtime_get(dev);
+
+ repeat:
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+ repeat_locked:
+	if (dev->power.runtime_status == RPM_ERROR) {
+		goto out;
+	} else if (dev->power.runtime_status & ~RPM_WAKE) {
+		/*
+		 * If RPM_WAKE is the only bit set in runtime_status, an idle
+		 * notification is scheduled for the device which is active.
+		 */
+		error = 0;
+		goto out;
+	} else if (dev->power.runtime_status == RPM_IDLE
+	    && !dev->power.suspend_aborted) {
+		/* Suspend request is pending, so cancel it. */
+		dev->power.suspend_aborted = true;
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+		/* Repeat if anyone else has cleared the status. */
+		if (dev->power.runtime_status != RPM_IDLE
+		    || !dev->power.suspend_aborted)
+			goto repeat_locked;
+
+		/*
+		 * Suspend request has been cancelled and there's nothing more
+		 * to do.  Clear the RPM_IDLE bit and return.
+		 */
+		dev->power.runtime_status = RPM_ACTIVE;
+		error = 0;
+		goto out;
+	}
+
+	if (sync && (dev->power.runtime_status & RPM_WAKE)) {
+		/* Resume request is pending, so let it run. */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		flush_work(&dev->power.resume_work);
+
+		goto repeat;
+	} else if (dev->power.runtime_status & RPM_SUSPENDING) {
+		/*
+		 * Suspend is running in parallel with us.  Wait for it to
+		 * complete and repeat.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		goto repeat;
+	} else if (!put_parent && parent
+	    && dev->power.runtime_status == RPM_SUSPENDED) {
+		/*
+		 * Increase the parent's resume counter and request that it be
+		 * woken up if necessary.
+		 */
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		error = pm_runtime_resume(parent);
+		if (error)
+			return error;
+
+		put_parent = true;
+		error = -EINVAL;
+		goto repeat;
+	}
+
+	status = dev->power.runtime_status;
+	if (status == RPM_RESUMING)
+		goto unlock;
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	dev->power.runtime_status = RPM_RESUMING;
+	init_completion(&dev->power.work_done);
+
+ unlock:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	/*
+	 * We can decrement the parent's resume counter right now, because it
+	 * can't be suspended anyway after the __pm_get_child() above.
+	 */
+	if (put_parent) {
+		__pm_runtime_put(parent);
+		put_parent = false;
+	}
+
+	if (status == RPM_RESUMING) {
+		/*
+		 * There's another resume running in parallel with us. Wait for
+		 * it to complete and return.
+		 */
+		wait_for_completion(&dev->power.work_done);
+
+		error = dev->power.runtime_error;
+		goto out_put;
+	}
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+		error = dev->bus->pm->runtime_resume(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	dev->power.runtime_status = error ? RPM_ERROR : RPM_ACTIVE;
+	dev->power.runtime_error = error;
+	complete_all(&dev->power.work_done);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (put_parent)
+		__pm_runtime_put(parent);
+
+ out_put:
+	/*
+	 * If we're running from pm_wq, the resume counter has been incremented
+	 * by pm_request_resume() too, so decrement it.
+	 */
+	if (error || !sync)
+		__pm_runtime_put(dev);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_resume);
+
+/**
+ * pm_runtime_resume_work - Run __pm_runtime_resume() for a device.
+ * @work: Work structure used for scheduling the execution of this function.
+ *
+ * Use @work to get the device object the resume has been scheduled for and run
+ * __pm_runtime_resume() for it.
+ */
+static void pm_runtime_resume_work(struct work_struct *work)
+{
+	__pm_runtime_resume(resume_work_to_device(work), false);
+}
+
+/**
+ * pm_cancel_suspend_work - Cancel a pending suspend request.
+ *
+ * Use @work to get the device object the work item has been scheduled for and
+ * cancel a pending suspend request for it.
+ */
+static void pm_cancel_suspend_work(struct work_struct *work)
+{
+	struct device *dev = resume_work_to_device(work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* Return if someone else has already dealt with the suspend request. */
+	if (dev->power.runtime_status != (RPM_IDLE | RPM_WAKE)
+	    || !dev->power.suspend_aborted)
+		goto out;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	cancel_delayed_work_sync(&dev->power.suspend_work);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* Clear the status if someone else hasn't done it already. */
+	if (dev->power.runtime_status == (RPM_IDLE | RPM_WAKE)
+	    && dev->power.suspend_aborted)
+		dev->power.runtime_status = RPM_ACTIVE;
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+
+/**
+ * __pm_request_resume - Schedule run-time resume of given device.
+ * @dev: Device to resume.
+ * @get: If set, always increment the device's resume counter.
+ *
+ * Schedule run-time resume of given device and increment its resume counter.
+ * If @get is set, the counter is incremented even if error code is going to be
+ * returned, and if it's unset, the counter is only incremented if resume
+ * request has been queued up (0 is returned in such a case).
+ */
+int __pm_request_resume(struct device *dev, bool get)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+	int error = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (get)
+		__pm_runtime_get(dev);
+
+	if (dev->power.runtime_status == RPM_ERROR)
+		error = -EINVAL;
+	else if (dev->power.runtime_status & ~RPM_WAKE)
+		error = -EBUSY;
+	else if (dev->power.runtime_status & (RPM_WAKE | RPM_RESUMING))
+		error = -EINPROGRESS;
+	if (error)
+		goto out;
+
+	if (dev->power.runtime_status == RPM_IDLE) {
+		error = -EBUSY;
+
+		if (dev->power.suspend_aborted)
+			goto out;
+
+		/* Suspend request is pending.  Queue a request to cancel it. */
+		dev->power.suspend_aborted = true;
+		INIT_WORK(&dev->power.resume_work, pm_cancel_suspend_work);
+		goto queue;
+	}
+
+	if (dev->power.runtime_status == RPM_SUSPENDED && parent)
+		__pm_get_child(parent);
+
+	INIT_WORK(&dev->power.resume_work, pm_runtime_resume_work);
+	if (!get)
+		__pm_runtime_get(dev);
+
+ queue:
+	/*
+	 * The device may be suspending at the moment or there may be a resume
+	 * request pending for it and we can't clear the RPM_SUSPENDING and
+	 * RPM_IDLE bits in its runtime_status just yet.
+	 */
+	dev->power.runtime_status |= RPM_WAKE;
+	queue_work(pm_wq, &dev->power.resume_work);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(pm_request_resume);
+
+/**
+ * __pm_runtime_clear_status - Change the run-time PM status of a device.
+ * @dev: Device to handle.
+ * @status: New value of the device's run-time PM status.
+ *
+ * Change the run-time PM status of the device to @status, which must be
+ * either RPM_ACTIVE or RPM_SUSPENDED, if its current value is equal to
+ * RPM_ERROR.
+ */
+void __pm_runtime_clear_status(struct device *dev, unsigned int status)
+{
+	struct device *parent = dev->parent;
+	unsigned long flags;
+
+	if (status & ~RPM_SUSPENDED)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR)
+		goto out;
+
+	dev->power.runtime_status = status;
+	if (status == RPM_SUSPENDED && parent)
+		__pm_put_child(parent);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_runtime_clear_status);
+
+/**
+ * pm_runtime_enable - Enable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_enable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	if (dev->power.runtime_status != RPM_ERROR) {
+		dev->power.runtime_status = RPM_ACTIVE;
+		if (dev->parent)
+			__pm_put_child(dev->parent);
+	}
+	__pm_runtime_put(dev);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enable);
+
+/**
+ * pm_runtime_disable - Disable run-time PM of a device.
+ * @dev: Device to handle.
+ */
+void pm_runtime_disable(struct device *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	do {
+
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		pm_runtime_resume(dev);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+
+	} while (!__pm_runtime_put(dev));
+
+	if (dev->power.runtime_status == RPM_ERROR) {
+		dev->power.runtime_status = RPM_WAKE;
+		if (dev->parent)
+			__pm_get_child(dev->parent);
+	}
+	__pm_runtime_get(dev);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_disable);
+
+/**
+ * pm_runtime_init - Initialize run-time PM fields in given device object.
+ * @dev: Device object to initialize.
+ */
+void pm_runtime_init(struct device *dev)
+{
+	spin_lock_init(&dev->power.lock);
+	/*
+	 * Make any attempts to suspend the device or resume it, or to put a
+	 * request for it into pm_wq terminate immediately.
+	 */
+	dev->power.runtime_status = RPM_WAKE;
+	atomic_set(&dev->power.resume_count, 1);
+	atomic_set(&dev->power.child_count, 0);
+	pm_suspend_ignore_children(dev, false);
+}
+
+/**
+ * pm_runtime_add - Update run-time PM fields of a device while adding it.
+ * @dev: Device object being added to device hierarchy.
+ */
+void pm_runtime_add(struct device *dev)
+{
+	dev->power.notify_running = false;
+	INIT_DELAYED_WORK(&dev->power.suspend_work, pm_runtime_suspend_work);
+
+	if (dev->parent)
+		__pm_get_child(dev->parent);
+}
+
+/**
+ * pm_runtime_remove - Prepare for the removal of a device object.
+ * @dev: Device object being removed.
+ */
+void pm_runtime_remove(struct device *dev)
+{
+	unsigned long flags;
+	unsigned int status;
+
+	/* This makes __pm_runtime_suspend() return immediately. */
+	__pm_runtime_get(dev);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	/* Cancel any pending requests. */
+	if ((dev->power.runtime_status & RPM_WAKE)
+	    || dev->power.notify_running) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_work_sync(&dev->power.resume_work);
+	} else if (dev->power.runtime_status == RPM_IDLE) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		cancel_delayed_work_sync(&dev->power.suspend_work);
+	}
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	while (dev->power.runtime_status & (RPM_SUSPENDING | RPM_RESUMING)) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+
+		wait_for_completion(&dev->power.work_done);
+
+		spin_lock_irqsave(&dev->power.lock, flags);
+	}
+	status = dev->power.runtime_status;
+
+	/* This makes the run-time PM functions above return immediately. */
+	dev->power.runtime_status = RPM_WAKE;
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (status != RPM_SUSPENDED && dev->parent)
+		__pm_put_child(dev->parent);
+}
Index: linux-2.6/include/linux/pm_runtime.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/pm_runtime.h
@@ -0,0 +1,141 @@
+/*
+ * pm_runtime.h - Device run-time power management helper functions.
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_RUNTIME_H
+#define _LINUX_PM_RUNTIME_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+extern struct workqueue_struct *pm_wq;
+
+extern void pm_runtime_init(struct device *dev);
+extern void pm_runtime_add(struct device *dev);
+extern void pm_runtime_remove(struct device *dev);
+extern void pm_runtime_put(struct device *dev);
+extern void pm_runtime_idle(struct device *dev);
+extern int __pm_runtime_suspend(struct device *dev, bool sync);
+extern void pm_request_suspend(struct device *dev, unsigned int msec);
+extern int __pm_runtime_resume(struct device *dev, bool sync);
+extern int __pm_request_resume(struct device *dev, bool get);
+extern void __pm_runtime_clear_status(struct device *dev, unsigned int status);
+extern void pm_runtime_enable(struct device *dev);
+extern void pm_runtime_disable(struct device *dev);
+
+static inline struct device *suspend_work_to_device(struct work_struct *work)
+{
+	struct delayed_work *dw = to_delayed_work(work);
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(dw, struct dev_pm_info, suspend_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline struct device *resume_work_to_device(struct work_struct *work)
+{
+	struct dev_pm_info *dpi;
+
+	dpi = container_of(work, struct dev_pm_info, resume_work);
+	return container_of(dpi, struct device, power);
+}
+
+static inline void __pm_runtime_get(struct device *dev)
+{
+	atomic_inc(&dev->power.resume_count);
+}
+
+static inline bool __pm_runtime_put(struct device *dev)
+{
+	return !!atomic_add_unless(&dev->power.resume_count, -1, 0);
+}
+
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
+static inline bool pm_suspend_possible(struct device *dev)
+{
+	return pm_children_suspended(dev)
+		&& !atomic_read(&dev->power.resume_count)
+		&& !(dev->power.runtime_status & RPM_WAKE);
+}
+
+static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
+{
+	dev->power.ignore_children = enable;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+static inline void pm_runtime_init(struct device *dev) {}
+static inline void pm_runtime_add(struct device *dev) {}
+static inline void pm_runtime_remove(struct device *dev) {}
+static inline void pm_runtime_put(struct device *dev) {}
+static inline void pm_runtime_idle(struct device *dev) {}
+static inline void pm_runtime_put_notify(struct device *dev) {}
+static inline int __pm_runtime_suspend(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline void pm_request_suspend(struct device *dev, unsigned int msec) {}
+static inline int __pm_runtime_resume(struct device *dev, bool sync)
+{
+	return -ENOSYS;
+}
+static inline int __pm_request_resume(struct device *dev, bool get)
+{
+	return -ENOSYS;
+}
+static inline void __pm_runtime_clear_status(struct device *dev,
+					      unsigned int status) {}
+static inline void pm_runtime_enable(struct device *dev) {}
+static inline void pm_runtime_disable(struct device *dev) {}
+
+static inline void __pm_runtime_get(struct device *dev) {}
+static inline bool __pm_runtime_put(struct device *dev) { return true; }
+static inline bool pm_children_suspended(struct device *dev) { return false; }
+static inline bool pm_suspend_possible(struct device *dev) { return false; }
+static inline void pm_suspend_ignore_children(struct device *dev, bool en) {}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+static inline int pm_runtime_suspend(struct device *dev)
+{
+	return __pm_runtime_suspend(dev, true);
+}
+
+static inline int pm_runtime_resume(struct device *dev)
+{
+	return __pm_runtime_resume(dev, true);
+}
+
+static inline int pm_request_resume(struct device *dev)
+{
+	return __pm_request_resume(dev, false);
+}
+
+static inline int pm_request_resume_get(struct device *dev)
+{
+	return __pm_request_resume(dev, true);
+}
+
+static inline void pm_runtime_clear_active(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_ACTIVE);
+}
+
+static inline void pm_runtime_clear_suspended(struct device *dev)
+{
+	__pm_runtime_clear_status(dev, RPM_SUSPENDED);
+}
+
+#endif
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -21,6 +21,7 @@
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
+#include <linux/pm_runtime.h>
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
@@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
 static bool transition_started;
 
 /**
+ * device_pm_init - Initialize the PM-related part of a device object
+ * @dev: Device object to initialize.
+ */
+void device_pm_init(struct device *dev)
+{
+	dev->power.status = DPM_ON;
+	pm_runtime_init(dev);
+}
+
+/**
  *	device_pm_lock - lock the list of active devices used by the PM core
  */
 void device_pm_lock(void)
@@ -88,6 +99,7 @@ void device_pm_add(struct device *dev)
 	}
 
 	list_add_tail(&dev->power.entry, &dpm_list);
+	pm_runtime_add(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -104,6 +116,7 @@ void device_pm_remove(struct device *dev
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
+	pm_runtime_remove(dev);
 	mutex_unlock(&dpm_list_mtx);
 }
 
@@ -507,6 +520,7 @@ static void dpm_complete(pm_message_t st
 		get_device(dev);
 		if (dev->power.status > DPM_ON) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			mutex_unlock(&dpm_list_mtx);
 
 			device_complete(dev, state);
@@ -753,6 +767,7 @@ static int dpm_prepare(pm_message_t stat
 
 		get_device(dev);
 		dev->power.status = DPM_PREPARING;
+		pm_runtime_disable(dev);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_prepare(dev, state);
@@ -760,6 +775,7 @@ static int dpm_prepare(pm_message_t stat
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			dev->power.status = DPM_ON;
+			pm_runtime_enable(dev);
 			if (error == -EAGAIN) {
 				put_device(dev);
 				continue;
Index: linux-2.6/drivers/base/dd.c
===================================================================
--- linux-2.6.orig/drivers/base/dd.c
+++ linux-2.6/drivers/base/dd.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_runtime.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -202,8 +203,12 @@ int driver_probe_device(struct device_dr
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	pm_runtime_disable(dev);
+
 	ret = really_probe(dev, drv);
 
+	pm_runtime_enable(dev);
+
 	return ret;
 }
 
@@ -306,6 +311,8 @@ static void __device_release_driver(stru
 
 	drv = dev->driver;
 	if (drv) {
+		pm_runtime_disable(dev);
+
 		driver_sysfs_remove(dev);
 
 		if (dev->bus)
@@ -320,6 +327,8 @@ static void __device_release_driver(stru
 		devres_release_all(dev);
 		dev->driver = NULL;
 		klist_remove(&dev->p->knode_driver);
+
+		pm_runtime_enable(dev);
 	}
 }
 
Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -0,0 +1,434 @@
+Run-time Power Management Framework for I/O Devices
+
+(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
+
+1. Introduction
+
+Support for run-time power management (run-time PM) of I/O devices is provided
+at the power management core (PM core) level by means of:
+
+* The power management workqueue pm_wq in which bus types and device drivers can
+  put their PM-related work items.  It is strongly recommended that pm_wq be
+  used for queuing all work items related to run-time PM, because this allows
+  them to be synchronized with system-wide power transitions.  pm_wq is declared
+  in include/linux/pm_runtime.h and defined in kernel/power/main.c.
+
+* A number of run-time PM fields in the 'power' member of 'struct device' (which
+  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
+  be used for synchronizing run-time PM operations with one another.
+
+* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+  include/linux/pm.h).
+
+* A set of helper functions defined in drivers/base/power/runtime.c that can be
+  used for carrying out run-time PM operations in such a way that the
+  synchronization between them is taken care of by the PM core.  Bus types and
+  device drivers are encouraged to use these functions.
+
+The device run-time PM fields of 'struct dev_pm_info', the helper functions
+using them and the run-time PM callbacks present in 'struct dev_pm_ops' are
+described below.
+
+2. Run-time PM Helper Functions and Device Fields
+
+The following helper functions are defined in drivers/base/power/runtime.c
+and include/linux/pm_runtime.h:
+
+* void pm_runtime_init(struct device *dev);
+* void pm_runtime_close(struct device *dev);
+
+* void pm_runtime_put(struct device *dev);
+* void pm_runtime_idle(struct device *dev);
+* int pm_runtime_suspend(struct device *dev);
+* void pm_request_suspend(struct device *dev, unsigned int msec);
+* int pm_runtime_resume(struct device *dev);
+* void pm_request_resume(struct device *dev);
+
+* bool pm_suspend_possible(struct device *dev);
+
+* void pm_runtime_enable(struct device *dev);
+* void pm_runtime_disable(struct device *dev);
+
+* void pm_suspend_ignore_children(struct device *dev, bool enable);
+
+* void pm_runtime_clear_active(struct device *dev) {}
+* void pm_runtime_clear_suspended(struct device *dev) {}
+
+pm_runtime_init() initializes the run-time PM fields in the 'power' member of
+a device object.  It is called during the initialization of the device object,
+in drivers/base/core.c:device_initialize().
+
+pm_runtime_add() updates the run-time PM fields in the 'power' member of a
+device object while the device is being added to the device hierarchy.  It is
+called from drivers/base/power/main.c:device_pm_add().
+
+pm_runtime_remove() disables the run-time PM of a device and updates the 'power'
+member of its parent's device object to take the removal of the device into
+account.  It cancels all of the run-time PM requests pending and waits for all
+of the run-time PM operations to complete.  It is called from
+drivers/base/power/main.c:device_pm_remove().
+
+pm_runtime_suspend(), pm_request_suspend(), pm_runtime_resume(),
+pm_request_resume(), and pm_request_resume_get() use the 'power.runtime_status',
+'power.resume_count', 'power.suspend_aborted', and 'power.child_count' fields of
+'struct device' for mutual cooperation.  In what follows the
+'power.runtime_status', 'power.resume_count', and 'power.child_count' fields are
+referred to as the device's run-time PM status, the device's resume counter, and
+the counter of unsuspended children of the device, respectively.  They are set
+to RPM_WAKE, 1 and 0, respectively, by pm_runtime_init().
+
+pm_runtime_put() decrements the device's resume counter unless it's already 0.
+If the counter was not zero before the decrementation, the function checks if
+the device can be suspended using pm_suspend_possible() and if that returns
+'true', it sets the RPM_WAKE bit in the device's run-time PM status field and
+queues up a request to execute the ->runtime_idle() callback provided by the
+device's bus type.  The work function of this request clears the RPM_WAKE bit
+before executing the bus type's ->runtime_idle() callback.  It is valid to call
+pm_runtime_put() from interrupt context.
+
+It is anticipated that pm_runtime_put() will be called after
+pm_runtime_resume(), pm_request_resume() or pm_request_resume_get(), when all of
+the I/O operations involving the device have been completed, in order to
+decrement the device's resume counter that was previously incremented by one of
+these functions.  Moreover, unbalanced calls to pm_runtime_put() are invalid, so
+drivers should ensure that pm_runtime_put() be only run after a function that
+increments the device's resume counter.
+
+pm_runtime_idle() uses pm_suspend_possible() to check if it is possible to
+suspend a device and if so, it executes the ->runtime_idle() callback provided
+by the device's bus type.
+
+pm_runtime_suspend() is used to carry out a run-time suspend of an active
+device.  It is called directly by a bus type or device driver, but internally
+it calls __pm_runtime_suspend() that is also used for asynchronous suspending of
+devices (i.e. to complete requests queued up by pm_request_suspend()) and works
+as follows.
+
+  * If the device is suspended (i.e. the RPM_SUSPENDED bit is set in the
+    device's run-time PM status field, 'power.runtime_status'), success is
+    returned.
+
+  * If the device's resume counter is greater than 0 or the device is resuming,
+    or it has a resume request pending (i.e. at least one of the RPM_WAKE and
+    RPM_RESUMING bits are set in the device's run-time PM status field), or the
+    function has been called via pm_wq as a result of a cancelled suspend
+    request (the RPM_IDLE bit is set in the device's run-time PM status field
+    and its 'power.suspend_aborted' flag is set), -EAGAIN is returned.
+
+  * If the device is suspending (i.e. the RPM_SUSPENDING bit is set in its
+    run-time PM status field), which means that another instance of
+    __pm_runtime_suspend() is running at the same time for the same device, the
+    function waits for the other instance to complete and returns the result
+    returned by it.
+
+  * If the device has a pending suspend request (i.e. the device's run-time PM
+    status is RPM_IDLE) and the function hasn't been called as a result of that
+    request, it cancels the request (synchronously).  Next, if a concurrent
+    thread changed the device's run-time PM status while the request was being
+    waited for to cancel, the function is restarted.
+
+  * If the children of the device are not suspended and the
+    'power.ignore_children' flag is not set for it, the device's run-time PM
+    status is set to RPM_ACTIVE and -EAGAIN is returned.
+
+If none of the above takes place, or a pending suspend request has been
+successfully cancelled, the device's run-time PM status is set to RPM_SUSPENDING
+and its bus type's ->runtime_suspend() callback is executed.  This callback is
+entirely responsible for handling the device as appropriate (for example, it may
+choose to execute the device driver's ->runtime_suspend() callback or to carry
+out any other suitable action depending on the bus type).
+
+  * If it completes successfully, the RPM_SUSPENDING bit is cleared and the
+    RPM_SUSPENDED bit is set in the device's run-time PM status field.  Once
+    that has happened, the device is regarded by the PM core as suspended, but
+    it _need_ _not_ mean that the device has been put into a low power state.
+    What really occurs to the device at this point entirely depends on its bus
+    type (it may depend on the device's driver if the bus type chooses to call
+    it).  Additionally, if the device bus type's ->runtime_suspend() callback
+    completes successfully and there's no resume request pending for the device
+    (i.e. the RPM_WAKE flag is not set in its run-time PM status field), and the
+    device has a parent, the parent's counter of unsuspended children (i.e. the
+    'power.child_count' field) is decremented.  If that counter turns out to be
+    equal to zero (i.e. the device was the last unsuspended child of its parent)
+    and the parent's 'power.ignore_children' flag is unset, and the parent's
+    resume counter is equal to 0, its bus type's ->runtime_idle() callback is
+    executed for it.
+
+  * If either -EBUSY or -EAGAIN is returned, the device's run-time PM status is
+    set to RPM_ACTIVE.
+
+  * If another error code is returned, the device's run-time PM status is set to
+    RPM_ERROR, which makes the PM core refuse to carry out any run-time PM
+    operations for it until the status is cleared by its bus type or driver with
+    the help of pm_runtime_clear_active() or pm_runtime_clear_suspended().
+
+Finally, pm_runtime_suspend() returns the result returned by the device bus
+type's ->runtime_suspend() callback.  If the device's bus type doesn't implement
+->runtime_suspend(), -EINVAL is returned and the device's run-time PM status is
+set to RPM_ERROR.
+
+pm_request_suspend() is used to queue up a suspend request for an active device.
+If the run-time PM status of the device (i.e. the value of the
+'power.runtime_status' field in 'struct device') is different from RPM_ACTIVE
+or its resume counter is greater than 0 (i.e. the device is not active from the
+PM core standpoint), the function returns immediately.  Otherwise, it changes
+the device's run-time PM status to RPM_IDLE and puts a request to suspend the
+device into pm_wq.  The 'msec' argument is used to specify the time to wait
+before the request will be completed, in milliseconds.  It is valid to call this
+function from interrupt context.
+
+pm_runtime_resume() is used to increment the resume counter of a device and, if
+necessary, to wake the device up (that happens if the device is suspended,
+suspending or has a suspend request pending).  It is called directly by a bus
+type or device driver, but internally it calls __pm_runtime_resume() that is
+also used for asynchronous resuming of devices (i.e. to complete requests queued
+up by pm_request_resume()).
+
+__pm_runtime_resume() first increments the device's resume counter to prevent
+new suspend requests from being queued up and to make subsequent attempts to
+suspend the device fail.  The device's resume counter will be decremented on
+return if error code is about to be returned or the function has been called via
+pm_wq.  After incrementing the device's run-time PM counter the function
+proceeds as follows.
+
+  * If the device is active (i.e. all of the bits in its run-time PM status are
+    unset, possibly except for RPM_WAKE, which means that an idle notification
+    is pending for it), success is returned.
+
+  * If there's a suspend request pending for the device (i.e. the RPM_IDLE bit
+    is set in the device's run-time PM status field), the
+    'power.suspend_aborted' flag is set for the device and the request is
+    cancelled (synchronously).  Then, the function restarts itself if the
+    device's RPM_IDLE bit was cleared or the 'power.suspend_aborted' flag was
+    unset in the meantime by a concurrent thread.  Otherwise, the device's
+    run-time PM status is cleared to RPM_ACTIVE and the function returns
+    success.
+
+  * If the device has a pending resume request (i.e. the RPM_WAKE bit is set in
+    its run-time PM status field), but the function hasn't been called as a
+    result of that request, the request is waited for to complete and the
+    function restarts itself.
+
+  * If the device is suspending (i.e. the RPM_SUSPENDING bit is set in its
+    run-time PM status field), the function waits for the suspend operation to
+    complete and restarts itself.
+
+  * If the device is suspended and doesn't have a pending resume request (i.e.
+    its run-time PM status is RPM_SUSPENDED), and it has a parent,
+    pm_runtime_resume() is called (recursively) for the parent.  If the parent's
+    resume is successful, the function notes that the parent's resume counter
+    will have to be decremented and restarts itself.  Otherwise, it returns the
+    error code returned by the instance of pm_runtime_resume_get() handling the
+    parent.
+
+  * If the device is resuming (i.e. the device's run-time PM status is
+    RPM_RESUMING), which means that another instance of __pm_runtime_resume() is
+    running at the same time for the same device, the function waits for the
+    other instance to complete and returns the result returned by it.
+
+If none of the above happens, the function checks if the device's run-time PM
+status is RPM_SUSPENDED, which means that the device doesn't have a resume
+request pending, and if it has a parent.  If that is the case, the parent's
+counter of unsuspended children is incremented.  Next, the device's run-time PM
+status is set to RPM_RESUMING and its bus type's ->runtime_resume() callback is
+executed.  This callback is entirely responsible for handling the device as
+appropriate (for example, it may choose to execute the device driver's
+->runtime_resume() callback or to carry out any other suitable action depending
+on the bus type).
+
+  * If it completes successfully, the device's run-time PM status is set to
+    RPM_ACTIVE, which means that the device is fully operational.  Thus, the
+    device bus type's ->runtime_resume() callback, when it is about to return
+    success, _must_ _ensure_ that this really is the case (i.e. when it returns
+    success, the device _must_ be able to carry out I/O operations as needed).
+
+  * If an error code is returned, the device's run-time PM status is set to
+    RPM_ERROR, which makes the PM core refuse to carry out any run-time PM
+    operations for the device until the status is cleared by its bus type or
+    driver with the help of either pm_runtime_clear_active(), or
+    pm_runtime_clear_suspended().  Thus, it is strongly recommended that bus
+    types' ->runtime_resume() callbacks only return error codes in fatal error
+    conditions, when it is impossible to bring the device back to the
+    operational state by any available means.  Inability to wake up a suspended
+    device usually means a service loss and it may very well result in a data
+    loss to the user, so it _must_ be regarded as a severe problem and avoided
+    if at all possible.
+
+Finally, __pm_runtime_resume() returns the result returned by the device bus
+type's ->runtime_resume() callback.  If the device's bus type doesn't implement
+->runtime_resume(), -EINVAL is returned and the device's run-time PM status is
+set to RPM_ERROR.  If __pm_runtime_resume() returns success and it hasn't been
+called via pm_wq, it leaves the device's resume counter incremented, so the
+counter has to be decremented, with the help of pm_runtime_put(), so that it's
+possible to suspend the device.  If __pm_runtime_resume() has been called via
+pm_wq, as a result of a resume request queued up by pm_request_resume(), the
+device's resume counter is left incremented regardless of whether or not the
+attempt to wake up the device has been successful.
+
+pm_request_resume_get() is used to increment the resume counter of a device
+and, if necessary, to queue up a resume request for the device (this happens if
+the device is suspended, suspending or has a suspend request pending).
+pm_request_resume() is used the to queue up a resume request for the device
+and it increments the device's resume counter if the request has been queued up
+successfully.  Internally both of them call __pm_request_resume() that first
+increments the device's resume counter in the pm_request_resume_get() case and
+then proceeds as follows.
+
+* If the run-time PM status of the device is RPM_ACTIVE or the only bit set in
+  it is RPM_WAKE (i.e. the idle notification has been queued up for the device
+  by pm_runtime_put()), -EBUSY is returned.
+
+* If the device is resuming or has a resume request pending (i.e. at least one
+  of the RPM_WAKE and RPM_RESUMING bits is set in the device's run-time PM
+  status field, but RPM_WAKE is not the only bit set), -EINPROGRESS is returned.
+
+* If the device's run-time status is RPM_IDLE (i.e. a suspend request is pending
+  for it) and the 'power.suspend_aborted' flag is set (i.e. the pending request
+  is being cancelled), -EBUSY is returned.
+
+* If the device's run-time status is RPM_IDLE (i.e. a suspend request is pending
+  for it) and the 'power.suspend_aborted' flag is not set, the device's
+  'power.suspend_aborted' flag is set, a request to cancel the pending
+  suspend request is queued up and -EBUSY is returned.
+
+If none of the above happens, the function checks if the device's run-time PM
+status is RPM_SUSPENDED and if it has a parent, in which case the parent's
+counter of unsuspended children is incremented.  Next, the RPM_WAKE bit is set
+in the device's run-time PM status field and the request to execute
+__pm_runtime_resume() is put into pm_wq (the device's resume counter is then
+incremented in the pm_request_resume() case).  Finally, the function returns 0,
+which means that the resume request has been successfully queued up.
+
+pm_request_resume_get() leaves the device's resume counter incremented even if
+an error code is returned.  Thus, after pm_request_resume_get() has returned, it
+is necessary to decrement the device's resume counter, with the help of
+pm_runtime_put(), before it's possible to suspend the device again.
+
+It is valid to call pm_request_resume() and pm_request_resume_get() from
+interrupt context.
+
+Note that it usually is _not_ safe to access the device for I/O purposes
+immediately after pm_request_resume() has returned, unless the returned result
+is -EBUSY, which means that it wasn't necessary to resume the device.
+
+Note also that only one suspend request or one resume request may be queued up
+at any given moment.  Moreover, a resume request cannot be queued up along with
+a suspend request.  Still, if it's necessary to queue up a request to cancel a
+pending suspend request, these two requests will be present in pm_wq at the
+same time.  In that case, regardless of which request is attempted to complete
+first, the device's run-time PM status will be set to RPM_ACTIVE as a final
+result.
+
+pm_suspend_possible() is used to check if the device may be suspended at this
+particular moment.  It checks the device's resume counter, the counter of
+unsuspended children, and the run-time PM status.  It returns 'false' if any of
+the counters is greater than 0 or the RPM_WAKE bit is set in the device's
+run-time PM status field.  Otherwise, 'true' is returned.
+
+pm_runtime_enable() and pm_runtime_disable() are used to enable and disable,
+respectively, all of the run-time PM core operations.  For this purpose
+pm_runtime_disable() calls pm_runtime_resume() to put the device into the
+active state, sets the RPM_WAKE bit in the device's run-time PM status field
+and increments the device's resume counter.  In turn, pm_runtime_enable() resets
+the RPM_WAKE bit and decrements the device's resume counter.  Therefore, if
+pm_runtime_disable() is called several times in a row for the same device, it
+has to be balanced by the appropriate number of pm_runtime_enable() calls so
+that the other run-time PM core functions work for that device.  The initial
+values of the device's resume counter and run-time PM status, as set by
+pm_runtime_init(), are 1 and RPM_WAKE, respectively (i.e. the device's run-time
+PM is initially disabled).
+
+pm_runtime_disable() and pm_runtime_enable() are used by the device core to
+disable the run-time power management of devices temporarily during device probe
+and removal as well as during system-wide power transitions (i.e. system-wide
+suspend or hibernation, or resume from a system sleep state).
+
+pm_suspend_ignore_children() is used to set or unset the
+'power.ignore_children' flag in 'struct device'.  If the 'enabled'
+argument is 'true', the field is set to 1, and if 'enable' is 'false', the field
+is set to 0.  The default value of 'power.ignore_children', as set by
+pm_runtime_init(), is 0.
+
+pm_runtime_clear_active() is used to change the device's run-time PM status
+field from RPM_ERROR to RPM_ACTIVE.  It is valid to call this function from
+interrupt context.
+
+pm_runtime_clear_suspended() is used to change the device's run-time PM status
+field from RPM_ERROR to RPM_SUSPENDED.  If the device has a parent, the function
+additionally decrements the parent's counter of unsuspended children, although
+the parent's bus type is not notified if the counter becomes 0.  It is valid to
+call this function from interrupt context.
+
+3. Device Run-time PM Callbacks
+
+There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+
+struct dev_pm_ops {
+	...
+	int (*runtime_suspend)(struct device *dev);
+	int (*runtime_resume)(struct device *dev);
+	void (*runtime_idle)(struct device *dev);
+	...
+};
+
+The ->runtime_suspend() callback is executed by pm_runtime_suspend() for the bus
+type of the device being suspended.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's ->runtime_suspend() callback (from the PM
+core's point of view it is not necessary to implement a ->runtime_suspend()
+callback in a device driver as long as the bus type's ->runtime_suspend() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_suspend() callback has returned successfully,
+    the PM core regards the device as suspended, which need not mean that the
+    device has been put into a low power state.  It is supposed to mean,
+    however, that the device will not communicate with the CPU(s) and RAM until
+    the bus type's ->runtime_resume() callback is executed for it.
+
+  * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
+    the device's run-time PM status is set to RPM_ACTIVE, which means that the
+    device _must_ be fully operational one this has happened.
+
+  * If the bus type's ->runtime_suspend() callback returns an error code
+    different from -EBUSY or -EAGAIN, the PM core regards this as an
+    unrecoverable error and will refuse to run the helper functions described in
+    Section 1 until the status is changed with the help of either
+    pm_runtime_clear_active(), or pm_runtime_clear_suspended() by the device's
+    bus type or driver.
+
+In particular, it is recommended that ->runtime_suspend() return -EBUSY or
+-EAGAIN if device_may_wakeup() returns 'false' for the device.  On the other
+hand, if device_may_wakeup() returns 'true' for the device and the device is put
+into a low power state during the execution of ->runtime_suspend(), it is
+expected that remote wake-up (i.e. hardware mechanism allowing the device to
+request a change of its power state, such as PCI PME) will be enabled for the
+device.  Generally, remote wake-up should be enabled whenever the device is put
+into a low power state at run time and is expected to receive input from the
+outside of the system.
+
+The ->runtime_resume() callback is executed by pm_runtime_resume() for the bus
+type of the device being woken up.  The bus type's callback is then _entirely_
+_responsible_ for handling the device as appropriate, which may, but need not
+include executing the device driver's ->runtime_resume() callback (from the PM
+core's point of view it is not necessary to implement a ->runtime_resume()
+callback in a device driver as long as the bus type's ->runtime_resume() knows
+what to do to handle the device).
+
+  * Once the bus type's ->runtime_resume() callback has returned successfully,
+    the PM core regards the device as fully operational, which means that the
+    device _must_ be able to complete I/O operations as needed.
+
+  * If the bus type's ->runtime_resume() callback returns an error code, the PM
+    core regards this as an unrecoverable error and will refuse to run the
+    helper functions described in Section 1 until the status is changed with the
+    help of either pm_runtime_clear_active(), or pm_runtime_clear_suspended() by
+    the device's bus type or driver.
+
+The ->runtime_idle() callback is executed by pm_runtime_suspend() for the bus
+type of a device the children of which are all suspended (and which has the
+'power.ignore_children' flag unset).  It also is executed if a device's resume
+counter is decremented with the help of pm_runtime_put() and it becomes 0.  The
+action carried out by this callback is totally dependent on the bus type in
+question, but the expected and recommended action is to check if the device can
+be suspended (i.e. if all of the conditions necessary for suspending the device
+are met) and to queue up a suspend request for the device if that is the case.
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -1,8 +1,3 @@
-static inline void device_pm_init(struct device *dev)
-{
-	dev->power.status = DPM_ON;
-}
-
 #ifdef CONFIG_PM_SLEEP
 
 /*
@@ -16,14 +11,16 @@ static inline struct device *to_device(s
 	return container_of(entry, struct device, power.entry);
 }
 
+extern void device_pm_init(struct device *dev);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern void device_pm_move_before(struct device *, struct device *);
 extern void device_pm_move_after(struct device *, struct device *);
 extern void device_pm_move_last(struct device *);
 
-#else /* CONFIG_PM_SLEEP */
+#else /* !CONFIG_PM_SLEEP */
 
+static inline void device_pm_init(struct device *dev) {}
 static inline void device_pm_add(struct device *dev) {}
 static inline void device_pm_remove(struct device *dev) {}
 static inline void device_pm_move_before(struct device *deva,
@@ -32,7 +29,7 @@ static inline void device_pm_move_after(
 					struct device *devb) {}
 static inline void device_pm_move_last(struct device *dev) {}
 
-#endif
+#endif /* !CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM
 

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2009-07-05 21:47 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-22 23:21 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3) Rafael J. Wysocki
2009-06-23 17:00 ` Rafael J. Wysocki
2009-06-23 17:00 ` Rafael J. Wysocki
2009-06-23 17:10 ` Alan Stern
2009-06-23 17:10 ` Alan Stern
2009-06-24  0:08   ` Rafael J. Wysocki
2009-06-24  0:08   ` Rafael J. Wysocki
2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
2009-06-24 19:24       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
2009-06-24 21:30         ` Alan Stern
2009-06-24 21:30         ` Alan Stern
2009-06-25 16:49           ` Alan Stern
2009-06-25 16:49           ` [linux-pm] " Alan Stern
2009-06-25 21:58             ` Rafael J. Wysocki
2009-06-25 21:58             ` [linux-pm] " Rafael J. Wysocki
2009-06-25 23:17               ` Rafael J. Wysocki
2009-06-25 23:17               ` Rafael J. Wysocki
2009-06-26 18:06               ` Alan Stern
2009-06-26 18:06               ` [linux-pm] " Alan Stern
2009-06-26 20:46                 ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Rafael J. Wysocki
2009-06-26 20:46                 ` Rafael J. Wysocki
2009-06-26 21:13                   ` Alan Stern
2009-06-26 22:32                     ` Rafael J. Wysocki
2009-06-27  1:25                       ` Alan Stern
2009-06-27  1:25                       ` Alan Stern
2009-06-27 14:51                       ` Alan Stern
2009-06-27 21:51                         ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 7) Rafael J. Wysocki
2009-06-27 21:51                         ` Rafael J. Wysocki
2009-06-27 14:51                       ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Alan Stern
2009-06-26 22:32                     ` Rafael J. Wysocki
2009-06-28 10:25                     ` Rafael J. Wysocki
2009-06-28 21:07                       ` Alan Stern
2009-06-28 21:07                       ` Alan Stern
2009-06-29  0:15                         ` Rafael J. Wysocki
2009-06-29  0:15                         ` Rafael J. Wysocki
2009-06-29  3:05                           ` Alan Stern
2009-06-29 14:09                             ` Rafael J. Wysocki
2009-06-29 14:09                             ` Rafael J. Wysocki
2009-06-29 14:29                               ` Alan Stern
2009-06-29 14:54                                 ` Rafael J. Wysocki
2009-06-29 14:54                                 ` Rafael J. Wysocki
2009-06-29 15:27                                   ` Alan Stern
2009-06-29 15:27                                   ` Alan Stern
2009-06-29 15:55                                     ` Rafael J. Wysocki
2009-06-29 15:55                                     ` Rafael J. Wysocki
2009-06-29 16:10                                       ` Alan Stern
2009-06-29 16:39                                         ` Rafael J. Wysocki
2009-06-29 16:39                                         ` Rafael J. Wysocki
2009-06-29 17:29                                           ` Alan Stern
2009-06-29 17:29                                           ` Alan Stern
2009-06-29 18:25                                             ` Rafael J. Wysocki
2009-06-29 18:25                                             ` Rafael J. Wysocki
2009-06-29 19:25                                               ` Alan Stern
2009-06-29 21:04                                                 ` Rafael J. Wysocki
2009-06-29 22:00                                                   ` Alan Stern
2009-06-29 22:50                                                     ` Rafael J. Wysocki
2009-06-29 22:50                                                     ` Rafael J. Wysocki
2009-06-30 15:10                                                       ` Alan Stern
2009-06-30 15:10                                                       ` Alan Stern
2009-06-30 22:30                                                         ` [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) Rafael J. Wysocki
2009-06-30 22:30                                                         ` Rafael J. Wysocki
2009-07-01 15:35                                                           ` Alan Stern
2009-07-01 22:19                                                             ` Rafael J. Wysocki
2009-07-02 15:42                                                               ` Rafael J. Wysocki
2009-07-02 15:42                                                               ` Rafael J. Wysocki
2009-07-02 15:55                                                               ` Alan Stern
2009-07-02 15:55                                                               ` Alan Stern
2009-07-02 17:50                                                                 ` Rafael J. Wysocki
2009-07-02 19:53                                                                   ` Alan Stern
2009-07-02 19:53                                                                   ` Alan Stern
2009-07-02 23:05                                                                     ` Rafael J. Wysocki
2009-07-02 23:05                                                                     ` Rafael J. Wysocki
2009-07-03 20:58                                                                       ` Alan Stern
2009-07-03 20:58                                                                       ` Alan Stern
2009-07-03 23:57                                                                         ` Rafael J. Wysocki
2009-07-04  3:12                                                                           ` Alan Stern
2009-07-04  3:12                                                                           ` Alan Stern
2009-07-04 21:27                                                                             ` Rafael J. Wysocki
2009-07-04 21:27                                                                             ` Rafael J. Wysocki
2009-07-05 14:50                                                                               ` Alan Stern
2009-07-05 14:50                                                                               ` Alan Stern
2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
2009-07-05 21:47                                                                                 ` Rafael J. Wysocki
2009-07-03 23:57                                                                         ` Rafael J. Wysocki
2009-07-02 17:50                                                                 ` Rafael J. Wysocki
2009-07-01 22:19                                                             ` Rafael J. Wysocki
2009-07-01 15:35                                                           ` Alan Stern
2009-06-29 21:04                                                 ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6) Rafael J. Wysocki
2009-06-29 14:29                               ` Alan Stern
2009-06-29  3:05                           ` Alan Stern
2009-06-28 10:25                     ` Rafael J. Wysocki
2009-06-26 21:13                   ` Alan Stern
2009-06-26 21:49           ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 5) Rafael J. Wysocki
2009-06-26 21:49           ` Rafael J. Wysocki
2009-06-25 14:57         ` Magnus Damm
2009-06-25 14:57           ` Magnus Damm
2009-06-26 22:02           ` Rafael J. Wysocki
2009-06-26 22:02           ` Rafael J. Wysocki
2009-06-25 14:57         ` Magnus Damm
2009-06-24 19:24       ` Rafael J. Wysocki
2009-06-24  0:36     ` [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 4) Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2009-06-22 23:21 [PATCH] PM: Introduce core framework for run-time PM of I/O devices (rev. 3) Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.