All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Fix Device Power Management States
@ 2004-08-09 10:43 Patrick Mochel
  2004-08-09 11:38 ` Pavel Machek
                   ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-09 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: pavel, benh, david-b


Ok, the patch below attempts to fix up the device power management
handling, taking into account (hopefully) everything that has been said
over the last week+, and lessons learned over the years. It's only been
compile-tested, and is meant just to prove that the framework is possible.
There are likely to be some missing pieces, mainly because it's late. :)

The highlights are that it adds:

- To struct class:

	int     (*dev_start)(struct class_device * dev);
	int     (*dev_stop)(struct class_device * dev);

- ->dev_stop() and ->dev_start() to struct class
  This provides the framework to shutdown a device from a functional
  level, rather than at a hardware level, as well as the entry points
  to stop/start ALL devices in the system.

  This is implemented by iterating through the list of struct classes,
  then through each of their struct class_device's. The class_device is
  the only argument to those functions.


- To struct bus_type:
	int             (*pm_save)(struct device *, struct pm_state *);
	int             (*pm_restore)(struct device *);

  These methods provide the interface for saving and restoring low-
  level device state only. The intent is to remove the callbacks from
  struct device_driver, and unconditionally call through the bus. (That's
  what all buses with drivers that implement those functions do today.)

  The methods are called for each device in the global device list, in
  a reverse order (so children are saved before parents).

- To struct bus_type:

	int             (*pm_power)(struct device *, struct pm_state *);

  This method is used to do all power transitions, including both
  shutdown/reset and device power management states.


The 2nd argument to ->pm_save() and ->pm_power() is determined by mapping
an enumerated system power state to a static array of 'struct pm_state'
pointers, that is in dev->power.pm_system. The pointers in that array
point to entries in dev->power.pm_supports, which is an array of all the
device power states that device supports. Please see include/linux/pm.h in
the patch below. These arrays should be initialized by the bus, though
they can be fixed up by the individual drivers, should they have a
different set of states they support, or given user policy.

The actual value of the 'struct pm_state' instances is driver-specific,
though again, the bus drivers should provide the set of power states valid
for that bus. For example:


struct pm_state {
        char    * name;
        u32     state;
};

#define decl_state(n, s) \
        struct pm_state pm_state_##n = {        \
                .name   = __stringify(n),       \
                .state  = s,                    \
        }

drivers/pci/ should do:

decl_state(d0, PCI_PM_CAP_PME_D0);
EXPORT_SYMBOL(pm_state_d0);

decl_state(d1, PCI_PM_CAP_PME_D1);
EXPORT_SYMBOL(pm_state_d1);

decl_state(d2, PCI_PM_CAP_PME_D2);
EXPORT_SYMBOL(pm_state_d2);

decl_state(d3, PCI_PM_CAP_PME_D3);
EXPORT_SYMBOL(pm_state_d3);


This provides a meaningful tag to each state that is completely local to
the bus, but can be handled easily by the core.

To handle runtime power management, a set of sysfs files will be created,
inclding 'current' and 'supports'. The latter will display all the
possible states the device supports, as specified its ->power.pm_supports
array, in an attractive string format. 'current' will display the current
power state, as an ASCII string. Writing to this file will look up the
state requested in ->pm_supports, and if found, translate that into the
device-specific power state and suspend the device.

The patches to implement that, as well as initial PCI support and system
power support, will hopefuly eek out in the next week..

Thanks,


	Pat

===== drivers/base/class.c 1.51 vs edited =====
--- 1.51/drivers/base/class.c	2004-06-10 02:35:56 -07:00
+++ edited/drivers/base/class.c	2004-08-09 01:04:23 -07:00
@@ -69,7 +69,7 @@
 };

 /* Hotplug events for classes go to the class_obj subsys */
-static decl_subsys(class, &ktype_class, NULL);
+decl_subsys(class, &ktype_class, NULL);


 int class_create_file(struct class * cls, const struct class_attribute * attr)
===== drivers/base/core.c 1.86 vs edited =====
--- 1.86/drivers/base/core.c	2004-07-07 17:55:49 -07:00
+++ edited/drivers/base/core.c	2004-08-09 02:35:03 -07:00
@@ -19,7 +19,6 @@
 #include <asm/semaphore.h>

 #include "base.h"
-#include "power/power.h"

 int (*platform_notify)(struct device * dev) = NULL;
 int (*platform_notify_remove)(struct device * dev) = NULL;
@@ -227,8 +226,6 @@

 	if ((error = kobject_add(&dev->kobj)))
 		goto Error;
-	if ((error = device_pm_add(dev)))
-		goto PMError;
 	if ((error = bus_add_device(dev)))
 		goto BusError;
 	down_write(&devices_subsys.rwsem);
@@ -243,8 +240,6 @@
 	put_device(dev);
 	return error;
  BusError:
-	device_pm_remove(dev);
- PMError:
 	kobject_del(&dev->kobj);
  Error:
 	if (parent)
@@ -326,7 +321,6 @@
 	if (platform_notify_remove)
 		platform_notify_remove(dev);
 	bus_remove_device(dev);
-	device_pm_remove(dev);
 	kobject_del(&dev->kobj);
 	if (parent)
 		put_device(parent);
===== drivers/base/power/Makefile 1.6 vs edited =====
--- 1.6/drivers/base/power/Makefile	2004-03-09 14:25:37 -08:00
+++ edited/drivers/base/power/Makefile	2004-08-09 02:33:17 -07:00
@@ -1,5 +1,5 @@
 obj-y			:= shutdown.o
-obj-$(CONFIG_PM)	+= main.o suspend.o resume.o runtime.o sysfs.o
+obj-$(CONFIG_PM)	+= class.o state.o

 ifeq ($(CONFIG_DEBUG_DRIVER),y)
 EXTRA_CFLAGS += -DDEBUG
===== drivers/base/power/class.c 1.2 vs edited =====
--- 1.2/drivers/base/power/class.c	2004-08-09 01:00:35 -07:00
+++ edited/drivers/base/power/class.c	2004-08-09 01:37:10 -07:00
@@ -0,0 +1,160 @@
+/*
+ *	drivers/base/power/class.c - Class-based device power management
+ *
+ *
+ * Class power management provides the ability to stop/start a device from a
+ * functional point of view. By doing this at a high level, we can be sure that
+ * no processes are touching the hardware, so we can be free to save its state
+ * and power it down (or vice versa).
+ *
+ *
+ * Class power management relies on the class implementing two methods:
+ *
+ * int (*dev_stop)(struct class_device *);
+ *
+ *	Quiesce the device. Finish, stop and/or prevent any future transactions
+ *	from being passed to the device. Prevent it from obtaining any new
+ *	connections, etc.
+ *
+ * int (*dev_start)(struct class_device *);
+ *
+ * 	Start the device. Restart any stopped transactions. Add it back to
+ *	various queues. Make it available for future use.
+ *
+ * Both methods are called with interrupts enabled, and the class is free to
+ * sleep. It should be noted, however, that during a system suspend transition,
+ * the order in which devices are suspended is not guaranteed. So, all swap
+ * devices could be stopped before any given device is, making it impossible
+ * to allocate large amounts of memory.
+ *
+ *
+ * These methods may be called either during a suspend transition or during a
+ * runtime device power transition for a particular device. The implementation
+ * of these methods should not depend on which it is, and is not passed that
+ * information.
+ */
+
+#define PM_DEBUG
+#ifdef PM_DEBUG
+#define DEBUG
+#endif
+
+#include <linux/device.h>
+
+
+extern struct subsystem class_subsys;
+
+int class_dev_stop(struct class_device * dev)
+{
+	int error = 0;
+
+	pr_debug("\t\t%s\n",dev->class_id);
+	if (dev->running) {
+		error = dev->class->dev_stop(dev);
+		if (!error)
+			dev->running = 0;
+	}
+	return error;
+}
+
+
+int class_dev_start(struct class_device * dev)
+{
+	int error = 0;
+
+	pr_debug("\t\t%s\n",dev->class_id);
+	if (!dev->running) {
+		error = dev->class->dev_start(dev);
+		if (!error)
+			dev->running = 1;
+	}
+	return 0;
+}
+
+
+static int __class_start(struct class * cls, struct class_device * cd)
+{
+	struct list_head * head = &cls->children;
+	int error = 0;
+
+	pr_debug("\t%s\n",cls->name);
+	cd = list_prepare_entry(cd, head, kobj.entry);
+	list_for_each_entry_continue(cd, head, kobj.entry) {
+		error = class_dev_start(cd);
+		if (error)
+			break;
+	}
+	return error;
+}
+
+static int class_start(struct class * cls)
+{
+	int error;
+
+	down_read(&cls->subsys.rwsem);
+	error = __class_start(cls, NULL);
+	up_read(&cls->subsys.rwsem);
+	return error;
+}
+
+static int class_stop(struct class * cls)
+{
+	struct class_device * cd;
+	int error = 0;
+
+	pr_debug("\t%s\n",cls->name);
+
+	down_read(&cls->subsys.rwsem);
+	list_for_each_entry(cd, &cls->children, kobj.entry) {
+		error = class_dev_stop(cd);
+		if (error) {
+			__class_start(cls, cd);
+			break;
+		}
+	}
+	up_read(&cls->subsys.rwsem);
+	return error;
+}
+
+
+static int __device_start(struct class * cls)
+{
+	int error = 0;
+	struct list_head * head = &class_subsys.kset.list;
+
+	cls = list_prepare_entry(cls, head, subsys.kset.kobj.entry);
+	list_for_each_entry_continue(cls, head, subsys.kset.kobj.entry) {
+		if ((error = class_start(cls)))
+			break;
+	}
+	return error;
+}
+
+int device_start(void)
+{
+	int error;
+
+	pr_debug("Starting Classes:\n");
+	down_read(&class_subsys.rwsem);
+	error = __device_start(NULL);
+	down_write(&class_subsys.rwsem);
+	return error;
+}
+
+int device_stop(void)
+{
+	int error = 0;
+	struct class * cls;
+
+	pr_debug("Stopping Classes:\n");
+	down_read(&class_subsys.rwsem);
+	list_for_each_entry_reverse(cls, &class_subsys.kset.list, subsys.kset.kobj.entry) {
+		error = class_stop(cls);
+		if (error) {
+			__device_start(cls);
+			break;
+		}
+	}
+	up_read(&class_subsys.rwsem);
+	return error;
+}
===== drivers/base/power/power.c 1.1 vs edited =====
--- 1.1/drivers/base/power/power.c	2004-08-09 02:33:57 -07:00
+++ edited/drivers/base/power/power.c	2004-08-09 02:59:30 -07:00
@@ -0,0 +1,75 @@
+/*
+ *	drivers/base/power.c - low-level power management
+ *
+ *
+ */
+
+#include <linux/device.h>
+
+extern struct subsystem devices_subsys;
+
+
+
+int device_power_dev(struct device * dev, struct pm_state * state)
+{
+	int error;
+
+	if (!state)
+		return 0;
+	if (dev->power.pm_current == state)
+		return 0;
+	error = dev->bus->pm_power(dev, state);
+	if (!error)
+		dev->power.pm_current = state;
+	return error;
+}
+
+int device_power_default(struct device * dev)
+{
+	return device_power_dev(dev, dev->power.pm_default);
+}
+
+static int __device_power_up(struct device * dev)
+{
+	struct list_head * head = &devices_subsys.kset.list;
+	struct pm_state * state;
+	int error = 0;
+
+	dev = list_prepare_entry(dev, head, kobj.entry);
+
+	list_for_each_entry_continue(dev, head, kobj.entry) {
+		state = dev->power.pm_resume;
+		if ((error = device_power_dev(dev, state)))
+			break;
+	}
+	return error;
+}
+
+
+int device_power_up(void)
+{
+	int error;
+	down_read(&devices_subsys.rwsem);
+	error = __device_power_up(NULL);
+	up_read(&devices_subsys.rwsem);
+	return error;
+}
+
+int device_power_down(u32 sys_state)
+{
+	struct device * dev;
+	struct pm_state * state;
+	int error = 0;
+
+	down_read(&devices_subsys.rwsem);
+	list_for_each_entry_reverse(dev, &devices_subsys.kset.list, entry) {
+		state = dev->power.pm_system[sys_state];
+		error = device_power_dev(dev, state);
+		if (error) {
+			__device_power_up(dev);
+			break;
+		}
+	}
+	up_read(&devices_subsys.rwsem);
+	return error;
+}
===== drivers/base/power/power.h 1.7 vs edited =====
--- 1.7/drivers/base/power/power.h	2004-06-09 23:34:24 -07:00
+++ edited/drivers/base/power/power.h	2004-08-09 02:42:53 -07:00
@@ -34,17 +34,6 @@
 extern struct list_head dpm_off;
 extern struct list_head dpm_off_irq;

-
-static inline struct dev_pm_info * to_pm_info(struct list_head * entry)
-{
-	return container_of(entry, struct dev_pm_info, entry);
-}
-
-static inline struct device * to_device(struct list_head * entry)
-{
-	return container_of(to_pm_info(entry), struct device, power);
-}
-
 extern int device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);

@@ -99,3 +88,13 @@
 }

 #endif
+
+extern int class_dev_stop(struct class_device *);
+extern int class_dev_start(struct class_device *);
+
+extern int device_save_dev(struct device *, struct pm_state *);
+extern int device_restore_dev(struct device *, struct pm_state *);
+
+extern int device_power_dev(struct device *, struct pm_state *);
+extern int device_power_default(struct device *);
+
===== drivers/base/power/shutdown.c 1.32 vs edited =====
--- 1.32/drivers/base/power/shutdown.c	2004-06-09 23:34:24 -07:00
+++ edited/drivers/base/power/shutdown.c	2004-08-09 02:51:17 -07:00
@@ -21,15 +21,7 @@

 int device_detach_shutdown(struct device * dev)
 {
-	if (!dev->detach_state)
-		return 0;
-
-	if (dev->detach_state == DEVICE_PM_OFF) {
-		if (dev->driver && dev->driver->shutdown)
-			dev->driver->shutdown(dev);
-		return 0;
-	}
-	return dpm_runtime_suspend(dev, dev->detach_state);
+	return device_power_default(dev);
 }


@@ -49,19 +41,7 @@
  */
 void device_shutdown(void)
 {
-	struct device * dev;
-
-	down_write(&devices_subsys.rwsem);
-	list_for_each_entry_reverse(dev, &devices_subsys.kset.list, kobj.entry) {
-		pr_debug("shutting down %s: ", dev->bus_id);
-		if (dev->driver && dev->driver->shutdown) {
-			pr_debug("Ok\n");
-			dev->driver->shutdown(dev);
-		} else
-			pr_debug("Ignored.\n");
-	}
-	up_write(&devices_subsys.rwsem);
-
+	device_power_down(pm_sys_SHUTDOWN);
 	sysdev_shutdown();
 }

===== drivers/base/power/state.c 1.1 vs edited =====
--- 1.1/drivers/base/power/state.c	2004-08-09 01:38:01 -07:00
+++ edited/drivers/base/power/state.c	2004-08-09 02:55:03 -07:00
@@ -0,0 +1,100 @@
+/*
+ *	drivers/base/power/state.c - Device state management
+ *
+ *
+ * When devices are suspended, either during runtime or during a system
+ * state transition, its driver must save a certain amount of state to be
+ * restored at a later time. This file implements the top-level functions
+ * to provide that service.
+ *
+ * We rely on the buses implementing two methods to accomplish this:
+ *
+ * int (*pm_save)(struct device * dev, u32 state);
+ *
+ *	Allocate memory and copy register state to it. The second parameter
+ *	specifies what state the device is going to enter, so the drivers
+ *	know how much state to save.
+ *
+ *
+ * int (*pm_restore)(struct device * dev);
+ *
+ *	Restore the device to the state saved with ->pm_save() and free memory
+ *	previously allocated. The drivers should check dev->power.prev_state
+ *	to determine what state the device was previously in.
+ *
+ *
+ * These functions are called with interrupts enabled, and the drivers are free
+ * to allocate memory to save device state. However, it's important to note that
+ * in the case of a system power state transition, no devices will be active
+ * during these calls, meaning no swap devices will be able to reclaim memory
+ * if there is little left.
+ *
+ */
+
+#include <linux/device.h>
+
+extern struct subsystem devices_subsys;
+
+int device_save_dev(struct device * dev, struct pm_state * state)
+{
+	return dev->bus->pm_save(dev, state);
+}
+
+
+int device_restore_dev(struct device * dev)
+{
+	return dev->bus->pm_restore(dev);
+}
+
+
+
+static int __device_restore(struct device * dev, u32 sys_state)
+{
+	struct list_head * head = &devices_subsys.kset.list;
+	int error = 0;
+
+	dev = list_prepare_entry(dev, head, kobj.entry);
+
+	list_for_each_entry_continue(dev, head, kobj.entry) {
+		if ((error = device_restore_dev(dev)))
+			break;
+	}
+	return error;
+}
+
+int device_restore(u32 sys_state)
+{
+	int error;
+
+	down_read(&devices_subsys.rwsem);
+	error = __device_restore(NULL, sys_state);
+	up_read(&devices_subsys.rwsem);
+	return error;
+}
+
+
+int device_save(u32 sys_state)
+{
+	int error = 0;
+	struct device * dev;
+	struct pm_state * state;
+
+	if (sys_state >= pm_sys_NUM)
+		return -EINVAL;
+
+	down_read(&devices_subsys.rwsem);
+	list_for_each_entry_reverse(dev, &devices_subsys.kset.list, kobj.entry) {
+		state = dev->power.pm_system[sys_state];
+		if (state && state != dev->power.pm_current) {
+			error = device_save_dev(dev, state);
+			if (!error)
+				dev->power.pm_resume = dev->power.pm_current;
+			else {
+				__device_restore(dev, sys_state);
+				break;
+			}
+		}
+	}
+	up_read(&devices_subsys.rwsem);
+	return error;
+}
===== include/linux/device.h 1.128 vs edited =====
--- 1.128/include/linux/device.h	2004-07-07 17:56:21 -07:00
+++ edited/include/linux/device.h	2004-08-09 02:44:19 -07:00
@@ -64,6 +64,10 @@
 				    int num_envp, char *buffer, int buffer_size);
 	int		(*suspend)(struct device * dev, u32 state);
 	int		(*resume)(struct device * dev);
+
+	int		(*pm_save)(struct device *, struct pm_state *);
+	int		(*pm_restore)(struct device *);
+	int		(*pm_power)(struct device *, struct pm_state *);
 };

 extern int bus_register(struct bus_type * bus);
@@ -156,6 +160,9 @@

 	void	(*release)(struct class_device *dev);
 	void	(*class_release)(struct class *class);
+
+	int	(*dev_start)(struct class_device * dev);
+	int	(*dev_stop)(struct class_device * dev);
 };

 extern int class_register(struct class *);
@@ -187,6 +194,7 @@
 	void			* class_data;	/* class-specific data */

 	char	class_id[BUS_ID_SIZE];	/* unique to this class */
+	int	running;		/* device is active or not */
 };

 static inline void *
===== include/linux/pm.h 1.16 vs edited =====
--- 1.16/include/linux/pm.h	2004-07-01 22:23:53 -07:00
+++ edited/include/linux/pm.h	2004-08-09 02:58:59 -07:00
@@ -222,6 +222,28 @@
 extern int pm_suspend(u32 state);


+enum {
+	pm_sys_ON = 1,
+	pm_sys_SHUTDOWN,
+	pm_sys_RESET,
+	pm_sys_STANDBY,
+	pm_sys_SUSPEND,
+	pm_sys_HIBERNATE,
+	pm_sys_NUM,
+};
+
+
+struct pm_state {
+	char	* name;
+	u32	state;
+};
+
+#define decl_state(n, s) \
+	struct pm_state pm_state_##n = {	\
+		.name	= __stringify(n),	\
+		.state	= s,			\
+	}
+
 /*
  * Device power management
  */
@@ -229,22 +251,29 @@
 struct device;

 struct dev_pm_info {
-	u32			power_state;
-#ifdef	CONFIG_PM
-	u32			prev_state;
-	u8			* saved_state;
-	atomic_t		pm_users;
-	struct device		* pm_parent;
-	struct list_head	entry;
-#endif
+	struct pm_state	* pm_current;
+	struct pm_state	** pm_supports;
+	struct pm_state	* pm_default;
+	struct pm_state	* pm_resume;
+	struct pm_state	* pm_system[pm_sys_NUM];
 };

+
 extern void device_pm_set_parent(struct device * dev, struct device * parent);

 extern int device_suspend(u32 state);
-extern int device_power_down(u32 state);
-extern void device_power_up(void);
 extern void device_resume(void);
+
+
+extern int device_stop(void);
+extern int device_start(void);
+
+extern int device_save(u32 sys_state);
+extern int device_restore(u32 sys_state);
+
+extern int device_power_up(void);
+extern int device_power_down(u32 sys_state);
+


 #endif /* __KERNEL__ */

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 10:43 [RFC] Fix Device Power Management States Patrick Mochel
@ 2004-08-09 11:38 ` Pavel Machek
  2004-08-09 16:02   ` Patrick Mochel
  2004-08-10  0:40 ` Benjamin Herrenschmidt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2004-08-09 11:38 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, benh, david-b

Hi!

> Ok, the patch below attempts to fix up the device power management
> handling, taking into account (hopefully) everything that has been said
> over the last week+, and lessons learned over the years. It's only been
> compile-tested, and is meant just to prove that the framework is possible.
> There are likely to be some missing pieces, mainly because it's late. :)
> 
> The highlights are that it adds:
> 
> - To struct class:
> 
> 	int     (*dev_start)(struct class_device * dev);
> 	int     (*dev_stop)(struct class_device * dev);
> 
> - ->dev_stop() and ->dev_start() to struct class
>   This provides the framework to shutdown a device from a functional
>   level, rather than at a hardware level, as well as the entry points
>   to stop/start ALL devices in the system.

Hmm, that will need lots of new code to call these, right? Like
device_suspend() will now get device_stop() equivalent, etc....

>   This is implemented by iterating through the list of struct classes,
>   then through each of their struct class_device's. The class_device is
>   the only argument to those functions.

Aha, so you are saying these do not need to be done in hardware order?

Semantics of dev_stop is "may not do DMA and interrupts when stopped",
right?

> - To struct bus_type:
> 
> 	int             (*pm_power)(struct device *, struct pm_state *);
> 
>   This method is used to do all power transitions, including both
>   shutdown/reset and device power management states.
> 
> 
> The 2nd argument to ->pm_save() and ->pm_power() is determined by mapping
> an enumerated system power state to a static array of 'struct
> pm_state'

I do not think we want to see "u32 state" any more. This really needs
to be specified as "enum something" otherwise everyone will get it
wrong... again.

> pointers, that is in dev->power.pm_system. The pointers in that array
> point to entries in dev->power.pm_supports, which is an array of all the
> device power states that device supports. Please see include/linux/pm.h in
> the patch below. These arrays should be initialized by the bus, though
> they can be fixed up by the individual drivers, should they have a
> different set of states they support, or given user policy.

I'd simply pass system state down to the driver, and let them map as
they see fit... This seems to be way too complex. OTOH you are solving
'runtime suspend', too...


> ===== include/linux/pm.h 1.16 vs edited =====
> --- 1.16/include/linux/pm.h	2004-07-01 22:23:53 -07:00
> +++ edited/include/linux/pm.h	2004-08-09 02:58:59 -07:00
> @@ -222,6 +222,28 @@
>  extern int pm_suspend(u32 state);
> 
> 
> +enum {
> +	pm_sys_ON = 1,
> +	pm_sys_SHUTDOWN,
> +	pm_sys_RESET,
> +	pm_sys_STANDBY,
> +	pm_sys_SUSPEND,
> +	pm_sys_HIBERNATE,
> +	pm_sys_NUM,
> +};

I believe different state is needed for "quiesce for atomic copy" and
for "we are really going down to S4 now".


> +struct pm_state {
> +	char	* name;
> +	u32	state;
> +};

Could we get something more specific than u32 here?
> +
>  extern void device_pm_set_parent(struct device * dev, struct device * parent);
> 
>  extern int device_suspend(u32 state);
> -extern int device_power_down(u32 state);
> -extern void device_power_up(void);

Hmm, how we do "tell devices to shut off with interrupts disable", now?

>  extern void device_resume(void);
> +
> +
> +extern int device_stop(void);
> +extern int device_start(void);
> +
> +extern int device_save(u32 sys_state);
> +extern int device_restore(u32 sys_state);
> +
> +extern int device_power_up(void);
> +extern int device_power_down(u32 sys_state);

Make these u32's enums, please...
									Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 11:38 ` Pavel Machek
@ 2004-08-09 16:02   ` Patrick Mochel
  2004-08-09 21:29     ` Pavel Machek
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-09 16:02 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, benh, david-b


On Mon, 9 Aug 2004, Pavel Machek wrote:

> > - ->dev_stop() and ->dev_start() to struct class
> >   This provides the framework to shutdown a device from a functional
> >   level, rather than at a hardware level, as well as the entry points
> >   to stop/start ALL devices in the system.
>
> Hmm, that will need lots of new code to call these, right? Like
> device_suspend() will now get device_stop() equivalent, etc....

Uh, not really. During suspend-to-disk, you would call

	device_stop();
	device_save();
	<snapshot system>
	class_device_start(suspend_device);
	<write snapshot>
	class_device_stop(suspend_device);
	device_power_down(state);

> >   This is implemented by iterating through the list of struct classes,
> >   then through each of their struct class_device's. The class_device is
> >   the only argument to those functions.
>
> Aha, so you are saying these do not need to be done in hardware order?

AFAICT, no.

> Semantics of dev_stop is "may not do DMA and interrupts when stopped",
> right?

To be more precise, "device is not processing any transactions and will
not be used to submit more to". It's up to the class to remove it from any
queues, etc, so DMA never has a chance to begin.

> I do not think we want to see "u32 state" any more. This really needs
> to be specified as "enum something" otherwise everyone will get it
> wrong... again.

That's not the point, at least at this stage. Eventually, the u32 could be
replaced with an enum. But, remember why there is confusion - the value of
the system suspend level was also passed to the drivers as the device
suspend level. If you read the email again and look at the code, you will
see that drivers always get passed a 'struct pm_state', and the 'u32' are
only used between the suspend imeplementations and the driver core (which
is the re-mapper). I think compile-time checks are good, but we're in
bigger trouble if we can't get a half-dozen calls using the right value..

> > pointers, that is in dev->power.pm_system. The pointers in that array
> > point to entries in dev->power.pm_supports, which is an array of all the
> > device power states that device supports. Please see include/linux/pm.h in
> > the patch below. These arrays should be initialized by the bus, though
> > they can be fixed up by the individual drivers, should they have a
> > different set of states they support, or given user policy.
>
> I'd simply pass system state down to the driver, and let them map as
> they see fit... This seems to be way too complex. OTOH you are solving
> 'runtime suspend', too...

It's not too complex; it's simply that the driver core is the one
responsible between mapping a system suspend state to the device suspend
state, based on values that the driver knows a priori. If you push that
responsibility down to the drivers, you require all of them to implement
the same thing, causing code duplication, which means more copy-n-pasting,
and more of a chance to get it wrong. If we choose to do it once and right
in the driver core, the resulting drivers become simpler, since they only
have to respond to something that says "enter this power state, damnit"

On the other hand, if it's too complicated for you, I encourage you to
modify the patch or create a new one that solves all of the problems in a
simpler manner.

> > +enum {
> > +	pm_sys_ON = 1,
> > +	pm_sys_SHUTDOWN,
> > +	pm_sys_RESET,
> > +	pm_sys_STANDBY,
> > +	pm_sys_SUSPEND,
> > +	pm_sys_HIBERNATE,
> > +	pm_sys_NUM,
> > +};
>
> I believe different state is needed for "quiesce for atomic copy" and
> for "we are really going down to S4 now".

There is nothing fundamentally different at the functional level - you
don't want any devices fulfilling any request. Besides, by the time the
system is actually ready to be placed in S4, the devices have long-since
been stopped, and the class devices do not need another notification
beyond "stop"

Thanks,


	Pat


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 16:02   ` Patrick Mochel
@ 2004-08-09 21:29     ` Pavel Machek
  2004-08-10  5:03       ` Patrick Mochel
  2004-08-09 22:15     ` Nigel Cunningham
  2004-08-10  0:43     ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2004-08-09 21:29 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, benh, david-b

Hi!

> Uh, not really. During suspend-to-disk, you would call
> 
> 	device_stop();
> 	device_save();
> 	<snapshot system>
> 	class_device_start(suspend_device);
> 	<write snapshot>
> 	class_device_stop(suspend_device);
> 	device_power_down(state);

Ok, I can live with that.

> > Semantics of dev_stop is "may not do DMA and interrupts when stopped",
> > right?
> 
> To be more precise, "device is not processing any transactions and will
> not be used to submit more to". It's up to the class to remove it from any
> queues, etc, so DMA never has a chance to begin.

Well, "no DMA" needs to be part of definition, too, because some
devices (USB) do DMA only if they have nothing to do.

> It's not too complex; it's simply that the driver core is the one
> responsible between mapping a system suspend state to the device suspend
> state, based on values that the driver knows a priori. If you push that
> responsibility down to the drivers, you require all of them to implement
> the same thing, causing code duplication, which means more
> copy-n-pasting,

...well, no, if device wants (for example) PCI state, it can call
to_pci_state() function from core. That should avoid code
duplication. OTOH if driver wants to do something more advanced, it
still can be done.

> and more of a chance to get it wrong. If we choose to do it once and right
> in the driver core, the resulting drivers become simpler, since they only
> have to respond to something that says "enter this power state, damnit"
> 
> On the other hand, if it's too complicated for you, I encourage you to
> modify the patch or create a new one that solves all of the problems in a
> simpler manner.

Well, if runtime suspend is the goal, your patch reflects that
complexity.

> > I believe different state is needed for "quiesce for atomic copy" and
> > for "we are really going down to S4 now".
> 
> There is nothing fundamentally different at the functional level - you
> don't want any devices fulfilling any request. Besides, by the time the
> system is actually ready to be placed in S4, the devices have long-since
> been stopped, and the class devices do not need another notification
> beyond "stop"

You are right, I was not reading carefully enough.

Anyway, there's still one problem:

if something like this gets merged, it will immediately break swsusp
because initially no drivers will have "stop" methods.

Passing system state down to drivers and having special "quiesce"
(as discussed in rather long thread) state has advantage of
automagicaly working on drivers that ignore u32 parameter of suspend
callback (and that's most of them). David's patches do not bring us
runtime suspend capabilities, but do not force us to go through all
the drivers, either...

You could perhaps make the code call suspend when no stop callback is
present... That should add a simple migration path... and would
probably work for me.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 16:02   ` Patrick Mochel
  2004-08-09 21:29     ` Pavel Machek
@ 2004-08-09 22:15     ` Nigel Cunningham
  2004-08-10  0:43     ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 42+ messages in thread
From: Nigel Cunningham @ 2004-08-09 22:15 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Linux Kernel Mailing List, Benjamin Herrenschmidt, david-b

Hi.

On Tue, 2004-08-10 at 02:02, Patrick Mochel wrote:
> On Mon, 9 Aug 2004, Pavel Machek wrote:
> 
> > > - ->dev_stop() and ->dev_start() to struct class
> > >   This provides the framework to shutdown a device from a functional
> > >   level, rather than at a hardware level, as well as the entry points
> > >   to stop/start ALL devices in the system.

Looks good to me. That said, I'll keep my 'device_tree' patch while I
wait for this to get read for use.

Nigel


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 10:43 [RFC] Fix Device Power Management States Patrick Mochel
  2004-08-09 11:38 ` Pavel Machek
@ 2004-08-10  0:40 ` Benjamin Herrenschmidt
  2004-08-10  4:55   ` Patrick Mochel
  2004-08-10 10:33 ` Matthew Garrett
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-10  0:40 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Linux Kernel list, Pavel Machek, David Brownell


> The highlights are that it adds:
> 
> - To struct class:
> 
> 	int     (*dev_start)(struct class_device * dev);
> 	int     (*dev_stop)(struct class_device * dev);
> 
> - ->dev_stop() and ->dev_start() to struct class
>   This provides the framework to shutdown a device from a functional
>   level, rather than at a hardware level, as well as the entry points
>   to stop/start ALL devices in the system.
> 
>   This is implemented by iterating through the list of struct classes,
>   then through each of their struct class_device's. The class_device is
>   the only argument to those functions.

Hrm... I don't agree, that iteration should be done in bus ordering too.

For example, if you stop operation of a USB host controller, you have to
do that after you have stopped operation of child devices. Same goes
with the ATA disk vs. controller. The ordering requirements for stopping
operations are the same as for PM
> 
> - To struct bus_type:
> 	int             (*pm_save)(struct device *, struct pm_state *);
> 	int             (*pm_restore)(struct device *);
> 
>   These methods provide the interface for saving and restoring low-
>   level device state only. The intent is to remove the callbacks from
>   struct device_driver, and unconditionally call through the bus. (That's
>   what all buses with drivers that implement those functions do today.)
> 
>   The methods are called for each device in the global device list, in
>   a reverse order (so children are saved before parents).

What about passing the previous state to restore ? could be useful...

> - To struct bus_type:
> 
> 	int             (*pm_power)(struct device *, struct pm_state *);
> 
>   This method is used to do all power transitions, including both
>   shutdown/reset and device power management states.

Who calls it ? It's the driver calling it's bus or what ? It make no
sense to power manage a device before suspending activity... I agree it
may be worth splitting dev_start/stop from PM transitions proper, that
would help dealing with various policies, however, there are still some
dependencies between those, and they all need to be tied to the bus
topology. 

> The 2nd argument to ->pm_save() and ->pm_power() is determined by mapping
> an enumerated system power state to a static array of 'struct pm_state'
> pointers, that is in dev->power.pm_system. The pointers in that array
> point to entries in dev->power.pm_supports, which is an array of all the
> device power states that device supports. Please see include/linux/pm.h in
> the patch below. These arrays should be initialized by the bus, though
> they can be fixed up by the individual drivers, should they have a
> different set of states they support, or given user policy.
> 
> The actual value of the 'struct pm_state' instances is driver-specific,
> though again, the bus drivers should provide the set of power states valid
> for that bus. For example:

I have to ponder that a bit more, but it looks like it's going to the
right direction. 

> struct pm_state {
>         char    * name;
>         u32     state;
> };
> 
> #define decl_state(n, s) \
>         struct pm_state pm_state_##n = {        \
>                 .name   = __stringify(n),       \
>                 .state  = s,                    \
>         }
> 
> drivers/pci/ should do:
> 
> decl_state(d0, PCI_PM_CAP_PME_D0);
> EXPORT_SYMBOL(pm_state_d0);
> 
> decl_state(d1, PCI_PM_CAP_PME_D1);
> EXPORT_SYMBOL(pm_state_d1);
> 
> decl_state(d2, PCI_PM_CAP_PME_D2);
> EXPORT_SYMBOL(pm_state_d2);
> 
> decl_state(d3, PCI_PM_CAP_PME_D3);
> EXPORT_SYMBOL(pm_state_d3);
> 
> 
> This provides a meaningful tag to each state that is completely local to
> the bus, but can be handled easily by the core.
> 
> To handle runtime power management, a set of sysfs files will be created,
> inclding 'current' and 'supports'. The latter will display all the
> possible states the device supports, as specified its ->power.pm_supports
> array, in an attractive string format. 'current' will display the current
> power state, as an ASCII string. Writing to this file will look up the
> state requested in ->pm_supports, and if found, translate that into the
> device-specific power state and suspend the device.
> 
> The patches to implement that, as well as initial PCI support and system
> power support, will hopefuly eek out in the next week..

What about partial tree ? We need to suspend childs first and we need to
tied PM transition with dev_start/stop (or have some way to indicate the
device we want it to auto-resume when it gets a request, or something).
We need to work out policy a bit more here I suppose...



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 16:02   ` Patrick Mochel
  2004-08-09 21:29     ` Pavel Machek
  2004-08-09 22:15     ` Nigel Cunningham
@ 2004-08-10  0:43     ` Benjamin Herrenschmidt
  2004-08-10  9:00       ` Russell King
  2004-08-10 10:08       ` Pavel Machek
  2 siblings, 2 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-10  0:43 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Pavel Machek, Linux Kernel list, David Brownell


> > Aha, so you are saying these do not need to be done in hardware order?
> 
> AFAICT, no.

As stated in my previous mail, I don't agree here... there are
dependencies that cannot be dealt otherwise. USB was an example
(ieee1394 is another), IDE is one, SCSI, i2c, whatever ... 

Of course, if we consider those "bus" drivers not to have class
and thus not to be stopped and only the "leaf" devices to get stopped,
that may work... I'm not sure we are not missing something there
though...

> > I believe different state is needed for "quiesce for atomic copy" and
> > for "we are really going down to S4 now".
> 
> There is nothing fundamentally different at the functional level - you
> don't want any devices fulfilling any request. Besides, by the time the
> system is actually ready to be placed in S4, the devices have long-since
> been stopped, and the class devices do not need another notification
> beyond "stop"
> 
> Thanks,
> 
> 
> 	Pat
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  0:40 ` Benjamin Herrenschmidt
@ 2004-08-10  4:55   ` Patrick Mochel
  2004-08-10  6:52     ` Benjamin Herrenschmidt
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10  4:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Kernel list, Pavel Machek, David Brownell


On Tue, 10 Aug 2004, Benjamin Herrenschmidt wrote:

> >   This is implemented by iterating through the list of struct classes,
> >   then through each of their struct class_device's. The class_device is
> >   the only argument to those functions.
>
> Hrm... I don't agree, that iteration should be done in bus ordering too.
>
> For example, if you stop operation of a USB host controller, you have to
> do that after you have stopped operation of child devices. Same goes
> with the ATA disk vs. controller. The ordering requirements for stopping
> operations are the same as for PM

It's easy enough to change which order things get stopped/started in. What
matters more is the conceptual shift in responsibility for who
stops/starts the devices, or rather their interfaces.

It also requires a mapping from struct device -> struct class_device that
the drivers will have to initialize.

> What about passing the previous state to restore ? could be useful...

It's saved in dev->power.pm_resume, so drivers can check it.

> > - To struct bus_type:
> >
> > 	int             (*pm_power)(struct device *, struct pm_state *);
> >
> >   This method is used to do all power transitions, including both
> >   shutdown/reset and device power management states.
>
> Who calls it ? It's the driver calling it's bus or what ? It make no
> sense to power manage a device before suspending activity... I agree it
> may be worth splitting dev_start/stop from PM transitions proper, that
> would help dealing with various policies, however, there are still some
> dependencies between those, and they all need to be tied to the bus
> topology.

The driver core calls it in device_power_down() (as was in the patch ;),
in physical topological order. The ordering of the calls is up the power
management core, but it just wouldn't make sense to power down a device
that wasn't stopped. Would be easy enough to add a check for it..

Note it would make sense to power down a device without stopping, if the
device had no device driver bound to it (e.g. unclaimed devices that are
in D0 unnecessarily; or unclaimed devices that need to be powered down
during a suspend transition).

> What about partial tree ? We need to suspend childs first and we need to
> tied PM transition with dev_start/stop (or have some way to indicate the
> device we want it to auto-resume when it gets a request, or something).
> We need to work out policy a bit more here I suppose...

Policy can come later; we have to have a working model first.

As far as partial trees go, it can be done using the posted patch.  Think
about why you want to suspend/resume a partial tree - to use a particular
leaf device. You know what device it is, and by virtue of the driver
model, you know each of its ancestors. So, you walk the tree up to the
root, and restart all the way down. Then, you re-stop it all the way back
up. Should be ~10 lines of code that is left as an exercise against the
posted patch. :)


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 21:29     ` Pavel Machek
@ 2004-08-10  5:03       ` Patrick Mochel
  2004-08-10  9:43         ` Nigel Cunningham
  2004-08-10 10:13         ` [RFC] Fix Device Power Management States Pavel Machek
  0 siblings, 2 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10  5:03 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, benh, david-b


On Mon, 9 Aug 2004, Pavel Machek wrote:

> Well, "no DMA" needs to be part of definition, too, because some
> devices (USB) do DMA only if they have nothing to do.

I don't understand; that doesn't sound healthy.

> if something like this gets merged, it will immediately break swsusp
> because initially no drivers will have "stop" methods.
>
> Passing system state down to drivers and having special "quiesce"
> (as discussed in rather long thread) state has advantage of
> automagicaly working on drivers that ignore u32 parameter of suspend
> callback (and that's most of them). David's patches do not bring us
> runtime suspend capabilities, but do not force us to go through all
> the drivers, either...

Nothing is free. ;)

We've been talking about creating and merging a sane power management
model for 3+ years now. It's always been known that the drivers will have
to be modified to support a sane model. It's a fact of life. At some
point, we have to bite the bullet and do the work. I see that time rapidly
approaching.

I do not intend to merge a patch that will break swsusp in a stable
kernel. However, we do have this wonderful thing called the -mm tree in
which we can a) evolve the model, b) get large testing coverage and c)
solicit driver fixes.

Once the swsusp consolidation is merged upstream, I will merge a new
device power model in -mm, and we can start working on the drivers. How
does that sound?


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  4:55   ` Patrick Mochel
@ 2004-08-10  6:52     ` Benjamin Herrenschmidt
  2004-08-10 10:07     ` Pavel Machek
  2004-08-10 19:41     ` David Brownell
  2 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-10  6:52 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Linux Kernel list, Pavel Machek, David Brownell


> It's easy enough to change which order things get stopped/started in. What
> matters more is the conceptual shift in responsibility for who
> stops/starts the devices, or rather their interfaces.
> 
> It also requires a mapping from struct device -> struct class_device that
> the drivers will have to initialize.

Yup, but class devices don't follow the bus topology, do they ?

> > What about passing the previous state to restore ? could be useful...
> 
> It's saved in dev->power.pm_resume, so drivers can check it.
> 
> > Who calls it ? It's the driver calling it's bus or what ? It make no
> > sense to power manage a device before suspending activity... I agree it
> > may be worth splitting dev_start/stop from PM transitions proper, that
> > would help dealing with various policies, however, there are still some
> > dependencies between those, and they all need to be tied to the bus
> > topology.
> 
> The driver core calls it in device_power_down() (as was in the patch ;),
> in physical topological order. The ordering of the calls is up the power
> management core, but it just wouldn't make sense to power down a device
> that wasn't stopped. Would be easy enough to add a check for it..
> 
> Note it would make sense to power down a device without stopping, if the
> device had no device driver bound to it (e.g. unclaimed devices that are
> in D0 unnecessarily; or unclaimed devices that need to be powered down
> during a suspend transition).

Ok, just be careful with that as some "platform" devices may not have a
driver bound and still don't want to be powered down... but we could
create fake drivers...

> > What about partial tree ? We need to suspend childs first and we need to
> > tied PM transition with dev_start/stop (or have some way to indicate the
> > device we want it to auto-resume when it gets a request, or something).
> > We need to work out policy a bit more here I suppose...
> 
> Policy can come later; we have to have a working model first.
> 
> As far as partial trees go, it can be done using the posted patch.  Think
> about why you want to suspend/resume a partial tree - to use a particular
> leaf device. You know what device it is, and by virtue of the driver
> model, you know each of its ancestors. So, you walk the tree up to the
> root, and restart all the way down. Then, you re-stop it all the way back
> up. Should be ~10 lines of code that is left as an exercise against the
> posted patch. :)
> 
> 
> 	Pat
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  0:43     ` Benjamin Herrenschmidt
@ 2004-08-10  9:00       ` Russell King
  2004-08-10 10:08       ` Pavel Machek
  1 sibling, 0 replies; 42+ messages in thread
From: Russell King @ 2004-08-10  9:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Patrick Mochel, Pavel Machek, Linux Kernel list, David Brownell

On Tue, Aug 10, 2004 at 10:43:50AM +1000, Benjamin Herrenschmidt wrote:
> > > Aha, so you are saying these do not need to be done in hardware order?
> > 
> > AFAICT, no.
> 
> As stated in my previous mail, I don't agree here... there are
> dependencies that cannot be dealt otherwise. USB was an example
> (ieee1394 is another), IDE is one, SCSI, i2c, whatever ... 
> 
> Of course, if we consider those "bus" drivers not to have class
> and thus not to be stopped and only the "leaf" devices to get stopped,
> that may work... I'm not sure we are not missing something there
> though...

Would a PCMCIA bridge be a "leaf" device in that definition, despite
having child devices?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  5:03       ` Patrick Mochel
@ 2004-08-10  9:43         ` Nigel Cunningham
  2004-08-10 10:20           ` Pavel Machek
  2004-08-10 13:58           ` Patrick Mochel
  2004-08-10 10:13         ` [RFC] Fix Device Power Management States Pavel Machek
  1 sibling, 2 replies; 42+ messages in thread
From: Nigel Cunningham @ 2004-08-10  9:43 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Linux Kernel Mailing List, Benjamin Herrenschmidt, david-b

Hi.

On Tue, 2004-08-10 at 15:03, Patrick Mochel wrote:
> Once the swsusp consolidation is merged upstream, I will merge a new
> device power model in -mm, and we can start working on the drivers. How
> does that sound?

Do you want me to merge before or after all this is done; I'm a bit
concerned that you guys are expending effort (well, Pavel is), getting
SMP and Highmem going when I already have a working version that -
unless the plans have changed - we were intending to merge too.

Nigel


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  4:55   ` Patrick Mochel
  2004-08-10  6:52     ` Benjamin Herrenschmidt
@ 2004-08-10 10:07     ` Pavel Machek
  2004-08-10 14:28       ` Patrick Mochel
  2004-08-10 19:41     ` David Brownell
  2 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 10:07 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Benjamin Herrenschmidt, Linux Kernel list, David Brownell

Hi!

> > >   This is implemented by iterating through the list of struct classes,
> > >   then through each of their struct class_device's. The class_device is
> > >   the only argument to those functions.
> >
> > Hrm... I don't agree, that iteration should be done in bus ordering too.
> >
> > For example, if you stop operation of a USB host controller, you have to
> > do that after you have stopped operation of child devices. Same goes
> > with the ATA disk vs. controller. The ordering requirements for stopping
> > operations are the same as for PM
> 
> It's easy enough to change which order things get stopped/started in. What
> matters more is the conceptual shift in responsibility for who
> stops/starts the devices, or rather their interfaces.

Can you explain why this class-based quiescing is good idea? It seems
to me that "quiesce this tree" is pretty much same as "suspend this
tree", and can be handled in the same way.

Nigel wanted to do class-based quiescing, but if we make quiescing
fast enough, it should be okay to do whole tree, always. (And I
believe quiescing *can* be fast enough).

> The driver core calls it in device_power_down() (as was in the patch ;),
> in physical topological order. The ordering of the calls is up the power
> management core, but it just wouldn't make sense to power down a device
> that wasn't stopped. Would be easy enough to add a check for it..

BUG_ON() would be welcome, otherwise someone will get it wrong.
								Pavel

-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  0:43     ` Benjamin Herrenschmidt
  2004-08-10  9:00       ` Russell King
@ 2004-08-10 10:08       ` Pavel Machek
  1 sibling, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 10:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Patrick Mochel, Linux Kernel list, David Brownell

Hi!

> > > Aha, so you are saying these do not need to be done in hardware order?
> > 
> > AFAICT, no.
> 
> As stated in my previous mail, I don't agree here... there are
> dependencies that cannot be dealt otherwise. USB was an example
> (ieee1394 is another), IDE is one, SCSI, i2c, whatever ... 
> 
> Of course, if we consider those "bus" drivers not to have class
> and thus not to be stopped and only the "leaf" devices to get stopped,
> that may work... I'm not sure we are not missing something there
> though...

Stopping only leafs is *not* enough as usb host controllers do DMA.

									Pavel
-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  5:03       ` Patrick Mochel
  2004-08-10  9:43         ` Nigel Cunningham
@ 2004-08-10 10:13         ` Pavel Machek
  2004-08-10 18:36           ` David Brownell
  1 sibling, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 10:13 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, benh, david-b

Hi!

> > Well, "no DMA" needs to be part of definition, too, because some
> > devices (USB) do DMA only if they have nothing to do.
> 
> I don't understand; that doesn't sound healthy.

It is not healthy. It is basicaly misdesigned piece of hardware called
UHCI. It simply does DMA all the time :-(.

> > if something like this gets merged, it will immediately break swsusp
> > because initially no drivers will have "stop" methods.
> >
> > Passing system state down to drivers and having special "quiesce"
> > (as discussed in rather long thread) state has advantage of
> > automagicaly working on drivers that ignore u32 parameter of suspend
> > callback (and that's most of them). David's patches do not bring us
> > runtime suspend capabilities, but do not force us to go through all
> > the drivers, either...
> 
> Nothing is free. ;)
> 
> We've been talking about creating and merging a sane power management
> model for 3+ years now. It's always been known that the drivers will have
> to be modified to support a sane model. It's a fact of life. At some
> point, we have to bite the bullet and do the work. I see that time rapidly
> approaching.
> 
> I do not intend to merge a patch that will break swsusp in a stable
> kernel. However, we do have this wonderful thing called the -mm tree in
> which we can a) evolve the model, b) get large testing coverage and c)
> solicit driver fixes.
> 
> Once the swsusp consolidation is merged upstream, I will merge a new
> device power model in -mm, and we can start working on the drivers. How
> does that sound?

It sounds like an acceptable plan. I see no real disadvantage in
"suspend with parameter X means quiesce", and I think we'd get smaller
patch that way, but if you go through -mm like this, we can do it.

								Pavel
-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  9:43         ` Nigel Cunningham
@ 2004-08-10 10:20           ` Pavel Machek
  2004-08-10 22:33             ` Nigel Cunningham
  2004-08-10 13:58           ` Patrick Mochel
  1 sibling, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 10:20 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Patrick Mochel, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, david-b

Hi!

> > Once the swsusp consolidation is merged upstream, I will merge a new
> > device power model in -mm, and we can start working on the drivers. How
> > does that sound?
> 
> Do you want me to merge before or after all this is done; I'm a bit
> concerned that you guys are expending effort (well, Pavel is), getting
> SMP and Highmem going when I already have a working version that -
> unless the plans have changed - we were intending to merge too.

I needed to do highmem anyway (for SUSE kernel)... SMP... I'm not 100%
convinced that SMP support in suspend2 is correct. (Notice: swsusp/SMP
support in current mainline is *not* correct, either). In particular,
if you sleep the CPU, it needs to loop somewhere, right? Can you quote
the piece of code where sleeping CPUs are spinning? ...that one needs
to be in assembly :-(.

Anyway, I believe that refrigerator merge can be done in paralel with
device tree changes, as it will be always very clear what is broken.

								Pavel
-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 10:43 [RFC] Fix Device Power Management States Patrick Mochel
  2004-08-09 11:38 ` Pavel Machek
  2004-08-10  0:40 ` Benjamin Herrenschmidt
@ 2004-08-10 10:33 ` Matthew Garrett
  2004-08-10 14:36   ` Patrick Mochel
  2004-08-10 19:18 ` David Brownell
  2004-08-11  1:47 ` Todd Poynor
  4 siblings, 1 reply; 42+ messages in thread
From: Matthew Garrett @ 2004-08-10 10:33 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel

Patrick Mochel <mochel@digitalimplant.org> wrote:

> Ok, the patch below attempts to fix up the device power management
> handling, taking into account (hopefully) everything that has been said
> over the last week+, and lessons learned over the years. It's only been
> compile-tested, and is meant just to prove that the framework is possible.
> There are likely to be some missing pieces, mainly because it's late. :)

At the moment I'm struggling with the fact that the order of resumption
of system devices appears significant (ie, I get hangs on resume with
the stock kernel, but changing the list_for_each_entry in sysdev_resume
to list_for_each_entry_reverse makes things work) but there doesn't seem
to be any mechanism for providing proper ordering when there isn't a
tree structure. This also crops up with resuming my wireless hardware -
it tries to do a hotplug firmware load, but the IDE bus hasn't been
woken up yet.

Do we need a more fine-grained dependency structure than the current
tree?

-- 
Matthew Garrett | mjg59-chiark.mail.linux-rutgers.kernel@srcf.ucam.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  9:43         ` Nigel Cunningham
  2004-08-10 10:20           ` Pavel Machek
@ 2004-08-10 13:58           ` Patrick Mochel
  2004-08-10 22:29             ` Nigel Cunningham
  1 sibling, 1 reply; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10 13:58 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Pavel Machek, Linux Kernel Mailing List, Benjamin Herrenschmidt, david-b


On Tue, 10 Aug 2004, Nigel Cunningham wrote:

> Do you want me to merge before or after all this is done; I'm a bit
> concerned that you guys are expending effort (well, Pavel is), getting
> SMP and Highmem going when I already have a working version that -
> unless the plans have changed - we were intending to merge too.

It would be nice if you posted small easily-consumable patches that
gradually merged the two. Even if you post them all at once, it provides
something to review and an understanding of how one evolves into the
other.


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 10:07     ` Pavel Machek
@ 2004-08-10 14:28       ` Patrick Mochel
  2004-08-10 17:56         ` Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10 14:28 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Benjamin Herrenschmidt, Linux Kernel list, David Brownell


On Tue, 10 Aug 2004, Pavel Machek wrote:

> Can you explain why this class-based quiescing is good idea? It seems
> to me that "quiesce this tree" is pretty much same as "suspend this
> tree", and can be handled in the same way.
>
> Nigel wanted to do class-based quiescing, but if we make quiescing
> fast enough, it should be okay to do whole tree, always. (And I
> believe quiescing *can* be fast enough).

It has nothing to do with speed and everything to do with domains of
responsibility. Each device belongs to 1 device class.  That class and the
assoicated class device represent the functional interface(s) to the
physical device. While the physical device controls physical attributes,
like actual I/O and power states, the class controls the logical
interface.

When we quiesce/stop a device, we need to make sure that the device is
going to either finish or cancel its current I/O transactions. If we do
that at a hardware, the higher levels may think the device has gone awry
and completely disable it. If we do it at a higher level, we can make
sure that the request is gracefully delayed. We can also make sure that
there are no new connnections to the device in a saner manner (by simply
removing the class device from the list of advertised interfaces), rather
than adding logic to every low-level driver to stop/restart I/O.

Plus, and not that I'm a feature-monger, the code is useful in other
cases. To throw a buzzword, it's the same as effectively 'taking the
device offline', since it implies the hardware is claimed by a driver but
not accepting connections and effectively idle.

AFAICT, it should also provide a necessary piece of infrastructure for
some upcoming resource management patches (from Adam Belay) that rely on
userspace being able to disable/enable a device, change its resources, and
restart it (ideally with transparency).

And, if it turns out to be too hard, then we can rethink it and start
over. :)


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 10:33 ` Matthew Garrett
@ 2004-08-10 14:36   ` Patrick Mochel
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10 14:36 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-kernel


On Tue, 10 Aug 2004, Matthew Garrett wrote:

> At the moment I'm struggling with the fact that the order of resumption
> of system devices appears significant (ie, I get hangs on resume with
> the stock kernel, but changing the list_for_each_entry in sysdev_resume
> to list_for_each_entry_reverse makes things work) but there doesn't seem
> to be any mechanism for providing proper ordering when there isn't a
> tree structure. This also crops up with resuming my wireless hardware -
> it tries to do a hotplug firmware load, but the IDE bus hasn't been
> woken up yet.

It's not that we need a tree; it's just that we need to take dependency
information into account with system devices, and those aren't necessarily
related to ancestry. That relates to the fact that not enough system
devices have been converted, and not enough time has been spent, to really
understand the dependency information enough to fix the model.

As for the wireless device, I would suggest modifying the driver so that
it retries the hotplug event. Or, perhaps it could be converted to use a
mechanism in which events are not lost (kevents maybe?).


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 14:28       ` Patrick Mochel
@ 2004-08-10 17:56         ` Pavel Machek
  2004-08-10 22:41           ` Patrick Mochel
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 17:56 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Benjamin Herrenschmidt, Linux Kernel list, David Brownell

HI!

> > Can you explain why this class-based quiescing is good idea? It seems
> > to me that "quiesce this tree" is pretty much same as "suspend this
> > tree", and can be handled in the same way.
> >
> > Nigel wanted to do class-based quiescing, but if we make quiescing
> > fast enough, it should be okay to do whole tree, always. (And I
> > believe quiescing *can* be fast enough).
> 
> It has nothing to do with speed and everything to do with domains of
> responsibility. Each device belongs to 1 device class.  That class and the
> assoicated class device represent the functional interface(s) to the
> physical device. While the physical device controls physical attributes,
> like actual I/O and power states, the class controls the logical
> interface.

I still do not see it... swsusp does not care about logical state of
device. (Actually manipulating logical state of device might make
swsusp less transparent). It cares about device not doing DMA (I also
said "no interrupts", but that is not strictly neccessary: we disable
interrupts for atomic copy. Device should do no NMIs, through).

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 10:13         ` [RFC] Fix Device Power Management States Pavel Machek
@ 2004-08-10 18:36           ` David Brownell
  2004-08-10 20:36             ` Pavel Machek
  2004-08-10 22:42             ` Patrick Mochel
  0 siblings, 2 replies; 42+ messages in thread
From: David Brownell @ 2004-08-10 18:36 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Patrick Mochel, linux-kernel, benh

On Tuesday 10 August 2004 3:13 am, Pavel Machek wrote:
> Hi!
> 
> > > Well, "no DMA" needs to be part of definition, too, because some
> > > devices (USB) do DMA only if they have nothing to do.

I think that should read "even if they have ...", not "only if ...".


> > I don't understand; that doesn't sound healthy.
> 
> It is not healthy. It is basicaly misdesigned piece of hardware called
> UHCI. It simply does DMA all the time :-(.

Unless it's suspended ... and I confess I've only had time to
make sure that OHCI and EHCI suspend properly using the
newish CONFIG_USB_SUSPEND patch.  UHCI is different
from those other mainstream controllers, it doesn't split
out its periodic schedule processing.

Keep in mind that to properly quiesce a USB controller, you've
got to quiesce every driver for every device hooked up to
that USB bus.  There's no escaping the bottom-up suspend
or top-down-resume processes, which makes me wonder
how Patrick's proposed patch can work for it...

- Dave
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 10:43 [RFC] Fix Device Power Management States Patrick Mochel
                   ` (2 preceding siblings ...)
  2004-08-10 10:33 ` Matthew Garrett
@ 2004-08-10 19:18 ` David Brownell
  2004-08-10 20:50   ` Pavel Machek
  2004-08-11  1:47 ` Todd Poynor
  4 siblings, 1 reply; 42+ messages in thread
From: David Brownell @ 2004-08-10 19:18 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, pavel, benh

Hmm, some of these proposals seem to relate to stuff OTHER than the
device power management states.  While I suspect it might be a very
handy thing to add methods to quiesce drivers (drivers have to have
that anyway) the semantics need to be properly nailed down.


On Monday 09 August 2004 3:43 am, Patrick Mochel wrote:

> The 2nd argument to ->pm_save() and ->pm_power() is determined by mapping
> an enumerated system power state to a static array of 'struct pm_state'
> pointers, that is in dev->power.pm_system. The pointers in that array
> point to entries in dev->power.pm_supports, which is an array of all the
> device power states that device supports. Please see include/linux/pm.h in
> the patch below. These arrays should be initialized by the bus, though
> they can be fixed up by the individual drivers, should they have a
> different set of states they support, or given user policy.

This sounds exactly like what I suggested a week or two ago... though
in the context of suspend/resume calls.  I don't remember whether I
mentioned the notion of having "new style" suspend calls with those
parameters, and having the glue code smart enough to call "old style"
ones, but I think Pavel mentioned that in this thread.  API migration is
a significant issue ... unless this is planning for a temporary 2.7 fork?  :)


> The actual value of the 'struct pm_state' instances is driver-specific,
> though again, the bus drivers should provide the set of power states valid
> for that bus. For example:
> 
> 
> struct pm_state {
>         char    * name;
>         u32     state;
> };

Even the same struct.  But the point of the "u32" would be for
bus-specific information, and I'd expected that these would
be used with suspend states specific to busses, devices,
and maybe even drivers.   I might even agree that the u32
shouldn't be included (it's certainly error prone).

And as a definitional thing:  two states are equal iff they are
the same pointer, NOT if the string names are the same.

So instead of:
 
> #define decl_state(n, s) \
>         struct pm_state pm_state_##n = {        \
>                 .name   = __stringify(n),       \
>                 .state  = s,                    \
>         }

Defining a macro for something this simple is overkill...

> drivers/pci/ should do:
> 
> decl_state(d0, PCI_PM_CAP_PME_D0);
> EXPORT_SYMBOL(pm_state_d0);
> 
> ...

While that might actually work OK for PCI, since it's only
got four power states (D3hot != D3cold), a single u32
is NOT enough to provide additional device power state
Instead there could be:

struct mybus_pm_state {
	struct pm_state	standard
	... custom PM state associated with it
};

  
> 
> This provides a meaningful tag to each state that is completely local to
> the bus, but can be handled easily by the core.

The core should only be passing these around, not even looking at
the tags.  The "system suspend" logic would know about a handful
of standard states ... architecture, board, bus, and driver-specific
code should be able to map those (along the lines Pavel suggested).


> To handle runtime power management, a set of sysfs files will be created,
> inclding 'current' and 'supports'. The latter will display all the
> possible states the device supports, as specified its ->power.pm_supports
> array, in an attractive string format. 'current' will display the current
> power state, as an ASCII string. Writing to this file will look up the
> state requested in ->pm_supports, and if found, translate that into the
> device-specific power state and suspend the device.

I like that idea.  Though I also catch myself wanting to distinguish
two different notions of "current":  (a)  current policy applied to
the device from external sources, not unlike the cpufreq governor;
(b) current state produced by that policy.

So for example a power-aware driver might have a default governor
that enters and exit low power states based on usage patterns ... but
if some external policy were applied -- maybe like STR -- it would no
longer be OK to exit the low power state automatically. Unless some
wakeup events were enabled, and hmm that's not yet part of the
device PM framework either ... :)

- Dave

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10  4:55   ` Patrick Mochel
  2004-08-10  6:52     ` Benjamin Herrenschmidt
  2004-08-10 10:07     ` Pavel Machek
@ 2004-08-10 19:41     ` David Brownell
  2004-08-10 22:44       ` Patrick Mochel
  2 siblings, 1 reply; 42+ messages in thread
From: David Brownell @ 2004-08-10 19:41 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Benjamin Herrenschmidt, Linux Kernel list, Pavel Machek

On Monday 09 August 2004 9:55 pm, Patrick Mochel wrote:
> 
> On Tue, 10 Aug 2004, Benjamin Herrenschmidt wrote:
> 
> 
> > What about partial tree ? We need to suspend childs first and we need to
> > tied PM transition with dev_start/stop (or have some way to indicate the
> > device we want it to auto-resume when it gets a request, or something).
> > We need to work out policy a bit more here I suppose...
> 
> Policy can come later; we have to have a working model first.

Suspending a partial tree is part of the "device power management"
problem ... it's not a policy, and deferring it would punt on one of
the more basic problems.

A policy would be IMO more like "how far should I suspend this",
"can this device wake out of its suspend state", or more interestingly
"how should these devices interact".

So:  "suspend everything using the 48 MHz clock, so we can
enter a deeper system sleep state" would be a kind of policy.
At least so long as those clock relationships don't appear
explicitly in the driver model ... they could be recorded in
device-specific power state descriptors.
 
- Dave

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 18:36           ` David Brownell
@ 2004-08-10 20:36             ` Pavel Machek
  2004-08-10 22:42             ` Patrick Mochel
  1 sibling, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 20:36 UTC (permalink / raw)
  To: David Brownell; +Cc: Patrick Mochel, linux-kernel, benh

Hi!

> > > > Well, "no DMA" needs to be part of definition, too, because some
> > > > devices (USB) do DMA only if they have nothing to do.
> 
> I think that should read "even if they have ...", not "only if ...".

Yes, sorry.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 19:18 ` David Brownell
@ 2004-08-10 20:50   ` Pavel Machek
  0 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 20:50 UTC (permalink / raw)
  To: David Brownell; +Cc: Patrick Mochel, linux-kernel, benh

Hi!

> Hmm, some of these proposals seem to relate to stuff OTHER than the
> device power management states.  While I suspect it might be a very
> handy thing to add methods to quiesce drivers (drivers have to have
> that anyway) the semantics need to be properly nailed down.

Okay, what about:

* switch system state to suspend_state_t (initialy u32)

* pass system state down to the drivers

* create helpers to convert from system state to PCI state (etc)

* create some well-known suspend_state_t's
	(like suspend_state_t prepare_for_powerdown = 3). Drivers may
	either compare state passed to them with one of well-known
	states, or call a helper

This should solve current mess. If we'll want to do runtime power
managment (etc), we can change suspend_state_t into structure /
whatever, but we will not break all the drivers this time. (=> this
approach does not solve runtime suspend, but will probably make it
easier).

I'd like to do this in 2.6.9 timeframe...
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 13:58           ` Patrick Mochel
@ 2004-08-10 22:29             ` Nigel Cunningham
  2004-08-10 22:56               ` Patrick Mochel
  0 siblings, 1 reply; 42+ messages in thread
From: Nigel Cunningham @ 2004-08-10 22:29 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Linux Kernel Mailing List, Benjamin Herrenschmidt, david-b

Hi.

On Tue, 2004-08-10 at 23:58, Patrick Mochel wrote:
> On Tue, 10 Aug 2004, Nigel Cunningham wrote:
> 
> > Do you want me to merge before or after all this is done; I'm a bit
> > concerned that you guys are expending effort (well, Pavel is), getting
> > SMP and Highmem going when I already have a working version that -
> > unless the plans have changed - we were intending to merge too.
> 
> It would be nice if you posted small easily-consumable patches that
> gradually merged the two. Even if you post them all at once, it provides
> something to review and an understanding of how one evolves into the
> other.

I'm not intending to patch the current implementation into the new
version; there are so many changes that it would make the process
extremely painful (as evolution would have been if it were really true).
Instead, I proposed, as Andrew requested to post a number of patches
simply adding the new version along side the old. When you're satisfied
that the new does everything the old does, I'm hoping we'll simply drop
the old version.

I'll start producing patches shortly, then.

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. But true tolerance can cope with others
being intolerant.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 10:20           ` Pavel Machek
@ 2004-08-10 22:33             ` Nigel Cunningham
  0 siblings, 0 replies; 42+ messages in thread
From: Nigel Cunningham @ 2004-08-10 22:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Patrick Mochel, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, david-b

Hi.

On Tue, 2004-08-10 at 20:20, Pavel Machek wrote:
> I needed to do highmem anyway (for SUSE kernel)... SMP... I'm not 100%
> convinced that SMP support in suspend2 is correct. (Notice: swsusp/SMP
> support in current mainline is *not* correct, either). In particular,
> if you sleep the CPU, it needs to loop somewhere, right? Can you quote
> the piece of code where sleeping CPUs are spinning? ...that one needs
> to be in assembly :-(.

In suspend2, it's in arch/i386/power/suspend2.c, involved via code near
the start of kernel/power/process.c. The bulk of it is not in assembly,
and doesn't need to be. If you get the latest version from
softwaresuspend.berlios.de, you'll see that I've separated the secondary
processors support out and tidied it a bit. It works perfectly for me.

> Anyway, I believe that refrigerator merge can be done in paralel with
> device tree changes, as it will be always very clear what is broken.

I'll try to start producing patches asap. (Within the constraints I have
by having a real job as well :>).

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. But true tolerance can cope with others
being intolerant.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 17:56         ` Pavel Machek
@ 2004-08-10 22:41           ` Patrick Mochel
  2004-08-10 23:10             ` Pavel Machek
                               ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10 22:41 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Benjamin Herrenschmidt, Linux Kernel list, David Brownell


On Tue, 10 Aug 2004, Pavel Machek wrote:

> I still do not see it... swsusp does not care about logical state of
> device. (Actually manipulating logical state of device might make
> swsusp less transparent). It cares about device not doing DMA (I also
> said "no interrupts", but that is not strictly neccessary: we disable
> interrupts for atomic copy. Device should do no NMIs, through).

Perhaps it is unncessary to do at a class level, at least at this point.
I think we all agree that we need some sort of stop/start methods for
devices, though. In which, we can add to struct bus_type:

	int (*dev_stop)(struct device *);
	int (*dev_start)(struct device *);

Sound good?


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 18:36           ` David Brownell
  2004-08-10 20:36             ` Pavel Machek
@ 2004-08-10 22:42             ` Patrick Mochel
  1 sibling, 0 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10 22:42 UTC (permalink / raw)
  To: David Brownell; +Cc: Pavel Machek, linux-kernel, benh


On Tue, 10 Aug 2004, David Brownell wrote:

> Keep in mind that to properly quiesce a USB controller, you've
> got to quiesce every driver for every device hooked up to
> that USB bus.  There's no escaping the bottom-up suspend
> or top-down-resume processes, which makes me wonder
> how Patrick's proposed patch can work for it...

Hey, I'm willing to capitulate. :)

I will change the ordering, as well as the proposed change in the last
email.


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 19:41     ` David Brownell
@ 2004-08-10 22:44       ` Patrick Mochel
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10 22:44 UTC (permalink / raw)
  To: David Brownell; +Cc: Benjamin Herrenschmidt, Linux Kernel list, Pavel Machek


On Tue, 10 Aug 2004, David Brownell wrote:

> Suspending a partial tree is part of the "device power management"
> problem ... it's not a policy, and deferring it would punt on one of
> the more basic problems.

I suppose I should ask whether or not its worth doing in the kernel. Can
you simply suspend the subtree using a properly implemented sysfs
interface and the runtime power management hooks?


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 22:29             ` Nigel Cunningham
@ 2004-08-10 22:56               ` Patrick Mochel
  2004-08-10 23:09                 ` Nigel Cunningham
  0 siblings, 1 reply; 42+ messages in thread
From: Patrick Mochel @ 2004-08-10 22:56 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Pavel Machek, Linux Kernel Mailing List, Benjamin Herrenschmidt, david-b


On Wed, 11 Aug 2004, Nigel Cunningham wrote:

> I'm not intending to patch the current implementation into the new
> version; there are so many changes that it would make the process
> extremely painful (as evolution would have been if it were really true).
> Instead, I proposed, as Andrew requested to post a number of patches
> simply adding the new version along side the old. When you're satisfied
> that the new does everything the old does, I'm hoping we'll simply drop
> the old version.

Then there is something seriously wrong with your development process.

I'm sorry, but that's not how it works. There is no "flag day" for
switching from one feature to the next, whether they are drivers, APIs, or
suspend-to-disk implementations. And, I will definitely say, from
experience, that adding a second competing implementation of anything to
the kernel tree does little more than confuse users and developers.

Making large changes to any piece of code is a long and tedious process,
and takes a lot of practice. But, it is imperative for the other people
working on it to be able to see improvements happen in steps, even if the
steps number in the 100s.

For example, see the way in which Al Viro works. One day, some driver will
be broken and unloved. Then Al will post a kajillion patches, each one
modifying 1-10 lines and making complete, obvious sense. When it's all
done, the result looks nothing like the original and the driver is
race-free and beautiful.

Rome was not built in a day, and neither is good kernel code.


	Pat

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 22:56               ` Patrick Mochel
@ 2004-08-10 23:09                 ` Nigel Cunningham
  2004-08-10 23:36                   ` suspend2 merge [was Re: [RFC] Fix Device Power Management States] Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Nigel Cunningham @ 2004-08-10 23:09 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Linux Kernel Mailing List, Benjamin Herrenschmidt, david-b

Hi.

On Wed, 2004-08-11 at 08:56, Patrick Mochel wrote:
> On Wed, 11 Aug 2004, Nigel Cunningham wrote:
> 
> > I'm not intending to patch the current implementation into the new
> > version; there are so many changes that it would make the process
> > extremely painful (as evolution would have been if it were really true).
> > Instead, I proposed, as Andrew requested to post a number of patches
> > simply adding the new version along side the old. When you're satisfied
> > that the new does everything the old does, I'm hoping we'll simply drop
> > the old version.
> 
> Then there is something seriously wrong with your development process.

No. It's just that the changes are so fundamental (how data is stored,
how we prepare the image etc) that to talk about evolution is simply a
misnomer. It's a redesign. The steps did occur one at a time, and there
are basic similarities, but some of the steps were fundamental rewrites.

> I'm sorry, but that's not how it works. There is no "flag day" for
> switching from one feature to the next, whether they are drivers, APIs, or
> suspend-to-disk implementations. And, I will definitely say, from
> experience, that adding a second competing implementation of anything to
> the kernel tree does little more than confuse users and developers.

I agree with the last bit, and I don't want a second competing design.
But there are flag days for some things, and multiple versions of some
code are found in the kernel (suspend until recently, e100/eepro100
driver...). This should be one of them.

Once it is merged, the period of co-existence should be short anyway.
Suspend 2 has had a ton of testing and is - dare I say it - virtually
bugfree. There should be no reason to delay its including in Linus'
tree.

> Making large changes to any piece of code is a long and tedious process,
> and takes a lot of practice. But, it is imperative for the other people
> working on it to be able to see improvements happen in steps, even if the
> steps number in the 100s.
> 
> For example, see the way in which Al Viro works. One day, some driver will
> be broken and unloved. Then Al will post a kajillion patches, each one
> modifying 1-10 lines and making complete, obvious sense. When it's all
> done, the result looks nothing like the original and the driver is
> race-free and beautiful.
> 
> Rome was not built in a day, and neither is good kernel code.

I'm not suggested it should be or that suspend2 was (it has been nearly
two years since I posted my first patch). But I am suggesting that the
process of working out bugs and thinking through design improvements has
already been done and tested. I'm perfectly willing to submit 16 or 30
patches, breaking up the changes into their main parts, but it is a
waste of your time and mine for me to break it up further than that.

That said, I'm sure there will also be suggestions that I can do some
things better. I look forward to them. But I don't look forward to
trying to break some of the irreducible changes down into artificially
distinguished component parts.

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. But true tolerance can cope with others
being intolerant.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 22:41           ` Patrick Mochel
@ 2004-08-10 23:10             ` Pavel Machek
  2004-08-10 23:14             ` [patch] Smaller goal first: fix confusion [was Re: [RFC] Fix Device Power Management States] Pavel Machek
  2004-08-11  1:02             ` [RFC] Fix Device Power Management States Benjamin Herrenschmidt
  2 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 23:10 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Benjamin Herrenschmidt, Linux Kernel list, David Brownell

Hi!

> > I still do not see it... swsusp does not care about logical state of
> > device. (Actually manipulating logical state of device might make
> > swsusp less transparent). It cares about device not doing DMA (I also
> > said "no interrupts", but that is not strictly neccessary: we disable
> > interrupts for atomic copy. Device should do no NMIs, through).
> 
> Perhaps it is unncessary to do at a class level, at least at this point.
> I think we all agree that we need some sort of stop/start methods for
> devices, though. In which, we can add to struct bus_type:
> 
> 	int (*dev_stop)(struct device *);
> 	int (*dev_start)(struct device *);
> 
> Sound good?

I fail to see why dev_stop(device) is better than suspend(device,
PM_PLEASE_QUIESCE_OR_WHATEVER). It has one big advantage: drivers that
ignore second argument (most do) will automagically work.

There is no fundamental problem with dev_stop/dev_start, I just fail
to see why they need to be separate from suspend/resume.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch] Smaller goal first: fix confusion [was Re: [RFC] Fix Device Power Management States]
  2004-08-10 22:41           ` Patrick Mochel
  2004-08-10 23:10             ` Pavel Machek
@ 2004-08-10 23:14             ` Pavel Machek
  2004-08-11  1:02             ` [RFC] Fix Device Power Management States Benjamin Herrenschmidt
  2 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 23:14 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Benjamin Herrenschmidt, Linux Kernel list, David Brownell

Hi!

> > I still do not see it... swsusp does not care about logical state of
> > device. (Actually manipulating logical state of device might make
> > swsusp less transparent). It cares about device not doing DMA (I also
> > said "no interrupts", but that is not strictly neccessary: we disable
> > interrupts for atomic copy. Device should do no NMIs, through).
> 
> Perhaps it is unncessary to do at a class level, at least at this point.
> I think we all agree that we need some sort of stop/start methods for
> devices, though. In which, we can add to struct bus_type:
> 
> 	int (*dev_stop)(struct device *);
> 	int (*dev_start)(struct device *);
> 
> Sound good?

What about this one... it should solve u32 confusion _now_, and
prepare ground for adding runtime power managment
later... suspend_state_t can be redefined to be structure later.

In future, pci_set_power_state should be modified to take "enum
pci_state" so that confusion is harder...

								Pavel

--- tmp/linux/drivers/base/power/power.h	2003-09-28 22:05:43.000000000 +0200
+++ linux/drivers/base/power/power.h	2004-08-11 00:18:58.000000000 +0200
@@ -1,5 +1,4 @@
-
-
+/* FIXME: This needs explanation... */
 enum {
 	DEVICE_PM_ON,
 	DEVICE_PM1,
@@ -66,14 +65,14 @@
 /*
  * suspend.c
  */
-extern int suspend_device(struct device *, u32);
+extern int suspend_device(struct device *, enum system_state);
 
 
 /*
  * runtime.c
  */
 
-extern int dpm_runtime_suspend(struct device *, u32);
+extern int dpm_runtime_suspend(struct device *, enum system_state);
 extern void dpm_runtime_resume(struct device *);
 
 #else /* CONFIG_PM */
@@ -88,7 +87,7 @@
 
 }
 
-static inline int dpm_runtime_suspend(struct device * dev, u32 state)
+static inline int dpm_runtime_suspend(struct device * dev, enum system_state state)
 {
 	return 0;
 }
--- tmp/linux/drivers/base/power/shutdown.c	2004-06-22 12:36:02.000000000 +0200
+++ linux/drivers/base/power/shutdown.c	2004-08-11 00:20:12.000000000 +0200
@@ -29,7 +30,8 @@
 			dev->driver->shutdown(dev);
 		return 0;
 	}
-	return dpm_runtime_suspend(dev,dev->detach_state);
+	/* FIXME: It probably should not be cast like this */
+	return dpm_runtime_suspend(dev, (enum system_state) dev->detach_state);
 }
 
 
--- tmp/linux/drivers/base/power/suspend.c	2004-06-22 12:36:02.000000000 +0200
+++ linux/drivers/base/power/suspend.c	2004-08-11 00:21:16.000000000 +0200
@@ -28,6 +28,7 @@
  * lists. This way, the ancestors will be accessed before their descendents.
  */
 
+/* FIXME: Having both suspend_device and device_suspend is evil */
 
 /**
  *	suspend_device - Save state of one device.
@@ -35,7 +36,7 @@
  *	@state:	Power state device is entering.
  */
 
-int suspend_device(struct device * dev, u32 state)
+int suspend_device(struct device * dev, enum system_state state)
 {
 	int error = 0;
 
@@ -70,7 +71,7 @@
  *
  */
 
-int device_suspend(u32 state)
+int device_suspend(enum system_state state)
 {
 	int error = 0;
 
@@ -112,15 +113,15 @@
  *	done, power down system devices. 
  */
 
-int device_power_down(u32 state)
+int device_power_down(enum system_state state)
 {
 	int error = 0;
 	struct device * dev;
 
 	list_for_each_entry_reverse(dev,&dpm_off_irq,power.entry) {
 		if ((error = suspend_device(dev,state)))
 			break;
 	} 
 	if (error)
 		goto Error;
 	if ((error = sysdev_suspend(state)))
--- tmp/linux/drivers/net/e100.c	2004-06-22 12:36:10.000000000 +0200
+++ linux/drivers/net/e100.c	2004-08-11 00:27:01.000000000 +0200
@@ -2269,7 +2269,7 @@
 }
 
 #ifdef CONFIG_PM
-static int e100_suspend(struct pci_dev *pdev, u32 state)
+static int e100_suspend(struct pci_dev *pdev, suspend_state_t state)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct nic *nic = netdev_priv(netdev);
@@ -2282,7 +2282,7 @@
 	pci_save_state(pdev, nic->pm_state);
 	pci_enable_wake(pdev, state, nic->flags & (wol_magic | e100_asf(nic)));
 	pci_disable_device(pdev);
-	pci_set_power_state(pdev, state);
+	pci_set_power_state(pdev, to_pci_state(state));
 
 	return 0;
 }
--- tmp/linux/include/linux/pci.h	2004-06-22 12:36:45.000000000 +0200
+++ linux/include/linux/pci.h	2004-08-10 23:56:56.000000000 +0200
@@ -18,6 +18,7 @@
 #define LINUX_PCI_H
 
 #include <linux/mod_devicetable.h>
+#include <linux/pci.h>
 
 /*
  * Under PCI, each device has 256 bytes of configuration address space,
@@ -593,7 +594,7 @@
 	const struct pci_device_id *id_table;	/* must be non-NULL for probe to be called */
 	int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);	/* New device inserted */
 	void (*remove) (struct pci_dev *dev);	/* Device removed (NULL if not a hot-plug capable driver) */
-	int  (*suspend) (struct pci_dev *dev, u32 state);	/* Device suspended */
+	int  (*suspend) (struct pci_dev *dev, suspend_state_t reason);	/* Device suspended */
 	int  (*resume) (struct pci_dev *dev);	                /* Device woken up */
 	int  (*enable_wake) (struct pci_dev *dev, u32 state, int enable);   /* Enable wake event */
 
@@ -965,5 +966,26 @@
 #define PCIPCI_VSFX		16
 #define PCIPCI_ALIMAGIK		32
 
+enum pci_state {
+	D0 = 0,
+	D1 = 1,
+	D2 = 2,
+	D3hot = 3,
+	D3cold = 4
+};
+
+static inline enum pci_state to_pci_state(suspend_state_t state)
+{
+	if (SUSPEND_EQ(state, PM_SUSPEND_ON))
+		return D0;
+	if (SUSPEND_EQ(state, PM_SUSPEND_STANDBY))
+		return D1;
+	if (SUSPEND_EQ(state, PM_SUSPEND_MEM))
+		return D3hot;
+	if (SUSPEND_EQ(state, PM_SUSPEND_DISK))
+		return D3cold;
+	BUG();
+}
+
 #endif /* __KERNEL__ */
 #endif /* LINUX_PCI_H */
--- tmp/linux/include/linux/pm.h	2004-06-22 12:36:45.000000000 +0200
+++ linux/include/linux/pm.h	2004-08-10 23:53:02.000000000 +0200
@@ -193,14 +193,22 @@
 extern void (*pm_idle)(void);
 extern void (*pm_power_off)(void);
 
-enum {
-	PM_SUSPEND_ON,
-	PM_SUSPEND_STANDBY,
-	PM_SUSPEND_MEM,
-	PM_SUSPEND_DISK,
+enum system_state {
+	PM_SUSPEND_ON = 0,
+	PM_SUSPEND_STANDBY = 1,
+	PM_SUSPEND_MEM = 2,
+	PM_SUSPEND_DISK = 3,
 	PM_SUSPEND_MAX,
 };
 
+/*
+ * For now, drivers only get system state. Later, this is going to become
+ * structure or something to enable runtime power managment.
+ */
+typedef enum system_state suspend_state_t;
+
+#define SUSPEND_EQ(a, b) (a == b)
+
 enum {
 	PM_DISK_FIRMWARE = 1,
 	PM_DISK_PLATFORM,
@@ -209,7 +217,6 @@
 	PM_DISK_MAX,
 };
 
-
 struct pm_ops {
 	u32	pm_disk_mode;
 	int (*prepare)(u32 state);
@@ -241,8 +248,11 @@
 
 extern void device_pm_set_parent(struct device * dev, struct device * parent);
 
-extern int device_suspend(u32 state);
-extern int device_power_down(u32 state);
+/*
+ * apply system suspend policy to all devices
+ */
+extern int device_suspend(enum system_state reason);
+extern int device_power_down(enum system_state reason);
 extern void device_power_up(void);
 extern void device_resume(void);
 
--- tmp/linux/kernel/power/disk.c	2004-05-20 23:08:36.000000000 +0200
+++ linux/kernel/power/disk.c	2004-08-11 00:11:53.000000000 +0200
@@ -46,20 +46,25 @@
 	int error = 0;
 
 	local_irq_save(flags);
-	device_power_down(PM_SUSPEND_DISK);
 	switch(mode) {
 	case PM_DISK_PLATFORM:
+		device_power_down(PM_SUSPEND_DISK);
 		error = pm_ops->enter(PM_SUSPEND_DISK);
 		break;
 	case PM_DISK_SHUTDOWN:
+		device_power_down(PM_SUSPEND_DISK);
 		printk("Powering off system\n");
 		machine_power_off();
 		break;
 	case PM_DISK_REBOOT:
+		device_power_down(PM_SUSPEND_DISK);
 		machine_restart(NULL);
 		break;
 	}
 	machine_halt();
+	/* Valid image is on the disk, if we continue we risk serious data corruption
+	   after resume. */
+	while(1);
 	device_power_up();
 	local_irq_restore(flags);
 	return 0;


-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* suspend2 merge [was Re: [RFC] Fix Device Power Management States]
  2004-08-10 23:09                 ` Nigel Cunningham
@ 2004-08-10 23:36                   ` Pavel Machek
  2004-08-11  0:04                     ` Arkadiusz Miskiewicz
  2004-08-11  5:05                     ` Nigel Cunningham
  0 siblings, 2 replies; 42+ messages in thread
From: Pavel Machek @ 2004-08-10 23:36 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Patrick Mochel, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, david-b

Hi!

> > > I'm not intending to patch the current implementation into the new
> > > version; there are so many changes that it would make the process
> > > extremely painful (as evolution would have been if it were really true).
> > > Instead, I proposed, as Andrew requested to post a number of patches
> > > simply adding the new version along side the old. When you're satisfied
> > > that the new does everything the old does, I'm hoping we'll simply drop
> > > the old version.
> > 
> > Then there is something seriously wrong with your development process.
> 
> No. It's just that the changes are so fundamental (how data is stored,
> how we prepare the image etc) that to talk about evolution is simply a
> misnomer. It's a redesign. The steps did occur one at a time, and there
> are basic similarities, but some of the steps were fundamental
> rewrites.

At one point I was pretty unhappy with swsusp state (that was just
before I wrote trivial highmem support), and was willing to merge
suspend2 pretty much at all costs. (Okay, I was never willing to let
LZO-suspend2 sneak into 2.6). I'm now way happier with merged
pmdisk+swsusp works.

I feared big problems with highmem support, but surprisingly, trivial
support thats in current code does not cause problems for
people. People seem to like pmdisk+swsusp, too...

Now, people like suspend2 even more, and for good reasons: it is
extremely fast, it provides nice feedback and its refrigerator is
superior.

I also realized that suspend2 is fundamentally more complex than
swsusp: it introduces additional time period where page cache must not
be touched. I did not realize this sooner.

I do longer think that "merge at all costs" is good idea now: in
kernel code is small, simple and nice by now, and I'd like to keep it
that way.

Now, there are some parts of swsusp that are not quite okay. One of
them is refrigerator -- it fails (in non-critical way but still) in
some cases where it should not fail. suspend2 seems to have this
solved, and I'd like to merge its refrigerator.

Other parts... I'm not so sure. Incremental patches would certainly
help a lot here.

Best regards,
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: suspend2 merge [was Re: [RFC] Fix Device Power Management States]
  2004-08-10 23:36                   ` suspend2 merge [was Re: [RFC] Fix Device Power Management States] Pavel Machek
@ 2004-08-11  0:04                     ` Arkadiusz Miskiewicz
  2004-08-11  5:05                     ` Nigel Cunningham
  1 sibling, 0 replies; 42+ messages in thread
From: Arkadiusz Miskiewicz @ 2004-08-11  0:04 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

On Wednesday 11 of August 2004 01:36, Pavel Machek wrote:
> Now, people like suspend2 even more, and for good reasons: it is
> extremely fast, it provides nice feedback and its refrigerator is
> superior.
... and it works with modular IDE. It resumes nicely when called from initrd. 
It is modular itself (well, parts of it) - I'm loading suspend modules from 
initrd, too.

pmdisk+swsusp unfortunately wasn't able to do it last time I've checked.

> Best regards,
> 								Pavel

-- 
Arkadiusz Miśkiewicz     CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-10 22:41           ` Patrick Mochel
  2004-08-10 23:10             ` Pavel Machek
  2004-08-10 23:14             ` [patch] Smaller goal first: fix confusion [was Re: [RFC] Fix Device Power Management States] Pavel Machek
@ 2004-08-11  1:02             ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-11  1:02 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Pavel Machek, Linux Kernel list, David Brownell

On Wed, 2004-08-11 at 08:41, Patrick Mochel wrote:
> On Tue, 10 Aug 2004, Pavel Machek wrote:
> 
> > I still do not see it... swsusp does not care about logical state of
> > device. (Actually manipulating logical state of device might make
> > swsusp less transparent). It cares about device not doing DMA (I also
> > said "no interrupts", but that is not strictly neccessary: we disable
> > interrupts for atomic copy. Device should do no NMIs, through).
> 
> Perhaps it is unncessary to do at a class level, at least at this point.
> I think we all agree that we need some sort of stop/start methods for
> devices, though. In which, we can add to struct bus_type:
> 
> 	int (*dev_stop)(struct device *);
> 	int (*dev_start)(struct device *);
> 
> Sound good?

Well, darwin has those, though I'm not sure if it's that necessary to
have them at all, well, maybe...

One thing is sure, having those at the class device level, if I
understand correctly, means we would have broken the ordering
requirement of always following the bus tree.

Honestly, I would stick to not having those at first, the PM callbacks
are enough for most cases, and just have "helpers" that the PM callbacks
call provided by their "class". For example, we could have a
netdev_quiesce_device() call that wuold wrap queue stopping and whatever
else (not much) has to be done at the network level.

IDE sort-of already has that via the PM request infrastructure, low
drivers only implement the few state machine steps they need for things
like cache flush or standby command. It would make sense to do a similar
mecanism for SCSI.

Honestly, there is really not that much to do related to class-side
start/stop, most of the work is usually at the driver level anyway from
experience (I implemented working PM for a bunch of drivers), there are
few things to take care with the "upper level", but not really that
much, and I would rather keep that as a helper function.

Let's keep things simple,

Ben.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-09 10:43 [RFC] Fix Device Power Management States Patrick Mochel
                   ` (3 preceding siblings ...)
  2004-08-10 19:18 ` David Brownell
@ 2004-08-11  1:47 ` Todd Poynor
  2004-08-12 22:03   ` Russell King
  4 siblings, 1 reply; 42+ messages in thread
From: Todd Poynor @ 2004-08-11  1:47 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, pavel, benh, david-b

Divorcing device power states from ACPI-defined integers would be very 
nice for embedded platforms.  Usually the platform bus would be 
involved.  Since the proposal tends to place more responsibility on the 
bus driver, I'm interested in the intended usage for platform devices. 
For example, are platform_device callbacks for 
suspend/resume/save/restore of the particular device still needed?  How 
does the platform bus driver map (platform-specific?) system states to 
device states?

Customizing the available system power states based on the platform is 
even more desirable.  But the generic sys_pm_* ids might still be the 
best way to hook up the platform-specific code to the generic PM layer 
(and the system-state-to-device-state mapping would still be based on 
the generic ids).  The device suspend/resume decisions we've had to make 
in response to system states have usually been related to properties of 
the system state that could be represented in platform-independent 
fashion, for example, whether the suspend state involves power cycling 
the particular device.

I may be misunderstanding the intent or delving too far into details, 
but I'd be happy to help with embedded platform stuff if there's 
interest, thanks,

-- 
Todd Poynor
MontaVista Software


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: suspend2 merge [was Re: [RFC] Fix Device Power Management States]
  2004-08-10 23:36                   ` suspend2 merge [was Re: [RFC] Fix Device Power Management States] Pavel Machek
  2004-08-11  0:04                     ` Arkadiusz Miskiewicz
@ 2004-08-11  5:05                     ` Nigel Cunningham
  2004-08-11  9:13                       ` Pavel Machek
  1 sibling, 1 reply; 42+ messages in thread
From: Nigel Cunningham @ 2004-08-11  5:05 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Patrick Mochel, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, david-b

Hi.

On Wed, 2004-08-11 at 09:36, Pavel Machek wrote:
> At one point I was pretty unhappy with swsusp state (that was just
> before I wrote trivial highmem support), and was willing to merge
> suspend2 pretty much at all costs. (Okay, I was never willing to let
> LZO-suspend2 sneak into 2.6). I'm now way happier with merged
> pmdisk+swsusp works.

The LZF licensing has been corrected so it's not a problem.

> I feared big problems with highmem support, but surprisingly, trivial
> support thats in current code does not cause problems for
> people. People seem to like pmdisk+swsusp, too...

That will be because you're eating most of the memory anyway; there's
not problem finding enough memory to copy the Highmem too. I guess that
once we start seeing people trying to suspend 2GB to disk or you try to
eat less memory, Highmem will become more of a pain.

> Now, people like suspend2 even more, and for good reasons: it is
> extremely fast, it provides nice feedback and its refrigerator is
> superior.
> 
> I also realized that suspend2 is fundamentally more complex than
> swsusp: it introduces additional time period where page cache must not
> be touched. I did not realize this sooner.

Sorry. I said it in so many ways! It's not really an issue though;
processes are stopped and suspend's own I/O doesn't touch page cache.

> I do longer think that "merge at all costs" is good idea now: in
> kernel code is small, simple and nice by now, and I'd like to keep it
> that way.

I want it to be simple and clear too. I'm getting there. As part of the
cleanup, I'd just made suspend modular, which has further untangled some
of the interdependancies between sections of code. I still have more
work to do, but I'm increasingly feeling like removing the debugging
code.

> Now, there are some parts of swsusp that are not quite okay. One of
> them is refrigerator -- it fails (in non-critical way but still) in
> some cases where it should not fail. suspend2 seems to have this
> solved, and I'd like to merge its refrigerator.

I'll submit a patch. I need to look at how you use the refrigerator
first. I refrigerate processes prior to resuming as well, and might need
to adjust things if you don't do that. I also need to check it will work
okay with S3.

> Other parts... I'm not so sure. Incremental patches would certainly
> help a lot here.

Well, if you insist, I'll do what I can. I looked at doing this before,
but was like trying to describe how to transform a mini into a station
wagon while using it every day! It will, of course, make the merge
_much_ slower too.

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. But true tolerance can cope with others
being intolerant.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: suspend2 merge [was Re: [RFC] Fix Device Power Management States]
  2004-08-11  5:05                     ` Nigel Cunningham
@ 2004-08-11  9:13                       ` Pavel Machek
  0 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2004-08-11  9:13 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Patrick Mochel, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, david-b

Hi!

> > I feared big problems with highmem support, but surprisingly, trivial
> > support thats in current code does not cause problems for
> > people. People seem to like pmdisk+swsusp, too...
> 
> That will be because you're eating most of the memory anyway; there's
> not problem finding enough memory to copy the Highmem too. I guess that
> once we start seeing people trying to suspend 2GB to disk or you try to
> eat less memory, Highmem will become more of a pain.

Yes, I know. I was just pleased that people do not complain as much as
I expected them to.

> > Now, people like suspend2 even more, and for good reasons: it is
> > extremely fast, it provides nice feedback and its refrigerator is
> > superior.
> > 
> > I also realized that suspend2 is fundamentally more complex than
> > swsusp: it introduces additional time period where page cache must not
> > be touched. I did not realize this sooner.
> 
> Sorry. I said it in so many ways! It's not really an issue though;
> processes are stopped and suspend's own I/O doesn't touch page
> cache.

Yes, I know it should work, it is just more things that need to be
verified.

> > Now, there are some parts of swsusp that are not quite okay. One of
> > them is refrigerator -- it fails (in non-critical way but still) in
> > some cases where it should not fail. suspend2 seems to have this
> > solved, and I'd like to merge its refrigerator.
> 
> I'll submit a patch. I need to look at how you use the refrigerator
> first. I refrigerate processes prior to resuming as well, and might need
> to adjust things if you don't do that. I also need to check it will work
> okay with S3.

Actually, we are not stopping processes prior to resuming, but I'd
call it a bug to be fixed.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC] Fix Device Power Management States
  2004-08-11  1:47 ` Todd Poynor
@ 2004-08-12 22:03   ` Russell King
  0 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2004-08-12 22:03 UTC (permalink / raw)
  To: Todd Poynor; +Cc: Patrick Mochel, linux-kernel, pavel, benh, david-b

On Tue, Aug 10, 2004 at 06:47:38PM -0700, Todd Poynor wrote:
> Divorcing device power states from ACPI-defined integers would be very 
> nice for embedded platforms.  Usually the platform bus would be 
> involved.  Since the proposal tends to place more responsibility on the 
> bus driver, I'm interested in the intended usage for platform devices. 
> For example, are platform_device callbacks for 
> suspend/resume/save/restore of the particular device still needed?  How 
> does the platform bus driver map (platform-specific?) system states to 
> device states?

platform devices are still a reminance of the previous PM model
incarnation, and where merely patched over to make them work again
in todays model.

It would be nice to have a proper platform_device_driver structure
so we could finally kill the otherwise unused PM methods in struct
device_driver and finally get rid of the old PM model.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2004-08-12 22:06 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-09 10:43 [RFC] Fix Device Power Management States Patrick Mochel
2004-08-09 11:38 ` Pavel Machek
2004-08-09 16:02   ` Patrick Mochel
2004-08-09 21:29     ` Pavel Machek
2004-08-10  5:03       ` Patrick Mochel
2004-08-10  9:43         ` Nigel Cunningham
2004-08-10 10:20           ` Pavel Machek
2004-08-10 22:33             ` Nigel Cunningham
2004-08-10 13:58           ` Patrick Mochel
2004-08-10 22:29             ` Nigel Cunningham
2004-08-10 22:56               ` Patrick Mochel
2004-08-10 23:09                 ` Nigel Cunningham
2004-08-10 23:36                   ` suspend2 merge [was Re: [RFC] Fix Device Power Management States] Pavel Machek
2004-08-11  0:04                     ` Arkadiusz Miskiewicz
2004-08-11  5:05                     ` Nigel Cunningham
2004-08-11  9:13                       ` Pavel Machek
2004-08-10 10:13         ` [RFC] Fix Device Power Management States Pavel Machek
2004-08-10 18:36           ` David Brownell
2004-08-10 20:36             ` Pavel Machek
2004-08-10 22:42             ` Patrick Mochel
2004-08-09 22:15     ` Nigel Cunningham
2004-08-10  0:43     ` Benjamin Herrenschmidt
2004-08-10  9:00       ` Russell King
2004-08-10 10:08       ` Pavel Machek
2004-08-10  0:40 ` Benjamin Herrenschmidt
2004-08-10  4:55   ` Patrick Mochel
2004-08-10  6:52     ` Benjamin Herrenschmidt
2004-08-10 10:07     ` Pavel Machek
2004-08-10 14:28       ` Patrick Mochel
2004-08-10 17:56         ` Pavel Machek
2004-08-10 22:41           ` Patrick Mochel
2004-08-10 23:10             ` Pavel Machek
2004-08-10 23:14             ` [patch] Smaller goal first: fix confusion [was Re: [RFC] Fix Device Power Management States] Pavel Machek
2004-08-11  1:02             ` [RFC] Fix Device Power Management States Benjamin Herrenschmidt
2004-08-10 19:41     ` David Brownell
2004-08-10 22:44       ` Patrick Mochel
2004-08-10 10:33 ` Matthew Garrett
2004-08-10 14:36   ` Patrick Mochel
2004-08-10 19:18 ` David Brownell
2004-08-10 20:50   ` Pavel Machek
2004-08-11  1:47 ` Todd Poynor
2004-08-12 22:03   ` Russell King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.