All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: So, what's the status on the recent patches here?
@ 2006-08-25 20:05 Woodruff, Richard
  2006-08-25 20:08 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 20:05 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

> > > For notebooks, devices *are* islands. powerop tries to push
> > > everything-depends-on-everything model that may be good for some
SoC,
> > > but sucks for notebooks. We need some middle ground.
> >
> > USB being enabled and causing your laptop battery to dry up is a
case
> > where laptop device dependency has been shown.  There are likely
many
> > more cases.  I would expect BIOS/chip set developers are all too
aware
> > of these in their sub-domains.
> 
> No, it is because USB enabled prevents cpu from sleeping; it is
> actually well known.

I vaguely recall hearing the why.  It has some DMAs which are going on
and I suppose the processor must service the completions.

Now, if you coordinated with the USB device some how, you could try and
place the USB bus into suspend mode, and only wake up on USB remote wake
up or data to be sent, they you could likely spend a lot more time in a
lower P state.

How are you to know when it is ok to shut off the USB bus?  Is that
something which could be coordinated with the processor and the active
use case.  If I don't need high performance I could go in and out of
suspend to save power.  Knowing high, or low performance helps in this
case :)

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 20:05 So, what's the status on the recent patches here? Woodruff, Richard
@ 2006-08-25 20:08 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 20:08 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-pm

Hi!

> > No, it is because USB enabled prevents cpu from sleeping; it is
> > actually well known.
> 
> I vaguely recall hearing the why.  It has some DMAs which are going on
> and I suppose the processor must service the completions.
> Now, if you coordinated with the USB device some how, you could try
> > and

If I coordinated with USB device somehow, I'd know when it is possible
to shutoff usb bus. This can be done locally at usb driver, no need
for big framework. Just someone needs to write that code.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-11 15:11                                           ` David Singleton
  2006-09-11 17:14                                             ` Pavel Machek
@ 2006-09-11 18:58                                             ` Matthew Locke
  1 sibling, 0 replies; 136+ messages in thread
From: Matthew Locke @ 2006-09-11 18:58 UTC (permalink / raw)
  To: David Singleton; +Cc: Preece Scott-PREECE, linux-pm, Pavel Machek


On Sep 11, 2006, at 8:11 AM, David Singleton wrote:

> On 9/9/06, Pavel Machek <pavel@ucw.cz> wrote:
>> On Fri 08-09-06 17:39:52, David Singleton wrote:
>>> On 9/3/06, Pavel Machek <pavel@ucw.cz> wrote:
>>>>> And those same steps are the same steps required to transition the
>>>>> system to a new operating point, whether it's suspend or change
>>>>> from 1.4GHz to 600MHz.
>>>>
>>>> No, processes are not frozen for simple cpu frequency change -- on
>>>> non-broken cpus.
>>>
>>> I didn't say cpu frequency changes freeze processes.  I said a 
>>> suspend
>>> does a prepare to suspend step (which freezes processes) and a cpu 
>>> frequency
>>> change does a prepare to change frequency step (where it will run 
>>> the driver
>>> notifier list to get drivers set to scale).
>>
>> Yep, and switching consoles is also same. It is prepare to switch, do
>> a switch, notify people you switched. Shall we use same code?
>
> No.  Since switching console is not dealing with changing the operating
> state of the system.  It's just switching which device is the console.
>
>>
>>> They both do the same three steps:
>>>
>>> 1) prepare to transition
>>>
>>> 2) transition
>>>
>>> 3) finish transition
>>>
>>> That's one of my arguements as to why suspend states should be 
>>> treated
>>> just like frequency states.
>>
>> Cat and horse is a same animal. They both have 4 legs, one head and a
>> tail.
>>
>> Anyway, as a software suspend maintainer, I do not want you to add
>> non-sleeping states to /sys/power/state. I will NAK any attempt to do
>> so. Please find more suitable interface.
>
> You are right.  The power/state files for devices are for suspend
> states as well.  I'll find a different interface for operating states 
> and
> leave suspend states in /sys/power/state.

You are just duplicating work that is already done in powerOP.  I still 
don't understand why you are redoing all the work Eugeny and I have 
done.  Nothing in our PowerOP interface prevents you from building on 
top achieve your goals.

>
> David
>>
>>                                                         Pavel
>> --
>> Thanks for all the (sleeping) penguins.
>>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-11 15:11                                           ` David Singleton
@ 2006-09-11 17:14                                             ` Pavel Machek
  2006-09-11 18:58                                             ` Matthew Locke
  1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-11 17:14 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm, Matthew Locke, Preece Scott-PREECE

Hi!

> >Cat and horse is a same animal. They both have 4 legs, one head and a
> >tail.
> >
> >Anyway, as a software suspend maintainer, I do not want you to add
> >non-sleeping states to /sys/power/state. I will NAK any attempt to do
> >so. Please find more suitable interface.
> 
> You are right.  The power/state files for devices are for suspend
> states as well.  I'll find a different interface for operating states and
> leave suspend states in /sys/power/state.

Thanks!
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-09 12:17                                         ` Pavel Machek
@ 2006-09-11 15:11                                           ` David Singleton
  2006-09-11 17:14                                             ` Pavel Machek
  2006-09-11 18:58                                             ` Matthew Locke
  0 siblings, 2 replies; 136+ messages in thread
From: David Singleton @ 2006-09-11 15:11 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm, Matthew Locke, Preece Scott-PREECE

On 9/9/06, Pavel Machek <pavel@ucw.cz> wrote:
> On Fri 08-09-06 17:39:52, David Singleton wrote:
> > On 9/3/06, Pavel Machek <pavel@ucw.cz> wrote:
> > > > And those same steps are the same steps required to transition the
> > > > system to a new operating point, whether it's suspend or change
> > > > from 1.4GHz to 600MHz.
> > >
> > > No, processes are not frozen for simple cpu frequency change -- on
> > > non-broken cpus.
> >
> > I didn't say cpu frequency changes freeze processes.  I said a suspend
> > does a prepare to suspend step (which freezes processes) and a cpu frequency
> > change does a prepare to change frequency step (where it will run the driver
> > notifier list to get drivers set to scale).
>
> Yep, and switching consoles is also same. It is prepare to switch, do
> a switch, notify people you switched. Shall we use same code?

No.  Since switching console is not dealing with changing the operating
state of the system.  It's just switching which device is the console.

>
> > They both do the same three steps:
> >
> > 1) prepare to transition
> >
> > 2) transition
> >
> > 3) finish transition
> >
> > That's one of my arguements as to why suspend states should be treated
> > just like frequency states.
>
> Cat and horse is a same animal. They both have 4 legs, one head and a
> tail.
>
> Anyway, as a software suspend maintainer, I do not want you to add
> non-sleeping states to /sys/power/state. I will NAK any attempt to do
> so. Please find more suitable interface.

You are right.  The power/state files for devices are for suspend
states as well.  I'll find a different interface for operating states and
leave suspend states in /sys/power/state.

David
>
>                                                         Pavel
> --
> Thanks for all the (sleeping) penguins.
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-09  0:48                                         ` David Singleton
@ 2006-09-09 16:13                                           ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-09 16:13 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

Hi!

On Fri 08-09-06 17:48:13, David Singleton wrote:
>        I've been off doing real work for a while but 
>        I've finally got some time
>        to work on SMP and device constraints for the 
>        oppoint patch set.

Could you teach your mail client not to insert tabs at begining of
line, and inline your patches?

>        Once I get SMP working well I'm going to spend 
>        some time thinking
>        about suspend operating states.  The pm_ops 
>        structure looks
>        suspicioulsy like a subset of the oppoint 
>        structure.  If I can

Just because a is subset of b does not mean that merging a and b is
good idea.
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-09  0:39                                       ` David Singleton
  2006-09-09  0:48                                         ` David Singleton
@ 2006-09-09 12:17                                         ` Pavel Machek
  2006-09-11 15:11                                           ` David Singleton
  1 sibling, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-09-09 12:17 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm, Matthew Locke, Preece Scott-PREECE

On Fri 08-09-06 17:39:52, David Singleton wrote:
> On 9/3/06, Pavel Machek <pavel@ucw.cz> wrote:
> > > And those same steps are the same steps required to transition the
> > > system to a new operating point, whether it's suspend or change
> > > from 1.4GHz to 600MHz.
> >
> > No, processes are not frozen for simple cpu frequency change -- on
> > non-broken cpus.
> 
> I didn't say cpu frequency changes freeze processes.  I said a suspend
> does a prepare to suspend step (which freezes processes) and a cpu frequency
> change does a prepare to change frequency step (where it will run the driver
> notifier list to get drivers set to scale).

Yep, and switching consoles is also same. It is prepare to switch, do
a switch, notify people you switched. Shall we use same code?

> They both do the same three steps:
> 
> 1) prepare to transition
> 
> 2) transition
> 
> 3) finish transition
> 
> That's one of my arguements as to why suspend states should be treated
> just like frequency states.

Cat and horse is a same animal. They both have 4 legs, one head and a
tail.

Anyway, as a software suspend maintainer, I do not want you to add
non-sleeping states to /sys/power/state. I will NAK any attempt to do
so. Please find more suitable interface.

							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-09  0:39                                       ` David Singleton
@ 2006-09-09  0:48                                         ` David Singleton
  2006-09-09 16:13                                           ` Pavel Machek
  2006-09-09 12:17                                         ` Pavel Machek
  1 sibling, 1 reply; 136+ messages in thread
From: David Singleton @ 2006-09-09  0:48 UTC (permalink / raw)
  To: linux-pm

[-- Attachment #1: Type: text/plain, Size: 3098 bytes --]

        I've been off doing real work for a while but I've finally got some time
        to work on SMP and device constraints for the oppoint patch set.

        I hope to have SMP tested in the next few days and I have
        device constraints working on the arm-pxa27x platform.


        The latest patch set is running meg movies  while scaling
frequencies, either
        manuall or  with the oppointd power manager daemon, and the
video runs correctly
        during  frequency scaling operations.

        The device constraints design has also simplified the core patch.
        I no longer have to touch device.h, driver.c, resume.c, or suspend.c.

        The only files in the core framework patch now are:

                kernel/power/main.c
                include/linux/pm.h
                kernel/power/power.h
                drivers/base/power/Makefile
                drivers/base/power/oppoint.c


        Constraints Design:

        Constraints are a device attribute.  A device must deteremine if
        the new target operating point will cause the device to fail
        to operate correctly.  An example is an LCD device whose driving
        clock will be set too low to operate correctly.

        In the OpPoint design device constraints are checked in the
        prepare_transition routines of each device.  When the transition
        notifier list is run for driver scaling functions the PRECHANGE
        operation will check for constraints conditions and return -EINVAL
        to signify the operating point would put the device into and illegal
        or non functional state.

        The PRECHANGE function will also check to see if the device has
        been suspended and constraints checks are not needed.

        If the prepare_transition routine returns -EINVAL the new target
        operating point will not be set.

        Once I get SMP working well I'm going to spend some time thinking
        about suspend operating states.  The pm_ops structure looks
        suspicioulsy like a subset of the oppoint structure.  If I can
        figure out a way to truely make all suspend states normal
        operating points I believe the enter_state and pm_change_state
        functions can be merged into a single function that handles both
        changes to suspend states and changes to new frequency or
        new voltage states.

        I also want to order the suspend states according to their
        power consumption levels to give the power manager better
        control over which suspend states will save the most power.
        I think it will be something along the lines of the suspend
        states names signifying and ordering, the same way I do
        for frequency and voltage operating points.


        Again the full patch set is at:

        http://source.mvista.com/~dsingleton/2.6.18-rc6

        The power manager is at:

        http://source.mvista.com/~dsingleton/oppointd

        Comments and flamages that improve the design and code
        are always welcome.

        Attached is the core patch.

David

[-- Attachment #2: oppoint-core.patch --]
[-- Type: application/octet-stream, Size: 17733 bytes --]


Signed-Off-by: David Singleton <dsingleton@mvista.com>

 drivers/base/power/Makefile  |    2 
 drivers/base/power/oppoint.c |   74 +++++++++
 include/linux/pm.h           |   34 ++++
 kernel/power/main.c          |  328 +++++++++++++++++++++++++++++++++++++------
 kernel/power/power.h         |    2 
 5 files changed, 396 insertions(+), 44 deletions(-)

Index: linux-2.6.17/kernel/power/main.c
===================================================================
--- linux-2.6.17.orig/kernel/power/main.c
+++ linux-2.6.17/kernel/power/main.c
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 #include <linux/pm.h>
 #include <linux/console.h>
+#include <linux/module.h>
 
 #include "power.h"
 
@@ -49,7 +50,7 @@ void pm_set_ops(struct pm_ops * ops)
  *	the platform can enter the requested state.
  */
 
-static int suspend_prepare(suspend_state_t state)
+static int suspend_prepare(struct oppoint * state)
 {
 	int error = 0;
 	unsigned int free_pages;
@@ -82,7 +83,7 @@ static int suspend_prepare(suspend_state
 	}
 
 	if (pm_ops->prepare) {
-		if ((error = pm_ops->prepare(state)))
+		if ((error = pm_ops->prepare(state->type)))
 			goto Thaw;
 	}
 
@@ -94,7 +95,7 @@ static int suspend_prepare(suspend_state
 	return 0;
  Finish:
 	if (pm_ops->finish)
-		pm_ops->finish(state);
+		pm_ops->finish(state->type);
  Thaw:
 	thaw_processes();
  Enable_cpu:
@@ -104,7 +105,7 @@ static int suspend_prepare(suspend_state
 }
 
 
-int suspend_enter(suspend_state_t state)
+int suspend_enter(struct oppoint * state)
 {
 	int error = 0;
 	unsigned long flags;
@@ -115,7 +116,7 @@ int suspend_enter(suspend_state_t state)
 		printk(KERN_ERR "Some devices failed to power down\n");
 		goto Done;
 	}
-	error = pm_ops->enter(state);
+	error = pm_ops->enter(state->type);
 	device_power_up();
  Done:
 	local_irq_restore(flags);
@@ -131,36 +132,98 @@ int suspend_enter(suspend_state_t state)
  *	console that we've allocated. This is not called for suspend-to-disk.
  */
 
-static void suspend_finish(suspend_state_t state)
+static void suspend_finish(struct oppoint * state)
 {
 	device_resume();
 	resume_console();
 	thaw_processes();
 	enable_nonboot_cpus();
 	if (pm_ops && pm_ops->finish)
-		pm_ops->finish(state);
+		pm_ops->finish(state->type);
 	pm_restore_console();
 }
 
 
+struct oppoint *current_state;
+struct oppoint pm_states = {
+	.name = "default",
+	.type = PM_SUSPEND_ON,
+};
+
+static struct oppoint standby = {
+	.name = "standby",
+	.type = PM_SUSPEND_STANDBY,
+};
+struct oppoint *oppoint_standby;
 
+static struct oppoint mem = {
+	.name = "mem",
+	.type = PM_SUSPEND_MEM,
+	.frequency = 0,
+	.voltage = 0,
+	.latency = 150,
+};
+struct oppoint *oppoint_mem;
 
-static const char * const pm_states[PM_SUSPEND_MAX] = {
-	[PM_SUSPEND_STANDBY]	= "standby",
-	[PM_SUSPEND_MEM]	= "mem",
 #ifdef CONFIG_SOFTWARE_SUSPEND
-	[PM_SUSPEND_DISK]	= "disk",
-#endif
+struct oppoint disk = {
+	.name = "disk",
+	.type = PM_SUSPEND_DISK,
 };
+#endif
 
-static inline int valid_state(suspend_state_t state)
+/*
+ *
+ */
+static int pm_change_state(struct oppoint *state)
+{
+	int error = 0;
+
+	printk("OpPoint: changing from %s to %s\n", current_state->name,
+	     state->name);
+	/*
+	 * compare to current operating point.
+	 * if different change to new operating point.
+	 */
+	if (current_state == state)
+		goto out;
+
+	/*
+	 * prepare_transition does device constraint checking.  If
+	 * a new operating point will put a device in an unsupported
+	 * state, lcd clock too low, NIC bus too low, etc.  the new state
+	 * cannot be entered (until the constrainded device is suspended).
+	 * If prepare_transition fails we don't go to the new operating
+	 * point.
+	 */
+	if ((error = state->prepare_transition(current_state, state)))
+		goto out;
+
+	/*
+	 * if the transition fails we call the finish transistion
+	 * with the current state as the new state, causing
+	 * the finish to return to the current_state.
+	 */
+
+	if ((error = state->transition(current_state, state)))
+		state = current_state;
+
+	if ((state->finish_transition(current_state, state)) == 0)
+		current_state = state;
+
+out:
+	printk("OpPoint: State change returned %d\n", error);
+	return error;
+}
+
+static inline int valid_state(struct oppoint * state)
 {
 	/* Suspend-to-disk does not really need low-level support.
 	 * It can work with reboot if needed. */
-	if (state == PM_SUSPEND_DISK)
+	if (state->type == PM_SUSPEND_DISK)
 		return 1;
 
-	if (pm_ops && pm_ops->valid && !pm_ops->valid(state))
+	if (pm_ops && pm_ops->valid && !pm_ops->valid(state->type))
 		return 0;
 	return 1;
 }
@@ -168,7 +231,7 @@ static inline int valid_state(suspend_st
 
 /**
  *	enter_state - Do common work of entering low-power state.
- *	@state:		pm_state structure for state we're entering.
+ *	@state:		oppoint structure for state we're entering.
  *
  *	Make sure we're the only ones trying to enter a sleep state. Fail
  *	if someone has beat us to it, since we don't want anything weird to
@@ -177,7 +240,7 @@ static inline int valid_state(suspend_st
  *	we've woken up).
  */
 
-static int enter_state(suspend_state_t state)
+static int enter_state(struct oppoint *state)
 {
 	int error;
 
@@ -186,16 +249,21 @@ static int enter_state(suspend_state_t s
 	if (down_trylock(&pm_sem))
 		return -EBUSY;
 
-	if (state == PM_SUSPEND_DISK) {
+	if (state->type == PM_SUSPEND_DISK) {
 		error = pm_suspend_disk();
 		goto Unlock;
 	}
 
-	pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
+	if (state->type == PM_FREQ_CHANGE || state->type == PM_VOLT_CHANGE) {
+		error = pm_change_state(state);
+		goto Unlock;
+	}
+
+	pr_debug("PM: Preparing system for %s sleep\n", state->name);
 	if ((error = suspend_prepare(state)))
 		goto Unlock;
 
-	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
+	pr_debug("PM: Entering %s sleep\n", state->name);
 	error = suspend_enter(state);
 
 	pr_debug("PM: Finishing wakeup.\n");
@@ -211,7 +279,15 @@ static int enter_state(suspend_state_t s
  */
 int software_suspend(void)
 {
-	return enter_state(PM_SUSPEND_DISK);
+	struct oppoint *this, *next;
+	struct list_head *head = &pm_states.list;
+	int error = 0;
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (this->type == PM_SUSPEND_DISK)
+			error= enter_state(this);
+	}
+	return error;
 }
 
 
@@ -223,9 +299,9 @@ int software_suspend(void)
  *	structure, and enter (above).
  */
 
-int pm_suspend(suspend_state_t state)
+int pm_suspend(struct oppoint * state)
 {
-	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
+	if (state->type > PM_SUSPEND_ON && state->type <= PM_SUSPEND_MAX)
 		return enter_state(state);
 	return -EINVAL;
 }
@@ -248,36 +324,29 @@ decl_subsys(power,NULL,NULL);
 
 static ssize_t state_show(struct subsystem * subsys, char * buf)
 {
-	int i;
 	char * s = buf;
 
-	for (i = 0; i < PM_SUSPEND_MAX; i++) {
-		if (pm_states[i] && valid_state(i))
-			s += sprintf(s,"%s ", pm_states[i]);
-	}
-	s += sprintf(s,"\n");
+	s += sprintf(s,"%s\n", current_state->name);
 	return (s - buf);
 }
 
 static ssize_t state_store(struct subsystem * subsys, const char * buf, size_t n)
 {
-	suspend_state_t state = PM_SUSPEND_STANDBY;
-	const char * const *s;
+	struct oppoint *this, *next;
+	struct list_head *head = &pm_states.list;
 	char *p;
-	int error;
+	int error = -EINVAL;
 	int len;
 
 	p = memchr(buf, '\n', n);
 	len = p ? p - buf : n;
-
-	for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
-		if (*s && !strncmp(buf, *s, len))
+	list_for_each_entry_safe(this, next, head, list) {
+		if ((strlen(this->name) == len) &&
+		   (!strncmp(this->name, buf, len))) {
+			error = enter_state(this);
 			break;
+		}
 	}
-	if (state < PM_SUSPEND_MAX && *s)
-		error = enter_state(state);
-	else
-		error = -EINVAL;
 	return error ? error : n;
 }
 
@@ -292,12 +361,191 @@ static struct attribute_group attr_group
 	.attrs = g,
 };
 
+static struct kobject oppoint_kobj = {
+        .kset = &power_subsys.kset,
+};
+
+struct oppoint_attribute {
+        struct attribute        attr;
+        ssize_t (*show)(struct kobject * kobj, char * buf);
+        ssize_t (*store)(struct kobject * kobj, const char * buf, size_t count);
+};
+
+#define to_oppoint(obj) container_of(obj,struct oppoint,kobj)
+#define to_oppoint_attr(_attr) container_of(_attr,struct oppoint_attribute,attr)
+/*
+ * the frequency, voltage and latency files are readonly
+ */
+
+static ssize_t oppoint_voltage_show(struct kobject * kobj, char * buf)
+{
+	ssize_t len;
+	struct oppoint *opt = to_oppoint(kobj);
+
+	len = sprintf(buf, "%8d\n", opt->voltage);
+
+	return len;
+}
+
+static ssize_t oppoint_voltage_store(struct kobject * kobj, const char * buf,
+	size_t n)
+{
+        return -EINVAL;
+
+}
+
+static ssize_t oppoint_frequency_show(struct kobject * kobj, char * buf)
+{
+	ssize_t len;
+	struct oppoint *opt = to_oppoint(kobj);
+
+	len = sprintf(buf, "%8d\n", opt->frequency);
+
+	return len;
+}
+
+static ssize_t oppoint_frequency_store(struct kobject * kobj,
+	 const char * buf, size_t n)
+{
+        return -EINVAL;
+
+}
+
+static ssize_t oppoint_latency_show(struct kobject * kobj, char * buf)
+{
+	ssize_t len;
+	struct oppoint *opt = to_oppoint(kobj);
+
+	len = sprintf(buf, "%8d\n", opt->latency);
+
+	return len;
+}
+
+static ssize_t oppoint_latency_store(struct kobject * kobj,
+	 const char * buf, size_t n)
+{
+        return -EINVAL;
+
+}
+
+static struct oppoint_attribute frequency_attr = {
+        .attr   = {
+                .name = "frequency",
+                .mode = 0400,
+        },
+        .show   = oppoint_frequency_show,
+        .store  = oppoint_frequency_store,
+};
+
+static struct oppoint_attribute voltage_attr = {
+        .attr   = {
+                .name = "voltage",
+                .mode = 0400,
+        },
+        .show   = oppoint_voltage_show,
+        .store  = oppoint_voltage_store,
+};
+
+static struct oppoint_attribute latency_attr = {
+        .attr   = {
+                .name = "latency",
+                .mode = 0400,
+        },
+        .show   = oppoint_latency_show,
+        .store  = oppoint_latency_store,
+};
+
+static ssize_t
+oppoint_attr_show(struct kobject * kobj, struct attribute * attr, char * buf)
+{
+	struct oppoint_attribute * opt_attr = to_oppoint_attr(attr);
+	ssize_t ret = 0;
+
+	if (opt_attr->show)
+		ret = opt_attr->show(kobj,buf);
+	return ret;
+}
+
+static ssize_t
+oppoint_attr_store(struct kobject * kobj, struct attribute * attr,
+	      const char * buf, size_t count)
+{
+	return -EINVAL;
+}
+
+static void oppoint_kobj_release(struct kobject *kobj)
+{
+	return;
+}
+
+static struct sysfs_ops oppoint_sysfs_ops = {
+	.show	= oppoint_attr_show,
+	.store	= oppoint_attr_store,
+};
+
+static struct attribute * oppoint_default_attrs[] = {
+	&frequency_attr.attr,
+	&voltage_attr.attr,
+	&latency_attr.attr,
+	NULL,
+};
+
+static struct kobj_type ktype_operating_point = {
+        .release        = oppoint_kobj_release,
+        .sysfs_ops      = &oppoint_sysfs_ops,
+        .default_attrs  = oppoint_default_attrs,
+};
+
+int unregister_operating_point(struct oppoint *opt)
+{
+	down(&pm_sem);
+	list_del_init(&opt->list);
+	sysfs_remove_file(&opt->kobj, &frequency_attr.attr);
+	sysfs_remove_file(&opt->kobj, &voltage_attr.attr);
+	sysfs_remove_file(&opt->kobj, &latency_attr.attr);
+	up(&pm_sem);
+}
+EXPORT_SYMBOL(unregister_operating_point);
+
+int register_operating_point(struct oppoint *opt)
+{
+	down(&pm_sem);
+	kobject_set_name(&opt->kobj, opt->name);
+	opt->kobj.kset = &power_subsys.kset;
+	opt->kobj.parent = &oppoint_kobj;
+	opt->kobj.ktype = &ktype_operating_point;
+	kobject_register(&opt->kobj);
+
+	sysfs_create_file(&opt->kobj, &frequency_attr.attr);
+	sysfs_create_file(&opt->kobj, &voltage_attr.attr);
+	sysfs_create_file(&opt->kobj, &latency_attr.attr);
+
+	list_add_tail(&opt->list, &pm_states.list);
+	up(&pm_sem);
+	return 0;
+}
+EXPORT_SYMBOL(register_operating_point);
 
 static int __init pm_init(void)
 {
+
 	int error = subsystem_register(&power_subsys);
-	if (!error)
+	if (!error) {
 		error = sysfs_create_group(&power_subsys.kset.kobj,&attr_group);
+		kobject_set_name(&oppoint_kobj, "operating_points");
+		kobject_register(&oppoint_kobj);
+	}
+
+
+	INIT_LIST_HEAD(&pm_states.list);
+
+#ifdef CONFIG_SOFTWARE_SUSPEND
+	register_operating_point(&disk);
+#endif
+	register_operating_point(&mem);
+	register_operating_point(&standby);
+	current_state = &pm_states;
+
 	return error;
 }
 
Index: linux-2.6.17/include/linux/pm.h
===================================================================
--- linux-2.6.17.orig/include/linux/pm.h
+++ linux-2.6.17/include/linux/pm.h
@@ -24,6 +24,7 @@
 #ifdef __KERNEL__
 
 #include <linux/list.h>
+#include <linux/kobject.h>
 #include <asm/atomic.h>
 
 /*
@@ -108,7 +109,36 @@ typedef int __bitwise suspend_state_t;
 #define PM_SUSPEND_STANDBY	((__force suspend_state_t) 1)
 #define PM_SUSPEND_MEM		((__force suspend_state_t) 3)
 #define PM_SUSPEND_DISK		((__force suspend_state_t) 4)
-#define PM_SUSPEND_MAX		((__force suspend_state_t) 5)
+#define PM_FREQ_CHANGE		((__force suspend_state_t) 5)
+#define PM_VOLT_CHANGE		((__force suspend_state_t) 6)
+#define PM_SUSPEND_MAX		((__force suspend_state_t) 7)
+
+struct oppoint {
+	struct list_head list;
+	suspend_state_t type;
+	unsigned int flags;
+	char *name;
+	unsigned int frequency;		/* in KHz */
+	unsigned int voltage;		/* mV */
+	unsigned int latency;		/* transition latency in us */
+	int     (*prepare_transition)(struct oppoint *cur, struct oppoint *new);
+	int     (*transition)(struct oppoint *cur, struct oppoint *new);
+	int     (*finish_transition)(struct oppoint *cur, struct oppoint *new);
+
+	void *md_data;			/* arch dependent data */
+	struct kobject kobj;
+};
+
+
+extern struct oppoint pm_states;
+extern struct oppoint *current_state;
+extern unsigned long oppoint_compute_lpj(unsigned long ref, u_int div, u_int mult);
+extern int register_operating_point(struct oppoint *opt);
+extern int unregister_operating_point(struct oppoint *opt);
+struct notifier_block;
+extern void oppoint_register_scale(struct notifier_block *nb, int level);
+extern void oppoint_unregister_scale(struct notifier_block *nb, int level);
+extern int oppoint_driver_scale(int level, struct oppoint *new);
 
 typedef int __bitwise suspend_disk_method_t;
 
@@ -128,7 +158,7 @@ struct pm_ops {
 
 extern void pm_set_ops(struct pm_ops *);
 extern struct pm_ops *pm_ops;
-extern int pm_suspend(suspend_state_t state);
+extern int pm_suspend(struct oppoint *state);
 
 
 /*
Index: linux-2.6.17/kernel/power/power.h
===================================================================
--- linux-2.6.17.orig/kernel/power/power.h
+++ linux-2.6.17/kernel/power/power.h
@@ -113,4 +113,4 @@ extern int swsusp_resume(void);
 extern int swsusp_read(void);
 extern int swsusp_write(void);
 extern void swsusp_close(void);
-extern int suspend_enter(suspend_state_t state);
+extern int suspend_enter(struct oppoint * state);
Index: linux-2.6.17/drivers/base/power/Makefile
===================================================================
--- linux-2.6.17.orig/drivers/base/power/Makefile
+++ linux-2.6.17/drivers/base/power/Makefile
@@ -1,4 +1,4 @@
-obj-y			:= shutdown.o
+obj-y			:= shutdown.o oppoint.o
 obj-$(CONFIG_PM)	+= main.o suspend.o resume.o runtime.o sysfs.o
 obj-$(CONFIG_PM_TRACE)	+= trace.o
 
Index: linux-2.6.17/drivers/base/power/oppoint.c
===================================================================
--- /dev/null
+++ linux-2.6.17/drivers/base/power/oppoint.c
@@ -0,0 +1,74 @@
+/*
+ * oppoint.c -- OpPoint ower Management support (hotplug events and device
+ * scaling).
+ *
+ * (c) 2006 MontaVista Software, Inc. This file is licensed under the
+ * terms of the GNU General Public License version 2. This program is
+ * licensed "as is" without any warranty of any kind, whether express or
+ * implied.
+ */
+
+#include <linux/device.h>
+#include <linux/pm.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/notifier.h>
+
+#include "power.h"
+static RAW_NOTIFIER_HEAD(oppoint_scale_notifier);
+static DECLARE_MUTEX(oppoint_scale_sem);
+
+/* This function may be called by the platform frequency scaler before
+   or after a frequency change, in order to let drivers adjust any
+   clocks or calculations for the new frequency. */
+
+int oppoint_driver_scale(int level, struct oppoint *newop)
+{
+        if (down_trylock(&oppoint_scale_sem))
+                return 1;
+
+        raw_notifier_call_chain(&oppoint_scale_notifier, level, newop);
+        up(&oppoint_scale_sem);
+	return 0;
+}
+
+void oppoint_register_scale(struct notifier_block *nb, int level)
+{
+        down(&oppoint_scale_sem);
+        raw_notifier_chain_register(&oppoint_scale_notifier, nb);
+        up(&oppoint_scale_sem);
+}
+
+void oppoint_unregister_scale(struct notifier_block *nb, int level)
+{
+        down(&oppoint_scale_sem);
+        raw_notifier_chain_unregister(&oppoint_scale_notifier, nb);
+        up(&oppoint_scale_sem);
+}
+
+EXPORT_SYMBOL(oppoint_driver_scale);
+EXPORT_SYMBOL(oppoint_register_scale);
+EXPORT_SYMBOL(oppoint_unregister_scale);
+
+unsigned long oppoint_compute_lpj(unsigned long ref, u_int div, u_int mult)
+{
+	unsigned long new_jiffy_l, new_jiffy_h;
+
+	/*
+	 * Recalculate loops_per_jiffy.  We do it this way to
+	 * avoid math overflow on 32-bit machines.  Maybe we
+	 * should make this architecture dependent?  If you have
+	 * a better way of doing this, please replace!
+	 *
+	 *    new = old * mult / div
+	 */
+	 new_jiffy_h = ref / div;
+	 new_jiffy_l = (ref % div) / 100;
+	 new_jiffy_h *= mult;
+	 new_jiffy_l = new_jiffy_l * mult / div;
+
+	 return new_jiffy_h + new_jiffy_l * 100;
+}
+EXPORT_SYMBOL(oppoint_compute_lpj);

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 21:33                                     ` Pavel Machek
@ 2006-09-09  0:39                                       ` David Singleton
  2006-09-09  0:48                                         ` David Singleton
  2006-09-09 12:17                                         ` Pavel Machek
  0 siblings, 2 replies; 136+ messages in thread
From: David Singleton @ 2006-09-09  0:39 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Preece Scott-PREECE, Matthew Locke, linux-pm

On 9/3/06, Pavel Machek <pavel@ucw.cz> wrote:
> > >That depends on the definition, but I think of suspend states as the ones
> > >that require processes to be frozen before they can be entered.  IMHO it is
> > >quite clear that such states cannot be handled in the same way as those
> > >that do not require the freezing of processes, so they are not the same.
> >
> > You are correct, processes do need to be frozen before a suspend.
> > That's the prepare to suspend part of the suspend process, and
> > the transtition is the suspending and finish is the un-freezing
> > of the processes to resume execution.
> >
> > And those same steps are the same steps required to transition the
> > system to a new operating point, whether it's suspend or change
> > from 1.4GHz to 600MHz.
>
> No, processes are not frozen for simple cpu frequency change -- on
> non-broken cpus.

I didn't say cpu frequency changes freeze processes.  I said a suspend
does a prepare to suspend step (which freezes processes) and a cpu frequency
change does a prepare to change frequency step (where it will run the driver
notifier list to get drivers set to scale).

They both do the same three steps:

1) prepare to transition

2) transition

3) finish transition

That's one of my arguements as to why suspend states should be treated
just like frequency states.

David



>                                                                 Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-05 16:45   ` Mark Gross
@ 2006-09-06 10:59     ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-06 10:59 UTC (permalink / raw)
  To: Mark Gross; +Cc: matthew.a.locke, scott.preece, linux-pm


> > > I think I'm listening to arguments just as much as you guys are! We just
> > > disagree. What are your criteria for "a clean interface"? Why do you
> > > think that n separate set-parameter() interfaces, with no consistency
> > > relationship between them, are cleaner than one define-op() and one
> > > set-op() interface?
> > 
> > Because we already have cpufreq-set-parameter() interface and
> > enter-suspend-state() interface. We can't really get rid of them.
> >
> This is true.  Yet todays cpufreq interface is not up to the job of
> providing power management for many embedded platforms.

> > If you add set-op() replacing both cpufreq-set-parameter() and
> > enter-suspend-state(), we'll end up with two different interfaces for
> > each interface; that's considered "mess".
> 
> Why can't they coexist?
> 
> Are you arguing that the cpufreq interface be morphed to support power
> op applications?

No. I'm arguing that

* cpufreq interface should be used for changing cpu frequency

* additional interfaces should be created for changing memory clock
etc.

* existing interfaces should be used for turning devices on/off (and
new ones created when old ones do not exist)

* powerop should take a look what userspace wants, and just close
closest point to that.
							Pavel

-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-05 16:03 Scott E. Preece
  2006-09-05 20:42 ` Rafael J. Wysocki
@ 2006-09-06 10:56 ` Pavel Machek
  1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-06 10:56 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm

Hi!

> We have thought about merging the decisions into a single range of
> operating points, but the added plumbing to get idle information back to
> the policy manager seemed unappealing.
> 
> We don't use cpufreq, so Pavel's arguments about not changing the kernel
> interface weren't a concern, for us. We're using a kernel interface that
> is all our own (and unappealingly ioctl-based).

Well, but you can see that my arguments are quite important for
mainline merge? ;-).

Of course, powerop/oppoint patches are okay for your own use.
							Pavel
-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-05 16:03 Scott E. Preece
@ 2006-09-05 20:42 ` Rafael J. Wysocki
  2006-09-06 10:56 ` Pavel Machek
  1 sibling, 0 replies; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-05 20:42 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm, pavel

On Tuesday, 5 September 2006 18:03, Scott E. Preece wrote:
> 
> | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> | 
> | On Monday, 4 September 2006 01:00, Scott E. Preece wrote:
> | > policy decision to suspend is based on factors that are wholly different
> | > than the factors that drive frequency/voltage changes. If that were the
> | > case, then there would be no point to making the decisions in the same
> | > place.  Honestly, I'm not sure of the answer to that...
> | 
> | I think the decision to suspend is made
> | a) by the user,
> | b) by a policy manager in case when, for example, the battery is running
> | critical (ie. on emergency).
> | and the decision to change a frequency/voltage is usually based on some
> | efficiency factors.
> | 
> | Also, the suspend "transitions" are never transparent to the user and the
> | changes of frequency/voltage usually are, at least as far as CPUs are
> | concerned.
> ---
> 
> Your scope is too narrow. In our domain (mobile phones) the user has no
> control at all over power management and decisions to suspend are always
> transparent.
> 
> In our own implementation, the user-space policy manager initiates
> frequency and voltage changes and enabled suspends, but doesn't actually
> initiate them. That is, the policy manager says "based on current user
> activity, it would be OK to suspend now", and a kernel component then
> looks for a good time to do it, based on the system being idle.

Okay, but it's not like that on a PC.

IMHO there are architectures on which suspend states are distinct and
therefore they should be treated as such in general.

> We have thought about merging the decisions into a single range of
> operating points, but the added plumbing to get idle information back to
> the policy manager seemed unappealing.
> 
> We don't use cpufreq, so Pavel's arguments about not changing the kernel
> interface weren't a concern, for us. We're using a kernel interface that
> is all our own (and unappealingly ioctl-based).

Fine, as far as I'm concerned.  ;-)

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-04  9:06 ` Pavel Machek
@ 2006-09-05 16:45   ` Mark Gross
  2006-09-06 10:59     ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Mark Gross @ 2006-09-05 16:45 UTC (permalink / raw)
  To: Pavel Machek; +Cc: matthew.a.locke, scott.preece, linux-pm

On Mon, Sep 04, 2006 at 11:06:45AM +0200, Pavel Machek wrote:
> On Sun 2006-09-03 17:40:27, Scott E. Preece wrote:
> > 
> > | From: Pavel Machek<pavel@ucw.cz>
> > | 
> > | On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
> > | > | From: Pavel Machek<pavel@ucw.cz>
> 
> > | > Not speaking to either of the current code submissions, I would say that
> > | > having a kernel interface for defining OPs and a kernel interface for
> > | > setting the OP, was a reasonably clean interface.
> > | 
> > | Well, me and Rafael disagree, and you do not really listen to
> > | arguments. Now you can either fix the interface, or try to submit code
> > | to lkml despite our NAKs. Go ahead and prepare for some flaming...
> > ---
> > 
> > I think I'm listening to arguments just as much as you guys are! We just
> > disagree. What are your criteria for "a clean interface"? Why do you
> > think that n separate set-parameter() interfaces, with no consistency
> > relationship between them, are cleaner than one define-op() and one
> > set-op() interface?
> 
> Because we already have cpufreq-set-parameter() interface and
> enter-suspend-state() interface. We can't really get rid of them.
>
This is true.  Yet todays cpufreq interface is not up to the job of
providing power management for many embedded platforms.

> If you add set-op() replacing both cpufreq-set-parameter() and
> enter-suspend-state(), we'll end up with two different interfaces for
> each interface; that's considered "mess".

Why can't they coexist?

Are you arguing that the cpufreq interface be morphed to support power
op applications?


--mgross
> 								Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-05 16:03 Scott E. Preece
  2006-09-05 20:42 ` Rafael J. Wysocki
  2006-09-06 10:56 ` Pavel Machek
  0 siblings, 2 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-05 16:03 UTC (permalink / raw)
  To: rjw; +Cc: scott.preece, matthew.a.locke, linux-pm, pavel


| From: "Rafael J. Wysocki" <rjw@sisk.pl>
| 
| On Monday, 4 September 2006 01:00, Scott E. Preece wrote:
| > policy decision to suspend is based on factors that are wholly different
| > than the factors that drive frequency/voltage changes. If that were the
| > case, then there would be no point to making the decisions in the same
| > place.  Honestly, I'm not sure of the answer to that...
| 
| I think the decision to suspend is made
| a) by the user,
| b) by a policy manager in case when, for example, the battery is running
| critical (ie. on emergency).
| and the decision to change a frequency/voltage is usually based on some
| efficiency factors.
| 
| Also, the suspend "transitions" are never transparent to the user and the
| changes of frequency/voltage usually are, at least as far as CPUs are
| concerned.
---

Your scope is too narrow. In our domain (mobile phones) the user has no
control at all over power management and decisions to suspend are always
transparent.

In our own implementation, the user-space policy manager initiates
frequency and voltage changes and enabled suspends, but doesn't actually
initiate them. That is, the policy manager says "based on current user
activity, it would be OK to suspend now", and a kernel component then
looks for a good time to do it, based on the system being idle.

We have thought about merging the decisions into a single range of
operating points, but the added plumbing to get idle information back to
the policy manager seemed unappealing.

We don't use cpufreq, so Pavel's arguments about not changing the kernel
interface weren't a concern, for us. We're using a kernel interface that
is all our own (and unappealingly ioctl-based).

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 23:00 Scott E. Preece
  2006-09-04  9:12 ` Pavel Machek
@ 2006-09-05 10:31 ` Rafael J. Wysocki
  1 sibling, 0 replies; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-05 10:31 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm, pavel

On Monday, 4 September 2006 01:00, Scott E. Preece wrote:
> 
> | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> | 
> | On Sunday, 3 September 2006 23:34, Scott E. Preece wrote:
> | > 
> | > | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> | > | 
> | > | On Sunday, 3 September 2006 18:25, David Singleton wrote:
> | > | > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> | > | > >
> | > | > > That depends on the definition, but I think of suspend states as the ones
> | > | > > that require processes to be frozen before they can be entered.  IMHO it is
> | > | > > quite clear that such states cannot be handled in the same way as those
> | > | > > that do not require the freezing of processes, so they are not the same.
> | > | > 
> | > | > You are correct, processes do need to be frozen before a suspend.
> | > | > That's the prepare to suspend part of the suspend process, and
> | > | > the transtition is the suspending and finish is the un-freezing
> | > | > of the processes to resume execution.
> | > | > 
> | > | > And those same steps are the same steps required to transition the
> | > | > system to a new operating point, whether it's suspend or change
> | > | > from 1.4GHz to 600MHz.
> | > | 
> | > | There are only a few states that require the processes to be frozen and I
> | > | think that's a good enough reason to handle them separately.
> | > 
> | > ---
> | > 
> | > But, surely that distinction can be handled in the implementation behind
> | > the interface, rather than exsposed in the interface.
> | 
> | I don't think you can handle that behind the interface in a satisfactory way.
> | For example during a suspend to disk we carry out several transitions of
> | devices within the suspend-resume cycle.
> | 
> | > Does that distinction matter to the policy manager?
> | 
> | I think so.
> | 
> | > I would argue that it 
> | > increases the latency, which would be important to the policy manager,
> | > but that the nature of the latency isn't important to making a policy
> | > decision,  and the proposed interface already exposes the latency as
> | > something that can be used in making transition decisions.
> | 
> | From the policy manager perspective it may be just a latency fator,
> | but for all of the things _outside_ of the policy manager it's much more
> | than that.
> | 
> | For example transitions like a CPU frequency change are transparent for kernel
> | threads, but the suspend "transitions" are not, because the kernel threads need
> | to be informed that the system is suspending and they are expected to freeze
> | themselves voluntarily.
> | 
> | Really, I think that the "states" which are entered only after tasks are
> | frozen should be considered as special and handled separately.
> ---
> 
> My point is that if the only kernel interface is set-op(), then the code
> in the kernel that implements set-op() is the thing that's going to
> drive the details of suspending the system, just as it does today.

It's not exactly correct in the case of the userland suspend when we have
a userland process that drives the suspend (eg. it writes the suspend image
to a storage).  In that case the kernel is only asked to performe some
well defined atomic actions and not the entire transition.

> The abstraction at the kernel interface is about as simple as it can be and
> all the policy issues are moved outside the kernel.
> 
> My question is whether there are aspects of suspending, other than
> latency, that the policy manager would need to consider in deciding
> whether to suspend or not.
> 
> Look at it this way. In one scheme the policy manager code is:
> 
>    new_OP = select_transition(current_OP, decision_factors);
>    set_OP(new_OP);
> 
> in the other the policy manager code is:
> 
>    new_OP = select_transition(current_OP, decision_factors);
>    if (new_OP == SUSPEND)
>       suspend();
>    else
>       set_OP(new_OP);
> 
> The only practical difference is whether the kernel has one interface or
> two; in the one-interface case, there's code in the kernel's
> implementation of set_OP() that does the same conditional and calls the
> same implementation of suspend. In Pavel's preferred idiom, the calls
> to set_OP() are replaced by a sequence of
> 
>    set_power_parameter(PARM, VALUE) calls
> 
> All dreadfully oversimplified, of course, but I know that the general
> approach is possible, because our PM subsystem works in a vaguely
> similar manner. The simplification isn't completely ignorable, though,
> because the mechanisms driving the transitions involve input from the
> kernel (entry to idle, interrupts, clock events, load information, etc.).
> The interaction between the kernel and the policy manager may actually
> be too complex to support doing all of policy management in user space
> (our implementation actually has some kernel bits and some user-spec
> bits). Not sure that affects the question of whether suspend is an
> operating point, though - that seems (to me) to work the same whether
> the policy decision is in the kernel or in user space.
> 
> The one question that I see as interesting on that score is whether the
> policy decision to suspend is based on factors that are wholly different
> than the factors that drive frequency/voltage changes. If that were the
> case, then there would be no point to making the decisions in the same
> place.  Honestly, I'm not sure of the answer to that...

I think the decision to suspend is made
a) by the user,
b) by a policy manager in case when, for example, the battery is running
critical (ie. on emergency).
and the decision to change a frequency/voltage is usually based on some
efficiency factors.

Also, the suspend "transitions" are never transparent to the user and the
changes of frequency/voltage usually are, at least as far as CPUs are
concerned.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-04 15:43 Scott E. Preece
  0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-04 15:43 UTC (permalink / raw)
  To: pavel; +Cc: scott.preece, matthew.a.locke, linux-pm

| From: Pavel Machek<pavel@ucw.cz>
| >... 
| > My question is whether there are aspects of suspending, other than
| > latency, that the policy manager would need to consider in deciding
| > whether to suspend or not.
| > 
| > Look at it this way. In one scheme the policy manager code is:
| > 
| >    new_OP = select_transition(current_OP, decision_factors);
| >    set_OP(new_OP);
| 
| No, it would be 
| 
|     new_OP = select_transition(current_OP, decision_factors);
| if (new_OP == SUSPEND) {
| 	setup wakeup events ...
| }
|     set_OP(new_OP);
---

Sorry; in our implementation, the devices are responsible for
configuring the wakeup events, either globally or in their suspend
routines, so it would look like my example code.  However, I would have
expected the setup of wakeup events to happen in the kernel's set_OP
implementation, rather than in the policy manager, anyway.

Again, this model puts the decision to change in user space and the
implementation of the decision in the kernel.

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 23:00 Scott E. Preece
@ 2006-09-04  9:12 ` Pavel Machek
  2006-09-05 10:31 ` Rafael J. Wysocki
  1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-04  9:12 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm

Hi!

> | Really, I think that the "states" which are entered only after tasks are
> | frozen should be considered as special and handled separately.
> ---
> 
> My point is that if the only kernel interface is set-op(), then the code
> in the kernel that implements set-op() is the thing that's going to
> drive the details of suspending the system, just as it does today. The
> abstraction at the kernel interface is about as simple as it can be and
> all the policy issues are moved outside the kernel.
> 
> My question is whether there are aspects of suspending, other than
> latency, that the policy manager would need to consider in deciding
> whether to suspend or not.
> 
> Look at it this way. In one scheme the policy manager code is:
> 
>    new_OP = select_transition(current_OP, decision_factors);
>    set_OP(new_OP);

No, it would be 

    new_OP = select_transition(current_OP, decision_factors);
if (new_OP == SUSPEND) {
	setup wakeup events ...
}
    set_OP(new_OP);


> in the other the policy manager code is:
> 
>    new_OP = select_transition(current_OP, decision_factors);
>    if (new_OP == SUSPEND)
>       suspend();
>    else
>       set_OP(new_OP);
...
> The one question that I see as interesting on that score is whether the
> policy decision to suspend is based on factors that are wholly different
> than the factors that drive frequency/voltage changes. If that were the
> case, then there would be no point to making the decisions in the same
> place.  Honestly, I'm not sure of the answer to that...

I'm pretty sure decision to suspend is other factors. Remember most
machines are non-functional during suspend.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 23:05 Scott E. Preece
@ 2006-09-04  9:09 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-04  9:09 UTC (permalink / raw)
  To: Scott E. Preece; +Cc: linux-pm

Hi!

> | > In the implementation we use, the initial OP is set by the boot
> | > loader. Our kernel power management driver is a module. The assumption
> | > is that booting involves a lot of processing and it makes sense to just
> | > use the highest OP anyway.  That makes sense in our environment, but I
> | > wouldn't recommend our version as a general solution, anyway.
> | 
> | That's actually regression; cpufreq saves power even while booting.
> ---
> 
> Well, the kernel boot phase is relatively brief. The module could be
> loaded and managing the OP before you start bringing up the rest of user
> space. However, I'm not arguing that our approach to that is generally
> applicable - I was just giving an existence proof.

powernow-k8 machines can't even run in highest OP point on battery
power.

> | > Hmm. I need to think about that. I guess the OP abstraction *could* be
> | > entirely inside the user-space policy manager, with the kernel exposing
> | > individual interfaces for every parameter that the policy manager would
> | > need to adjust. However, that means that the whole mess has to be at
> | > user level (kernel just implements simple knobs for individual
> | > parameters), because the dependency management between the parameters 
> | > would only be known at user level, and means a relatively bulky kernel
> | > interface, since it would expose more things at the interface. 
> | 
> | That would work for me.
> | 
> | But my idea was actually opposite: expose individual knobs to
> | userspace, and then select some operating point (inside kernel) that
> | satisfies given knobs.
> ---
> 
> But then the kernel needs to know about the operating points, so aren't
> you back where we were before?

Yes, it will be similar to your solution, but

a) I'll need not use oppoints on PCs where parameters are independend

and

b) I'll still get reasonable user<->kernel interface.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 22:40 Scott E. Preece
@ 2006-09-04  9:06 ` Pavel Machek
  2006-09-05 16:45   ` Mark Gross
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-09-04  9:06 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm

On Sun 2006-09-03 17:40:27, Scott E. Preece wrote:
> 
> | From: Pavel Machek<pavel@ucw.cz>
> | 
> | On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
> | > | From: Pavel Machek<pavel@ucw.cz>

> | > Not speaking to either of the current code submissions, I would say that
> | > having a kernel interface for defining OPs and a kernel interface for
> | > setting the OP, was a reasonably clean interface.
> | 
> | Well, me and Rafael disagree, and you do not really listen to
> | arguments. Now you can either fix the interface, or try to submit code
> | to lkml despite our NAKs. Go ahead and prepare for some flaming...
> ---
> 
> I think I'm listening to arguments just as much as you guys are! We just
> disagree. What are your criteria for "a clean interface"? Why do you
> think that n separate set-parameter() interfaces, with no consistency
> relationship between them, are cleaner than one define-op() and one
> set-op() interface?

Because we already have cpufreq-set-parameter() interface and
enter-suspend-state() interface. We can't really get rid of them.

If you add set-op() replacing both cpufreq-set-parameter() and
enter-suspend-state(), we'll end up with two different interfaces for
each interface; that's considered "mess".
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-03 23:05 Scott E. Preece
  2006-09-04  9:09 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 23:05 UTC (permalink / raw)
  To: pavel; +Cc: linux-pm


| From: Pavel Machek<pavel@ucw.cz>
| 
| On Sun 2006-09-03 17:31:02, Scott E. Preece wrote:
| > | From: Pavel Machek<pavel@ucw.cz>
| > | Cc: <amit.kucheria@nokia.com>, <linux-pm@lists.osdl.org>
| > | User-Agent: Mutt/1.5.11+cvs20060126
| > | 
| > | On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
| > | > | From: Pavel Machek<pavel@ucw.cz>
| > | > some reason, in your perception, why definition of operating points
| > | > really needs to be in the kernel?  Definition of the operating points,
| > | > as opposed to changing from one OP to another, shouldn't have any timing
| > | > issues, so why isn't a privileged user-space manager a reasonable
| > | > approach?
| > | 
| > | For one thing, is not powerop needed for boot? You need to boot in
| > | some operating point after all :-).
| > ---
| > 
| > In the implementation we use, the initial OP is set by the boot
| > loader. Our kernel power management driver is a module. The assumption
| > is that booting involves a lot of processing and it makes sense to just
| > use the highest OP anyway.  That makes sense in our environment, but I
| > wouldn't recommend our version as a general solution, anyway.
| 
| That's actually regression; cpufreq saves power even while booting.
---

Well, the kernel boot phase is relatively brief. The module could be
loaded and managing the OP before you start bringing up the rest of user
space. However, I'm not arguing that our approach to that is generally
applicable - I was just giving an existence proof.

scott

| 
| > | > As noted previously, OPs bundle together more than just the
| > | > frequency. Those of us supporting the OP model believe that you can't
| > | > intelligently change CPU frequency in isolation and you can't change
| > | > some of those parameters independently, because only certain
| > | > combinations work.
| > | 
| > | That's okay. User gives you combination he wants, and you select "next
| > | higher" working operating point.
| > ---
| > 
| > Hmm. I need to think about that. I guess the OP abstraction *could* be
| > entirely inside the user-space policy manager, with the kernel exposing
| > individual interfaces for every parameter that the policy manager would
| > need to adjust. However, that means that the whole mess has to be at
| > user level (kernel just implements simple knobs for individual
| > parameters), because the dependency management between the parameters 
| > would only be known at user level, and means a relatively bulky kernel
| > interface, since it would expose more things at the interface. 
| 
| That would work for me.
| 
| But my idea was actually opposite: expose individual knobs to
| userspace, and then select some operating point (inside kernel) that
| satisfies given knobs.
---

But then the kernel needs to know about the operating points, so aren't
you back where we were before?

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-03 23:00 Scott E. Preece
  2006-09-04  9:12 ` Pavel Machek
  2006-09-05 10:31 ` Rafael J. Wysocki
  0 siblings, 2 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 23:00 UTC (permalink / raw)
  To: rjw; +Cc: scott.preece, matthew.a.locke, linux-pm, pavel


| From: "Rafael J. Wysocki" <rjw@sisk.pl>
| 
| On Sunday, 3 September 2006 23:34, Scott E. Preece wrote:
| > 
| > | From: "Rafael J. Wysocki" <rjw@sisk.pl>
| > | 
| > | On Sunday, 3 September 2006 18:25, David Singleton wrote:
| > | > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
| > | > >
| > | > > That depends on the definition, but I think of suspend states as the ones
| > | > > that require processes to be frozen before they can be entered.  IMHO it is
| > | > > quite clear that such states cannot be handled in the same way as those
| > | > > that do not require the freezing of processes, so they are not the same.
| > | > 
| > | > You are correct, processes do need to be frozen before a suspend.
| > | > That's the prepare to suspend part of the suspend process, and
| > | > the transtition is the suspending and finish is the un-freezing
| > | > of the processes to resume execution.
| > | > 
| > | > And those same steps are the same steps required to transition the
| > | > system to a new operating point, whether it's suspend or change
| > | > from 1.4GHz to 600MHz.
| > | 
| > | There are only a few states that require the processes to be frozen and I
| > | think that's a good enough reason to handle them separately.
| > 
| > ---
| > 
| > But, surely that distinction can be handled in the implementation behind
| > the interface, rather than exsposed in the interface.
| 
| I don't think you can handle that behind the interface in a satisfactory way.
| For example during a suspend to disk we carry out several transitions of
| devices within the suspend-resume cycle.
| 
| > Does that distinction matter to the policy manager?
| 
| I think so.
| 
| > I would argue that it 
| > increases the latency, which would be important to the policy manager,
| > but that the nature of the latency isn't important to making a policy
| > decision,  and the proposed interface already exposes the latency as
| > something that can be used in making transition decisions.
| 
| From the policy manager perspective it may be just a latency fator,
| but for all of the things _outside_ of the policy manager it's much more
| than that.
| 
| For example transitions like a CPU frequency change are transparent for kernel
| threads, but the suspend "transitions" are not, because the kernel threads need
| to be informed that the system is suspending and they are expected to freeze
| themselves voluntarily.
| 
| Really, I think that the "states" which are entered only after tasks are
| frozen should be considered as special and handled separately.
---

My point is that if the only kernel interface is set-op(), then the code
in the kernel that implements set-op() is the thing that's going to
drive the details of suspending the system, just as it does today. The
abstraction at the kernel interface is about as simple as it can be and
all the policy issues are moved outside the kernel.

My question is whether there are aspects of suspending, other than
latency, that the policy manager would need to consider in deciding
whether to suspend or not.

Look at it this way. In one scheme the policy manager code is:

   new_OP = select_transition(current_OP, decision_factors);
   set_OP(new_OP);

in the other the policy manager code is:

   new_OP = select_transition(current_OP, decision_factors);
   if (new_OP == SUSPEND)
      suspend();
   else
      set_OP(new_OP);

The only practical difference is whether the kernel has one interface or
two; in the one-interface case, there's code in the kernel's
implementation of set_OP() that does the same conditional and calls the
same implementation of suspend. In Pavel's preferred idiom, the calls
to set_OP() are replaced by a sequence of

   set_power_parameter(PARM, VALUE) calls

All dreadfully oversimplified, of course, but I know that the general
approach is possible, because our PM subsystem works in a vaguely
similar manner. The simplification isn't completely ignorable, though,
because the mechanisms driving the transitions involve input from the
kernel (entry to idle, interrupts, clock events, load information, etc.).
The interaction between the kernel and the policy manager may actually
be too complex to support doing all of policy management in user space
(our implementation actually has some kernel bits and some user-spec
bits). Not sure that affects the question of whether suspend is an
operating point, though - that seems (to me) to work the same whether
the policy decision is in the kernel or in user space.

The one question that I see as interesting on that score is whether the
policy decision to suspend is based on factors that are wholly different
than the factors that drive frequency/voltage changes. If that were the
case, then there would be no point to making the decisions in the same
place.  Honestly, I'm not sure of the answer to that...

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 22:31 Scott E. Preece
@ 2006-09-03 22:41 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 22:41 UTC (permalink / raw)
  To: Scott E. Preece; +Cc: linux-pm

On Sun 2006-09-03 17:31:02, Scott E. Preece wrote:
> | From: Pavel Machek<pavel@ucw.cz>
> | Cc: <amit.kucheria@nokia.com>, <linux-pm@lists.osdl.org>
> | User-Agent: Mutt/1.5.11+cvs20060126
> | 
> | On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
> | > | From: Pavel Machek<pavel@ucw.cz>
> | > some reason, in your perception, why definition of operating points
> | > really needs to be in the kernel?  Definition of the operating points,
> | > as opposed to changing from one OP to another, shouldn't have any timing
> | > issues, so why isn't a privileged user-space manager a reasonable
> | > approach?
> | 
> | For one thing, is not powerop needed for boot? You need to boot in
> | some operating point after all :-).
> ---
> 
> In the implementation we use, the initial OP is set by the boot
> loader. Our kernel power management driver is a module. The assumption
> is that booting involves a lot of processing and it makes sense to just
> use the highest OP anyway.  That makes sense in our environment, but I
> wouldn't recommend our version as a general solution, anyway.

That's actually regression; cpufreq saves power even while booting.

> | > As noted previously, OPs bundle together more than just the
> | > frequency. Those of us supporting the OP model believe that you can't
> | > intelligently change CPU frequency in isolation and you can't change
> | > some of those parameters independently, because only certain
> | > combinations work.
> | 
> | That's okay. User gives you combination he wants, and you select "next
> | higher" working operating point.
> ---
> 
> Hmm. I need to think about that. I guess the OP abstraction *could* be
> entirely inside the user-space policy manager, with the kernel exposing
> individual interfaces for every parameter that the policy manager would
> need to adjust. However, that means that the whole mess has to be at
> user level (kernel just implements simple knobs for individual
> parameters), because the dependency management between the parameters 
> would only be known at user level, and means a relatively bulky kernel
> interface, since it would expose more things at the interface. 

That would work for me.

But my idea was actually opposite: expose individual knobs to
userspace, and then select some operating point (inside kernel) that
satisfies given knobs.

> On the other hand, that complexity has to be somewhere, so maybe that
> would be OK. As I said, I need to think about it...

Looking forward for new interface proposal ;-).

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-03 22:40 Scott E. Preece
  2006-09-04  9:06 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 22:40 UTC (permalink / raw)
  To: pavel; +Cc: scott.preece, matthew.a.locke, linux-pm


| From: Pavel Machek<pavel@ucw.cz>
| 
| On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
| > | From: Pavel Machek<pavel@ucw.cz>
| 
| > | > But, surely that distinction can be handled in the implementation behind
| > | > the interface, rather than exsposed in the interface.  Does that
| > | > distinction matter to the policy manager?  I would argue that it
| > | > increases the latency, which would be important to the policy manager,
| > | > but that the nature of the latency isn't important to making a policy
| > | > decision,  and the proposed interface already exposes the latency as
| > | > something that can be used in making transition decisions.
| > | 
| > | Are we talking about the same thing?
| > | 
| > | If policy manager decides to suspend-to-RAM, it will freeze
| > | itself. Puff, it is not running any more.
| > ---
| > 
| > Well, I assume the policy manager is telling something in the kernel to
| > actually set the operating point. Once it has made that request, it
| > doesn't need to run any longer.
| 
| And how will it tell the kernel to get back to some _operating_
| point? (As opposed to "off-suspended-to-disk"?)
| 
| You see, that interface even causes problems in our (human!)
| comunication. Some of operating points are not really operating!
---

You mean, how will it initiate the transition out of "suspended"? Well,
obviously, it wouldn't be able to do that until the machine
resumed. But, from the perspective of the policy manager, that doesn't
really matter - no time passes (from its perspective), it just starts
running again, receives some kind of wakeup event from the kernel, and
decides what transition should happen.

---
| 
| > | Of course, we could use same interface for both. No, it is not good
| > | idea. We want reasonably clean interface. If it means rewriting
| > | powerop two or three times... we'll need to do it.
| > ---
| > 
| > Not speaking to either of the current code submissions, I would say that
| > having a kernel interface for defining OPs and a kernel interface for
| > setting the OP, was a reasonably clean interface.
| 
| Well, me and Rafael disagree, and you do not really listen to
| arguments. Now you can either fix the interface, or try to submit code
| to lkml despite our NAKs. Go ahead and prepare for some flaming...
---

I think I'm listening to arguments just as much as you guys are! We just
disagree. What are your criteria for "a clean interface"? Why do you
think that n separate set-parameter() interfaces, with no consistency
relationship between them, are cleaner than one define-op() and one
set-op() interface?

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-03 22:31 Scott E. Preece
  2006-09-03 22:41 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 22:31 UTC (permalink / raw)
  To: pavel; +Cc: linux-pm

| From: Pavel Machek<pavel@ucw.cz>
| Cc: <amit.kucheria@nokia.com>, <linux-pm@lists.osdl.org>
| User-Agent: Mutt/1.5.11+cvs20060126
| 
| On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
| > | From: Pavel Machek<pavel@ucw.cz>
| > some reason, in your perception, why definition of operating points
| > really needs to be in the kernel?  Definition of the operating points,
| > as opposed to changing from one OP to another, shouldn't have any timing
| > issues, so why isn't a privileged user-space manager a reasonable
| > approach?
| 
| For one thing, is not powerop needed for boot? You need to boot in
| some operating point after all :-).
---

In the implementation we use, the initial OP is set by the boot
loader. Our kernel power management driver is a module. The assumption
is that booting involves a lot of processing and it makes sense to just
use the highest OP anyway.  That makes sense in our environment, but I
wouldn't recommend our version as a general solution, anyway.

---
| 
| Yes, I see having points defined in userspace is useful for debugging,
| but having kernel depend on external daemon for its proper operation
| is not nice.
| 
---

Again, as long as the kernel comes up in some OP it should run
properly. If there's any goal of moving policy out of the kernel, the
kernel is going to have to depend on user-space to support optimal
operation, but the kernel should operate correctly, if non-optimally,
without it.

---
| > | > The only other interface is the actually setting of a (named) operating
| > | > point and that is _required_ to do anything useful.
| > | 
| > | No, they are not.
| > | 
| > | We already have interface for selecting cpu frequency. Lets keep it.
| > 
| > As noted previously, OPs bundle together more than just the
| > frequency. Those of us supporting the OP model believe that you can't
| > intelligently change CPU frequency in isolation and you can't change
| > some of those parameters independently, because only certain
| > combinations work.
| 
| That's okay. User gives you combination he wants, and you select "next
| higher" working operating point.
---

Hmm. I need to think about that. I guess the OP abstraction *could* be
entirely inside the user-space policy manager, with the kernel exposing
individual interfaces for every parameter that the policy manager would
need to adjust. However, that means that the whole mess has to be at
user level (kernel just implements simple knobs for individual
parameters), because the dependency management between the parameters 
would only be known at user level, and means a relatively bulky kernel
interface, since it would expose more things at the interface. 

On the other hand, that complexity has to be somewhere, so maybe that
would be OK. As I said, I need to think about it...

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 22:12 Scott E. Preece
@ 2006-09-03 22:25 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 22:25 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm

On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
> | From: Pavel Machek<pavel@ucw.cz>

> | > But, surely that distinction can be handled in the implementation behind
> | > the interface, rather than exsposed in the interface.  Does that
> | > distinction matter to the policy manager?  I would argue that it
> | > increases the latency, which would be important to the policy manager,
> | > but that the nature of the latency isn't important to making a policy
> | > decision,  and the proposed interface already exposes the latency as
> | > something that can be used in making transition decisions.
> | 
> | Are we talking about the same thing?
> | 
> | If policy manager decides to suspend-to-RAM, it will freeze
> | itself. Puff, it is not running any more.
> ---
> 
> Well, I assume the policy manager is telling something in the kernel to
> actually set the operating point. Once it has made that request, it
> doesn't need to run any longer.

And how will it tell the kernel to get back to some _operating_
point? (As opposed to "off-suspended-to-disk"?)

You see, that interface even causes problems in our (human!)
comunication. Some of operating points are not really operating!

> | Of course, we could use same interface for both. No, it is not good
> | idea. We want reasonably clean interface. If it means rewriting
> | powerop two or three times... we'll need to do it.
> ---
> 
> Not speaking to either of the current code submissions, I would say that
> having a kernel interface for defining OPs and a kernel interface for
> setting the OP, was a reasonably clean interface.

Well, me and Rafael disagree, and you do not really listen to
arguments. Now you can either fix the interface, or try to submit code
to lkml despite our NAKs. Go ahead and prepare for some flaming...

(But I'd rather have you fix the interface.)

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-03 22:12 Scott E. Preece
  2006-09-03 22:25 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 22:12 UTC (permalink / raw)
  To: pavel; +Cc: scott.preece, matthew.a.locke, linux-pm



| From: Pavel Machek<pavel@ucw.cz>
| 
| Hi!
| 
| > | > > That depends on the definition, but I think of suspend states as the ones
| > | > > that require processes to be frozen before they can be entered.  IMHO it is
| > | > > quite clear that such states cannot be handled in the same way as those
| > | > > that do not require the freezing of processes, so they are not the same.
| > | > 
| > | > You are correct, processes do need to be frozen before a suspend.
| > | > That's the prepare to suspend part of the suspend process, and
| > | > the transtition is the suspending and finish is the un-freezing
| > | > of the processes to resume execution.
| > | > 
| > | > And those same steps are the same steps required to transition the
| > | > system to a new operating point, whether it's suspend or change
| > | > from 1.4GHz to 600MHz.
| > | 
| > | There are only a few states that require the processes to be frozen and I
| > | think that's a good enough reason to handle them separately.
| > 
| > ---
| > 
| > But, surely that distinction can be handled in the implementation behind
| > the interface, rather than exsposed in the interface.  Does that
| > distinction matter to the policy manager?  I would argue that it
| > increases the latency, which would be important to the policy manager,
| > but that the nature of the latency isn't important to making a policy
| > decision,  and the proposed interface already exposes the latency as
| > something that can be used in making transition decisions.
| 
| Are we talking about the same thing?
| 
| If policy manager decides to suspend-to-RAM, it will freeze
| itself. Puff, it is not running any more.
---

Well, I assume the policy manager is telling something in the kernel to
actually set the operating point. Once it has made that request, it
doesn't need to run any longer.

---
| 
| Yes, it is important that interfaces are different. Would you argue
| for using same interface for slowing down machine and for turning
| machine off?
| 
| And suspend-to-disk *is* turning machine off.
---

An interesting question. While it's turning the machine off, it's not
turning it off in the same sense as shutdown, because otherwise you
wouldn't come back via resume.

In any case, I could imagine OFF being another point in the operating
point continuum, except that it's not something I would expect to be
part of the range available to a policy manager (probably; I guess there
are emergency situations where the policy manager might want to shut the
machine down).

---
| 
| Of course, we could use same interface for both. No, it is not good
| idea. We want reasonably clean interface. If it means rewriting
| powerop two or three times... we'll need to do it.
---

Not speaking to either of the current code submissions, I would say that
having a kernel interface for defining OPs and a kernel interface for
setting the OP, was a reasonably clean interface.

scott

-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 21:34 Scott E. Preece
  2006-09-03 21:43 ` Pavel Machek
@ 2006-09-03 22:10 ` Rafael J. Wysocki
  1 sibling, 0 replies; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-03 22:10 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm, pavel

On Sunday, 3 September 2006 23:34, Scott E. Preece wrote:
> 
> | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> | 
> | On Sunday, 3 September 2006 18:25, David Singleton wrote:
> | > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> | > >
> | > > That depends on the definition, but I think of suspend states as the ones
> | > > that require processes to be frozen before they can be entered.  IMHO it is
> | > > quite clear that such states cannot be handled in the same way as those
> | > > that do not require the freezing of processes, so they are not the same.
> | > 
> | > You are correct, processes do need to be frozen before a suspend.
> | > That's the prepare to suspend part of the suspend process, and
> | > the transtition is the suspending and finish is the un-freezing
> | > of the processes to resume execution.
> | > 
> | > And those same steps are the same steps required to transition the
> | > system to a new operating point, whether it's suspend or change
> | > from 1.4GHz to 600MHz.
> | 
> | There are only a few states that require the processes to be frozen and I
> | think that's a good enough reason to handle them separately.
> 
> ---
> 
> But, surely that distinction can be handled in the implementation behind
> the interface, rather than exsposed in the interface.

I don't think you can handle that behind the interface in a satisfactory way.
For example during a suspend to disk we carry out several transitions of
devices within the suspend-resume cycle.

> Does that distinction matter to the policy manager?

I think so.

> I would argue that it 
> increases the latency, which would be important to the policy manager,
> but that the nature of the latency isn't important to making a policy
> decision,  and the proposed interface already exposes the latency as
> something that can be used in making transition decisions.

>From the policy manager perspective it may be just a latency fator,
but for all of the things _outside_ of the policy manager it's much more
than that.

For example transitions like a CPU frequency change are transparent for kernel
threads, but the suspend "transitions" are not, because the kernel threads need
to be informed that the system is suspending and they are expected to freeze
themselves voluntarily.

Really, I think that the "states" which are entered only after tasks are
frozen should be considered as special and handled separately.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 21:21 Scott E. Preece
@ 2006-09-03 21:54 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 21:54 UTC (permalink / raw)
  To: Scott E. Preece; +Cc: linux-pm

On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
> | From: Pavel Machek<pavel@ucw.cz>
> | On Thu 2006-08-31 16:44:12, Amit Kucheria wrote:
> | > On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
> | > > On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
> | > > > But PowerOP would allow SoC-based systems to tune the operating points
> | > > > to get the most out of their top-10 use-cases and sleep modes.
> | > > 
> | > > Question is: can we get similar savings without ugly interface powerop
> | > > presents?
> | > 
> | > If I have understood correctly, your main objection is to defining new
> | > operating points from userspace?
> | 
> | Well, that is big objection, but not my main one. I believe that "new
> | operating points from userspace" are non-starter. "So obviously wrong
> | that noone would merge that".
> ---
> 
> Why? Are you interpreting "from user space" as "under user control"? A
> lot of us have been taught for some time that it's a good thing to move
> stuff out of the kernel, unless it really needs to be there. Is
> there

Moving stuff out of kernel is one important design principe. Keeping
user<->kernel interface reasonably clean is another one.

> some reason, in your perception, why definition of operating points
> really needs to be in the kernel?  Definition of the operating points,
> as opposed to changing from one OP to another, shouldn't have any timing
> issues, so why isn't a privileged user-space manager a reasonable
> approach?

For one thing, is not powerop needed for boot? You need to boot in
some operating point after all :-).

Yes, I see having points defined in userspace is useful for debugging,
but having kernel depend on external daemon for its proper operation
is not nice.

> | > The only other interface is the actually setting of a (named) operating
> | > point and that is _required_ to do anything useful.
> | 
> | No, they are not.
> | 
> | We already have interface for selecting cpu frequency. Lets keep it.
> 
> As noted previously, OPs bundle together more than just the
> frequency. Those of us supporting the OP model believe that you can't
> intelligently change CPU frequency in isolation and you can't change
> some of those parameters independently, because only certain
> combinations work.

That's okay. User gives you combination he wants, and you select "next
higher" working operating point.

> | Now, it should be up-to the powerop framework to select best operating
> | point given "cpu speed, dsp speed, usb on/off" state. But I argue that
> | this should be done in-kernel and hidden from user.
> 
> Well, I agree with hiding it from the user, but there's no particular
> reason that means it needs to be done in the kernel. Again, we like to
> have it run from user-space, so we can replace it easily (without
> recompiling/restarting the kernel) in development.

Do whatever you want for development (that includes patching your
kernel). For production, nice interface is more important.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 21:34 Scott E. Preece
@ 2006-09-03 21:43 ` Pavel Machek
  2006-09-03 22:10 ` Rafael J. Wysocki
  1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 21:43 UTC (permalink / raw)
  Cc: scott.preece, matthew.a.locke, linux-pm

Hi!

> | > > That depends on the definition, but I think of suspend states as the ones
> | > > that require processes to be frozen before they can be entered.  IMHO it is
> | > > quite clear that such states cannot be handled in the same way as those
> | > > that do not require the freezing of processes, so they are not the same.
> | > 
> | > You are correct, processes do need to be frozen before a suspend.
> | > That's the prepare to suspend part of the suspend process, and
> | > the transtition is the suspending and finish is the un-freezing
> | > of the processes to resume execution.
> | > 
> | > And those same steps are the same steps required to transition the
> | > system to a new operating point, whether it's suspend or change
> | > from 1.4GHz to 600MHz.
> | 
> | There are only a few states that require the processes to be frozen and I
> | think that's a good enough reason to handle them separately.
> 
> ---
> 
> But, surely that distinction can be handled in the implementation behind
> the interface, rather than exsposed in the interface.  Does that
> distinction matter to the policy manager?  I would argue that it
> increases the latency, which would be important to the policy manager,
> but that the nature of the latency isn't important to making a policy
> decision,  and the proposed interface already exposes the latency as
> something that can be used in making transition decisions.

Are we talking about the same thing?

If policy manager decides to suspend-to-RAM, it will freeze
itself. Puff, it is not running any more.

Yes, it is important that interfaces are different. Would you argue
for using same interface for slowing down machine and for turning
machine off?

And suspend-to-disk *is* turning machine off.

Of course, we could use same interface for both. No, it is not good
idea. We want reasonably clean interface. If it means rewriting
powerop two or three times... we'll need to do it.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-03 21:34 Scott E. Preece
  2006-09-03 21:43 ` Pavel Machek
  2006-09-03 22:10 ` Rafael J. Wysocki
  0 siblings, 2 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 21:34 UTC (permalink / raw)
  To: rjw; +Cc: scott.preece, matthew.a.locke, linux-pm, pavel


| From: "Rafael J. Wysocki" <rjw@sisk.pl>
| 
| On Sunday, 3 September 2006 18:25, David Singleton wrote:
| > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
| > >
| > > That depends on the definition, but I think of suspend states as the ones
| > > that require processes to be frozen before they can be entered.  IMHO it is
| > > quite clear that such states cannot be handled in the same way as those
| > > that do not require the freezing of processes, so they are not the same.
| > 
| > You are correct, processes do need to be frozen before a suspend.
| > That's the prepare to suspend part of the suspend process, and
| > the transtition is the suspending and finish is the un-freezing
| > of the processes to resume execution.
| > 
| > And those same steps are the same steps required to transition the
| > system to a new operating point, whether it's suspend or change
| > from 1.4GHz to 600MHz.
| 
| There are only a few states that require the processes to be frozen and I
| think that's a good enough reason to handle them separately.

---

But, surely that distinction can be handled in the implementation behind
the interface, rather than exsposed in the interface.  Does that
distinction matter to the policy manager?  I would argue that it
increases the latency, which would be important to the policy manager,
but that the nature of the latency isn't important to making a policy
decision,  and the proposed interface already exposes the latency as
something that can be used in making transition decisions.

scott

-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 16:25                                   ` David Singleton
  2006-09-03 20:57                                     ` Rafael J. Wysocki
@ 2006-09-03 21:33                                     ` Pavel Machek
  2006-09-09  0:39                                       ` David Singleton
  1 sibling, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 21:33 UTC (permalink / raw)
  To: David Singleton; +Cc: Preece Scott-PREECE, Matthew Locke, linux-pm

> >That depends on the definition, but I think of suspend states as the ones
> >that require processes to be frozen before they can be entered.  IMHO it is
> >quite clear that such states cannot be handled in the same way as those
> >that do not require the freezing of processes, so they are not the same.
> 
> You are correct, processes do need to be frozen before a suspend.
> That's the prepare to suspend part of the suspend process, and
> the transtition is the suspending and finish is the un-freezing
> of the processes to resume execution.
> 
> And those same steps are the same steps required to transition the
> system to a new operating point, whether it's suspend or change
> from 1.4GHz to 600MHz.

No, processes are not frozen for simple cpu frequency change -- on
non-broken cpus.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-03 21:21 Scott E. Preece
  2006-09-03 21:54 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 21:21 UTC (permalink / raw)
  To: pavel; +Cc: linux-pm



| From: Pavel Machek<pavel@ucw.cz>
| 
| Hi!
| 
| On Thu 2006-08-31 16:44:12, Amit Kucheria wrote:
| > On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
| > > On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
| > > > But PowerOP would allow SoC-based systems to tune the operating points
| > > > to get the most out of their top-10 use-cases and sleep modes.
| > > 
| > > Question is: can we get similar savings without ugly interface powerop
| > > presents?
| > 
| > If I have understood correctly, your main objection is to defining new
| > operating points from userspace?
| 
| Well, that is big objection, but not my main one. I believe that "new
| operating points from userspace" are non-starter. "So obviously wrong
| that noone would merge that".
---

Why? Are you interpreting "from user space" as "under user control"? A
lot of us have been taught for some time that it's a good thing to move
stuff out of the kernel, unless it really needs to be there. Is there
some reason, in your perception, why definition of operating points
really needs to be in the kernel?  Definition of the operating points,
as opposed to changing from one OP to another, shouldn't have any timing
issues, so why isn't a privileged user-space manager a reasonable
approach?

---
| 
| > The only other interface is the actually setting of a (named) operating
| > point and that is _required_ to do anything useful.
| 
| No, they are not.
| 
| We already have interface for selecting cpu frequency. Lets keep it.
---

As noted previously, OPs bundle together more than just the
frequency. Those of us supporting the OP model believe that you can't
intelligently change CPU frequency in isolation and you can't change
some of those parameters independently, because only certain
combinations work.

---
| ...
| Now, it should be up-to the powerop framework to select best operating
| point given "cpu speed, dsp speed, usb on/off" state. But I argue that
| this should be done in-kernel and hidden from user.
---

Well, I agree with hiding it from the user, but there's no particular
reason that means it needs to be done in the kernel. Again, we like to
have it run from user-space, so we can replace it easily (without
recompiling/restarting the kernel) in development.

scott

-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-03 16:25                                   ` David Singleton
@ 2006-09-03 20:57                                     ` Rafael J. Wysocki
  2006-09-03 21:33                                     ` Pavel Machek
  1 sibling, 0 replies; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-03 20:57 UTC (permalink / raw)
  To: David Singleton
  Cc: Preece Scott-PREECE, Matthew Locke, linux-pm, Pavel Machek

On Sunday, 3 September 2006 18:25, David Singleton wrote:
> On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Saturday 02 September 2006 20:05, David Singleton wrote:
> > > On 8/29/06, Matthew Locke <matthew.a.locke@comcast.net> wrote:
> > > >
> > > > On Aug 29, 2006, at 10:49 AM, Preece Scott-PREECE wrote:
> > > >
> > > > >
> > > > >> From: linux-pm-bounces@lists.osdl.org
> > > > >> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Pavel Machek
> > > > >> Sent: Tuesday, August 29, 2006 11:35 AM
> > > > >> To: David Singleton
> > > > >> Cc: linux-pm@lists.osdl.org
> > > > >> Subject: Re: [linux-pm] So, what's the status on the recent
> > > > >> patches here?
> > > > >>
> > > > >> Hi!
> > > > >>>>>         point, by name. There is a new
> > > > >>>> /sys/power/operating_points directory
> > > > >>>>>         that shows all the operating points the
> > > > >>>> system supports. An
> > > > >>>>>         exampled from my centrino laptop shows:
> > > > >>>>>
> > > > >>>>>         /sys/power/operating_points/high
> > > > >>>>>         /sys/power/operating_points/highest
> > > > >>>>>         /sys/power/operating_points/low
> > > > >>>>>         /sys/power/operating_points/lowest
> > > > >>>>>         /sys/power/operating_points/medium
> > > > >>>>>         /sys/power/operating_points/mem
> > > > >>>>>         /sys/power/operating_points/standby
> > > > >>>>
> > > > >>>> What makes you think that mixing operating and sleep
> > > > >> states is good
> > > > >>>> idea?
> > > > >>>
> > > > >>> They are all power states managed by the kernel and in the
> > > > >> operating
> > > > >>> point concept they are all operating points the system supports.
> > > > >>
> > > > >> That does not make mixing them right.
> > > > > ---
> > > > >
> > > > > Could you say why you think they shouldn't be mixed? Absent argument to
> > > > > the contrary,
> > > > > making it a single continuum seems appealing. Why have separate
> > > > > policies?
> > > >
> > > > I know this questions is directed at Pavel but I have similar concerns.
> > > >    I agree that making sleep states into operating points is appealing.
> > > > However,  if the implementation is just going to special case the sleep
> > > > state operating points then they should be handled separately.  As
> > > > Pavel points out, you can see from Dave's implementation that the
> > > > operating point definition doesn't quite work for both.   Voltage and
> > > > frequency don't have meaning for the sleep points.
> > >
> > > Actually what I was trying, unsuccessfully, to explain was that
> > > suspend states are valid, supported operating states the
> > > system can be in for power management.    And that they are the
> > > same as an operating point for a processor frequency.
> >
> > That depends on the definition, but I think of suspend states as the ones
> > that require processes to be frozen before they can be entered.  IMHO it is
> > quite clear that such states cannot be handled in the same way as those
> > that do not require the freezing of processes, so they are not the same.
> 
> You are correct, processes do need to be frozen before a suspend.
> That's the prepare to suspend part of the suspend process, and
> the transtition is the suspending and finish is the un-freezing
> of the processes to resume execution.
> 
> And those same steps are the same steps required to transition the
> system to a new operating point, whether it's suspend or change
> from 1.4GHz to 600MHz.

There are only a few states that require the processes to be frozen and I
think that's a good enough reason to handle them separately.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-02 19:30                                 ` Rafael J. Wysocki
@ 2006-09-03 16:25                                   ` David Singleton
  2006-09-03 20:57                                     ` Rafael J. Wysocki
  2006-09-03 21:33                                     ` Pavel Machek
  0 siblings, 2 replies; 136+ messages in thread
From: David Singleton @ 2006-09-03 16:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Preece Scott-PREECE, Matthew Locke, linux-pm, Pavel Machek

On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Saturday 02 September 2006 20:05, David Singleton wrote:
> > On 8/29/06, Matthew Locke <matthew.a.locke@comcast.net> wrote:
> > >
> > > On Aug 29, 2006, at 10:49 AM, Preece Scott-PREECE wrote:
> > >
> > > >
> > > >> From: linux-pm-bounces@lists.osdl.org
> > > >> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Pavel Machek
> > > >> Sent: Tuesday, August 29, 2006 11:35 AM
> > > >> To: David Singleton
> > > >> Cc: linux-pm@lists.osdl.org
> > > >> Subject: Re: [linux-pm] So, what's the status on the recent
> > > >> patches here?
> > > >>
> > > >> Hi!
> > > >>>>>         point, by name. There is a new
> > > >>>> /sys/power/operating_points directory
> > > >>>>>         that shows all the operating points the
> > > >>>> system supports. An
> > > >>>>>         exampled from my centrino laptop shows:
> > > >>>>>
> > > >>>>>         /sys/power/operating_points/high
> > > >>>>>         /sys/power/operating_points/highest
> > > >>>>>         /sys/power/operating_points/low
> > > >>>>>         /sys/power/operating_points/lowest
> > > >>>>>         /sys/power/operating_points/medium
> > > >>>>>         /sys/power/operating_points/mem
> > > >>>>>         /sys/power/operating_points/standby
> > > >>>>
> > > >>>> What makes you think that mixing operating and sleep
> > > >> states is good
> > > >>>> idea?
> > > >>>
> > > >>> They are all power states managed by the kernel and in the
> > > >> operating
> > > >>> point concept they are all operating points the system supports.
> > > >>
> > > >> That does not make mixing them right.
> > > > ---
> > > >
> > > > Could you say why you think they shouldn't be mixed? Absent argument to
> > > > the contrary,
> > > > making it a single continuum seems appealing. Why have separate
> > > > policies?
> > >
> > > I know this questions is directed at Pavel but I have similar concerns.
> > >    I agree that making sleep states into operating points is appealing.
> > > However,  if the implementation is just going to special case the sleep
> > > state operating points then they should be handled separately.  As
> > > Pavel points out, you can see from Dave's implementation that the
> > > operating point definition doesn't quite work for both.   Voltage and
> > > frequency don't have meaning for the sleep points.
> >
> > Actually what I was trying, unsuccessfully, to explain was that
> > suspend states are valid, supported operating states the
> > system can be in for power management.    And that they are the
> > same as an operating point for a processor frequency.
>
> That depends on the definition, but I think of suspend states as the ones
> that require processes to be frozen before they can be entered.  IMHO it is
> quite clear that such states cannot be handled in the same way as those
> that do not require the freezing of processes, so they are not the same.

You are correct, processes do need to be frozen before a suspend.
That's the prepare to suspend part of the suspend process, and
the transtition is the suspending and finish is the un-freezing
of the processes to resume execution.

And those same steps are the same steps required to transition the
system to a new operating point, whether it's suspend or change
from 1.4GHz to 600MHz.

>
> Greetings,
> Rafael
>
>
> --
> You never change things by fighting the existing reality.
>                 R. Buckminster Fuller
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-09-02 18:05                               ` David Singleton
@ 2006-09-02 19:30                                 ` Rafael J. Wysocki
  2006-09-03 16:25                                   ` David Singleton
  0 siblings, 1 reply; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-02 19:30 UTC (permalink / raw)
  To: David Singleton
  Cc: Preece Scott-PREECE, Matthew Locke, linux-pm, Pavel Machek

On Saturday 02 September 2006 20:05, David Singleton wrote:
> On 8/29/06, Matthew Locke <matthew.a.locke@comcast.net> wrote:
> >
> > On Aug 29, 2006, at 10:49 AM, Preece Scott-PREECE wrote:
> >
> > >
> > >> From: linux-pm-bounces@lists.osdl.org
> > >> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Pavel Machek
> > >> Sent: Tuesday, August 29, 2006 11:35 AM
> > >> To: David Singleton
> > >> Cc: linux-pm@lists.osdl.org
> > >> Subject: Re: [linux-pm] So, what's the status on the recent
> > >> patches here?
> > >>
> > >> Hi!
> > >>>>>         point, by name. There is a new
> > >>>> /sys/power/operating_points directory
> > >>>>>         that shows all the operating points the
> > >>>> system supports. An
> > >>>>>         exampled from my centrino laptop shows:
> > >>>>>
> > >>>>>         /sys/power/operating_points/high
> > >>>>>         /sys/power/operating_points/highest
> > >>>>>         /sys/power/operating_points/low
> > >>>>>         /sys/power/operating_points/lowest
> > >>>>>         /sys/power/operating_points/medium
> > >>>>>         /sys/power/operating_points/mem
> > >>>>>         /sys/power/operating_points/standby
> > >>>>
> > >>>> What makes you think that mixing operating and sleep
> > >> states is good
> > >>>> idea?
> > >>>
> > >>> They are all power states managed by the kernel and in the
> > >> operating
> > >>> point concept they are all operating points the system supports.
> > >>
> > >> That does not make mixing them right.
> > > ---
> > >
> > > Could you say why you think they shouldn't be mixed? Absent argument to
> > > the contrary,
> > > making it a single continuum seems appealing. Why have separate
> > > policies?
> >
> > I know this questions is directed at Pavel but I have similar concerns.
> >    I agree that making sleep states into operating points is appealing.
> > However,  if the implementation is just going to special case the sleep
> > state operating points then they should be handled separately.  As
> > Pavel points out, you can see from Dave's implementation that the
> > operating point definition doesn't quite work for both.   Voltage and
> > frequency don't have meaning for the sleep points.
> 
> Actually what I was trying, unsuccessfully, to explain was that
> suspend states are valid, supported operating states the
> system can be in for power management.    And that they are the
> same as an operating point for a processor frequency.

That depends on the definition, but I think of suspend states as the ones
that require processes to be frozen before they can be entered.  IMHO it is
quite clear that such states cannot be handled in the same way as those
that do not require the freezing of processes, so they are not the same.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
		R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30  6:20                             ` Matthew Locke
  2006-08-30 13:26                               ` Preece Scott-PREECE
@ 2006-09-02 18:05                               ` David Singleton
  2006-09-02 19:30                                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 136+ messages in thread
From: David Singleton @ 2006-09-02 18:05 UTC (permalink / raw)
  To: Matthew Locke; +Cc: linux-pm, Preece Scott-PREECE, Pavel Machek

On 8/29/06, Matthew Locke <matthew.a.locke@comcast.net> wrote:
>
> On Aug 29, 2006, at 10:49 AM, Preece Scott-PREECE wrote:
>
> >
> >> From: linux-pm-bounces@lists.osdl.org
> >> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Pavel Machek
> >> Sent: Tuesday, August 29, 2006 11:35 AM
> >> To: David Singleton
> >> Cc: linux-pm@lists.osdl.org
> >> Subject: Re: [linux-pm] So, what's the status on the recent
> >> patches here?
> >>
> >> Hi!
> >>>>>         point, by name. There is a new
> >>>> /sys/power/operating_points directory
> >>>>>         that shows all the operating points the
> >>>> system supports. An
> >>>>>         exampled from my centrino laptop shows:
> >>>>>
> >>>>>         /sys/power/operating_points/high
> >>>>>         /sys/power/operating_points/highest
> >>>>>         /sys/power/operating_points/low
> >>>>>         /sys/power/operating_points/lowest
> >>>>>         /sys/power/operating_points/medium
> >>>>>         /sys/power/operating_points/mem
> >>>>>         /sys/power/operating_points/standby
> >>>>
> >>>> What makes you think that mixing operating and sleep
> >> states is good
> >>>> idea?
> >>>
> >>> They are all power states managed by the kernel and in the
> >> operating
> >>> point concept they are all operating points the system supports.
> >>
> >> That does not make mixing them right.
> > ---
> >
> > Could you say why you think they shouldn't be mixed? Absent argument to
> > the contrary,
> > making it a single continuum seems appealing. Why have separate
> > policies?
>
> I know this questions is directed at Pavel but I have similar concerns.
>    I agree that making sleep states into operating points is appealing.
> However,  if the implementation is just going to special case the sleep
> state operating points then they should be handled separately.  As
> Pavel points out, you can see from Dave's implementation that the
> operating point definition doesn't quite work for both.   Voltage and
> frequency don't have meaning for the sleep points.

Actually what I was trying, unsuccessfully, to explain was that
suspend states are valid, supported operating states the
system can be in for power management.    And that they are the
same as an operating point for a processor frequency.

Nothing is being 'mixed' because they are all the same thing;
an operating point the system can be in from a power management
perspective.

And since they are are valid operating points they should be in
the same power management framework and operate
through the same interfaces.

And voltage and frequency (and latency) do matter to suspend states.
Systems that support multiple suspend states have different
power consumption levels (hence the voltage requirement) associated
with the different suspend states.

And different suspend states have latencies to get into
and the power manager needs to know all of this information.

The power manager must know about the voltage values
to determine which state consumes the most or least power.

And the power manager must know about the latencies for different
states so it can decide when best to use them.  If a suspend state
takes too long to enter for the current operation of they system then
the power manager may choose to just move to a lower power consumption
operating point.


>The per point
> transition callbacks are needed just to handle the sleep points.

Not true.  The call backs are also needed when transitioning
to a new processor frequency operating point that will affect device state.

 The prepare to transition and finish transition callbacks
are needed for transitions between all operating points, whether
a suspend or processor frequency point.  Which is why I'm using
the existing cpufreq scaling callbacks and transitioning between
both supported processor frequencies and suspend states
is working on different architectures for OpPoint now.

> Latency has a very different meaning between the two types.


No, latency means the same thing for all operating points.  It's the time
it takes to transition into the state (and out of as well).

The power manager needs to know that time it takes to transition into
both a new cpu frequency and a suspend state.  If the suspend state
latency is too long the power manager may choose to just go to
a lower frequency operating point.

And when devices can export their latencies to transition into an idle
or suspended state the design and model still works.

And in fact the power manager can then better create policies which contain
an operating point for the entire system and a set of device states,
which is the definition of a policy.  And where the huge combination of
possible policies comes from.

When the power manager understands the frequency, voltage and
latency for each supported operating point and the latencies for
idling devices it can then operate in the most efficient manner
and create the best power management polices and classes of
policies for the system.




>  Also,  the
> type field is required to identify which points are sleep points and
> which ones are operating points.

They are all operating points.  They are all valid, supported states the system
can be in.  And of course they should be in the same power management
framework and work through the same interfaces.

>  I think the concept of using
> operating points to define sleep states could be a valid one but the
> implementation provided isn't quite right (yet).
>
> However, this should not detract from mainlining PowerOP.  Integrating
> sleep and operating points is not required to use PowerOP.


>
> >
> > ---
> >>
> >>> The system can be set to any of the supported states by
> >> setting their
> >>> name in the /sys/power/state file.  I find simplicity is usually a
> >>> good thing.
> >>
> >> I believe the quote is 'make it as simple as possible but not
> >> simpler'.
> > ---
> >
> > So, why don't you think this simplification is possible?
> >
> > ---
> >>
> >>>> And '600MHz' makes lot more sense than 'lowest' on centrino.
> >>>
> >>> Perhaps, but the common name space makes it easy for the
> >> power manager
> >>> daemon to perform the same functions without having to know
> >> that the
> >>> lowest speed on my laptop is 600Mhz.
> >>
> >> And enumerate english strings in power daemon? Limiting the
> >> numver of states?
> > ---
> >
> > Are you saying that on your laptop, all possible CPU and bus
> > frequencies
> >
> > can be used independently? So, it would be unnecessarily limiting to
> > have
> > the system designer provide a list of combinations that work? Remember
> > that
> > the scope of this is a limited set of parameters, not all the devices
> > in
> > the system.
> >
> > ---
> >>
> >>>>
> >>>>>         /sys/power/operating_points/high/frequency
> >>>>>         /sys/power/operating_points/high/voltage
> >>>>>         /sys/power/operating_points/high/latency
> >>>>
> >>>> What is voltage for 'mem'?
> >>>
> >>> I don't know what the voltage or latency is for mem.
> >>> Perhaps Intel could better
> >>> say what the voltage is in the suspend state and what the
> >> latency was
> >>> for transistion to that state.  I didn't have the data
> >> available when
> >>> I wrote the code.
> >>
> >> And you will not have data available even if intel helps you.
> >> What is _frequency_ for mem? These fields are meaningless for
> >> sleep states; that should tell you that mixing sleep and
> >> operating states is bad idea.
> > ---
> >
> > Why isn't 0 a meaningful value for frequency? And I can imagine
> > that some hardware might have different voltage options for sleep
> > States.  Additionally, these sys entries could represent the frequency,
> > voltage, etc., that the system would go to upon resuming from sleep...
>
> I think its more an issue with the implementation rather than the
> concept.  We can't just have frequency and voltage as shown in oppoint
> patches because we don't know which frequency and voltage they refer
> to.  The number of frequency and voltage parameters are completely
> dependent on the hardware platform.
>
> You bring up a more interesting area to look at integrating sleep
> states and operating points.  Can we add some hooks that allow us to
> define which operating point is selected at resume?  Need to think
> about that a bit:)
>
> >
> > scott
> >
> > _______________________________________________
> > linux-pm mailing list
> > linux-pm@lists.osdl.org
> > https://lists.osdl.org/mailman/listinfo/linux-pm
> >
>
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-31 13:44                           ` Amit Kucheria
@ 2006-09-02 11:17                             ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-02 11:17 UTC (permalink / raw)
  To: Amit Kucheria; +Cc: linux-pm

Hi!

On Thu 2006-08-31 16:44:12, Amit Kucheria wrote:
> On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
> > On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
> > > But PowerOP would allow SoC-based systems to tune the operating points
> > > to get the most out of their top-10 use-cases and sleep modes.
> > 
> > Question is: can we get similar savings without ugly interface powerop
> > presents?
> 
> If I have understood correctly, your main objection is to defining new
> operating points from userspace?

Well, that is big objection, but not my main one. I believe that "new
operating points from userspace" are non-starter. "So obviously wrong
that noone would merge that".

> The only other interface is the actually setting of a (named) operating
> point and that is _required_ to do anything useful.

No, they are not.

We already have interface for selecting cpu frequency. Lets keep it.

We may need new interface for selecting DSP frequency. If that is
needed, lets *add* that interface.

We may need new interface to say if usb needs to be enabled or
not. How to do that interface right is a question, but lets say we
*add* that interface.

Now, it should be up-to the powerop framework to select best operating
point given "cpu speed, dsp speed, usb on/off" state. But I argue that
this should be done in-kernel and hidden from user.

> > Is there anything cpufreq can't do?
> 
> - Embedded systems _want_ to deal with performance/power management in
> terms of Operating Points that encapsulate the complete state of the SoC
> (core speeds, voltages, buses speeds, etc.) instead of only CPU
> frequency.

I'm not saying we'll not have complete-SoC-state at some layer. But I
do not think we want it at userspace-kernelspace interface.

> - There is not much in cpufreq for handling rate propagation and
> dependency tracking for clocks and voltages. This is what the clock
> framework and the upcoming voltage framework handle quite well.

Good.

> - Most implementations of cpufreq drivers have a fixed rate table (freq,
> voltage). With rate propagation and dependencies in SoCs, available
> rates can vary dynamically based on states of various cores,
> peripherals, etc.

But that's cpufreq-implementation-detail, right?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-09-01 14:49 Scott E. Preece
  0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-01 14:49 UTC (permalink / raw)
  To: amit.kucheria; +Cc: linux-pm, scott.preece

| From Amit.Kucheria@nokia.com Fri Sep  1 03:14:57 2006
| 
| On Thu, 2006-08-31 at 15:22 -0400, ext Preece Scott-PREECE wrote:
| ...
| > So, if a driver had set acceptable_latency to 300ms, the
| > Power-Management policy manager could look at the range of
| > available Ops and pick the lowest-power OP that met the
| > expected load and would also meet the required latency
| > guarantee. [And note that the acceptable latency has to include
| > both the resume time and whatever part of suspend happens with
| > interrupts blocked and can't be aborted.]
| 
| Thinking of it that way, latency is possibly useful. Needs more
| thinking. But what latency values are associated with the OP? The values
| from the spec sheet provided by the silicon vendor do not take into
| account the other operations necessary before you can safely switch to a
| new OP. Some of these operations require indeterminate amount of time.
---

That's something the system designer would have to work out and provide
as part of the information associated with each possible OP transtion
(that is, it would potentially be different for each (currentOP, newOP)
pair).

The system designer would also need to decide whether the latencies had
to be worst-case guarantees or whether the system could tolerate
occasionally missing a latency deadline. This would vary depending on
the system (a heart pacemaker might find deadlines to be more important
than a PDA).

scott

-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-31 19:22                           ` Preece Scott-PREECE
@ 2006-09-01  8:11                             ` Amit Kucheria
  0 siblings, 0 replies; 136+ messages in thread
From: Amit Kucheria @ 2006-09-01  8:11 UTC (permalink / raw)
  To: ext Preece Scott-PREECE; +Cc: linux-pm

On Thu, 2006-08-31 at 15:22 -0400, ext Preece Scott-PREECE wrote:
> > [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Amit Kucheria
> > 
> > > >
> > > > - latency is not an attribute of a certain operating point but a
> > > function of
> > > > two arguments - current operating point and a point we 
> > are goint to 
> > > > switch to. Therefor latency just does not belong to 
> > 'struct powerop'
> > > 
> > > I disagree.
> > 
> > Problem is that you disagree without giving your reasons. 
> > Here is another reason putting latency into your operating 
> > point definition isn't going to fly:
> > 
> > http://lwn.net/Articles/196900/ <--- An API for specifying 
> > latency constraints
> > 
> > http://lwn.net/Articles/197282/
> ---

<snip>

> So, if a driver had set acceptable_latency to 300ms, the
> Power-Management policy manager could look at the range of
> available Ops and pick the lowest-power OP that met the
> expected load and would also meet the required latency
> guarantee. [And note that the acceptable latency has to include
> both the resume time and whatever part of suspend happens with
> interrupts blocked and can't be aborted.]

Thinking of it that way, latency is possibly useful. Needs more
thinking. But what latency values are associated with the OP? The values
from the spec sheet provided by the silicon vendor do not take into
account the other operations necessary before you can safely switch to a
new OP. Some of these operations require indeterminate amount of time.

> I like this new facility a lot. Now we just need something similar
> for expressing anticipated required processing capacity ("I need
> n thousand instructions executed in the next s seconds") in a nice
> platform-independent way...

bogomips? :-D

-- 
Amit Kucheria <amit.kucheria@nokia.com>
Nokia

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-31 13:27                         ` Amit Kucheria
@ 2006-08-31 19:22                           ` Preece Scott-PREECE
  2006-09-01  8:11                             ` Amit Kucheria
  0 siblings, 1 reply; 136+ messages in thread
From: Preece Scott-PREECE @ 2006-08-31 19:22 UTC (permalink / raw)
  To: Amit Kucheria, ext David Singleton; +Cc: linux-pm

> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Amit Kucheria
> 
> > >
> > > - latency is not an attribute of a certain operating point but a
> > function of
> > > two arguments - current operating point and a point we 
> are goint to 
> > > switch to. Therefor latency just does not belong to 
> 'struct powerop'
> > 
> > I disagree.
> 
> Problem is that you disagree without giving your reasons. 
> Here is another reason putting latency into your operating 
> point definition isn't going to fly:
> 
> http://lwn.net/Articles/196900/ <--- An API for specifying 
> latency constraints
> 
> http://lwn.net/Articles/197282/
---

Actually, the proposed acceptable_latency functions just make it 
more useful to add latency information to the operating point 
definitions. The new interfaces set a [global] acceptable 
latency, the operating point attribute would define the maximum 
latency that could occur for a particular OP or sleep mode. 
You need both sides - you need to know what's an acceptable 
latency and you need to know what latency a particular 
operation (like transitioning to a different OP) will impose; 
then you can decide whether a particular transition can be 
made while still meeting the required latency.

So, if a driver had set acceptable_latency to 300ms, the
Power-Management policy manager could look at the range of
available Ops and pick the lowest-power OP that met the
expected load and would also meet the required latency
guarantee. [And note that the acceptable latency has to include
both the resume time and whatever part of suspend happens with
interrupts blocked and can't be aborted.]

I like this new facility a lot. Now we just need something similar
for expressing anticipated required processing capacity ("I need
n thousand instructions executed in the next s seconds") in a nice
platform-independent way...

Scott

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-31 15:14 Scott E. Preece
  0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-08-31 15:14 UTC (permalink / raw)
  To: pavel; +Cc: linux-pm, matthew.a.locke, scott.preece


| From: Pavel Machek<pavel@suse.cz>
| ...
| > I'm not sure how you distinguish between a "system" sleep state 
| > and a "CPU" sleep state - seems like there's a collection of 
| > things that can be shut down or not; except for true OFF, there's
| > always something on.
| 
| Well, even in "true OFF", RTC keeps ticking. And in "disk" state
| (swsusp), machine is basically "true OFF" but it still retains state.
---

In our sleep state (which I would aligned with "mem", in the previous
list), the application processor part of the system is basically true
off, but retains state in memory. In our systems, of course, there's a
second processor that is independently going in and out of its own
low-power modes while waking up every so many milliseconds to stay
camped on a cellular network.

One "interesting" diffference between "disk" and "mem" (as we would use
them, though we don't have a disk, so we don't have a "disk" state), is
that suspend-to-disk today requires rebooting, while suspend-to-RAM
doesn't.  I don't see why that distinction can't still be below the
interface abstraction presented to a user-space power manager, but it's
the most qualitative difference across the range of proposed operating
points...

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30 22:36                         ` Pavel Machek
@ 2006-08-31 13:44                           ` Amit Kucheria
  2006-09-02 11:17                             ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-31 13:44 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: linux-pm

On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
> On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:

<snip>

> > You are trying to make it sound more complex than it really is. For a
> > notebook, as you yourself pointed out, things could be handled with the
> > present adaptive, load-based system. So you don't need to map _every_
> > use-case to an operating point. So you don't need to move to use PowerOP
> > today.
> 
> Ok, but please do not try to replace cpufreq with
> powerop/oppoint. That is not possible.

No one wants to replace cpufreq with PowerOP today. Which is why the
patches make PowerOP optional. But embedded systems need it today.

> > But PowerOP would allow SoC-based systems to tune the operating points
> > to get the most out of their top-10 use-cases and sleep modes.
> 
> Question is: can we get similar savings without ugly interface powerop
> presents?
> 

If I have understood correctly, your main objection is to defining new
operating points from userspace?

The only other interface is the actually setting of a (named) operating
point and that is _required_ to do anything useful.

<snip>
 
> > But they are NOT independent parameters! Which is why we want to
> > encapsulate them into an 'Operating Point'. We have completely failed in
> > our effort to explain the concept of an operating point if that has been
> > your assumption all along.
> 
> They are independed, at least from application point of view. And
> that's probably right interface to present to userland. Application
> tells you its dsp speed desired, you take current cpu frequency
> requirements from cpufreq, and select ooperating point with lowest
> consumption based on that constraints.

You are violating your own principles here - why should application know
about 'DSP speed'?

And individual applications don't know about operating points either.
They just present their requirements in terms of increased load or a
constraint (usb, temp, etc.). The device manager gets inputs about this
increased load and constraints and programs the appropriate OP. So we
agree here.

<snip>

> > And USB (or any device information) is NOT part of the operating point.
> > It is just an asynchronous constraint whose appearance/disappearance
> > influences operating point tangentially. IOW, on some systems USB could
> > run at any operating point, so there would be no constraint. On others,
> > use of USB would automatically cause usb clocks to go high which in turn
> > would switch the system to an operating point that satisfies the
> > constraint - this is handled by clock/voltage framework.
> 
> Okay, and why can't we handle _all_ the constraints in this style? Ask
> userspace what constraints are there, and automagically select best
> operating point, without having operating points explicit at userspace
> interface.
> 

OP change _will_ happen automatically in the kernel for quick
transitions. But the device manager will sometimes override this with
policy decisions. Because in certain cases, userspace knows best.

<snip>

> > Yes. Or more particularly, the ondemand governor, right? But load
> > average is not the only input used to make decisions. There could be
> > thermal alarms, battery alarms, etc. And deciding which of these
> > conflicting inputs is given priority is a policy decision made by the
> > device manager. We discussed some of this at the PM summit.
> 
> cpufreq already knows about thermal. (There's no policy in there, you
> can't allow system to overheat).
> 
> cpufreq already knows about battery. (On some powernow-k8, high cpu
> frequencies are not available on battery power, because battery is not
> powerful enough). If you have aditional constraints (may not use
> 400MHz when battery is below 20%, because li-ion has too big internal
> resistancy at that point?), please use cpufreq framework to enforce
> them.

I will look at support for thermal/battery events in cpufreq in greater
detail.

> Is there anything cpufreq can't do?

- Embedded systems _want_ to deal with performance/power management in
terms of Operating Points that encapsulate the complete state of the SoC
(core speeds, voltages, buses speeds, etc.) instead of only CPU
frequency.
- There is not much in cpufreq for handling rate propagation and
dependency tracking for clocks and voltages. This is what the clock
framework and the upcoming voltage framework handle quite well.
- Most implementations of cpufreq drivers have a fixed rate table (freq,
voltage). With rate propagation and dependencies in SoCs, available
rates can vary dynamically based on states of various cores,
peripherals, etc.

Regards,
Amit

-- 
Amit Kucheria <amit.kucheria@nokia.com>
Nokia

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-29  1:29                       ` David Singleton
  2006-08-29 22:39                         ` Eugeny S. Mints
@ 2006-08-31 13:27                         ` Amit Kucheria
  2006-08-31 19:22                           ` Preece Scott-PREECE
  1 sibling, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-31 13:27 UTC (permalink / raw)
  To: ext David Singleton; +Cc: linux-pm

On Mon, 2006-08-28 at 18:29 -0700, ext David Singleton wrote:

> >
> > - latency is not an attribute of a certain operating point but a
> function of
> > two arguments - current operating point and a point we are goint to
> > switch to. Therefor latency just does not belong to 'struct powerop'
> 
> I disagree.

Problem is that you disagree without giving your reasons. Here is
another reason putting latency into your operating point definition
isn't going to fly:

http://lwn.net/Articles/196900/ <--- An API for specifying latency
constraints

http://lwn.net/Articles/197282/

Please comment on PowerOP patches; dissect them if need be on why you
don't agree with the approach.

Regards,
Amit

-- 
Amit Kucheria <amit.kucheria@nokia.com>
Nokia

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-31  0:22                                   ` Preece Scott-PREECE
@ 2006-08-31 12:04                                     ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-31 12:04 UTC (permalink / raw)
  To: Preece Scott-PREECE; +Cc: Matthew Locke, linux-pm

Hi!

> > We may have confusion here.
> > 
> > On PC, it is definitely not possible to enter sleep state 
> > between frames of video... because video is powered off.
> > 
> > On PC, sleep states are *system* sleep states. CPU sleep 
> > states exist, too, but that's in-kernel implementation 
> > detail. They are called C1..C4.
> ---
> 
> Well, we have some hardware where we can sleep everything but 
> memory and some where we can also leave the display active (and 
> backlit). In fact, however, today the latency for going to sleep 
> is too great to do so between frames, so we just do a wait there. 
> We would LIKE to be able to sleep there at some point in the 
> future and would prefer a power model that made that cleanly 
> part of a continuum of operating points.
> 
> However, we definitely DO sleep (with self-refreshing RAM) during 
> relatively short periods, with the wakeup resulting from an 
> interrupt from the RTC (which is self-powered and is set 
> to timeout at the next scheduled timer when we go to sleep) 
> or another hardware interrupt. We think of this as a system-level 
> sleep state.
> 
> I'm not sure how you distinguish between a "system" sleep state 
> and a "CPU" sleep state - seems like there's a collection of 
> things that can be shut down or not; except for true OFF, there's
> always something on.

Well, even in "true OFF", RTC keeps ticking. And in "disk" state
(swsusp), machine is basically "true OFF" but it still retains state.

								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-31  2:41 Woodruff, Richard
  0 siblings, 0 replies; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-31  2:41 UTC (permalink / raw)
  To: Preece Scott-PREECE, Pavel Machek; +Cc: Matthew Locke, linux-pm

> Well, we have some hardware where we can sleep everything but
> memory and some where we can also leave the display active (and
> backlit). In fact, however, today the latency for going to sleep
> is too great to do so between frames, so we just do a wait there.
[Woodruff, Richard] 

In effect OMAP2/3 can auto idle to low power states in between LCD FIFO
refills.  The SDRAM, the DPLL and interconnects can be auto-idled
between LCD FIFO loads.  By carefully setting your LCD FIFO thresholds
you can spin back up the DPLL, memory and interconnect in time to load
up the LCD FIFO, then back to sleep.

Its not just LCD, other devices can do the same.  Say pushing data into
an audio codec's FIFO.

Getting this effect does require device coordination.  If a device
objects you don't hit the big savings.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-31  0:52 Scott E. Preece
  0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-08-31  0:52 UTC (permalink / raw)
  To: pavel; +Cc: linux-pm



| From linux-pm-bounces@lists.osdl.org Wed Aug 30 17:48:27 2006
| 
| On Tue 2006-08-29 21:52:26, David Singleton wrote:
| > >> >>         /sys/power/operating_points/mem
| > >> >>         /sys/power/operating_points/standby
| ....
| > >That does not make mixing them right.
| > 
| > Both OpPoint and PowerOp are going to 'mix' frequency, voltage
| > and sleep states into their operating point concepts.
| > 
| > The point was not to make it look like I was mixing sleep states and
| > CPU frequency states, but to present all the power states
| > supported by the system in one place and with one interface.  It simplifies
| > not only kernel code, but power manager code as well.
| 
| It is also wrong. And no, I do not think your power manager can
| properly use "mem" state.
| 
| You see, "mem" is very different from lowest. To exit lowest, you have
| to "echo highest > state". To exit "mem", you need power
| button. That's very different operation.
---

Not sure exactly what is meant by "mem" operating point. I was assuming
it meant "suspend-to-RAM" (almost everything shut down, memory self
refreshing). In my world, our current policy manager does manage mem
(which we call "sleep" and is the deepest suspend we do) separately from
frequency changes, but that's accident rather than intention.

I agree that there is some difference between them, since we do
frequency changes in response to load, but sleep-state changes based on
idleness. However, there's no real reason why those can't be inputs to
the same policy manager. We actually do make both decisions in the Idle
handler (well, there's more plumbing than that, but they're both driven
by going idle).

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30 22:50                                 ` Pavel Machek
@ 2006-08-31  0:22                                   ` Preece Scott-PREECE
  2006-08-31 12:04                                     ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Preece Scott-PREECE @ 2006-08-31  0:22 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Matthew Locke, linux-pm

 
> From: Pavel Machek [mailto:pavel@ucw.cz] 
> > It's an interesting question, and one we (Motorola) don't have hard 
> > data to support, yet. Our current implementation remembers 
> the OP in 
> > use before sleeping and resumes at the same OP. The 
> argument is that 
> > this corresponds to an application that is doing work at a 
> > steady-state level (say decoding video), but that has a 
> brief pause to 
> > wait for I/O or a timer (say, pausing until the next frame 
> time); then 
> > it makes sense that the level of effort required on resume 
> will be the 
> > same as before sleeping.
> 
> We may have confusion here.
> 
> On PC, it is definitely not possible to enter sleep state 
> between frames of video... because video is powered off.
> 
> On PC, sleep states are *system* sleep states. CPU sleep 
> states exist, too, but that's in-kernel implementation 
> detail. They are called C1..C4.
---

Well, we have some hardware where we can sleep everything but 
memory and some where we can also leave the display active (and 
backlit). In fact, however, today the latency for going to sleep 
is too great to do so between frames, so we just do a wait there. 
We would LIKE to be able to sleep there at some point in the 
future and would prefer a power model that made that cleanly 
part of a continuum of operating points.

However, we definitely DO sleep (with self-refreshing RAM) during 
relatively short periods, with the wakeup resulting from an 
interrupt from the RTC (which is self-powered and is set 
to timeout at the next scheduled timer when we go to sleep) 
or another hardware interrupt. We think of this as a system-level 
sleep state.

I'm not sure how you distinguish between a "system" sleep state 
and a "CPU" sleep state - seems like there's a collection of 
things that can be shut down or not; except for true OFF, there's
always something on.

Scott

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30 13:26                               ` Preece Scott-PREECE
@ 2006-08-30 22:50                                 ` Pavel Machek
  2006-08-31  0:22                                   ` Preece Scott-PREECE
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-30 22:50 UTC (permalink / raw)
  To: Preece Scott-PREECE; +Cc: Matthew Locke, linux-pm

Hi!

> > You bring up a more interesting area to look at integrating 
> > sleep states and operating points.  Can we add some hooks 
> > that allow us to define which operating point is selected at 
> > resume?  Need to think about that a bit:)
> ---
> 
> It's an interesting question, and one we (Motorola) don't have hard
> data to support, yet. Our current implementation remembers the OP
> in use before sleeping and resumes at the same OP. The argument is
> that this corresponds to an application that is doing work at a 
> steady-state level (say decoding video), but that has a brief pause
> to wait for I/O or a timer (say, pausing until the next frame time);
> then it makes sense that the level of effort required on resume will
> be the same as before sleeping.

We may have confusion here.

On PC, it is definitely not possible to enter sleep state between
frames of video... because video is powered off.

On PC, sleep states are *system* sleep states. CPU sleep states exist,
too, but that's in-kernel implementation detail. They are called
C1..C4.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30  4:52                           ` David Singleton
  2006-08-30  5:52                             ` Matthew Locke
@ 2006-08-30 22:43                             ` Pavel Machek
  1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-30 22:43 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

On Tue 2006-08-29 21:52:26, David Singleton wrote:
> On 8/29/06, Pavel Machek <pavel@ucw.cz> wrote:
> >Hi!

> >> >>         /sys/power/operating_points/high
> >> >>         /sys/power/operating_points/highest
> >> >>         /sys/power/operating_points/low
> >> >>         /sys/power/operating_points/lowest
> >> >>         /sys/power/operating_points/medium
> >> >>         /sys/power/operating_points/mem
> >> >>         /sys/power/operating_points/standby
....
> >That does not make mixing them right.
> 
> Both OpPoint and PowerOp are going to 'mix' frequency, voltage
> and sleep states into their operating point concepts.
> 
> The point was not to make it look like I was mixing sleep states and
> CPU frequency states, but to present all the power states
> supported by the system in one place and with one interface.  It simplifies
> not only kernel code, but power manager code as well.

It is also wrong. And no, I do not think your power manager can
properly use "mem" state.

You see, "mem" is very different from lowest. To exit lowest, you have
to "echo highest > state". To exit "mem", you need power
button. That's very different operation.

> >> Perhaps, but the common name space makes it easy for the
> >> power manager
> >> daemon to perform the same functions without having to
> >> know that the lowest
> >> speed on my laptop is 600Mhz.
> >
> >And enumerate english strings in power daemon? Limiting the numver of
> >states?
> 
> Hah,  I didn't think of it that way.   I was thinking in the same way
> "mem" and "disk" and "standy" are strings in the kernel.

> The names themselves don't mean anything other than to imply an order so the
> kernel and power manager can understand the same order.

mem/disk/standby are strings, because they can not be easily turned
into numbers. "low"/"lowest"/"high"/"highest" mess can easily be
turned into numbers. And that's what cpufreq does, btw.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30 11:00                       ` Amit Kucheria
@ 2006-08-30 22:36                         ` Pavel Machek
  2006-08-31 13:44                           ` Amit Kucheria
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-30 22:36 UTC (permalink / raw)
  To: Amit Kucheria; +Cc: linux-pm

On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
> On Thu, 2006-08-24 at 09:59 +0200, ext Pavel Machek wrote:
> > Hi!
> > 
> > > > > The userspace interface in Eungeny's patches is for other userspace
> > > > > programs (policy managers) to activate/deactivate valid operating points
> > > > > in the system dynamically and if necessary, introduce new ones into the
> > > > > system. It will also allow the operating points to be referenced by name
> > > > > instead of the tuple.
> > > > > 
> > > > > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > > > > 'powersave', 'usb' to switch to the relevant operating point based on
> > > > > configuration of the policy manager.
> > > > 
> > > > This seems to be too specific to embedded machine.
> > > > 
> > > > If userspace wants to work with usb and play mp3s at the same time,
> > > > what does it do?
> > > 
> > > Switch to 'fast'?
> > > 
> > > The operating point for a use-case specifies the _minimum_ required for
> > > the use-case. You can always go up.
> > 
> > > The system designer is responsible for 'designing' operating points that
> > > take into account multiple use-cases. Designing here refers to mapping
> > > use-cases to HW operating points.
> > 
> > Yes, and that's why I argue this is unsuitable for notebook: there are
> > just too many usecases for a notebook.
> 
> You are trying to make it sound more complex than it really is. For a
> notebook, as you yourself pointed out, things could be handled with the
> present adaptive, load-based system. So you don't need to map _every_
> use-case to an operating point. So you don't need to move to use PowerOP
> today.

Ok, but please do not try to replace cpufreq with
powerop/oppoint. That is not possible.

> But PowerOP would allow SoC-based systems to tune the operating points
> to get the most out of their top-10 use-cases and sleep modes.

Question is: can we get similar savings without ugly interface powerop
presents?

> > > Consider an example system with a main CPU and a DSP. To simplify
> > > discussion, lets assume 3 levels for CPU and DSP speeds and system
> > > voltage. Then, here is what an example operating-point to use-case
> > > mapping table could look like:
> > > 
> > > #     CPU speed      DSP speed      Voltage       use-case
> > > ----------------------------------------------------------
> > > 1.    high           high           high          fast, video
> > > 2.    med            high           high          
> > > 3.    med            med            med           usb[1]
> > > 4.    low            med            med           mp3
> > > 5.    low            low            low           powersave
> > > 
> > > [1] USB has voltage constraint (voltage >= med)
> > 
> > So... you take three independend parametrs and merge them into one,
> > named parameter. Bad idea.
> 
> But they are NOT independent parameters! Which is why we want to
> encapsulate them into an 'Operating Point'. We have completely failed in
> our effort to explain the concept of an operating point if that has been
> your assumption all along.

They are independed, at least from application point of view. And
that's probably right interface to present to userland. Application
tells you its dsp speed desired, you take current cpu frequency
requirements from cpufreq, and select ooperating point with lowest
consumption based on that constraints.

> > What about simply having these parameters:
> > 
> > usb on or off
> > 
> > cpu speed (controlled by cpufreq)
> > 
> > dsp speed (controlled by userspace)
> > 
> > Then you can have infrastructure that is able to compute system
> > voltage from usb/cpu/dsp speed, and users stll have interface they can
> > understand.
> 
> This is moot for the reason above - cpu/dsp/volt are NOT independent.
> 
> And USB (or any device information) is NOT part of the operating point.
> It is just an asynchronous constraint whose appearance/disappearance
> influences operating point tangentially. IOW, on some systems USB could
> run at any operating point, so there would be no constraint. On others,
> use of USB would automatically cause usb clocks to go high which in turn
> would switch the system to an operating point that satisfies the
> constraint - this is handled by clock/voltage framework.

Okay, and why can't we handle _all_ the constraints in this style? Ask
userspace what constraints are there, and automagically select best
operating point, without having operating points explicit at userspace
interface.

> > > - Add usb and we switch to OP 3.
> > > - Now our performance monitor (e.g load avg) indicates that we need more
> > > CPU processing. So we switch to OP 2.
> > 
> > That's cpufreq job, please
> 
> Yes. Or more particularly, the ondemand governor, right? But load
> average is not the only input used to make decisions. There could be
> thermal alarms, battery alarms, etc. And deciding which of these
> conflicting inputs is given priority is a policy decision made by the
> device manager. We discussed some of this at the PM summit.

cpufreq already knows about thermal. (There's no policy in there, you
can't allow system to overheat).

cpufreq already knows about battery. (On some powernow-k8, high cpu
frequencies are not available on battery power, because battery is not
powerful enough). If you have aditional constraints (may not use
400MHz when battery is below 20%, because li-ion has too big internal
resistancy at that point?), please use cpufreq framework to enforce
them.

Is there anything cpufreq can't do?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30 22:13                                       ` Mark Gross
@ 2006-08-30 22:27                                         ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-30 22:27 UTC (permalink / raw)
  To: Mark Gross; +Cc: linux-pm

On Wed 2006-08-30 15:13:54, Mark Gross wrote:
> On Mon, Aug 28, 2006 at 07:39:57PM +0200, Pavel Machek wrote:
> > On Mon 2006-08-28 09:40:38, Mark Gross wrote:
> > > On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> > > > On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > > > > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > > > Because 8388608 policies is clearly not reasonable, powerop can not
> > > > help here, and something better should be developed... like power
> > > > domains someone proposed here.
> > > > 
> > > > (Or to say it in another words, powerop forces one big power domain,
> > > > which is bad model for notebook-style machine).
> > > 
> > > I doubt notebook-style machines will ever us power op in any
> > > significant way.  HPC and embedded will be the first users.
> > 
> > I agree here... power op look useless for notebooks. But I doubt power
> > op authors would agree...
> 
> Concluding that it will be useless for notebooks may be premature.
> 
> I see powerop as the bottom of an future PM stack.  As the upper layers
> take shape who knows what platforms will use it?

Well, PCs are generaly designed in a way where individual devices are
separate, and that means that we do not have linked-parameters-problem
powerop tries to solve. But okay, perhaps someone created such
notebook in future...

> > > Power domains will likely build on top power op.
> > > 
> > > Power domains adds complexities themselves. Dealing with
> > > dependencies and constraints between domains will be a challenge.
> > 
> > Once we have power domains in/solved... do we still need power op? I
> > thought power op could be useful for solving constrains _inside_ one
> > domain, but...
> 
> Power domains and the components within them will likely be accessed as
> operating points.  I think we need to build the power domain
> abstractions on top of operating points.  This is why I want to see
> support for multiple power_op_driver instances or a story for how
> operating points are added to a running system or even platform to
> enable and deal with domains.

Yes, multiple power_op_drivers -- one per power domain -- makes some
sense.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-28 17:39                                     ` Pavel Machek
  2006-08-29  7:51                                       ` Matthew Locke
@ 2006-08-30 22:13                                       ` Mark Gross
  2006-08-30 22:27                                         ` Pavel Machek
  1 sibling, 1 reply; 136+ messages in thread
From: Mark Gross @ 2006-08-30 22:13 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On Mon, Aug 28, 2006 at 07:39:57PM +0200, Pavel Machek wrote:
> On Mon 2006-08-28 09:40:38, Mark Gross wrote:
> > On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> > > On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > > > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > > Because 8388608 policies is clearly not reasonable, powerop can not
> > > help here, and something better should be developed... like power
> > > domains someone proposed here.
> > > 
> > > (Or to say it in another words, powerop forces one big power domain,
> > > which is bad model for notebook-style machine).
> > 
> > I doubt notebook-style machines will ever us power op in any
> > significant way.  HPC and embedded will be the first users.
> 
> I agree here... power op look useless for notebooks. But I doubt power
> op authors would agree...

Concluding that it will be useless for notebooks may be premature.

I see powerop as the bottom of an future PM stack.  As the upper layers
take shape who knows what platforms will use it?

> 
> > Power domains will likely build on top power op.
> > 
> > Power domains adds complexities themselves. Dealing with
> > dependencies and constraints between domains will be a challenge.
> 
> Once we have power domains in/solved... do we still need power op? I
> thought power op could be useful for solving constrains _inside_ one
> domain, but...

Power domains and the components within them will likely be accessed as
operating points.  I think we need to build the power domain
abstractions on top of operating points.  This is why I want to see
support for multiple power_op_driver instances or a story for how
operating points are added to a running system or even platform to
enable and deal with domains.

--mgross

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30  5:52                             ` Matthew Locke
@ 2006-08-30 13:39                               ` Preece Scott-PREECE
  0 siblings, 0 replies; 136+ messages in thread
From: Preece Scott-PREECE @ 2006-08-30 13:39 UTC (permalink / raw)
  To: Matthew Locke, David Singleton; +Cc: linux-pm, Pavel Machek

 
> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Matthew Locke

> Latency is a good example of how mixing sleep states with 
> operating points doesn't  quite work.  Latency for switching 
> to a new operating 
> point is very a much a function of the current operating point.   I 
> think latency for sleep states is the same every time.  Also 
> which direction are you capturing latency for, suspend or 
> resume?  If your power manager is making decisions based on 
> latency, I could imagine that it needs to know the latency 
> for going into and out of that state.
---

Well, you could also argue that the ability to express latency is 
a good argument for having an approach that does mingle frequency-based,

voltage-based, and sleep-state Ops, though not necessarily exactly 
what oppoint does! That is, if the policy manager is to make good 
decisions about whether to shift to a different frequency or sleep 
at one of n possible sleep levels, it would exactly want to be able 
to consider the alternatives in terms of the expected time and power 
cost of each possible transition and whether the expected time to 
remain in that state would be sufficient to recoup those costs.

That is, you'd like a big state-transition table that has a tuple of 
costs associated with each possible transition from each possible state.

The utility of the expected times, of course, depends on whether you 
are able to infer anything about how long an idle (or low-work) 
period is likely to persist. I think this is a fertile area for 
experimentation, though we haven't had time to do it, yet.  And, 
while it's superficially easier to take on for embedded devices 
that have limited numbers of states, the notion of a system manager 
that watches work patterns and their corresponding load levels and 
idle times is not particularly far-fetched.

On the other hand, getting infrastructure in place is more 
important, today! But, that said, if user-space interfaces 
are forever, you would like to present a user-space interface 
that can be easily extended to support such notions in the future, 
without breaking compatibility.

Scott

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30  6:20                             ` Matthew Locke
@ 2006-08-30 13:26                               ` Preece Scott-PREECE
  2006-08-30 22:50                                 ` Pavel Machek
  2006-09-02 18:05                               ` David Singleton
  1 sibling, 1 reply; 136+ messages in thread
From: Preece Scott-PREECE @ 2006-08-30 13:26 UTC (permalink / raw)
  To: Matthew Locke; +Cc: linux-pm, Pavel Machek

 
> From: Matthew Locke [mailto:matthew.a.locke@comcast.net] 
> ...
> > Could you say why you think [frequencies and sleep states]
> > shouldn't be mixed? Absent argument 
> > to the contrary, making it a single continuum seems appealing. Why 
> > have separate policies?
> 
> I know this questions is directed at Pavel but I have similar 
> concerns. 
>    I agree that making sleep states into operating points is 
> appealing.  
> However,  if the implementation is just going to special case 
> the sleep state operating points then they should be handled 
> separately.  As Pavel points out, you can see from Dave's 
> implementation that the 
> operating point definition doesn't quite work for both.   Voltage and 
> frequency don't have meaning for the sleep points.  The per 
> point transition callbacks are needed just to handle the 
> sleep points.  
> Latency has a very different meaning between the two types.  
> Also,  the type field is required to identify which points 
> are sleep points and which ones are operating points.  I 
> think the concept of using operating points to define sleep 
> states could be a valid one but the implementation provided 
> isn't quite right (yet).
---

Well, if the object is to present "the right interface" to a user-space
policy manager, then I think the argument about whether the
implementations
are shared is irrelevant - the whole point would be to abstract away the
implementation, which could conceivably be separate hard-coded routines
for each operating point.  HOWEVER, the second part of your comment,
that the interface actually isn't right for describing sleep states, is
more compelling and might point a direction for future realignment (or
might just be grist for discussion now about the right attributes for
operating points. 

> ...
> 
> I think its more an issue with the implementation rather than 
> the concept.  We can't just have frequency and voltage as 
> shown in oppoint patches because we don't know which 
> frequency and voltage they refer to.  The number of frequency 
> and voltage parameters are completely dependent on the 
> hardware platform.
---

I agree that the set of parameters 
describing an OP should be flexible.

---
> 
> You bring up a more interesting area to look at integrating 
> sleep states and operating points.  Can we add some hooks 
> that allow us to define which operating point is selected at 
> resume?  Need to think about that a bit:)
---

It's an interesting question, and one we (Motorola) don't have hard
data to support, yet. Our current implementation remembers the OP
in use before sleeping and resumes at the same OP. The argument is
that this corresponds to an application that is doing work at a 
steady-state level (say decoding video), but that has a brief pause
to wait for I/O or a timer (say, pausing until the next frame time);
then it makes sense that the level of effort required on resume will
be the same as before sleeping.

Another possible approach would be to automatically resume at the 
highest OP, on the assumption that whatever wakes you up will be 
work to do, so you should be prepared to do it and then sink back 
to a lower OP as the workload diminishes. That argument is especially 
appropriate if you have a mechanism that aggregates timers or does 
periodic scheduling, so that you wake up less often, but do more 
work at each wakeup.

We're planning to do some research on this, but it's waiting for 
the infrequent time when there's a lull in product development work...

Scott

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-24  7:59                     ` Pavel Machek
@ 2006-08-30 11:00                       ` Amit Kucheria
  2006-08-30 22:36                         ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-30 11:00 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: linux-pm

On Thu, 2006-08-24 at 09:59 +0200, ext Pavel Machek wrote:
> Hi!
> 
> > > > The userspace interface in Eungeny's patches is for other userspace
> > > > programs (policy managers) to activate/deactivate valid operating points
> > > > in the system dynamically and if necessary, introduce new ones into the
> > > > system. It will also allow the operating points to be referenced by name
> > > > instead of the tuple.
> > > > 
> > > > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > > > 'powersave', 'usb' to switch to the relevant operating point based on
> > > > configuration of the policy manager.
> > > 
> > > This seems to be too specific to embedded machine.
> > > 
> > > If userspace wants to work with usb and play mp3s at the same time,
> > > what does it do?
> > 
> > Switch to 'fast'?
> > 
> > The operating point for a use-case specifies the _minimum_ required for
> > the use-case. You can always go up.
> 
> > The system designer is responsible for 'designing' operating points that
> > take into account multiple use-cases. Designing here refers to mapping
> > use-cases to HW operating points.
> 
> Yes, and that's why I argue this is unsuitable for notebook: there are
> just too many usecases for a notebook.

You are trying to make it sound more complex than it really is. For a
notebook, as you yourself pointed out, things could be handled with the
present adaptive, load-based system. So you don't need to map _every_
use-case to an operating point. So you don't need to move to use PowerOP
today.

But if someone (distro vendors?) takes the time and effort to map
possible use-cases, then the power manager could do better prediction of
performance requirements.

Optionally, _if_ applications were power aware, they would send
information about their activity to power manager or modify their own
class-of-service requirements. e.g. Rendering webpage has different
requirements than simply showing the page. This is NOT being discussed
at the moment.

But PowerOP would allow SoC-based systems to tune the operating points
to get the most out of their top-10 use-cases and sleep modes.

> > Consider an example system with a main CPU and a DSP. To simplify
> > discussion, lets assume 3 levels for CPU and DSP speeds and system
> > voltage. Then, here is what an example operating-point to use-case
> > mapping table could look like:
> > 
> > #     CPU speed      DSP speed      Voltage       use-case
> > ----------------------------------------------------------
> > 1.    high           high           high          fast, video
> > 2.    med            high           high          
> > 3.    med            med            med           usb[1]
> > 4.    low            med            med           mp3
> > 5.    low            low            low           powersave
> > 
> > [1] USB has voltage constraint (voltage >= med)
> 
> So... you take three independend parametrs and merge them into one,
> named parameter. Bad idea.

But they are NOT independent parameters! Which is why we want to
encapsulate them into an 'Operating Point'. We have completely failed in
our effort to explain the concept of an operating point if that has been
your assumption all along.

> What about simply having these parameters:
> 
> usb on or off
> 
> cpu speed (controlled by cpufreq)
> 
> dsp speed (controlled by userspace)
> 
> Then you can have infrastructure that is able to compute system
> voltage from usb/cpu/dsp speed, and users stll have interface they can
> understand.

This is moot for the reason above - cpu/dsp/volt are NOT independent.

And USB (or any device information) is NOT part of the operating point.
It is just an asynchronous constraint whose appearance/disappearance
influences operating point tangentially. IOW, on some systems USB could
run at any operating point, so there would be no constraint. On others,
use of USB would automatically cause usb clocks to go high which in turn
would switch the system to an operating point that satisfies the
constraint - this is handled by clock/voltage framework.

> (How are they supposed to know if video use case is compatible with
> usb? They should not have to).

Only one human 'user' needs to worry about this detail - the system
designer, and that too only in SoC-based systems. For PC systems, such
constraints don't exist; you can be happy with the load-based scaling.

> > - Now if we are playing mp3, we switch to OP 4.
> 
> Do you expect all mp3 playing applications to play with
> /sys/.../powerop-point? How do you tell if mp3's are playing? These
> are hard questions for a notebook.

There are two ways I can think of:

1. Modify every application - Every application then sends messages to
the power manager about its state e.g. paused, playing, ffwd, stopped.
This is not meant for PC applications due to their sheer numbers. But it
is not uncommon on embedded systems to tune application behaviour to
ease/improve power management.

2. Modify central launcher application - Click on an application icon
gives us information about what application is about to be launched that
allows us to change operating point. This might be doable for PC
applications by modifying KDE/Gnome launchers. But it only tells us what
applications are loaded, even though they might be idle. Which is where
the load average helps.

> > - Add usb and we switch to OP 3.
> > - Now our performance monitor (e.g load avg) indicates that we need more
> > CPU processing. So we switch to OP 2.
> 
> That's cpufreq job, please

Yes. Or more particularly, the ondemand governor, right? But load
average is not the only input used to make decisions. There could be
thermal alarms, battery alarms, etc. And deciding which of these
conflicting inputs is given priority is a policy decision made by the
device manager. We discussed some of this at the PM summit.

    Device manager
            |
  -------------------------------------
  |          |             |           |
 Power     Performance    Thermal     Misc.
 manager    manager       manager
         (e.g. ondemand)  


Regards,
Amit

-- 
Amit Kucheria <amit.kucheria@nokia.com>
Nokia

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-29 17:49                           ` Preece Scott-PREECE
@ 2006-08-30  6:20                             ` Matthew Locke
  2006-08-30 13:26                               ` Preece Scott-PREECE
  2006-09-02 18:05                               ` David Singleton
  0 siblings, 2 replies; 136+ messages in thread
From: Matthew Locke @ 2006-08-30  6:20 UTC (permalink / raw)
  To: Preece Scott-PREECE; +Cc: linux-pm, Pavel Machek


On Aug 29, 2006, at 10:49 AM, Preece Scott-PREECE wrote:

>
>> From: linux-pm-bounces@lists.osdl.org
>> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Pavel Machek
>> Sent: Tuesday, August 29, 2006 11:35 AM
>> To: David Singleton
>> Cc: linux-pm@lists.osdl.org
>> Subject: Re: [linux-pm] So, what's the status on the recent
>> patches here?
>>
>> Hi!
>>>>>         point, by name. There is a new
>>>> /sys/power/operating_points directory
>>>>>         that shows all the operating points the
>>>> system supports. An
>>>>>         exampled from my centrino laptop shows:
>>>>>
>>>>>         /sys/power/operating_points/high
>>>>>         /sys/power/operating_points/highest
>>>>>         /sys/power/operating_points/low
>>>>>         /sys/power/operating_points/lowest
>>>>>         /sys/power/operating_points/medium
>>>>>         /sys/power/operating_points/mem
>>>>>         /sys/power/operating_points/standby
>>>>
>>>> What makes you think that mixing operating and sleep
>> states is good
>>>> idea?
>>>
>>> They are all power states managed by the kernel and in the
>> operating
>>> point concept they are all operating points the system supports.
>>
>> That does not make mixing them right.
> ---
>
> Could you say why you think they shouldn't be mixed? Absent argument to
> the contrary,
> making it a single continuum seems appealing. Why have separate
> policies?

I know this questions is directed at Pavel but I have similar concerns. 
   I agree that making sleep states into operating points is appealing.  
However,  if the implementation is just going to special case the sleep 
state operating points then they should be handled separately.  As 
Pavel points out, you can see from Dave's implementation that the 
operating point definition doesn't quite work for both.   Voltage and 
frequency don't have meaning for the sleep points.  The per point 
transition callbacks are needed just to handle the sleep points.  
Latency has a very different meaning between the two types.  Also,  the 
type field is required to identify which points are sleep points and 
which ones are operating points.  I think the concept of using 
operating points to define sleep states could be a valid one but the 
implementation provided isn't quite right (yet).

However, this should not detract from mainlining PowerOP.  Integrating 
sleep and operating points is not required to use PowerOP.

>
> ---
>>
>>> The system can be set to any of the supported states by
>> setting their
>>> name in the /sys/power/state file.  I find simplicity is usually a
>>> good thing.
>>
>> I believe the quote is 'make it as simple as possible but not
>> simpler'.
> ---
>
> So, why don't you think this simplification is possible?
>
> ---
>>
>>>> And '600MHz' makes lot more sense than 'lowest' on centrino.
>>>
>>> Perhaps, but the common name space makes it easy for the
>> power manager
>>> daemon to perform the same functions without having to know
>> that the
>>> lowest speed on my laptop is 600Mhz.
>>
>> And enumerate english strings in power daemon? Limiting the
>> numver of states?
> ---
>
> Are you saying that on your laptop, all possible CPU and bus 
> frequencies
>
> can be used independently? So, it would be unnecessarily limiting to
> have
> the system designer provide a list of combinations that work? Remember
> that
> the scope of this is a limited set of parameters, not all the devices 
> in
> the system.
>
> ---
>>
>>>>
>>>>>         /sys/power/operating_points/high/frequency
>>>>>         /sys/power/operating_points/high/voltage
>>>>>         /sys/power/operating_points/high/latency
>>>>
>>>> What is voltage for 'mem'?
>>>
>>> I don't know what the voltage or latency is for mem.
>>> Perhaps Intel could better
>>> say what the voltage is in the suspend state and what the
>> latency was
>>> for transistion to that state.  I didn't have the data
>> available when
>>> I wrote the code.
>>
>> And you will not have data available even if intel helps you.
>> What is _frequency_ for mem? These fields are meaningless for
>> sleep states; that should tell you that mixing sleep and
>> operating states is bad idea.
> ---
>
> Why isn't 0 a meaningful value for frequency? And I can imagine
> that some hardware might have different voltage options for sleep
> States.  Additionally, these sys entries could represent the frequency,
> voltage, etc., that the system would go to upon resuming from sleep...

I think its more an issue with the implementation rather than the 
concept.  We can't just have frequency and voltage as shown in oppoint 
patches because we don't know which frequency and voltage they refer 
to.  The number of frequency and voltage parameters are completely 
dependent on the hardware platform.

You bring up a more interesting area to look at integrating sleep 
states and operating points.  Can we add some hooks that allow us to 
define which operating point is selected at resume?  Need to think 
about that a bit:)

>
> scott
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-30  4:52                           ` David Singleton
@ 2006-08-30  5:52                             ` Matthew Locke
  2006-08-30 13:39                               ` Preece Scott-PREECE
  2006-08-30 22:43                             ` Pavel Machek
  1 sibling, 1 reply; 136+ messages in thread
From: Matthew Locke @ 2006-08-30  5:52 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm, Pavel Machek


On Aug 29, 2006, at 9:52 PM, David Singleton wrote:

> On 8/29/06, Pavel Machek <pavel@ucw.cz> wrote:
>> Hi!
>>>>>         point, by name. There is a new
>>>> /sys/power/operating_points directory
>>>>>         that shows all the operating points the
>>>> system supports. An
>>>>>         exampled from my centrino laptop shows:
>>>>>
>>>>>         /sys/power/operating_points/high
>>>>>         /sys/power/operating_points/highest
>>>>>         /sys/power/operating_points/low
>>>>>         /sys/power/operating_points/lowest
>>>>>         /sys/power/operating_points/medium
>>>>>         /sys/power/operating_points/mem
>>>>>         /sys/power/operating_points/standby
>>>>
>>>> What makes you think that mixing operating and sleep
>>>> states is good
>>>> idea?
>>>
>>> They are all power states managed by the kernel and in
>>> the operating
>>> point concept they are all operating points the system
>>> supports.
>>
>> That does not make mixing them right.
>
> Both OpPoint and PowerOp are going to 'mix' frequency, voltage
> and sleep states into their operating point concepts.

PowerOP does not mix sleep states with operating points.  We are not 
pushing for integrating sleep states and operating points.  I haven't 
seen an implementation that makes sense yet.  I'm writing another email 
to address this in more detail.

>
> The point was not to make it look like I was mixing sleep states and
> CPU frequency states, but to present all the power states
> supported by the system in one place and with one interface.  It 
> simplifies
> not only kernel code, but power manager code as well.
>
>>
>>> The system can be set to any of the supported states by
>>> setting their name in the /sys/power/state file.  I find
>>> simplicity
>>> is usually a good thing.
>>
>> I believe the quote is 'make it as simple as possible but not
>> simpler'.
>>
>>>> And '600MHz' makes lot more sense than 'lowest' on
>>>> centrino.
>>>
>>> Perhaps, but the common name space makes it easy for the
>>> power manager
>>> daemon to perform the same functions without having to
>>> know that the lowest
>>> speed on my laptop is 600Mhz.
>>
>> And enumerate english strings in power daemon? Limiting the numver of
>> states?
>
> Hah,  I didn't think of it that way.   I was thinking in the same way
> "mem" and "disk" and "standy" are strings in the kernel.
>
>  The names themselves don't mean anything other than to imply an order 
> so the
> kernel and power manager can understand the same order.
>
> I have the oppointd daemon running on systems of different 
> architectures
> and different numbers of operating points.  Some have only two 
> operating
> points defined, some have three, some have five and one has six.
>
> The power manager functions the same on all of them because of the 
> ordering
> presented by the names.
>
> The oppointd daemon also understands that the operating points 
> associated
> with the names  may be sparsely populated.  The daemon can  still work
> correctly on a sparsely populated name space because of the ordering
> implied. It works unchanged
> on systems with only two states and on systems with six states.
>
>>
>>>>
>>>>>         /sys/power/operating_points/high/frequency
>>>>>         /sys/power/operating_points/high/voltage
>>>>>         /sys/power/operating_points/high/latency
>>>>
>>>> What is voltage for 'mem'?
>>>
>>> I don't know what the voltage or latency is for mem.
>>> Perhaps Intel could better
>>> say what the voltage is in the suspend state and what
>>> the latency was
>>> for transistion to that state.  I didn't have the data
>>> available when
>>> I wrote the code.
>>
>> And you will not have data available even if intel helps you. What is
>> _frequency_ for mem? These fields are meaningless for sleep states;
>> that should tell you that mixing sleep and operating states is bad
>> idea.
>
> Actually the SoC concerns that both OpPoint and PowerOp are trying
> to deal with have different power consumption levels associated with
> different sleep states.
>
> Different sleep states have different power consumption levels
> and different transition latencies.  The power manager needs to
> understand both the power consumption level with each
> sleep state and the transition latency so it can make decisions
> about when to transition into different sleep states.
>
> Typically sleep states with the lowest power consumption have
> the longest transition latencies.  The power manager must know
> the power consumption and transition latencies so it can decide when 
> best to
> switch to a sleep state that consumes a bit more power but has
> a much shorter latency to switch into and out of,  and
> when it's okay to switch to the lowest sleep state, but longest
> transition latency.
>

Latency is a good example of how mixing sleep states with operating 
points doesn't  quite work.  Latency for switching to a new operating 
point is very a much a function of the current operating point.   I 
think latency for sleep states is the same every time.  Also which 
direction are you capturing latency for, suspend or resume?  If your 
power manager is making decisions based on latency, I could imagine 
that it needs to know the latency for going into and out of that state.

>>
>> --
>> Thanks for all the (sleeping) penguins.
>>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-29 16:34                         ` Pavel Machek
  2006-08-29 17:49                           ` Preece Scott-PREECE
@ 2006-08-30  4:52                           ` David Singleton
  2006-08-30  5:52                             ` Matthew Locke
  2006-08-30 22:43                             ` Pavel Machek
  1 sibling, 2 replies; 136+ messages in thread
From: David Singleton @ 2006-08-30  4:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On 8/29/06, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
> > >>         point, by name. There is a new
> > >/sys/power/operating_points directory
> > >>         that shows all the operating points the
> > >system supports. An
> > >>         exampled from my centrino laptop shows:
> > >>
> > >>         /sys/power/operating_points/high
> > >>         /sys/power/operating_points/highest
> > >>         /sys/power/operating_points/low
> > >>         /sys/power/operating_points/lowest
> > >>         /sys/power/operating_points/medium
> > >>         /sys/power/operating_points/mem
> > >>         /sys/power/operating_points/standby
> > >
> > >What makes you think that mixing operating and sleep
> > >states is good
> > >idea?
> >
> > They are all power states managed by the kernel and in
> > the operating
> > point concept they are all operating points the system
> > supports.
>
> That does not make mixing them right.

Both OpPoint and PowerOp are going to 'mix' frequency, voltage
and sleep states into their operating point concepts.

The point was not to make it look like I was mixing sleep states and
CPU frequency states, but to present all the power states
supported by the system in one place and with one interface.  It simplifies
not only kernel code, but power manager code as well.

>
> > The system can be set to any of the supported states by
> > setting their name in the /sys/power/state file.  I find
> > simplicity
> > is usually a good thing.
>
> I believe the quote is 'make it as simple as possible but not
> simpler'.
>
> > >And '600MHz' makes lot more sense than 'lowest' on
> > >centrino.
> >
> > Perhaps, but the common name space makes it easy for the
> > power manager
> > daemon to perform the same functions without having to
> > know that the lowest
> > speed on my laptop is 600Mhz.
>
> And enumerate english strings in power daemon? Limiting the numver of
> states?

Hah,  I didn't think of it that way.   I was thinking in the same way
"mem" and "disk" and "standy" are strings in the kernel.

 The names themselves don't mean anything other than to imply an order so the
kernel and power manager can understand the same order.

I have the oppointd daemon running on systems of different architectures
and different numbers of operating points.  Some have only two operating
points defined, some have three, some have five and one has six.

The power manager functions the same on all of them because of the ordering
presented by the names.

The oppointd daemon also understands that the operating points associated
with the names  may be sparsely populated.  The daemon can  still work
correctly on a sparsely populated name space because of the ordering
implied. It works unchanged
on systems with only two states and on systems with six states.

>
> > >
> > >>         /sys/power/operating_points/high/frequency
> > >>         /sys/power/operating_points/high/voltage
> > >>         /sys/power/operating_points/high/latency
> > >
> > >What is voltage for 'mem'?
> >
> > I don't know what the voltage or latency is for mem.
> > Perhaps Intel could better
> > say what the voltage is in the suspend state and what
> > the latency was
> > for transistion to that state.  I didn't have the data
> > available when
> > I wrote the code.
>
> And you will not have data available even if intel helps you. What is
> _frequency_ for mem? These fields are meaningless for sleep states;
> that should tell you that mixing sleep and operating states is bad
> idea.

Actually the SoC concerns that both OpPoint and PowerOp are trying
to deal with have different power consumption levels associated with
different sleep states.

Different sleep states have different power consumption levels
and different transition latencies.  The power manager needs to
understand both the power consumption level with each
sleep state and the transition latency so it can make decisions
about when to transition into different sleep states.

Typically sleep states with the lowest power consumption have
the longest transition latencies.  The power manager must know
the power consumption and transition latencies so it can decide when best to
switch to a sleep state that consumes a bit more power but has
a much shorter latency to switch into and out of,  and
when it's okay to switch to the lowest sleep state, but longest
transition latency.

>
> --
> Thanks for all the (sleeping) penguins.
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-29  1:29                       ` David Singleton
@ 2006-08-29 22:39                         ` Eugeny S. Mints
  2006-08-31 13:27                         ` Amit Kucheria
  1 sibling, 0 replies; 136+ messages in thread
From: Eugeny S. Mints @ 2006-08-29 22:39 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

2006/8/29, David Singleton <daviado@gmail.com>:
> On 8/27/06, Eugeny S. Mints <eugeny.mints@gmail.com> wrote:
> > 2006/8/26, David Singleton <daviado@gmail.com>:
> > > On 8/19/06, Dave Jones <davej@redhat.com> wrote:
> > > > On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:
> > > >
> > > >  > If I had all the existing cpufreq tables transformed
> > > >  > into operating points I could make a patch that would remove
> > > >  > the bulk of cpufreq code from the kernel and you'd have
> > > >  > pretty much the same functionality without the maintenance
> > > >  > issues the added layers and complexity bring.
> > > >
> > > > If this is going to fly at all, I think thats where we need to be headed.
> > > > Having two parts of the kernel doing the same thing just seems
> > > > very wrong to me.
> > > >
> > > > The other alternative as suggested earlier this week would be archictures
> > > > getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
> > > > necessarily bring anything but the layer of indirection.
> > > >
> > > > I'm about to disappear for two weeks for a much needed vacation, but
> > > > I'll be interested to see other folks comments/opinions on this
> > > > when I get back.
> > >
> > [snip]
> > >        1) I believe I now have the right kernel interface for a common
> > >        power management infrastructure.
> > >
> > OpPoint continues to focus on user space interface development for
> > power management in contrast to that there seem to be an agreemment in
> > the comunity to defer this integration due to in fact quite a lot of
> > open/undiscussed and complex questions about this integration and
> > instead to focus on getting a consensus on operating point structure
> > definition and methods to work with the structure instances.
>
> Actually OpPoint is focusing on all the interfaces, user-kernel,
> kernel-architecture
> independent - power management interfaces, and power management
> framework -
> architecture/platform specific interfaces.
you've dug into all complex set of interfaces without having community
convinced you put correct main brick (struct oppoint definition) in
the basis. That's fine to provide reference code for other pieces as
example how you expect your oppoint to work but you keep to enchance
other interface despite there are unresovled questions with struct
oppoint definition even based on your initial draft of overall
picture.
>
> >
> > OpPoint continues to focus on integration with CPUFreq in a manner
> > which was outlined as an anacceptable during recent discussions on the
> > list - removing the concept of a inkernel governor and most of the
> > CPUFreq feature code.
>
> The point of OpPoint is to show that a unified power management infrastructure
> is possible and that bolting on another power management infrastructure to
> the kernel is not the right approach.
Just not true. The patch set which corresponds to what you are saying
here should remove cpufreq code completely while in your patch set you
just added something without touching legacy cpufreq code in any way
what is completely confusing.

PowerOP is much more flexible solution: it provides an option to build
sysfs interface identical to your sysfs interface (even enchanced
sysfs interface with export of power parameters and points creation
capability)  _and_ an option to build legacy cpufreq interface on top
of PowerOP .  One can build what is more suitable for a particular
system design on top of PowerOP - that is the real university.
>
> OpPoint is not trying to replace cpufreq.  It's trying to unify all
> the power management
> infrastructures into a a single infrastructure.  OpPoint uses the
> cpufreq notifier
> infrastructure to do both operating opoint transition and driver
> scaling notification,
> and it performs the same basic functions as cpufreq, without the
> policy and governor
> code.  It is also performing all the same Dynamic Power Management functionality
> on the pxa27x mainstone.  The point is one infrastructure can support them.
>
> And with the new oppointd power daemon it is performing all the same functions
> as cpuspeed did on my laptop, just with a lot less code in the kernel.
Please publish your measured latencies _with proper system load_ for
inkerenel cpufreq governor and your userspace deamon so everyone can
see it's really the same.
>
>
> >
> > OpPoint continues to develop userspace interfaces and integration
> > based on operating point definition for which Matt and I posted
> > issues/questions several time and the posts have been left without a
> > reply.
>
> Sorry, I'm having a hard time keeping up with all the email threads.
>
> >
> > Below I'm trying to summurize all issues I see with OpPoint approach
> > sometimes using terms defined in PowerOP approach (for example layer
> > names).
> >
> > 'struct powerop' definition
> > ------------------------------------
> > - frequency, voltage fields are arch specific: not to mention any
> > complex embedded case but current definition and OpPoint
> > implementation does not work even for x86 SMP case.
>
> Actually frequency, voltage and latency fields are architecture independent
> and a necessary peice of information that any power manager must have.
there can be a platfrom where power parameters are deviders. no any
frequency and voltage. these _are_ arch dependent.
> You are right, I have not yet put in the additional layer to support SMP
> systems.  That is one of the pieces I'm still working on.
until there is no SMP code your definition of 'struct oppoint' is
wrong due to my SMP and right above comment.
> >
> > - latency is not an attribute of a certain operating point but a function of
> > two arguments - current operating point and a point we are goint to
> > switch to. Therefor latency just does not belong to 'struct powerop'
>
> I disagree.
any arguments?
>
> >
> > - all hooks are redundant: the hooks are the same for all operating points
> > untill we come to the integration with suspend/resume. But we believe the
> > intagration needs more investigation at the first place and at the second we
> > feel like the integration may be handled on PM Core layer instead
> > of having per operating point hooks
>
> The hooks are not redundant nor the same for all operating points.  Each
> operating point defines it's prepare, transition, and finish functions for the
> hooks.  And different types of operating points may have completely different
> functions in those hooks, on the same platform.
please point out lines in your patch set to see this. in the code i
saw all the hooks are the same so far.
> >
> > - prepare_transition and finish_ransition may be moved even below PM Core to
> > clock/voltage framework; needs more carefull investigation though
>
>
> I disagree.  Both the pm suspend and cpufreq code has them in exactly
> the same place.
"it's been implemented in this way for ages so it's the only right
way" argument. i do not buy it.
>
> >
> > - md_data has an issue from OO design paradigm perspective.  OpPoint
> > requires an entity above PowerOP to know internals of arch md_data (see
> > centrino-dynamic-powerop.c implementation) and thus requires an arch
> > dependent header file to be included in the code which can be
> > impemented in arch independent manner. That would be fine if there was
> > no solution to achieve required functionality without such a hack but
> > PowerOP provides such approach by dereferencing  power parameters by
> > name. File which implements operating points registration in PowerOP
> > approach does not include any header file from include/asm-* subtrees.
>
> No, the md_data is the opaque pointer into architecture dependent data.
> The power management infrastructure doens't need to know what
> data is linked into md_data, just as drivers have driver specific
> structs that are opaque to the upper layers of software.
no reasons to discuss this until we are done discussing 'struct
oppoint' definition.
>
> >
> > All further pieces porposed by OpPOint base on the above incorrect
> > design of the main structure and therefore have issues.
>
> wow.
>
> >
> > integration with suspend/resume
> > -------------------------------
> > - mixing system state and operating point concept (different points
> > may correspond to a sleep/standby system state)
>
> The pxa27x code shows that indeed there are more than one suspsend state,
> which is why the operating point model works so well on both my
> centrino laptop and my pxa27x mainstone running the same oppointd
> power daemon.
>
> >
> > - legacy PM states are redefined via new OpPOint interface but do not
> > use it (explicit 'if' statements in legacy pm code instead of OpPOint
> > hooks uilization)
>
> The enter_state code could be merged into the pm_change code, or vice versa,
> I haven't had time to make it really unified and pretty.
>
> >
> > - names for operating points presented in the original letter below
> > implicitly assumed the points are ranged by some order (now it is from
> > the highest [power comsumption] to the lowest. However having many
> > more power parameters than just one freq and one voltage does not
> > allow to range the points in such a way and a string name without
> > knowledge of a particular power parameter values is not sufficient
>
> That's not quite correct.  The ordering of names, lowest to highest,
> allows the power manager daemon to cover most of the use cases
> right out of the box.  It's performing the same functions on both
> my centrino laptop and the pxa27x mainstone right now without
> any changes to either the power manager or power managenment
> config file.
>
> One of the next boards I'm working on has different operating points
> at the same frequency, but different voltages.  All that is realy required
> to support this a plugin to the power manager that understand the
> different operating points so it can best choose when to transition to
> each point.
>
> Custom plugins to a power manager that lets the power manager deal
> with the unique set of operating points on a particular platform is
> one of the really attractive parts of OpPoint.  It won't have to be
> woven into the kernel.
> > (even in x86 SMP case: not to mention it's hard to me to express SMP
> > case in current OpPOint terms but what are names and how to
> > distinguish/range 2 CPUs system states corresponded to 'highest point
> > for CPU0 + medium for CPU1' against 'low for CPU0 + high for CPU1' ?)
>
> I'm still working on the SMP case.  It's not that I'm ignoring it. Give me
> a few more days.
>
> >
> > - no example of (at least optional) capability to export information about
> > particular power paramenter is presented while it was obviously
> > highlighted by embedded community that it is a must
>
>
> Which parameters besides frequency, voltage and latency are required
> to be exported to the power manager?
basically all available platform power parameters. let userspace app
to decide to use it or not. Tell me please what are frequency, voltage
and latency for omap1710?
> >
> > - direct utilization of PM internal structure 'pm_state' instead of an attempt
> > of an API
> >
> > cpufreq core and a cpufreq driver/OpPOint integration
> > -------------------------------
> > - integration with legacy cpufreq interface is completely missing in both arch
> > (x86 and pxa) examples. If OpOint was a universal approach it would
> > allow to build different interfaces on top of it. In this case you can
> > porpose more optimized/improved interface if you feel existed
> > interface has issues leaving existed interface as a [configurable]
> > option and remove it when agreed.
>
> I'm sorry, I don't understand that statement.  I'm still opposed to
> dynamic-on-the-fly construction of operating points.  It's really dangerous.
> The hardware vendors want it so that new hardware doesn't have to
> wait for software before they can sell it.
PowerOP has dynamic-on-the-fly construction of operating points
capability but provides completely the same "safe" interface for
cpufreq feature users. Where is a danger?
>
> The cpufreq structure of defining and validating operating points before being
> integrated into the kernel is the correct way to do it, in my opinion.
please take a  look at CpuFreq PwerOP inegration patch set and comment
the code if you see any problems.
> >
> >- OpPoint design does not handle SMP case.
[snip]
> >
> > PowerOP addresses all the issues mentioned above and works for SMP
> > case. Integration with legacy kernel PM code (including constraints
> > and standalone driver suspend/resume) and a certain userspace
> > interface (basically which can be any having current PowerOP interface
> > underneath) are the next steps for PowerOP approach  once the correct
> > brick of PowerOP layer is in place.
>
> It does?
please comment the code if any issues.

Eugeny
>
> David
>
> >
> >  Eugeny
> >
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-28 22:18                       ` Pavel Machek
@ 2006-08-29 21:46                         ` Eugeny S. Mints
  0 siblings, 0 replies; 136+ messages in thread
From: Eugeny S. Mints @ 2006-08-29 21:46 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

2006/8/29, Pavel Machek <pavel@suse.cz>:
> Hi!
>
> Okay, I never liked opPoint framework... so your critism seems okay,
> but do your patches really solve all the points you raised?
it's hard to catch up in this way until people start to comment out
the [patches] code. I haven;t seen any comments on my code except
those Greg made. I'm working to address Greg's comments.
>
> On Sun 2006-08-27 13:54:31, Eugeny S. Mints wrote:
> > 2006/8/26, David Singleton <daviado@gmail.com>:
> > > On 8/19/06, Dave Jones <davej@redhat.com> wrote:
> > > > On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:
> ...
> > >        1) I believe I now have the right kernel interface for a common
> > >        power management infrastructure.
>
> > - md_data has an issue from OO design paradigm perspective.  OpPoint
> > requires an entity above PowerOP to know internals of arch md_data (see
> > centrino-dynamic-powerop.c implementation) and thus requires an arch
>
> Is that what you were trying to solve with string parsing?
Exactly. The parsing is result of PowerOP patches evolution from
exporting 'struct powerop_point' through 'void *' upto string parsing.
'void *' approach would be fine if we agreed to have all  instances
manipulating on points via sysfs/configfs interface. But the
requirement on inkernel operating points registration modules which
came from the discussions on the list still persists. For example
cpufreq needs such inkernel registration module. String parsing comes
to the scene to register power parameters without inclusion of an arch
header file.

OpPoint has not such header file included in the registration module
x86 code just because OpPoint has wrong definition of operating point
with some unknown frequency and voltage fields in 'struct oppoint'
while the fields must be defined in arch dependent part of the code.

I'm still pondering about approach to have parsing only in powerop
core layer according to your comments, btw.

Eugeny
>                                                                Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-29 16:34                         ` Pavel Machek
@ 2006-08-29 17:49                           ` Preece Scott-PREECE
  2006-08-30  6:20                             ` Matthew Locke
  2006-08-30  4:52                           ` David Singleton
  1 sibling, 1 reply; 136+ messages in thread
From: Preece Scott-PREECE @ 2006-08-29 17:49 UTC (permalink / raw)
  To: Pavel Machek, David Singleton; +Cc: linux-pm


> From: linux-pm-bounces@lists.osdl.org 
> [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Pavel Machek
> Sent: Tuesday, August 29, 2006 11:35 AM
> To: David Singleton
> Cc: linux-pm@lists.osdl.org
> Subject: Re: [linux-pm] So, what's the status on the recent 
> patches here?
> 
> Hi!
> > >>         point, by name. There is a new
> > >/sys/power/operating_points directory
> > >>         that shows all the operating points the
> > >system supports. An
> > >>         exampled from my centrino laptop shows:
> > >>
> > >>         /sys/power/operating_points/high
> > >>         /sys/power/operating_points/highest
> > >>         /sys/power/operating_points/low
> > >>         /sys/power/operating_points/lowest
> > >>         /sys/power/operating_points/medium
> > >>         /sys/power/operating_points/mem
> > >>         /sys/power/operating_points/standby
> > >
> > >What makes you think that mixing operating and sleep 
> states is good 
> > >idea?
> > 
> > They are all power states managed by the kernel and in the 
> operating 
> > point concept they are all operating points the system supports.
> 
> That does not make mixing them right.
---

Could you say why you think they shouldn't be mixed? Absent argument to
the contrary, 
making it a single continuum seems appealing. Why have separate
policies?

---
> 
> > The system can be set to any of the supported states by 
> setting their 
> > name in the /sys/power/state file.  I find simplicity is usually a 
> > good thing.
> 
> I believe the quote is 'make it as simple as possible but not 
> simpler'.
---

So, why don't you think this simplification is possible?

---
> 
> > >And '600MHz' makes lot more sense than 'lowest' on centrino.
> > 
> > Perhaps, but the common name space makes it easy for the 
> power manager 
> > daemon to perform the same functions without having to know 
> that the 
> > lowest speed on my laptop is 600Mhz.
> 
> And enumerate english strings in power daemon? Limiting the 
> numver of states?
---

Are you saying that on your laptop, all possible CPU and bus frequencies

can be used independently? So, it would be unnecessarily limiting to
have 
the system designer provide a list of combinations that work? Remember
that
the scope of this is a limited set of parameters, not all the devices in
the system.

---
> 
> > >
> > >>         /sys/power/operating_points/high/frequency
> > >>         /sys/power/operating_points/high/voltage
> > >>         /sys/power/operating_points/high/latency
> > >
> > >What is voltage for 'mem'?
> > 
> > I don't know what the voltage or latency is for mem.  
> > Perhaps Intel could better
> > say what the voltage is in the suspend state and what the 
> latency was 
> > for transistion to that state.  I didn't have the data 
> available when 
> > I wrote the code.
> 
> And you will not have data available even if intel helps you. 
> What is _frequency_ for mem? These fields are meaningless for 
> sleep states; that should tell you that mixing sleep and 
> operating states is bad idea.
---

Why isn't 0 a meaningful value for frequency? And I can imagine
that some hardware might have different voltage options for sleep
States.  Additionally, these sys entries could represent the frequency,
voltage, etc., that the system would go to upon resuming from sleep...

scott  

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-29 15:55                       ` David Singleton
@ 2006-08-29 16:34                         ` Pavel Machek
  2006-08-29 17:49                           ` Preece Scott-PREECE
  2006-08-30  4:52                           ` David Singleton
  0 siblings, 2 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-29 16:34 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

Hi!
> >>         point, by name. There is a new 
> >/sys/power/operating_points directory
> >>         that shows all the operating points the 
> >system supports. An
> >>         exampled from my centrino laptop shows:
> >>
> >>         /sys/power/operating_points/high
> >>         /sys/power/operating_points/highest
> >>         /sys/power/operating_points/low
> >>         /sys/power/operating_points/lowest
> >>         /sys/power/operating_points/medium
> >>         /sys/power/operating_points/mem
> >>         /sys/power/operating_points/standby
> >
> >What makes you think that mixing operating and sleep 
> >states is good
> >idea?
> 
> They are all power states managed by the kernel and in 
> the operating
> point concept they are all operating points the system 
> supports.

That does not make mixing them right.

> The system can be set to any of the supported states by
> setting their name in the /sys/power/state file.  I find 
> simplicity
> is usually a good thing.

I believe the quote is 'make it as simple as possible but not
simpler'.

> >And '600MHz' makes lot more sense than 'lowest' on 
> >centrino.
> 
> Perhaps, but the common name space makes it easy for the 
> power manager
> daemon to perform the same functions without having to 
> know that the lowest
> speed on my laptop is 600Mhz.

And enumerate english strings in power daemon? Limiting the numver of
states?

> >
> >>         /sys/power/operating_points/high/frequency
> >>         /sys/power/operating_points/high/voltage
> >>         /sys/power/operating_points/high/latency
> >
> >What is voltage for 'mem'?
> 
> I don't know what the voltage or latency is for mem.  
> Perhaps Intel could better
> say what the voltage is in the suspend state and what 
> the latency was
> for transistion to that state.  I didn't have the data 
> available when
> I wrote the code.

And you will not have data available even if intel helps you. What is
_frequency_ for mem? These fields are meaningless for sleep states;
that should tell you that mixing sleep and operating states is bad
idea.

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-27 15:41                     ` Pavel Machek
@ 2006-08-29 15:55                       ` David Singleton
  2006-08-29 16:34                         ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: David Singleton @ 2006-08-29 15:55 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On 8/27/06, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
> > > The other alternative as suggested earlier this week would be archictures
> > > getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
> > > necessarily bring anything but the layer of indirection.
> > >
> > > I'm about to disappear for two weeks for a much needed vacation, but
> > > I'll be interested to see other folks comments/opinions on this
> > > when I get back.
> >
> >         This week I got some really good feedback and suggestions
> >         from Mark Gross on the kernel interface and usability and
> >         I have two new additions for this patch set.  So I spend the week
> >         working on a well thought out kernel interface.
> >
> >         1) I believe I now have the right kernel interface for a common
> >         power management infrastructure.
> >
> >         The new kernel interface still uses /sys/power/state to both
> >         show the current operating point and set a desired operating
> >         point, by name. There is a new /sys/power/operating_points directory
> >         that shows all the operating points the system supports. An
> >         exampled from my centrino laptop shows:
> >
> >         /sys/power/operating_points/high
> >         /sys/power/operating_points/highest
> >         /sys/power/operating_points/low
> >         /sys/power/operating_points/lowest
> >         /sys/power/operating_points/medium
> >         /sys/power/operating_points/mem
> >         /sys/power/operating_points/standby
>
> What makes you think that mixing operating and sleep states is good
> idea?

They are all power states managed by the kernel and in the operating
point concept they are all operating points the system supports.

The system can be set to any of the supported states by
setting their name in the /sys/power/state file.  I find simplicity
is usually a good thing.

>
> And '600MHz' makes lot more sense than 'lowest' on centrino.

Perhaps, but the common name space makes it easy for the power manager
daemon to perform the same functions without having to know that the lowest
speed on my laptop is 600Mhz.

I have the same power manager running on my laptop as on my ARM pxa27x
board and it's performing the same operations, even though there are different
frequencies behind the name "lowest".

The frequency is available in the
/sys/power/operating_points/lowest/frequency file,
which the power manager reads when it starts up.

>
> >         /sys/power/operating_points/high/frequency
> >         /sys/power/operating_points/high/voltage
> >         /sys/power/operating_points/high/latency
>
> What is voltage for 'mem'?

I don't know what the voltage or latency is for mem.  Perhaps Intel could better
say what the voltage is in the suspend state and what the latency was
for transistion to that state.  I didn't have the data available when
I wrote the code.


>
> >         I've finally had a bit of time to get the sysfs one file - one
> >         value system in place for OpPoint.
> >
> >         2) The really good news is there is a now a power manager for
> > OpPoint now,
> >         both in rpm and src rpm form.  And since the new power manager runs off
> >         the new kernel interface and actually does what the cpuspeed daemon does
> >         I think the kernel interface is sound.
> >
> >         I took the cpuspeed power manager daemon, version 1.2.1, and modified
> >         it Friday to use the oppoint interface. It supports all the
> >         options the cpuspeed daemon does, (and can actually still be compiled to
> >         be the original cpuspeed daemon) it just uses the interface
> >         described above instead of the cpufreq interfaces.
>
> Congratulations, you now have inferior version of cpufreq ondemand
> governor.

Perhaps inferior, but definitely simpler, and its outside of the kernel.

And it's able to also work on a wider range of system power states
than just scaling
frequency.  The PXA27x controls a lot more clock state than a centino
or powernow
system.  The next set of  systems I'm working on have separate power
domains that
are individually controllable out side of the processor speed and they
fit nicely within
the same framework and can work with the same power manager daemon.

A single power management infrastructure for the kernel would not be an inferior
nor a bad thing.

David


>                                                 Pavel
> --
> Thanks for all the (sleeping) penguins.
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-28 17:39                                     ` Pavel Machek
@ 2006-08-29  7:51                                       ` Matthew Locke
  2006-08-30 22:13                                       ` Mark Gross
  1 sibling, 0 replies; 136+ messages in thread
From: Matthew Locke @ 2006-08-29  7:51 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm


On Aug 28, 2006, at 10:39 AM, Pavel Machek wrote:

> On Mon 2006-08-28 09:40:38, Mark Gross wrote:
>> On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
>>> On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
>>>> On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
>>> Because 8388608 policies is clearly not reasonable, powerop can not
>>> help here, and something better should be developed... like power
>>> domains someone proposed here.
>>>
>>> (Or to say it in another words, powerop forces one big power domain,
>>> which is bad model for notebook-style machine).
>>
>> I doubt notebook-style machines will ever us power op in any
>> significant way.  HPC and embedded will be the first users.
>
> I agree here... power op look useless for notebooks. But I doubt power
> op authors would agree...

Agree that something I work on is useless? Never:)  I know I sound like 
a broken record but...

PowerOP is the basic building block for scaling power management.  Its 
as  useless or useful as the cpufreq_driver layer of cpufreq is on 
laptops.  You can think of PowerOP as a redesign of cpufreq_driver that 
enables other software in the PM stack to select a group of power 
parameter values by a string.  On x86 this other software can continue 
to be cpufreq.   On embedded devices the other software can use the 
powerop sysfs api or kernel APIs.

>
>> Power domains will likely build on top power op.
>>
>> Power domains adds complexities themselves. Dealing with
>> dependencies and constraints between domains will be a challenge.
>
> Once we have power domains in/solved... do we still need power op? I
> thought power op could be useful for solving constrains _inside_ one
> domain, but...

I don't have a specific answer for this.  We will deal with it when we 
port to hardware that has power domain control.


> 								Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-27 20:54                     ` Eugeny S. Mints
  2006-08-28 22:18                       ` Pavel Machek
@ 2006-08-29  1:29                       ` David Singleton
  2006-08-29 22:39                         ` Eugeny S. Mints
  2006-08-31 13:27                         ` Amit Kucheria
  1 sibling, 2 replies; 136+ messages in thread
From: David Singleton @ 2006-08-29  1:29 UTC (permalink / raw)
  To: Eugeny S. Mints; +Cc: linux-pm

On 8/27/06, Eugeny S. Mints <eugeny.mints@gmail.com> wrote:
> 2006/8/26, David Singleton <daviado@gmail.com>:
> > On 8/19/06, Dave Jones <davej@redhat.com> wrote:
> > > On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:
> > >
> > >  > If I had all the existing cpufreq tables transformed
> > >  > into operating points I could make a patch that would remove
> > >  > the bulk of cpufreq code from the kernel and you'd have
> > >  > pretty much the same functionality without the maintenance
> > >  > issues the added layers and complexity bring.
> > >
> > > If this is going to fly at all, I think thats where we need to be headed.
> > > Having two parts of the kernel doing the same thing just seems
> > > very wrong to me.
> > >
> > > The other alternative as suggested earlier this week would be archictures
> > > getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
> > > necessarily bring anything but the layer of indirection.
> > >
> > > I'm about to disappear for two weeks for a much needed vacation, but
> > > I'll be interested to see other folks comments/opinions on this
> > > when I get back.
> >
> [snip]
> >        1) I believe I now have the right kernel interface for a common
> >        power management infrastructure.
> >
> OpPoint continues to focus on user space interface development for
> power management in contrast to that there seem to be an agreemment in
> the comunity to defer this integration due to in fact quite a lot of
> open/undiscussed and complex questions about this integration and
> instead to focus on getting a consensus on operating point structure
> definition and methods to work with the structure instances.

Actually OpPoint is focusing on all the interfaces, user-kernel,
kernel-architecture
independent - power management interfaces, and power management framework -
architecture/platform specific interfaces.

>
> OpPoint continues to focus on integration with CPUFreq in a manner
> which was outlined as an anacceptable during recent discussions on the
> list - removing the concept of a inkernel governor and most of the
> CPUFreq feature code.

The point of OpPoint is to show that a unified power management infrastructure
is possible and that bolting on another power management infrastructure to
the kernel is not the right approach.

OpPoint is not trying to replace cpufreq.  It's trying to unify all
the power management
infrastructures into a a single infrastructure.  OpPoint uses the
cpufreq notifier
infrastructure to do both operating opoint transition and driver
scaling notification,
and it performs the same basic functions as cpufreq, without the
policy and governor
code.  It is also performing all the same Dynamic Power Management functionality
on the pxa27x mainstone.  The point is one infrastructure can support them.

And with the new oppointd power daemon it is performing all the same functions
as cpuspeed did on my laptop, just with a lot less code in the kernel.


>
> OpPoint continues to develop userspace interfaces and integration
> based on operating point definition for which Matt and I posted
> issues/questions several time and the posts have been left without a
> reply.

Sorry, I'm having a hard time keeping up with all the email threads.

>
> Below I'm trying to summurize all issues I see with OpPoint approach
> sometimes using terms defined in PowerOP approach (for example layer
> names).
>
> 'struct powerop' definition
> ------------------------------------
> - frequency, voltage fields are arch specific: not to mention any
> complex embedded case but current definition and OpPoint
> implementation does not work even for x86 SMP case.

Actually frequency, voltage and latency fields are architecture independent
and a necessary peice of information that any power manager must have.

You are right, I have not yet put in the additional layer to support SMP
systems.  That is one of the pieces I'm still working on.

>
> - latency is not an attribute of a certain operating point but a function of
> two arguments - current operating point and a point we are goint to
> switch to. Therefor latency just does not belong to 'struct powerop'

I disagree.

>
> - all hooks are redundant: the hooks are the same for all operating points
> untill we come to the integration with suspend/resume. But we believe the
> intagration needs more investigation at the first place and at the second we
> feel like the integration may be handled on PM Core layer instead
> of having per operating point hooks

The hooks are not redundant nor the same for all operating points.  Each
operating point defines it's prepare, transition, and finish functions for the
hooks.  And different types of operating points may have completely different
functions in those hooks, on the same platform.

>
> - prepare_transition and finish_ransition may be moved even below PM Core to
> clock/voltage framework; needs more carefull investigation though


I disagree.  Both the pm suspend and cpufreq code has them in exactly
the same place.

>
> - md_data has an issue from OO design paradigm perspective.  OpPoint
> requires an entity above PowerOP to know internals of arch md_data (see
> centrino-dynamic-powerop.c implementation) and thus requires an arch
> dependent header file to be included in the code which can be
> impemented in arch independent manner. That would be fine if there was
> no solution to achieve required functionality without such a hack but
> PowerOP provides such approach by dereferencing  power parameters by
> name. File which implements operating points registration in PowerOP
> approach does not include any header file from include/asm-* subtrees.

No, the md_data is the opaque pointer into architecture dependent data.
The power management infrastructure doens't need to know what
data is linked into md_data, just as drivers have driver specific
structs that are opaque to the upper layers of software.

>
> All further pieces porposed by OpPOint base on the above incorrect
> design of the main structure and therefore have issues.

wow.

>
> integration with suspend/resume
> -------------------------------
> - mixing system state and operating point concept (different points
> may correspond to a sleep/standby system state)

The pxa27x code shows that indeed there are more than one suspsend state,
which is why the operating point model works so well on both my
centrino laptop and my pxa27x mainstone running the same oppointd
power daemon.

>
> - legacy PM states are redefined via new OpPOint interface but do not
> use it (explicit 'if' statements in legacy pm code instead of OpPOint
> hooks uilization)

The enter_state code could be merged into the pm_change code, or vice versa,
I haven't had time to make it really unified and pretty.

>
> - names for operating points presented in the original letter below
> implicitly assumed the points are ranged by some order (now it is from
> the highest [power comsumption] to the lowest. However having many
> more power parameters than just one freq and one voltage does not
> allow to range the points in such a way and a string name without
> knowledge of a particular power parameter values is not sufficient

That's not quite correct.  The ordering of names, lowest to highest,
allows the power manager daemon to cover most of the use cases
right out of the box.  It's performing the same functions on both
my centrino laptop and the pxa27x mainstone right now without
any changes to either the power manager or power managenment
config file.

One of the next boards I'm working on has different operating points
at the same frequency, but different voltages.  All that is realy required
to support this a plugin to the power manager that understand the
different operating points so it can best choose when to transition to
each point.

Custom plugins to a power manager that lets the power manager deal
with the unique set of operating points on a particular platform is
one of the really attractive parts of OpPoint.  It won't have to be
woven into the kernel.

> (even in x86 SMP case: not to mention it's hard to me to express SMP
> case in current OpPOint terms but what are names and how to
> distinguish/range 2 CPUs system states corresponded to 'highest point
> for CPU0 + medium for CPU1' against 'low for CPU0 + high for CPU1' ?)

I'm still working on the SMP case.  It's not that I'm ignoring it. Give me
a few more days.

>
> - no example of (at least optional) capability to export information about
> particular power paramenter is presented while it was obviously
> highlighted by embedded community that it is a must


Which parameters besides frequency, voltage and latency are required
to be exported to the power manager?

>
> - direct utilization of PM internal structure 'pm_state' instead of an attempt
> of an API
>
> cpufreq core and a cpufreq driver/OpPOint integration
> -------------------------------
> - integration with legacy cpufreq interface is completely missing in both arch
> (x86 and pxa) examples. If OpOint was a universal approach it would
> allow to build different interfaces on top of it. In this case you can
> porpose more optimized/improved interface if you feel existed
> interface has issues leaving existed interface as a [configurable]
> option and remove it when agreed.

I'm sorry, I don't understand that statement.  I'm still opposed to
dynamic-on-the-fly construction of operating points.  It's really dangerous.
The hardware vendors want it so that new hardware doesn't have to
wait for software before they can sell it.

The cpufreq structure of defining and validating operating points before being
integrated into the kernel is the correct way to do it, in my opinion.

>
> - while clear desgin and interfaces are outlined for so called PM Core
> layer by PowerOP approach this layer is not addressed by OpPoints in
> any way

correct.  They are a different design.

>
> - a cpufreq driver still should contains code to access arch hardware
> while the functionality of cpufreq driver falls into PM Core layer and
> there is no longer reason to have the functionality related to cpufreq
> concept

Is this a statement about PowerOP?  OpPoint doesn't use the PowerOp
PM Core  layer definition.  OpPoint only has 3 layers:

  1) user space power manager and user-kernel interace.

  2) architecture independent layer between the kernel and the power managment
      infrastructure.

  3) The architecture dependent layer that does the work and has to touch
      all the hardware.  The architecture dependent layer is the piece where
      where the hardware specific operating points and functions to transition
      to the operating points are defined.

 This is also why it's so simple to add new architecture and platform support.
All that is needed is the architecture dependent portion to support a new
platform.


>
> - no any integration with clock/voltage framework. Integral solution
> which includes Clock/voltage framework just saves more power [period].

No so.  The mainstone uses the existing <linux/clk.h> clock framework,
and it must
since it supports so many different clocks to transition to a new
operating point.
I'm still open to integrate with any new voltage framework, I just haven't seen
it yet.

I also don't believe it will be a problem integrating with voltage
framework. The
voltage framework will be needed by the architecture dependent pieces of
power management and a common voltage framework will just make it
easier.


>
> x86 cpufreq/OpPoint integration
> -------------------------------
> - struct powerop hooks are expected to be arch specific but intialized by some
> cpufreq core routines
>
> - cpufreq driver still shares cpufreq core cpufreq_frequncy_table structure

Correct, the cpu_frq table structure is the piece that gets the gets
the frequency
and voltage right.  I'm not changing operating points definition for
the existing processor line.   I'm just simplifying the transition to
and from the existing system states.

>
> - integration with legacy cpufreq interface is completely missing

Not quite.  Once the operating points are constructed, from the
same validated data, the oppointd daemon can perform the
same legacy cpufreq functionality.  Governor and policy code moves
out of the kernel into the power manager.  It integrates through the same
cpufreq table data, the same cpufreq notifier lists for transitioning and
scaling drivers, and moves policy management code out of
the kernel into the power daemon.

>
> - OpPoint design does not handle SMP case.
>
> PowerOP addresses all the issues mentioned above and works for SMP
> case. Integration with legacy kernel PM code (including constraints
> and standalone driver suspend/resume) and a certain userspace
> interface (basically which can be any having current PowerOP interface
> underneath) are the next steps for PowerOP approach  once the correct
> brick of PowerOP layer is in place.

It does?

David

>
>  Eugeny
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-27 20:54                     ` Eugeny S. Mints
@ 2006-08-28 22:18                       ` Pavel Machek
  2006-08-29 21:46                         ` Eugeny S. Mints
  2006-08-29  1:29                       ` David Singleton
  1 sibling, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-28 22:18 UTC (permalink / raw)
  To: Eugeny S. Mints; +Cc: linux-pm

Hi!

Okay, I never liked opPoint framework... so your critism seems okay,
but do your patches really solve all the points you raised?

On Sun 2006-08-27 13:54:31, Eugeny S. Mints wrote:
> 2006/8/26, David Singleton <daviado@gmail.com>:
> > On 8/19/06, Dave Jones <davej@redhat.com> wrote:
> > > On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:
...
> >        1) I believe I now have the right kernel interface for a common
> >        power management infrastructure.

> - md_data has an issue from OO design paradigm perspective.  OpPoint
> requires an entity above PowerOP to know internals of arch md_data (see
> centrino-dynamic-powerop.c implementation) and thus requires an arch

Is that what you were trying to solve with string parsing?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-28 16:40                                   ` Mark Gross
@ 2006-08-28 17:39                                     ` Pavel Machek
  2006-08-29  7:51                                       ` Matthew Locke
  2006-08-30 22:13                                       ` Mark Gross
  0 siblings, 2 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-28 17:39 UTC (permalink / raw)
  To: Mark Gross; +Cc: linux-pm

On Mon 2006-08-28 09:40:38, Mark Gross wrote:
> On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> > On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > Because 8388608 policies is clearly not reasonable, powerop can not
> > help here, and something better should be developed... like power
> > domains someone proposed here.
> > 
> > (Or to say it in another words, powerop forces one big power domain,
> > which is bad model for notebook-style machine).
> 
> I doubt notebook-style machines will ever us power op in any
> significant way.  HPC and embedded will be the first users.

I agree here... power op look useless for notebooks. But I doubt power
op authors would agree...

> Power domains will likely build on top power op.
> 
> Power domains adds complexities themselves. Dealing with
> dependencies and constraints between domains will be a challenge.

Once we have power domains in/solved... do we still need power op? I
thought power op could be useful for solving constrains _inside_ one
domain, but...
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-26 13:46                                 ` Pavel Machek
@ 2006-08-28 16:40                                   ` Mark Gross
  2006-08-28 17:39                                     ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Mark Gross @ 2006-08-28 16:40 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > Hi Pavel,
> > 
> > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > >Hi!
> > >
> > >> >> No. The reason is there's no _real_ difference in 'fileserver' and
> > >> >> 'webserver' from the PM POV, so this will never happen.
> > >> >> There's no reason to introduce different policies for different use
> > >> >> cases which however imply similar peripherals utilization.
> > >> >> Moreover, I never play MP3s on a fileserver/webserver. The example
> > >> >> you've given is pretty much artificial.
> > >> >
> > >> >My notebook has 23 different devices. Do you really want to have
> > >> >8388608 policies for different perihepal utilizations?
> > >>
> > >> Can you please elaborate on how this number corresponds to the reality?
> > >> Looks like you don't catch what I'm saying. I'm talking about use
> > >> case-driven model in which you will need to invent 8388608 use cases
> > >> basically in order to have 8388608 policies. IOW, not any combination
> > >> is valid within these 8388608.
> > >
> > >I'm saying that usecase-driven model is not acceptable for a
> > >kernel. It is not kernel's business to limit user to particular usage
> > >models.
> > >
> > >That's why your model works for closed machines like a cellphones, but
> > >is totally broken for notebook. Sorry.
> > 
> > Who talks about kernel? A policy is an userspace thing. I guess we're
> > not quite understanding each other :)
> 
> You upload policies to kernel. You want 5 policies for your cellphone,
> and thats fine, but I'm telling you I'd need 8388608 policies for my
> notebook, because devices are independent and users want separate
> control.

No. Users do not, and if they do they won't deal directly with that large
of a number of power states.

Do not confuse policies with operating points.  The policies define the
sets of operating points that are valid at a given time and the policy
manager attempts to set the optimum OP.

The user will only deal with the set of policies exported by whatever
policy manager is in use.  Not the operating points.

power op is attempting to build a power management stack and is near the
bottom of the stack.

> 
> Because 8388608 policies is clearly not reasonable, powerop can not
> help here, and something better should be developed... like power
> domains someone proposed here.
> 
> (Or to say it in another words, powerop forces one big power domain,
> which is bad model for notebook-style machine).

I doubt notebook-style machines will ever us power op in any
significant way.  HPC and embedded will be the first users.

Power domains will likely build on top power op.

Power domains adds complexities themselves. Dealing with
dependencies and constraints between domains will be a challenge.

It is an interesting thought about implementing powerop interfaces on a
per power domain bases....

--mgross

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-27 19:48                     ` Greg KH
@ 2006-08-28  0:07                       ` David Singleton
  0 siblings, 0 replies; 136+ messages in thread
From: David Singleton @ 2006-08-28  0:07 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-pm

On 8/27/06, Greg KH <greg@kroah.com> wrote:
> On Sat, Aug 26, 2006 at 09:37:14PM -0700, David Singleton wrote:
> >
> >        The patchset and rpmsl are available at:
> >
> >        http://source.mvista.com/~dsingleton
>
> Care to put up a tarball for those of us running on distros that are not
> RPM based?

Okay,  there is tarball, patches and spec file in

    http://source.mvista.com/~dsingleton/2.6.18-rc4/oppointd


David


>
> thanks,
>
> greg k-h
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-27  4:37                   ` David Singleton
  2006-08-27 15:41                     ` Pavel Machek
  2006-08-27 19:48                     ` Greg KH
@ 2006-08-27 20:54                     ` Eugeny S. Mints
  2006-08-28 22:18                       ` Pavel Machek
  2006-08-29  1:29                       ` David Singleton
  2 siblings, 2 replies; 136+ messages in thread
From: Eugeny S. Mints @ 2006-08-27 20:54 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

2006/8/26, David Singleton <daviado@gmail.com>:
> On 8/19/06, Dave Jones <davej@redhat.com> wrote:
> > On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:
> >
> >  > If I had all the existing cpufreq tables transformed
> >  > into operating points I could make a patch that would remove
> >  > the bulk of cpufreq code from the kernel and you'd have
> >  > pretty much the same functionality without the maintenance
> >  > issues the added layers and complexity bring.
> >
> > If this is going to fly at all, I think thats where we need to be headed.
> > Having two parts of the kernel doing the same thing just seems
> > very wrong to me.
> >
> > The other alternative as suggested earlier this week would be archictures
> > getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
> > necessarily bring anything but the layer of indirection.
> >
> > I'm about to disappear for two weeks for a much needed vacation, but
> > I'll be interested to see other folks comments/opinions on this
> > when I get back.
>
[snip]
>        1) I believe I now have the right kernel interface for a common
>        power management infrastructure.
>
OpPoint continues to focus on user space interface development for
power management in contrast to that there seem to be an agreemment in
the comunity to defer this integration due to in fact quite a lot of
open/undiscussed and complex questions about this integration and
instead to focus on getting a consensus on operating point structure
definition and methods to work with the structure instances.

OpPoint continues to focus on integration with CPUFreq in a manner
which was outlined as an anacceptable during recent discussions on the
list - removing the concept of a inkernel governor and most of the
CPUFreq feature code.

OpPoint continues to develop userspace interfaces and integration
based on operating point definition for which Matt and I posted
issues/questions several time and the posts have been left without a
reply.

Below I'm trying to summurize all issues I see with OpPoint approach
sometimes using terms defined in PowerOP approach (for example layer
names).

'struct powerop' definition
------------------------------------
- frequency, voltage fields are arch specific: not to mention any
complex embedded case but current definition and OpPoint
implementation does not work even for x86 SMP case.

- latency is not an attribute of a certain operating point but a function of
two arguments - current operating point and a point we are goint to
switch to. Therefor latency just does not belong to 'struct powerop'

- all hooks are redundant: the hooks are the same for all operating points
untill we come to the integration with suspend/resume. But we believe the
intagration needs more investigation at the first place and at the second we
feel like the integration may be handled on PM Core layer instead
of having per operating point hooks

- prepare_transition and finish_ransition may be moved even below PM Core to
clock/voltage framework; needs more carefull investigation though

- md_data has an issue from OO design paradigm perspective.  OpPoint
requires an entity above PowerOP to know internals of arch md_data (see
centrino-dynamic-powerop.c implementation) and thus requires an arch
dependent header file to be included in the code which can be
impemented in arch independent manner. That would be fine if there was
no solution to achieve required functionality without such a hack but
PowerOP provides such approach by dereferencing  power parameters by
name. File which implements operating points registration in PowerOP
approach does not include any header file from include/asm-* subtrees.

All further pieces porposed by OpPOint base on the above incorrect
design of the main structure and therefore have issues.

integration with suspend/resume
-------------------------------
- mixing system state and operating point concept (different points
may correspond to a sleep/standby system state)

- legacy PM states are redefined via new OpPOint interface but do not
use it (explicit 'if' statements in legacy pm code instead of OpPOint
hooks uilization)

- names for operating points presented in the original letter below
implicitly assumed the points are ranged by some order (now it is from
the highest [power comsumption] to the lowest. However having many
more power parameters than just one freq and one voltage does not
allow to range the points in such a way and a string name without
knowledge of a particular power parameter values is not sufficient
(even in x86 SMP case: not to mention it's hard to me to express SMP
case in current OpPOint terms but what are names and how to
distinguish/range 2 CPUs system states corresponded to 'highest point
for CPU0 + medium for CPU1' against 'low for CPU0 + high for CPU1' ?)

- no example of (at least optional) capability to export information about
particular power paramenter is presented while it was obviously
highlighted by embedded community that it is a must

- direct utilization of PM internal structure 'pm_state' instead of an attempt
of an API

cpufreq core and a cpufreq driver/OpPOint integration
-------------------------------
- integration with legacy cpufreq interface is completely missing in both arch
(x86 and pxa) examples. If OpOint was a universal approach it would
allow to build different interfaces on top of it. In this case you can
porpose more optimized/improved interface if you feel existed
interface has issues leaving existed interface as a [configurable]
option and remove it when agreed.

- while clear desgin and interfaces are outlined for so called PM Core
layer by PowerOP approach this layer is not addressed by OpPoints in
any way

- a cpufreq driver still should contains code to access arch hardware
while the functionality of cpufreq driver falls into PM Core layer and
there is no longer reason to have the functionality related to cpufreq
concept

- no any integration with clock/voltage framework. Integral solution
which includes Clock/voltage framework just saves more power [period].

x86 cpufreq/OpPoint integration
-------------------------------
- struct powerop hooks are expected to be arch specific but intialized by some
cpufreq core routines

- cpufreq driver still shares cpufreq core cpufreq_frequncy_table structure

- integration with legacy cpufreq interface is completely missing

- OpPoint design does not handle SMP case.

PowerOP addresses all the issues mentioned above and works for SMP
case. Integration with legacy kernel PM code (including constraints
and standalone driver suspend/resume) and a certain userspace
interface (basically which can be any having current PowerOP interface
underneath) are the next steps for PowerOP approach  once the correct
brick of PowerOP layer is in place.

 Eugeny

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-27  4:37                   ` David Singleton
  2006-08-27 15:41                     ` Pavel Machek
@ 2006-08-27 19:48                     ` Greg KH
  2006-08-28  0:07                       ` David Singleton
  2006-08-27 20:54                     ` Eugeny S. Mints
  2 siblings, 1 reply; 136+ messages in thread
From: Greg KH @ 2006-08-27 19:48 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

On Sat, Aug 26, 2006 at 09:37:14PM -0700, David Singleton wrote:
> 
>        The patchset and rpmsl are available at:
> 
>        http://source.mvista.com/~dsingleton

Care to put up a tarball for those of us running on distros that are not
RPM based?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-27  4:37                   ` David Singleton
@ 2006-08-27 15:41                     ` Pavel Machek
  2006-08-29 15:55                       ` David Singleton
  2006-08-27 19:48                     ` Greg KH
  2006-08-27 20:54                     ` Eugeny S. Mints
  2 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-27 15:41 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

Hi!

> > The other alternative as suggested earlier this week would be archictures
> > getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
> > necessarily bring anything but the layer of indirection.
> >
> > I'm about to disappear for two weeks for a much needed vacation, but
> > I'll be interested to see other folks comments/opinions on this
> > when I get back.
> 
>         This week I got some really good feedback and suggestions
>         from Mark Gross on the kernel interface and usability and
>         I have two new additions for this patch set.  So I spend the week
>         working on a well thought out kernel interface.
> 
>         1) I believe I now have the right kernel interface for a common
>         power management infrastructure.
> 
>         The new kernel interface still uses /sys/power/state to both
>         show the current operating point and set a desired operating
>         point, by name. There is a new /sys/power/operating_points directory
>         that shows all the operating points the system supports. An
>         exampled from my centrino laptop shows:
> 
>         /sys/power/operating_points/high
>         /sys/power/operating_points/highest
>         /sys/power/operating_points/low
>         /sys/power/operating_points/lowest
>         /sys/power/operating_points/medium
>         /sys/power/operating_points/mem
>         /sys/power/operating_points/standby

What makes you think that mixing operating and sleep states is good
idea?

And '600MHz' makes lot more sense than 'lowest' on centrino.

>         /sys/power/operating_points/high/frequency
>         /sys/power/operating_points/high/voltage
>         /sys/power/operating_points/high/latency

What is voltage for 'mem'?

>         I've finally had a bit of time to get the sysfs one file - one
>         value system in place for OpPoint.
> 
>         2) The really good news is there is a now a power manager for
> OpPoint now,
>         both in rpm and src rpm form.  And since the new power manager runs off
>         the new kernel interface and actually does what the cpuspeed daemon does
>         I think the kernel interface is sound.
> 
>         I took the cpuspeed power manager daemon, version 1.2.1, and modified
>         it Friday to use the oppoint interface. It supports all the
>         options the cpuspeed daemon does, (and can actually still be compiled to
>         be the original cpuspeed daemon) it just uses the interface
>         described above instead of the cpufreq interfaces.

Congratulations, you now have inferior version of cpufreq ondemand
governor.
						Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-20  3:30                 ` Dave Jones
  2006-08-23 18:50                   ` Mark Gross
@ 2006-08-27  4:37                   ` David Singleton
  2006-08-27 15:41                     ` Pavel Machek
                                       ` (2 more replies)
  1 sibling, 3 replies; 136+ messages in thread
From: David Singleton @ 2006-08-27  4:37 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-pm

On 8/19/06, Dave Jones <davej@redhat.com> wrote:
> On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:
>
>  > If I had all the existing cpufreq tables transformed
>  > into operating points I could make a patch that would remove
>  > the bulk of cpufreq code from the kernel and you'd have
>  > pretty much the same functionality without the maintenance
>  > issues the added layers and complexity bring.
>
> If this is going to fly at all, I think thats where we need to be headed.
> Having two parts of the kernel doing the same thing just seems
> very wrong to me.
>
> The other alternative as suggested earlier this week would be archictures
> getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
> necessarily bring anything but the layer of indirection.
>
> I'm about to disappear for two weeks for a much needed vacation, but
> I'll be interested to see other folks comments/opinions on this
> when I get back.

        This week I got some really good feedback and suggestions
        from Mark Gross on the kernel interface and usability and
        I have two new additions for this patch set.  So I spend the week
        working on a well thought out kernel interface.

        1) I believe I now have the right kernel interface for a common
        power management infrastructure.

        The new kernel interface still uses /sys/power/state to both
        show the current operating point and set a desired operating
        point, by name. There is a new /sys/power/operating_points directory
        that shows all the operating points the system supports. An
        exampled from my centrino laptop shows:

        /sys/power/operating_points/high
        /sys/power/operating_points/highest
        /sys/power/operating_points/low
        /sys/power/operating_points/lowest
        /sys/power/operating_points/medium
        /sys/power/operating_points/mem
        /sys/power/operating_points/standby

        In each operating point directory there are three files,
        frequency, voltage and latency.  They show the frequency,
        voltage and transition latency respectively for the operating
        point. An example from my centrino laptop shows:

        /sys/power/operating_points/high/frequency
        /sys/power/operating_points/high/voltage
        /sys/power/operating_points/high/latency

        I've finally had a bit of time to get the sysfs one file - one
        value system in place for OpPoint.

        2) The really good news is there is a now a power manager for
OpPoint now,
        both in rpm and src rpm form.  And since the new power manager runs off
        the new kernel interface and actually does what the cpuspeed daemon does
        I think the kernel interface is sound.

        I took the cpuspeed power manager daemon, version 1.2.1, and modified
        it Friday to use the oppoint interface. It supports all the
        options the cpuspeed daemon does, (and can actually still be compiled to
        be the original cpuspeed daemon) it just uses the interface
        described above instead of the cpufreq interfaces.

        For anyone out there with a centrino laptop there is now
        a complete solution for oppoint power management.  To try
        the solution you'll only have to add three patches to the
        2.6.18-rc4 kernel and install the oppointd rpm.

        The patches you'll need are:

                oppoint-core.patch
                oppoint-cpufreq.patch
                oppoint-x86-centrino.patch

        The oppointd rpm installs these files on your system:

                /etc/sysconfig/oppoint-powermanagement
                /etc/init.d/oppointd
                /usr/sbin/oppointd
                /usr/local/man/man.1/oppointd


        "/etc/init.d/oppoint start"  turns on the daemon and off it
        goes.  The default configuration uses the system load as
        an indicator or operating point to run.  The more idle the
        system is the lower the operating point.  When the system
        is unused it quickly drops to the lowest operating point.
        It's the power management policy I like to use for my laptop.

        Within 2 seconds the system drops into the lowest state if
        the machine is comletely idle, and up to the highest state
        within two seconds when I start off a kernel build.

        The default configuration also uses the acpi AC adapter state.
        When the AC adapter is not plugged the system goes into the low
        state.

        I have a patch for the AMD elan processor line and the
powernow-k6 processors,
        but I don't have any hardware to test them on.  If anyone
        would be willing to help me test the elan, powernow-k6 (and
        soon the powernow-k7) patches I'd really appreciate it.

        The patchset and rpmsl are available at:

        http://source.mvista.com/~dsingleton

        There is one last thing I'd like to point out.  This interface
        can support all the thousands of power management policy combinations
        being discussed. It can support each operating state the system can
        support and all the combinations of each state and a set
        of devices on the system being suspended or not.  And none of
        the power management policies need to be in the kernel to
        be supported.

        For example, this interface will support a power management policy so
        the system can run at the medium state to run an mp3 player and
        suspend the lcd and usb controller, or leave the usb controller
        on if its being used.

        The thousands of combinations of possible power management
        policies belong in the power manager, which can be much more easily
        tailored to the individual platfrom, whether server, laptop or handheld.

David

>
>                 Dave
>
> --
> http://www.codemonkey.org.uk
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-26 13:30                               ` Vitaly Wool
@ 2006-08-26 13:46                                 ` Pavel Machek
  2006-08-28 16:40                                   ` Mark Gross
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-26 13:46 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: linux-pm

On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> Hi Pavel,
> 
> On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> >Hi!
> >
> >> >> No. The reason is there's no _real_ difference in 'fileserver' and
> >> >> 'webserver' from the PM POV, so this will never happen.
> >> >> There's no reason to introduce different policies for different use
> >> >> cases which however imply similar peripherals utilization.
> >> >> Moreover, I never play MP3s on a fileserver/webserver. The example
> >> >> you've given is pretty much artificial.
> >> >
> >> >My notebook has 23 different devices. Do you really want to have
> >> >8388608 policies for different perihepal utilizations?
> >>
> >> Can you please elaborate on how this number corresponds to the reality?
> >> Looks like you don't catch what I'm saying. I'm talking about use
> >> case-driven model in which you will need to invent 8388608 use cases
> >> basically in order to have 8388608 policies. IOW, not any combination
> >> is valid within these 8388608.
> >
> >I'm saying that usecase-driven model is not acceptable for a
> >kernel. It is not kernel's business to limit user to particular usage
> >models.
> >
> >That's why your model works for closed machines like a cellphones, but
> >is totally broken for notebook. Sorry.
> 
> Who talks about kernel? A policy is an userspace thing. I guess we're
> not quite understanding each other :)

You upload policies to kernel. You want 5 policies for your cellphone,
and thats fine, but I'm telling you I'd need 8388608 policies for my
notebook, because devices are independent and users want separate
control.

Because 8388608 policies is clearly not reasonable, powerop can not
help here, and something better should be developed... like power
domains someone proposed here.

(Or to say it in another words, powerop forces one big power domain,
which is bad model for notebook-style machine).
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-26 10:18                             ` Pavel Machek
@ 2006-08-26 13:30                               ` Vitaly Wool
  2006-08-26 13:46                                 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-26 13:30 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

Hi Pavel,

On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
>
> > >> No. The reason is there's no _real_ difference in 'fileserver' and
> > >> 'webserver' from the PM POV, so this will never happen.
> > >> There's no reason to introduce different policies for different use
> > >> cases which however imply similar peripherals utilization.
> > >> Moreover, I never play MP3s on a fileserver/webserver. The example
> > >> you've given is pretty much artificial.
> > >
> > >My notebook has 23 different devices. Do you really want to have
> > >8388608 policies for different perihepal utilizations?
> >
> > Can you please elaborate on how this number corresponds to the reality?
> > Looks like you don't catch what I'm saying. I'm talking about use
> > case-driven model in which you will need to invent 8388608 use cases
> > basically in order to have 8388608 policies. IOW, not any combination
> > is valid within these 8388608.
>
> I'm saying that usecase-driven model is not acceptable for a
> kernel. It is not kernel's business to limit user to particular usage
> models.
>
> That's why your model works for closed machines like a cellphones, but
> is totally broken for notebook. Sorry.

Who talks about kernel? A policy is an userspace thing. I guess we're
not quite understanding each other :)

Vitaly

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 23:26                           ` Vitaly Wool
@ 2006-08-26 10:18                             ` Pavel Machek
  2006-08-26 13:30                               ` Vitaly Wool
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-26 10:18 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: linux-pm

Hi!

> >> No. The reason is there's no _real_ difference in 'fileserver' and
> >> 'webserver' from the PM POV, so this will never happen.
> >> There's no reason to introduce different policies for different use
> >> cases which however imply similar peripherals utilization.
> >> Moreover, I never play MP3s on a fileserver/webserver. The example
> >> you've given is pretty much artificial.
> >
> >My notebook has 23 different devices. Do you really want to have
> >8388608 policies for different perihepal utilizations?
> 
> Can you please elaborate on how this number corresponds to the reality?
> Looks like you don't catch what I'm saying. I'm talking about use
> case-driven model in which you will need to invent 8388608 use cases
> basically in order to have 8388608 policies. IOW, not any combination
> is valid within these 8388608.

I'm saying that usecase-driven model is not acceptable for a
kernel. It is not kernel's business to limit user to particular usage
models.

That's why your model works for closed machines like a cellphones, but
is totally broken for notebook. Sorry.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 22:03       ` Pavel Machek
@ 2006-08-26  2:21         ` Alan Stern
  0 siblings, 0 replies; 136+ messages in thread
From: Alan Stern @ 2006-08-26  2:21 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On Sat, 26 Aug 2006, Pavel Machek wrote:

> Hi!
> 
> > > Are there some patches to test? I'd like to power down USB bus, even
> > > when it has device connected (I do not user fingerprint scanner that
> > > much).
> > 
> > There are some old patches.  I could update them to the current -mm kernel 
> > and post them next week.
> 
> Yes, that would be great.
> 
> > The idea of the patches is that they will autosuspend a USB hub when it
> > has no active (i.e., unsuspended) children, and autosuspending a root hub
> > stops the USB controller from doing DMA.  However, non-hub devices are not
> > yet automatically suspended, so you will have to suspend the fingerprint
> > scanner by hand.
> 
> That is okay, I can do that. It saves 2 hours of battery life on my
> machine...

Come to think of it, you don't need the autosuspend patches to turn these 
devices off.  You can do it right now with your existing kernel, although 
it's a little easier with -mm.  (The reason is that -mm contains a 
development patch which ties a USB device's interfaces to the device 
itself; suspending the device will automatically suspend all its 
interfaces, and likewise resuming the device will automatically resume all 
its interfaces.  With a vanilla kernel you must manually suspend the 
interfaces before you can suspend the device and manually resume them 
after resuming the device.)

Anyway, you can use the deprecated 

	echo -n 2 >/sys/devices/.../power/state

mechanism to suspend all the USB interfaces, devices, and controllers you 
want -- provided you work your way up from the bottom of the device tree.  
The autosuspend patch just makes it simpler, since it takes care of 
suspending and resuming all the hubs for you.

Alan Stern

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 19:55                         ` Pavel Machek
@ 2006-08-25 23:26                           ` Vitaly Wool
  2006-08-26 10:18                             ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-25 23:26 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On 8/25/06, Pavel Machek <pavel@ucw.cz> wrote:
> > No. The reason is there's no _real_ difference in 'fileserver' and
> > 'webserver' from the PM POV, so this will never happen.
> > There's no reason to introduce different policies for different use
> > cases which however imply similar peripherals utilization.
> > Moreover, I never play MP3s on a fileserver/webserver. The example
> > you've given is pretty much artificial.
>
> My notebook has 23 different devices. Do you really want to have
> 8388608 policies for different perihepal utilizations?

Can you please elaborate on how this number corresponds to the reality?
Looks like you don't catch what I'm saying. I'm talking about use
case-driven model in which you will need to invent 8388608 use cases
basically in order to have 8388608 policies. IOW, not any combination
is valid within these 8388608.

Vitaly

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-25 22:11 Woodruff, Richard
  0 siblings, 0 replies; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 22:11 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm, Pavel Machek

> Note that you don't need to replace one device with another having the
> same PID/VID.  It could be the very same device but with new media
loaded.
> That would be just as bad.

Yes, I see your point this time.  Thanks.  If I'm hooking up to a modem
inside a phone via an internal transceiver-less link it has a chance,
but not so well with a USB card reader.

Completely generalized solutions in the power domain seem pretty hard to
come by.  You end up with 'if this class of device and not that class of
device' when you try and optimize.  One way of slicing it up is with
discrete sets...kind of like operating point parameters :)

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 21:46     ` Alan Stern
@ 2006-08-25 22:03       ` Pavel Machek
  2006-08-26  2:21         ` Alan Stern
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 22:03 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm

Hi!

> > Are there some patches to test? I'd like to power down USB bus, even
> > when it has device connected (I do not user fingerprint scanner that
> > much).
> 
> There are some old patches.  I could update them to the current -mm kernel 
> and post them next week.

Yes, that would be great.

> The idea of the patches is that they will autosuspend a USB hub when it
> has no active (i.e., unsuspended) children, and autosuspending a root hub
> stops the USB controller from doing DMA.  However, non-hub devices are not
> yet automatically suspended, so you will have to suspend the fingerprint
> scanner by hand.

That is okay, I can do that. It saves 2 hours of battery life on my
machine...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 21:27   ` Pavel Machek
@ 2006-08-25 21:46     ` Alan Stern
  2006-08-25 22:03       ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Alan Stern @ 2006-08-25 21:46 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On Fri, 25 Aug 2006, Pavel Machek wrote:

> Hi!
> 
> > > The frequency of entering this state should not interfere with my active
> > > use case.
> > > 
> > > -B- After I've put down the USB device, I now can program the internal
> > > SOC bus wrapper for the USB to allow idling of the interconnect.  I also
> > > need to associate the USB remote wake interrupt with a wake up interrupt
> > > to restart my interconnect.  All devices on that interconnect must be in
> > > the same state for the big savings to happen.
> > > 
> > > Certainly for this embedded system, not coordinating the device states
> > > means I can't get the big power savings.
> > 
> > Part of this programming has to be done in the architecture-specific
> > driver for the interconnect.  There already is code being developed to
> > suspend USB buses when they aren't in use (although determining _when_
> > they aren't in use has not yet been implemented).  However this code
> > stops
> 
> Are there some patches to test? I'd like to power down USB bus, even
> when it has device connected (I do not user fingerprint scanner that
> much).

There are some old patches.  I could update them to the current -mm kernel 
and post them next week.

The idea of the patches is that they will autosuspend a USB hub when it
has no active (i.e., unsuspended) children, and autosuspending a root hub
stops the USB controller from doing DMA.  However, non-hub devices are not
yet automatically suspended, so you will have to suspend the fingerprint
scanner by hand.

Alan Stern

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 21:21 Woodruff, Richard
@ 2006-08-25 21:42 ` Alan Stern
  0 siblings, 0 replies; 136+ messages in thread
From: Alan Stern @ 2006-08-25 21:42 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-pm, Pavel Machek

On Fri, 25 Aug 2006, Woodruff, Richard wrote:

> > > > ?  Wouldn't suspending the entire bus completely stop the
> throughput
> > > of
> > > > any attached device?
> > >
> > > Not necessarily (right?).  If I shut off VBUS then yes then I need
> to
> > > re-enumerate for sure.  One might figure out a cleaver way of
> shutting
> > > it off and turning it back on in between host requests.
> > 
> > No, you can't do that.  Without VBUS power there's no reliable way to
> > detect disconnects or media changes.
> 
> That's a good point.  I wonder in this case if it is possible to keep a
> list of what was there.  When you re-power if its not there, it should
> be like it was disconnected, then act accordingly.  If it is there all
> is ok.  It seems unlikely that the same PID/VID device would have
> replaced it.  New devices would show up as not being there before.
> 
> Coding that all up would likely be a bunch of work assuming it was
> possible.

There have been long discussions about this in the past, mainly focused on 
suspend-to-disk (which generally turns off all power to the USB 
controllers).  They were pretty much inconclusive; it's safe to assume no 
progress will be made on supporting this for a long time, if ever.

Note that you don't need to replace one device with another having the 
same PID/VID.  It could be the very same device but with new media loaded.  
That would be just as bad.

Alan Stern

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 20:34 ` Alan Stern
@ 2006-08-25 21:27   ` Pavel Machek
  2006-08-25 21:46     ` Alan Stern
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 21:27 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm

Hi!

> > The frequency of entering this state should not interfere with my active
> > use case.
> > 
> > -B- After I've put down the USB device, I now can program the internal
> > SOC bus wrapper for the USB to allow idling of the interconnect.  I also
> > need to associate the USB remote wake interrupt with a wake up interrupt
> > to restart my interconnect.  All devices on that interconnect must be in
> > the same state for the big savings to happen.
> > 
> > Certainly for this embedded system, not coordinating the device states
> > means I can't get the big power savings.
> 
> Part of this programming has to be done in the architecture-specific
> driver for the interconnect.  There already is code being developed to
> suspend USB buses when they aren't in use (although determining _when_
> they aren't in use has not yet been implemented).  However this code
> stops

Are there some patches to test? I'd like to power down USB bus, even
when it has device connected (I do not user fingerprint scanner that
much).
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-25 21:21 Woodruff, Richard
  2006-08-25 21:42 ` Alan Stern
  0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 21:21 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm, Pavel Machek

> > > ?  Wouldn't suspending the entire bus completely stop the
throughput
> > of
> > > any attached device?
> >
> > Not necessarily (right?).  If I shut off VBUS then yes then I need
to
> > re-enumerate for sure.  One might figure out a cleaver way of
shutting
> > it off and turning it back on in between host requests.
> 
> No, you can't do that.  Without VBUS power there's no reliable way to
> detect disconnects or media changes.

That's a good point.  I wonder in this case if it is possible to keep a
list of what was there.  When you re-power if its not there, it should
be like it was disconnected, then act accordingly.  If it is there all
is ok.  It seems unlikely that the same PID/VID device would have
replaced it.  New devices would show up as not being there before.

Coding that all up would likely be a bunch of work assuming it was
possible.


Thanks,
Richard W.
 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 20:57 Woodruff, Richard
@ 2006-08-25 21:13 ` Alan Stern
  0 siblings, 0 replies; 136+ messages in thread
From: Alan Stern @ 2006-08-25 21:13 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-pm, Pavel Machek

On Fri, 25 Aug 2006, Woodruff, Richard wrote:

> > ?  Wouldn't suspending the entire bus completely stop the throughput
> of
> > any attached device?
> 
> Not necessarily (right?).  If I shut off VBUS then yes then I need to
> re-enumerate for sure.  One might figure out a cleaver way of shutting
> it off and turning it back on in between host requests.

No, you can't do that.  Without VBUS power there's no reliable way to
detect disconnects or media changes.

>  If I have a USB
> drive attached it might well be the protocol layers are smart enough
> keep using the device once it is restarted at the low layers.  They just
> have to wait longer for the data to arrive.
> 
> If I just do USB bus suspend the device at the other end can signal
> remote wake up to me.  I tell him suspend and he agrees to drop to a low
> power state, then I lower current capacity.  When he has some data he
> can let me know and I'll up the VBUS current capacity and then go talk
> to him.  If I have data I want from him I directly wake him up.

Assuming the device has remote wake-up capability.  And assuming the 
latency of a remote wakeup isn't too high.  I tried doing some tests using 
a USB keyboard; when the device was suspended and I woke it up by typing 
on it, nearly every time the first few keystrokes were lost.

> > But this doesn't require any over-reaching global coordination.  All
> it
> > needs is for each driver to know when it's not being used.
> 
> Applying an activity time of sorts to each driver 'might' end up in
> situations where you get good power savings.  Assuming everything lines
> up.  However, given hardware and software bugs and errata, it seems
> forcing the situation is much more likely to succeed and make sure you
> hit your targets.

Activity timers might be appropriate for some devices but not for others.  
For instance, a USB mass-storage device will always have a lower-level
driver below the USB driver (for instance, a disk or CD driver that uses
the USB driver for its transport).  Suspending the USB driver can't be
done unless the lower-level driver is suspended first, because it might 
have unexpected side effects such as spinning down a drive.  My point 
being that an inactivity timer might be appropriate at the level of the 
disk or CD driver, but not at the level of a USB mass-storage driver.

> Even with a per-driver activity timer how does one set the time out
> levels for the whole system?  You need some kind of policy pieces to set
> all the knobs.  Letting the self adjust won't likely work for QOS
> (quality of service type things) unless they are set very
> conservatively.

I would imagine it depends very heavily on the type of system you're 
talking about.  On desktops and laptops, for example, Windows seems to get 
along okay with a small handful of system-level inactivity timers.

Alan Stern

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-25 20:57 Woodruff, Richard
  2006-08-25 21:13 ` Alan Stern
  0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 20:57 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm, Pavel Machek

> ?  Wouldn't suspending the entire bus completely stop the throughput
of
> any attached device?

Not necessarily (right?).  If I shut off VBUS then yes then I need to
re-enumerate for sure.  One might figure out a cleaver way of shutting
it off and turning it back on in between host requests.  If I have a USB
drive attached it might well be the protocol layers are smart enough
keep using the device once it is restarted at the low layers.  They just
have to wait longer for the data to arrive.

If I just do USB bus suspend the device at the other end can signal
remote wake up to me.  I tell him suspend and he agrees to drop to a low
power state, then I lower current capacity.  When he has some data he
can let me know and I'll up the VBUS current capacity and then go talk
to him.  If I have data I want from him I directly wake him up.

In both cases the extra overhead causes my throughput to drop, but I
still have an effective connection to the device.

At the PM summit Len made a nice observation that you can map all the
processor ACPI-States right to devices.  You can have T states with them
if you 'throttle' them (slow down their clock).  You can have P states
with voltage/frequency changes.  And you can have C-states (1,2,3) by
shutting down devices in-between accesses, etc.  You can likely have a
single state representation which covers processors and devices.  You
might even implement 'race to idle' conditions at the devices.  Get the
work done then shut off to as low a state as you can and still have
acceptable latency.  This was what sparked the 'on-ness' discussion a
while back on this list.

 
> > The frequency of entering this state should not interfere with my
active
> > use case.
> >
> > -B- After I've put down the USB device, I now can program the
internal
> > SOC bus wrapper for the USB to allow idling of the interconnect.  I
also
> > need to associate the USB remote wake interrupt with a wake up
interrupt
> > to restart my interconnect.  All devices on that interconnect must
be in
> > the same state for the big savings to happen.
> >
> > Certainly for this embedded system, not coordinating the device
states
> > means I can't get the big power savings.
> 
> Part of this programming has to be done in the architecture-specific
> driver for the interconnect.  There already is code being developed to
> suspend USB buses when they aren't in use (although determining _when_
> they aren't in use has not yet been implemented).  However this code
stops
> at the level of the USB controller.  Further development is being
stymied
> by lack of information about how to detect the controller's wake-up
events
> on regular desktop systems; it's possible someone might implement this
> first on an embedded platform.

That seems possible.

> But this doesn't require any over-reaching global coordination.  All
it
> needs is for each driver to know when it's not being used.

Applying an activity time of sorts to each driver 'might' end up in
situations where you get good power savings.  Assuming everything lines
up.  However, given hardware and software bugs and errata, it seems
forcing the situation is much more likely to succeed and make sure you
hit your targets.

Even with a per-driver activity timer how does one set the time out
levels for the whole system?  You need some kind of policy pieces to set
all the knobs.  Letting the self adjust won't likely work for QOS
(quality of service type things) unless they are set very
conservatively.

Thanks,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-25 20:22 Woodruff, Richard
@ 2006-08-25 20:34 ` Alan Stern
  2006-08-25 21:27   ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Alan Stern @ 2006-08-25 20:34 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-pm, Pavel Machek

On Fri, 25 Aug 2006, Woodruff, Richard wrote:

> There are two sides to this in the case for the embedded processor I'm
> using.
> 
> -A- you have to put the USB bus into suspend mode.
> 	- This will lower the throughput of the device on the other end.

?  Wouldn't suspending the entire bus completely stop the throughput of 
any attached device?

> The frequency of entering this state should not interfere with my active
> use case.
> 
> -B- After I've put down the USB device, I now can program the internal
> SOC bus wrapper for the USB to allow idling of the interconnect.  I also
> need to associate the USB remote wake interrupt with a wake up interrupt
> to restart my interconnect.  All devices on that interconnect must be in
> the same state for the big savings to happen.
> 
> Certainly for this embedded system, not coordinating the device states
> means I can't get the big power savings.

Part of this programming has to be done in the architecture-specific
driver for the interconnect.  There already is code being developed to
suspend USB buses when they aren't in use (although determining _when_
they aren't in use has not yet been implemented).  However this code stops
at the level of the USB controller.  Further development is being stymied
by lack of information about how to detect the controller's wake-up events
on regular desktop systems; it's possible someone might implement this
first on an embedded platform.

But this doesn't require any over-reaching global coordination.  All it 
needs is for each driver to know when it's not being used.

Alan Stern

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-25 20:22 Woodruff, Richard
  2006-08-25 20:34 ` Alan Stern
  0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 20:22 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

> > > No, it is because USB enabled prevents cpu from sleeping; it is
> > > actually well known.
> >
> > I vaguely recall hearing the why.  It has some DMAs which are going
on
> > and I suppose the processor must service the completions.
> > Now, if you coordinated with the USB device some how, you could try
> > > and
> 
> If I coordinated with USB device somehow, I'd know when it is possible
> to shutoff usb bus. This can be done locally at usb driver, no need
> for big framework. Just someone needs to write that code.

There are two sides to this in the case for the embedded processor I'm
using.

-A- you have to put the USB bus into suspend mode.
	- This will lower the throughput of the device on the other end.
The frequency of entering this state should not interfere with my active
use case.

-B- After I've put down the USB device, I now can program the internal
SOC bus wrapper for the USB to allow idling of the interconnect.  I also
need to associate the USB remote wake interrupt with a wake up interrupt
to restart my interconnect.  All devices on that interconnect must be in
the same state for the big savings to happen.

Certainly for this embedded system, not coordinating the device states
means I can't get the big power savings.

Now, programming up millions of combinations is not feasible.  However
you can profile your usage and target common use cases.  Playing MP3 and
reading a document for instance.

Thanks,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-24 14:52 Woodruff, Richard
@ 2006-08-25 19:58 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 19:58 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-pm

Hi!

> > For notebooks, devices *are* islands. powerop tries to push
> > everything-depends-on-everything model that may be good for some SoC,
> > but sucks for notebooks. We need some middle ground.
> 
> USB being enabled and causing your laptop battery to dry up is a case
> where laptop device dependency has been shown.  There are likely many
> more cases.  I would expect BIOS/chip set developers are all too aware
> of these in their sub-domains.

No, it is because USB enabled prevents cpu from sleeping; it is
actually well known.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-24 12:58                       ` Vitaly Wool
@ 2006-08-25 19:55                         ` Pavel Machek
  2006-08-25 23:26                           ` Vitaly Wool
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 19:55 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: linux-pm

On Thu 2006-08-24 16:58:50, Vitaly Wool wrote:
> On 8/23/06, Pavel Machek <pavel@ucw.cz> wrote:
> >> >
> >> >This seems to be too specific to embedded machine.
> >> >
> >> >If userspace wants to work with usb and play mp3s at the same time,
> >> >what does it do?
> >>
> >> I guess it just defines an appropriate policy. You can call it
> >> 'usb_mp3' if you wish ;)
> >> I don't think it's too embedded-specific.
> >
> >Well, it leads to exponential number of policies -- not nice. Having
> >usb_mp3_fileserver_webserver is not nice.
> 
> No. The reason is there's no _real_ difference in 'fileserver' and
> 'webserver' from the PM POV, so this will never happen.
> There's no reason to introduce different policies for different use
> cases which however imply similar peripherals utilization.
> Moreover, I never play MP3s on a fileserver/webserver. The example
> you've given is pretty much artificial.

My notebook has 23 different devices. Do you really want to have
8388608 policies for different perihepal utilizations?

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-24 14:52 Woodruff, Richard
  2006-08-25 19:58 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-24 14:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

> > > ...which is very bad interface for applications. See my other
> > > mail. Applications should not have to play with fast/medium/slow,
> > > explicitely. Instead, on opening /dev/dsp, you should power up the
> > > sound system (and maybe adjust cpu frequency if
> > > neccessary). Application should not have to do echo fast >
somewhere
> > > before opening /dev/dsp
> >
> > How does /dev/dsp know at what level it can run at?  On the SOC I
> > control the speed of the DSP.  I can adjust its MIPs rate.
> 
> (I meant /dev/dsp -- OSS audio device, not Digital Signal Processor).

It can be all the same.

The device behind /dev/dsp doing the work likely is the main control
processor or it is a DSP (and some side mixer chip or not).  The DSP in
the laptop case might be hidden inside some PCI composite device and
present its own interface.  Thus you may treat it as some discrete
device with one register range.  On the SOC it is all unrolled and you
must control all the pieces individually (and in concert with each
other).  The DSP in both cases may have its own OS also running.  This
is generally what you download as firmware.

When I want to frequency and/or voltage scale I must take into account
what the DSP is doing and what the applications processor is doing.
This is not so different from today's SMP/Core-Duo type systems where
both CPUs are in the same voltage plane.  You can't change one with out
affecting the other.

The internal busses in SOCs wrap all these integrated peripherals in a
common way and add power hooks.  This allows them to achieve massive
power optimizations which are not likely possible in the PC world.

> > A missing pieces is meaningful coordination between devices.  Each
> > device is not an island.  Not taking care of all devices on the
internal
> > interconnects may mean you don't get the big power savings.  For the
DSP
> 
> For notebooks, devices *are* islands. powerop tries to push
> everything-depends-on-everything model that may be good for some SoC,
> but sucks for notebooks. We need some middle ground.

USB being enabled and causing your laptop battery to dry up is a case
where laptop device dependency has been shown.  There are likely many
more cases.  I would expect BIOS/chip set developers are all too aware
of these in their sub-domains.

On a PC it might be hardware bugs and software bugs which are cause some
of the problems.  This is the case for embedded also.  An embedded SOC
does have another dimension in that they are designed to have global
system power states which include all devices (a processor is just
another device and their may be many).  Their high level of integration
enables this.  Linux's device model doesn't match up well with this.
There are X standard ways in which it is done by various vendors.
PowerOP at the low level provides a mechanism to abstract these
differences.

Thanks,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-23 12:28                     ` Pavel Machek
  2006-08-23 15:26                       ` Igor Stoppa
@ 2006-08-24 12:58                       ` Vitaly Wool
  2006-08-25 19:55                         ` Pavel Machek
  1 sibling, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-24 12:58 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On 8/23/06, Pavel Machek <pavel@ucw.cz> wrote:
> > >
> > >This seems to be too specific to embedded machine.
> > >
> > >If userspace wants to work with usb and play mp3s at the same time,
> > >what does it do?
> >
> > I guess it just defines an appropriate policy. You can call it
> > 'usb_mp3' if you wish ;)
> > I don't think it's too embedded-specific.
>
> Well, it leads to exponential number of policies -- not nice. Having
> usb_mp3_fileserver_webserver is not nice.

No. The reason is there's no _real_ difference in 'fileserver' and
'webserver' from the PM POV, so this will never happen.
There's no reason to introduce different policies for different use
cases which however imply similar peripherals utilization.
Moreover, I never play MP3s on a fileserver/webserver. The example
you've given is pretty much artificial.

Vitaly

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-19  6:10           ` David Singleton
  2006-08-22  2:13             ` Greg KH
  2006-08-23 19:05             ` Mark Gross
@ 2006-08-24 12:39             ` Pavel Machek
  2 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-24 12:39 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

Hi!


> >> This adds a whole bunch of new code, and doesn't seem to make any
> >> existing code any simpler (to me at least).  From a cpufreq point of 
> >view,
> >> what does adding this buy us? What problem do we have today that is
> >> being solved by all this?
> 
> Greg and Dave,
> 
>                 there are two competing patch sets for a new power 
>                 management
>        framework.   The patch set I sent out simplifies power management,
>        from both the cpufreq perspective and the embedded world's view of
>        power management.
> 
>                I've renamed my patch oppoint so as not confuse it
>        with the powerop set from Matt Locke (which will probably make
>        it even more confusing).  I've renamed it so it can be seen as an
>        alternative design approach, not just an alternative implementation
>        of the same ideas.  I've also incorporated suggestions from
>        Pavel in cleaning up the original patches.
> 
>                If you'd be willing to take a look at, or try out, the 
>                patches
>        in my patch set you should be able to see how oppoint could simplify
>        cpufreq code.  The first patch is the oppoint-cpufreq.patch and
>        the second is the oppoint-x86-centrino.patch.
> 
>                Oppoint could replace large pieces of the cpufreq code
>        in the kernel, most notably the policy and governor code, which I
>        believe belongs in user space in the power manager daemon.

I was told (by intel folks) that you can't push governor code into
userspace, because it is latency-critical on new cpus... so I do not
think this is going to simplify cpufreq.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-24 12:16 Woodruff, Richard
@ 2006-08-24 12:29 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-24 12:29 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-pm


> > > I have some notion that a policy manager can create a state with
> simple
> > > & general names like fast, medium, slow (whatever) which is the
> > > interface in which applications might speak.  A complex policy
> > > manager
> > 
> > ...which is very bad interface for applications. See my other
> > mail. Applications should not have to play with fast/medium/slow,
> > explicitely. Instead, on opening /dev/dsp, you should power up the
> > sound system (and maybe adjust cpu frequency if
> > neccessary). Application should not have to do echo fast > somewhere
> > before opening /dev/dsp
> 
> How does /dev/dsp know at what level it can run at?  On the SOC I
> control the speed of the DSP.  I can adjust its MIPs rate.  

(I meant /dev/dsp -- OSS audio device, not Digital Signal Processor). 

> A missing pieces is meaningful coordination between devices.  Each
> device is not an island.  Not taking care of all devices on the internal
> interconnects may mean you don't get the big power savings.  For the DSP

For notebooks, devices *are* islands. powerop tries to push
everything-depends-on-everything model that may be good for some SoC,
but sucks for notebooks. We need some middle ground.  

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-24 12:16 Woodruff, Richard
  2006-08-24 12:29 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-24 12:16 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

> > To some extent having lots of specific policies in the embedded
space is
> > inevitable.  The hardware is very tightly coupled.  You may have
> 
> Maybe. But you certainly do not have to export that uglyness to
> userspace. 

Some aspects are easier to manage.  If data base like operations and
such are needed it's a more friendly place.

Sysfs exporting every gory detail about PCI or USB doesn't seem so far
from this kind of thing.

> > I have some notion that a policy manager can create a state with
simple
> > & general names like fast, medium, slow (whatever) which is the
> > interface in which applications might speak.  A complex policy
> > manager
> 
> ...which is very bad interface for applications. See my other
> mail. Applications should not have to play with fast/medium/slow,
> explicitely. Instead, on opening /dev/dsp, you should power up the
> sound system (and maybe adjust cpu frequency if
> neccessary). Application should not have to do echo fast > somewhere
> before opening /dev/dsp

How does /dev/dsp know at what level it can run at?  On the SOC I
control the speed of the DSP.  I can adjust its MIPs rate.  

We do internally have some run time algorithms on the DSP which allow it
to feed statistics back about how well it is doing... like did I drop
any frames, and how close was I to my deadline in decoding a sample.  So
there is some low level things which can be done.  The DSP is currently
has management code in the kernel (bridge driver) and it has a user
space component which can load algorithms and such through the bridge to
do things.

A missing pieces is meaningful coordination between devices.  Each
device is not an island.  Not taking care of all devices on the internal
interconnects may mean you don't get the big power savings.  For the DSP
and the Control processor to work you need each device enabled to a
sufficient performance level.  Setting them to all high means you don't
get the savings.

Doing this kind of coordination which is very specific to your use case
is difficult to achieve in a generalized fashion.  Splitting some of
this work between user and kernel space can help.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-23 19:20 Woodruff, Richard
@ 2006-08-24  8:03 ` Pavel Machek
  0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-24  8:03 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-pm

Hi!

> > > I guess it just defines an appropriate policy. You can call it
> > > 'usb_mp3' if you wish ;)
> > > I don't think it's too embedded-specific.
> > 
> > Well, it leads to exponential number of policies -- not nice. Having
> > usb_mp3_fileserver_webserver is not nice.
> 
> To some extent having lots of specific policies in the embedded space is
> inevitable.  The hardware is very tightly coupled.  You may have

Maybe. But you certainly do not have to export that uglyness to
userspace.

> I have some notion that a policy manager can create a state with simple
> & general names like fast, medium, slow (whatever) which is the
> interface in which applications might speak.  A complex policy
> manager

...which is very bad interface for applications. See my other
mail. Applications should not have to play with fast/medium/slow,
explicitely. Instead, on opening /dev/dsp, you should power up the
sound system (and maybe adjust cpu frequency if
neccessary). Application should not have to do echo fast > somewhere
before opening /dev/dsp.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-18 11:48                   ` Amit Kucheria
@ 2006-08-24  7:59                     ` Pavel Machek
  2006-08-30 11:00                       ` Amit Kucheria
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-24  7:59 UTC (permalink / raw)
  To: Amit Kucheria; +Cc: linux-pm

Hi!

> > > The userspace interface in Eungeny's patches is for other userspace
> > > programs (policy managers) to activate/deactivate valid operating points
> > > in the system dynamically and if necessary, introduce new ones into the
> > > system. It will also allow the operating points to be referenced by name
> > > instead of the tuple.
> > > 
> > > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > > 'powersave', 'usb' to switch to the relevant operating point based on
> > > configuration of the policy manager.
> > 
> > This seems to be too specific to embedded machine.
> > 
> > If userspace wants to work with usb and play mp3s at the same time,
> > what does it do?
> 
> Switch to 'fast'?
> 
> The operating point for a use-case specifies the _minimum_ required for
> the use-case. You can always go up.

> The system designer is responsible for 'designing' operating points that
> take into account multiple use-cases. Designing here refers to mapping
> use-cases to HW operating points.

Yes, and that's why I argue this is unsuitable for notebook: there are
just too many usecases for a notebook.

> Consider an example system with a main CPU and a DSP. To simplify
> discussion, lets assume 3 levels for CPU and DSP speeds and system
> voltage. Then, here is what an example operating-point to use-case
> mapping table could look like:
> 
> #     CPU speed      DSP speed      Voltage       use-case
> ----------------------------------------------------------
> 1.    high           high           high          fast, video
> 2.    med            high           high          
> 3.    med            med            med           usb[1]
> 4.    low            med            med           mp3
> 5.    low            low            low           powersave
> 
> [1] USB has voltage constraint (voltage >= med)

So... you take three independend parametrs and merge them into one,
named parameter. Bad idea.

What about simply having these parameters:

usb on or off

cpu speed (controlled by cpufreq)

dsp speed (controlled by userspace)

Then you can have infrastructure that is able to compute system
voltage from usb/cpu/dsp speed, and users stll have interface they can
understand.

(How are they supposed to know if video use case is compatible with
usb? They should not have to).

> - Now if we are playing mp3, we switch to OP 4.

Do you expect all mp3 playing applications to play with
/sys/.../powerop-point? How do you tell if mp3's are playing? These
are hard questions for a notebook.

> - Add usb and we switch to OP 3.
> - Now our performance monitor (e.g load avg) indicates that we need more
> CPU processing. So we switch to OP 2.

That's cpufreq job, please

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-23 19:20 Woodruff, Richard
  2006-08-24  8:03 ` Pavel Machek
  0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-23 19:20 UTC (permalink / raw)
  To: Pavel Machek, Vitaly Wool; +Cc: linux-pm

> > I guess it just defines an appropriate policy. You can call it
> > 'usb_mp3' if you wish ;)
> > I don't think it's too embedded-specific.
> 
> Well, it leads to exponential number of policies -- not nice. Having
> usb_mp3_fileserver_webserver is not nice.

To some extent having lots of specific policies in the embedded space is
inevitable.  The hardware is very tightly coupled.  You may have almost
all the functionality of a PC some 5 years back on a single chip.  In
that kind of environment not taking into account the chip as a whole
means you do power at a 10x or say 100x of optimal.  You don't get the
big interconnect savings unless you link the all the individual device
states with the processor states.  A 400mA@1.3v might sound good to a PC
centric person, but when the design target is 4mA@1.3v it is not good.

I have some notion that a policy manager can create a state with simple
& general names like fast, medium, slow (whatever) which is the
interface in which applications might speak.  A complex policy manager
will associate this name with device and cpu states in great detail.  A
more general purpose one only need map it to some governor and its run
time parameters.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-19  6:10           ` David Singleton
  2006-08-22  2:13             ` Greg KH
@ 2006-08-23 19:05             ` Mark Gross
  2006-08-24 12:39             ` Pavel Machek
  2 siblings, 0 replies; 136+ messages in thread
From: Mark Gross @ 2006-08-23 19:05 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

On Fri, Aug 18, 2006 at 11:10:02PM -0700, David Singleton wrote:
> On 8/14/06, Greg KH <greg@kroah.com> wrote:
> >On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> >>
> >> This adds a whole bunch of new code, and doesn't seem to make any
> >> existing code any simpler (to me at least).  From a cpufreq point of 
> >view,
> >> what does adding this buy us? What problem do we have today that is
> >> being solved by all this?
> 
> Greg and Dave,
> 
>                 there are two competing patch sets for a new power 
>                 management
>        framework.   The patch set I sent out simplifies power management,
>        from both the cpufreq perspective and the embedded world's view of
>        power management.

Why can't we have one evolve a single powerOP framework?  Both of these
patches are derived from The MV/Todd Poynor's patches.  It seems "funny"
to not coordinate these two patch sets.

> 
>                I've renamed my patch oppoint so as not confuse it
>        with the powerop set from Matt Locke (which will probably make
>        it even more confusing).  I've renamed it so it can be seen as an
>        alternative design approach, not just an alternative implementation
>        of the same ideas.  I've also incorporated suggestions from
>        Pavel in cleaning up the original patches.
> 
>                If you'd be willing to take a look at, or try out, the 
>                patches
>        in my patch set you should be able to see how oppoint could simplify
>        cpufreq code.  The first patch is the oppoint-cpufreq.patch and
>        the second is the oppoint-x86-centrino.patch.

How would the ACPI cpufreq_driver be integrated with this design?

> 
>                Oppoint could replace large pieces of the cpufreq code
>        in the kernel, most notably the policy and governor code, which I
>        believe belongs in user space in the power manager daemon.

How will the users of on-demand make use of this design?
I don't think you can just dump the governor function of CPUFREQ for
user defined performance control.

> 
>                You'll notice that the oppoint-cpufreq.patch only touches
>        two files, cpufreq.c and cpufreq.h. It only creates two new 
>        interfaces
>        to the cpufreq frequency scaling notifier lists to support driver pre
>        and post scaling routines, already supported in the kernel.

re-using the cpufreq notification infrastructure makes sense.

>                The oppoint-x86-centrino.patch completes the replacement
>        of cpufreq code by introducing the transition routine to
>        change frequencies and creates operating points for the
>        centrino-speedstep processors already supported by Linux.
> 
>        (although I've recieved a note from Intel that the data I've copied
>        from the centrino-speedstep cpufreq tables is known to be inaccurate
>        and unsupported)
> 
>                This code could replace cpufreq code and simplify it quite a
>        bit in the process.  The kernel drivers that support cpufreq 
>        frequency

Only for user mode governors, I believe kernel mode governors still have
role in Linux.



--mgross

>        scaling would not have to be changed.  Operating points for the rest
>        of the processors that support cpufreq would have to be created, but
>        as you can see it's quite a straight forward transformation from
>        a cpufreq table to a set of operating points for a processor.
> 
>                The entire patch set can be found at:
> 
>                http://source.mvista.com/~dsingleton/2.6.18-rc4/
> 
>        The patch set consists of:
> 
>                oppoint-core.patch
>                oppoint-cpufreq.patch
>                oppoint-x86-centrino.patch
>                oppoint-arm-pxa27x.patch
> 
>        I'll attach oppoint-cpufreq.patch  to this email and
>        send out oppoint-x86-centrino.patch next.
> 
> 
> David
> 
> 
> 
> >>
> >> Every explanation of powerop I've seen so far dives into microdetails,
> >> whilst the 10,000ft view has always passed me by other than "this is
> >> what we've had in the embedded world".
> >>
> >> The diagram at 
> >http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> >> also confuses me.  I was under the impression that powerop was adding 
> >additional
> >> userspace interfaces.  If we're not changing how things from a userspace
> >> point of view, we're churning a lot of kernel code,.. why?
> >>
> >> Clue me in here, I'm feeling thick.
> >
> >You're not alone, I really don't get it either.
> >
> >But I guess we'll just wait for the next round of unified patches and
> >then go from there.
> >
> >thanks,
> >
> >greg k-h
> >_______________________________________________
> >linux-pm mailing list
> >linux-pm@lists.osdl.org
> >https://lists.osdl.org/mailman/listinfo/linux-pm
> >


> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-20  3:30                 ` Dave Jones
@ 2006-08-23 18:50                   ` Mark Gross
  2006-08-27  4:37                   ` David Singleton
  1 sibling, 0 replies; 136+ messages in thread
From: Mark Gross @ 2006-08-23 18:50 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-pm

On Sat, Aug 19, 2006 at 11:30:44PM -0400, Dave Jones wrote:
> On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:
> 
>  > If I had all the existing cpufreq tables transformed
>  > into operating points I could make a patch that would remove
>  > the bulk of cpufreq code from the kernel and you'd have
>  > pretty much the same functionality without the maintenance
>  > issues the added layers and complexity bring.
> 
> If this is going to fly at all, I think thats where we need to be headed.
> Having two parts of the kernel doing the same thing just seems
> very wrong to me.
> 
> The other alternative as suggested earlier this week would be archictures
> getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
> necessarily bring anything but the layer of indirection.
> 
> I'm about to disappear for two weeks for a much needed vacation, but
> I'll be interested to see other folks comments/opinions on this
> when I get back.

I worry about all the users of ondemand and powernow.  Whatever
happens it needs to be a evolve over time.  I don't know how you can
have only one power solution that works for HPC and embedded.

> 
> 		Dave
> 
> -- 
> http://www.codemonkey.org.uk
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-23 12:28                     ` Pavel Machek
@ 2006-08-23 15:26                       ` Igor Stoppa
  2006-08-24 12:58                       ` Vitaly Wool
  1 sibling, 0 replies; 136+ messages in thread
From: Igor Stoppa @ 2006-08-23 15:26 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: linux-pm

On Wed, 2006-08-23 at 15:28 +0300, ext Pavel Machek wrote:
> > On 8/18/06, Pavel Machek <pavel@ucw.cz> wrote:
> > >> Then, we will be able to use names like 'video', 'mp3', 'fast',
> > >> 'powersave', 'usb' to switch to the relevant operating point
> based on
> > >> configuration of the policy manager.
> > >
> > >This seems to be too specific to embedded machine.
> > >
> > >If userspace wants to work with usb and play mp3s at the same time,
> > >what does it do?
> >
> > I guess it just defines an appropriate policy. You can call it
> > 'usb_mp3' if you wish ;)
> > I don't think it's too embedded-specific.
> 
> Well, it leads to exponential number of policies -- not nice. Having
> usb_mp3_fileserver_webserver is not nice.

The whole idea is that you have generic "good enough" policies for
unknown cases-combinations, plus specific policies for well known cases.
This tends to be simpler for embedded systems, of course.

But even for your laptops you can identify few major use cases, that
incidentally tend to overlap with embedded devices use cases:
-mp3
-video
-browsing
-name your own
...

-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-18  5:42                   ` Vitaly Wool
@ 2006-08-23 12:28                     ` Pavel Machek
  2006-08-23 15:26                       ` Igor Stoppa
  2006-08-24 12:58                       ` Vitaly Wool
  0 siblings, 2 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-23 12:28 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: linux-pm

> On 8/18/06, Pavel Machek <pavel@ucw.cz> wrote:
> >> Then, we will be able to use names like 'video', 'mp3', 'fast',
> >> 'powersave', 'usb' to switch to the relevant operating point based on
> >> configuration of the policy manager.
> >
> >This seems to be too specific to embedded machine.
> >
> >If userspace wants to work with usb and play mp3s at the same time,
> >what does it do?
> 
> I guess it just defines an appropriate policy. You can call it
> 'usb_mp3' if you wish ;)
> I don't think it's too embedded-specific.

Well, it leads to exponential number of policies -- not nice. Having
usb_mp3_fileserver_webserver is not nice.
								Pavel

-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-22  2:13             ` Greg KH
@ 2006-08-22  5:20               ` David Singleton
  0 siblings, 0 replies; 136+ messages in thread
From: David Singleton @ 2006-08-22  5:20 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-pm

On 8/21/06, Greg KH <greg@kroah.com> wrote:
> On Fri, Aug 18, 2006 at 11:10:02PM -0700, David Singleton wrote:
> >                Oppoint could replace large pieces of the cpufreq code
> >        in the kernel, most notably the policy and governor code, which I
> >        believe belongs in user space in the power manager daemon.
> >
> >                You'll notice that the oppoint-cpufreq.patch only touches
> >        two files, cpufreq.c and cpufreq.h. It only creates two new
> >        interfaces
> >        to the cpufreq frequency scaling notifier lists to support driver pre
> >        and post scaling routines, already supported in the kernel.
> >
> >                The oppoint-x86-centrino.patch completes the replacement
> >        of cpufreq code by introducing the transition routine to
> >        change frequencies and creates operating points for the
> >        centrino-speedstep processors already supported by Linux.
> >
> >        (although I've recieved a note from Intel that the data I've copied
> >        from the centrino-speedstep cpufreq tables is known to be inaccurate
> >        and unsupported)
> >
> >                This code could replace cpufreq code and simplify it quite a
> >        bit in the process.  The kernel drivers that support cpufreq
> >        frequency
> >        scaling would not have to be changed.  Operating points for the rest
> >        of the processors that support cpufreq would have to be created, but
> >        as you can see it's quite a straight forward transformation from
> >        a cpufreq table to a set of operating points for a processor.
>
> This only touches on the cpu frequency stuff.  I am assuming that the
> current driver interface to the different power management states is
> acceptable to you?

   Yes.  The driver interface works perfectlly with the operating point design.
The power manager can simply set whatever operating point it wants the system
to be in and still have the flexibility to suspend individual devices through
the current driver interface.  Drivers do not have be changed in any way
to operate correctly with the operating point model.

     The cpufreq driver scaling code doesn't need to changed either.  OpPoint
calls the same scaling routines throught the same notifier chain as
cpufreq does.

     That's two of the big advantages to the OpPoint design,
the driver interace doesn't need to change and the existing driver frequency
scaling code doesn't need to change either.

     Which makes sense since the operating point design is performing the same
functionality, just in a simpler manner.

      I still have to write a power manager for this so all the
policy/class stuff
that's being discussed now can be plug-ins for the power manager.  You'd
set you power management classes and policies up in the power manager,
wether its' performance, or energy efficiency, or thermal constraints,
or battery
constraints, and let it simply set operating states and handle
individual devices
as it sees fit.

      That's really where all the policy/class code belongs, in the
power manager.
OpPoint just provides a simpler interface for the power manager.  The operating
points are set by their name and the device control works just as it does today
through the sysfs interface.

David


>
> thanks,
>
> greg k-h
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-19  6:10           ` David Singleton
@ 2006-08-22  2:13             ` Greg KH
  2006-08-22  5:20               ` David Singleton
  2006-08-23 19:05             ` Mark Gross
  2006-08-24 12:39             ` Pavel Machek
  2 siblings, 1 reply; 136+ messages in thread
From: Greg KH @ 2006-08-22  2:13 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

On Fri, Aug 18, 2006 at 11:10:02PM -0700, David Singleton wrote:
>                Oppoint could replace large pieces of the cpufreq code
>        in the kernel, most notably the policy and governor code, which I
>        believe belongs in user space in the power manager daemon.
>
>                You'll notice that the oppoint-cpufreq.patch only touches
>        two files, cpufreq.c and cpufreq.h. It only creates two new 
>        interfaces
>        to the cpufreq frequency scaling notifier lists to support driver pre
>        and post scaling routines, already supported in the kernel.
> 
>                The oppoint-x86-centrino.patch completes the replacement
>        of cpufreq code by introducing the transition routine to
>        change frequencies and creates operating points for the
>        centrino-speedstep processors already supported by Linux.
> 
>        (although I've recieved a note from Intel that the data I've copied
>        from the centrino-speedstep cpufreq tables is known to be inaccurate
>        and unsupported)
> 
>                This code could replace cpufreq code and simplify it quite a
>        bit in the process.  The kernel drivers that support cpufreq 
>        frequency
>        scaling would not have to be changed.  Operating points for the rest
>        of the processors that support cpufreq would have to be created, but
>        as you can see it's quite a straight forward transformation from
>        a cpufreq table to a set of operating points for a processor.

This only touches on the cpu frequency stuff.  I am assuming that the
current driver interface to the different power management states is
acceptable to you?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-20 13:36 Woodruff, Richard
  0 siblings, 0 replies; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-20 13:36 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

> 
>                 Oppoint could replace large pieces of the cpufreq code
>         in the kernel, most notably the policy and governor code,
which I
>         believe belongs in user space in the power manager daemon.

I've not actually looked a CPUFreq implementation to know how all this
maps...

In general policy is better in user space and depending on your system
it might all live there.

However, when response time counts, it can be necessary to have a level
of algorithm be executed in kernel space.  Some might associate this
level with policy.

Cpufreq has both user and kernel space governess.  For sure the choice
of what govner to use, in which context it executes, and tracking of its
performance likely would be easiest in user space.

Regards,
Richard W.  

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-18 21:05                         ` Alexey Starikovskiy
@ 2006-08-20 13:19                           ` Igor Stoppa
  0 siblings, 0 replies; 136+ messages in thread
From: Igor Stoppa @ 2006-08-20 13:19 UTC (permalink / raw)
  To: ext Alexey Starikovskiy
  Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)

On Sat, 2006-08-19 at 00:05 +0300, ext Alexey Starikovskiy wrote:
> Igor Stoppa wrote:
> > On Fri, 2006-08-18 at 18:29 +0300, ext Alexey Starikovskiy wrote:
> >> Igor Stoppa wrote:
> >>> On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
> >>>> Hi!
> >>>>
> >>>>>> If there are dependancies inherently linking core1 and core2,
> >>>> cpufreq
> >>>>>> should already be programming both parts. For example, the
> SA1100
> >>>>>> driver programs both CPU and SDRAM controller.  If there isn't
> >> any
> >>>>>> dependancy
> >>>>>> between them, I don't see the attraction of creating an
> >> artificial
> >>>> one
> >>>>>> in the way suggested for no real purpose.
> >>>>>>
> >>>>>> Things like voltage and frequency are closely tied together, so
> >>>>>> offering
> >>>>>> any means of controlling them independantly makes no sense
> >> afaics.
> >>>>> Yet a certain subsystem (for example an onboard camera, in a
> >> phone)
> >>>>> might require a higher voltage when it's active, effectively
> >>>> loosening
> >>>>> the tight coupling between freq and voltage that the porcessor
> is
> >>>>> enforcing.
> >>>> So... you expect userland to echo high > state before camera can
> be
> >>>> used?
> >>>>
> >>>> I'd rather have kernel automagically up the voltage
> >> when /dev/video0
> >>>> is opened...
> >>> Not really, I meant that the CPU is not the only customer of power
> >>> domains (depend on the HW design), so the relation freq <->
> voltage
> >> is
> >>> not always true.
> >>>
> >> Then you need to introduce power domains and associate your devices
> >> with them, isn't it?
> >> So if your camera appears in the same domain with CPU, the voltage
> of
> >> that domain will go up either with camera=on, or CPU going to
> higher
> >> frequency.
> >
> > I used the expression "power domain" to refer to a generic domain,
> > either voltage or frequency, to indicate that changing either freq
> or
> > voltage in a domain implies changing the domain power level.
> >
> > Of course it is changing linearly with frequency, while the
> dependency
> > from voltage is quadratic.
> >
> > So in the camera example we might have 2 different cases:
> >
> > -the one mentioned above, where the camera shares the same voltage
> > domain with CPU and the correlation is the one you described
> >
> > -another case where the clock frequency provided to the camera is
> > related to the resolution being used
> >
> > camera off => no constraints
> > low res => low freq, high voltage
> > high res => high freq, high voltage
> >
> > in such case the currently active resolution would affect whatever
> > device shares the camera clock, if any.
> >
> > But no need to introduce power domains.
> >
> How about introducing a frequency domain as well?

The clock framework deals with clock correlations and dependencies.

-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-20  3:20               ` David Singleton
@ 2006-08-20  3:30                 ` Dave Jones
  2006-08-23 18:50                   ` Mark Gross
  2006-08-27  4:37                   ` David Singleton
  0 siblings, 2 replies; 136+ messages in thread
From: Dave Jones @ 2006-08-20  3:30 UTC (permalink / raw)
  To: David Singleton; +Cc: linux-pm

On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote:

 > If I had all the existing cpufreq tables transformed
 > into operating points I could make a patch that would remove
 > the bulk of cpufreq code from the kernel and you'd have
 > pretty much the same functionality without the maintenance
 > issues the added layers and complexity bring.

If this is going to fly at all, I think thats where we need to be headed.
Having two parts of the kernel doing the same thing just seems
very wrong to me.

The other alternative as suggested earlier this week would be archictures
getting to 'opt out' of powerop for their cpufreq drivers where it doesn't
necessarily bring anything but the layer of indirection.

I'm about to disappear for two weeks for a much needed vacation, but
I'll be interested to see other folks comments/opinions on this
when I get back.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
       [not found]             ` <20060819184843.GB15644@redhat.com>
@ 2006-08-20  3:20               ` David Singleton
  2006-08-20  3:30                 ` Dave Jones
  0 siblings, 1 reply; 136+ messages in thread
From: David Singleton @ 2006-08-20  3:20 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-pm

On 8/19/06, Dave Jones <davej@redhat.com> wrote:
> On Fri, Aug 18, 2006 at 11:19:37PM -0700, David Singleton wrote:
>  > On 8/14/06, Greg KH <greg@kroah.com> wrote:
>  > > On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
>  > > >
>  > > > This adds a whole bunch of new code, and doesn't seem to make any
>  > > > existing code any simpler (to me at least).  From a cpufreq point of view,
>  > > > what does adding this buy us? What problem do we have today that is
>  > > > being solved by all this?
>  >
>  >
>  > Greg and Dave,
>  >
>  >       Here is the patch the provides the cpufreq functionality in an
>  > operating point
>  > fashion for the centrino-speedstep.   Cpufreq tables are transformed
>  > into operating
>  > points which can be simply set by writing the name of the operating
>  > point into /sys/power/state.
>  >
>  >      These two patches implement the cpufreq functionality of changing
>  > processsor
>  > frequency and voltage.  The huge amount of code that tries to make decisions
>  > about what operating point to set and which devices can be suspended
>  > or not is left to
>  > the power manager.
>
> Why does this replicate the cpufreq driver instead of just *using* it ?
> From a distro vendor standpoint, I want to offer people choice, which means
> with this patch we'd have both drivers compiled in, but this now means we
> have two sets of tables to add to every time we have new cpu support to add.
> Two places to look for implementation bugs etc etc.

        Dave,
                I didn't use the cpufreq driver because I couldn't.
        While encapsulated operating points can do CPU frequency
        scaling they also do a bit more.

                If you take a look at the 4th patch in the patch set,
        the oppoint-arm-pxa27x.patch, it shows that an operating point
        encapsulates a lot more system state than just the CPU.

                There are a set of clocks that have to managed and scaled
        when CPU frequency is scaled.  The arm-pxa27x patch also shows
        that this board has larger operating points but still uses
        the cpufreq driver scaling functions, with out any change
        to existing drivers.

                The operating point model performs the needed driver
        preparation and finishing routines to CPU and clock scaling
        by using the cpufreq driver scaling callbacks in the
        cpufreq notifier chain.

                The next three boards I'm working on have expanded
        system state encapsulated in their operating points. They
        have separately controllable power domains and they can
        run operating points at the same CPU frequency but at a lower
        voltage.  For example, one operating point runs at 168Mhz and 1.5V and
        another operating point runs at 168Mhz and 1.1V.

                While OpPoint can perform the same functionality as
        cpufreq, without the extra overhead and complexity of policies
        and governors, it provides a framework that can handle much
        more system state control, separate power domains, individually
        scalable clocks,  both frequency and voltage scaling and
        sleep states.

                The next three boards also have more than one sleep
        state.

                I like the cpufreq concept of defining the operating
        points at compile time and letting the system discover which
        CPU and operating points to install at boot time.   But I needed
        a bit more system state control than the cpufreq driver can
        provide.

                If I had all the existing cpufreq tables transformed
        into operating points I could make a patch that would remove
        the bulk of cpufreq code from the kernel and you'd have
        pretty much the same functionality without the maintenance
        issues the added layers and complexity bring.

                The aim of this patch set is to unify cpufreq with the Dynamic
          Power Management infrastructure, which deals with complex
          system state and individual power domains, and clock domains, etc.
          If merged, there would just be one power management framework
          that all architectures and platforms could use.


David


>
>                 Dave
> --
> http://www.codemonkey.org.uk
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15  1:00         ` Greg KH
                             ` (2 preceding siblings ...)
  2006-08-19  6:10           ` David Singleton
@ 2006-08-19  6:19           ` David Singleton
       [not found]             ` <20060819184843.GB15644@redhat.com>
  3 siblings, 1 reply; 136+ messages in thread
From: David Singleton @ 2006-08-19  6:19 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1800 bytes --]

On 8/14/06, Greg KH <greg@kroah.com> wrote:
> On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> >
> > This adds a whole bunch of new code, and doesn't seem to make any
> > existing code any simpler (to me at least).  From a cpufreq point of view,
> > what does adding this buy us? What problem do we have today that is
> > being solved by all this?


Greg and Dave,

      Here is the patch the provides the cpufreq functionality in an
operating point
fashion for the centrino-speedstep.   Cpufreq tables are transformed
into operating
points which can be simply set by writing the name of the operating
point into /sys/power/state.

     These two patches implement the cpufreq functionality of changing
processsor
frequency and voltage.  The huge amount of code that tries to make decisions
about what operating point to set and which devices can be suspended
or not is left to
the power manager.

David




> >
> > Every explanation of powerop I've seen so far dives into microdetails,
> > whilst the 10,000ft view has always passed me by other than "this is
> > what we've had in the embedded world".
> >
> > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> > also confuses me.  I was under the impression that powerop was adding additional
> > userspace interfaces.  If we're not changing how things from a userspace
> > point of view, we're churning a lot of kernel code,.. why?
> >
> > Clue me in here, I'm feeling thick.
>
> You're not alone, I really don't get it either.
>
> But I guess we'll just wait for the next round of unified patches and
> then go from there.
>
> thanks,
>
> greg k-h
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

[-- Attachment #2: oppoint-x86-centrino.patch --]
[-- Type: application/octet-stream, Size: 16865 bytes --]


Signed-Off-by: David Singleton <dsingleton@mvista.com>

 arch/i386/kernel/cpu/Makefile                           |    1 
 arch/i386/kernel/cpu/oppoint/Makefile                   |    2 
 arch/i386/kernel/cpu/oppoint/centrino-dynamic-oppoint.c |   71 ++
 arch/i386/kernel/cpu/oppoint/centrino-oppoint.c         |  460 ++++++++++++++++
 arch/i386/kernel/i386_ksyms.c                           |    4 
 5 files changed, 538 insertions(+)

Index: linux-2.6.17/arch/i386/kernel/cpu/Makefile
===================================================================
--- linux-2.6.17.orig/arch/i386/kernel/cpu/Makefile
+++ linux-2.6.17/arch/i386/kernel/cpu/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_X86_MCE)	+=	mcheck/
 
 obj-$(CONFIG_MTRR)	+= 	mtrr/
 obj-$(CONFIG_CPU_FREQ)	+=	cpufreq/
+obj-$(CONFIG_PM)	+=	oppoint/
Index: linux-2.6.17/arch/i386/kernel/i386_ksyms.c
===================================================================
--- linux-2.6.17.orig/arch/i386/kernel/i386_ksyms.c
+++ linux-2.6.17/arch/i386/kernel/i386_ksyms.c
@@ -28,3 +28,7 @@ EXPORT_SYMBOL(__read_lock_failed);
 #endif
 
 EXPORT_SYMBOL(csum_partial);
+#ifdef CONFIG_PM
+#include <linux/pm.h>
+EXPORT_SYMBOL(pm_states);
+#endif
Index: linux-2.6.17/arch/i386/kernel/cpu/oppoint/Makefile
===================================================================
--- /dev/null
+++ linux-2.6.17/arch/i386/kernel/cpu/oppoint/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO)	+= centrino-oppoint.o
+obj-m					+= centrino-dynamic-oppoint.o
Index: linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-dynamic-oppoint.c
===================================================================
--- /dev/null
+++ linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-dynamic-oppoint.c
@@ -0,0 +1,71 @@
+/*
+ * oppoint/centrino-dynamic-oppoint.c
+ *
+ * This is the template to create dynamic operating points for power management.
+ *
+ * Author: David Singleton dsingleton@mvista.com MontaVista Software, Inc.
+ *
+ * 2006 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/pm.h>
+#include <linux/cpufreq.h>
+#include <linux/moduleparam.h>
+#include <linux/moduleloader.h>
+
+int centrino_transition(struct oppoint *cur, struct oppoint *new);
+
+static char oppoint_name[PM_NAME_SIZE] = "dynamic";
+static unsigned int voltage = 1308;
+static unsigned int latency = 100;
+module_param_named(name, oppoint_name, char *, 0);
+module_param_named(frequency, frequency, uint, 0);
+module_param_named(voltage, voltage, uint, 0);
+module_param_named(latency, latency, uint, 0);
+MODULE_PARM_DESC(frequency, "cpu frequency in kHz");
+MODULE_PARM_DESC(voltage, "cpu voltage in mV");
+MODULE_PARM_DESC(latency, "transition latency in us");
+
+/* Register both the driver and the device */
+
+static struct oppoint dynamic_op = {
+	.type = PM_FREQ_CHANGE,
+	.name = "Dynamic",
+	.prepare_transition = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+extern void centrino_set_frequency(struct oppoint *op, uint freq, uint volt);
+
+int __init dynamic_oppoint_init(void)
+{
+
+	printk("Dynamic PowerOp operating point for speedstep centrino\n");
+	dynamic_op.name = name;
+	dynamic_op.frequency = frequency;
+	dynamic_op.voltage = voltage;
+	dynamic_op.latency = latency;
+	centrino_set_frequency(&dynamic_op, frequency / 1000, voltage);
+	printk("freq %d volt %d msr 0x%x\n", dynamic_op.frequency,
+	   dynamic_op.voltage, (unsigned int)dynamic_op.md_data);
+	list_add_tail(&dynamic_op.list, &pm_states.list);
+	return 0;
+}
+
+void __exit dynamic_oppoint_cleanup(void)
+{
+	list_del_init(&dynamic_op.list);
+}
+
+module_init(dynamic_oppoint_init);
+module_exit(dynamic_oppoint_cleanup);
+
+MODULE_DESCRIPTION("Dynamic Powerop module");
+MODULE_LICENSE("GPL");
Index: linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-oppoint.c
===================================================================
--- /dev/null
+++ linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-oppoint.c
@@ -0,0 +1,460 @@
+/*
+ * PowerOp support for Enhanced SpeedStep, as found in Intel's Pentium
+ * M (part of the Centrino chipset).
+ *
+ * Modelled on speedstep-centrino.c
+ *
+ * Copyright (C) 2006 David Singleton <dsingleton@mvista.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/delay.h>
+#include <linux/compiler.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+
+struct cpu_id
+{
+	__u8	x86;            /* CPU family */
+	__u8	x86_model;	/* model */
+	__u8	x86_mask;	/* stepping */
+};
+
+enum {
+	CPU_BANIAS,
+	CPU_DOTHAN_A1,
+	CPU_DOTHAN_A2,
+	CPU_DOTHAN_B0,
+	CPU_MP4HT_D0,
+	CPU_MP4HT_E0,
+};
+
+static const struct cpu_id cpu_ids[] = {
+	[CPU_BANIAS]	= { 6,  9, 5 },
+	[CPU_DOTHAN_A1]	= { 6, 13, 1 },
+	[CPU_DOTHAN_A2]	= { 6, 13, 2 },
+	[CPU_DOTHAN_B0]	= { 6, 13, 6 },
+	[CPU_MP4HT_D0]	= {15,  3, 4 },
+	[CPU_MP4HT_E0]	= {15,  4, 1 },
+};
+#define N_IDS	ARRAY_SIZE(cpu_ids)
+
+struct cpu_model
+{
+	const struct cpu_id *cpu_id;
+	const char	*model_name;
+	unsigned	max_freq; /* max clock in kHz */
+
+	struct cpufreq_frequency_table *op_points; /* clock/voltage pairs */
+};
+static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c, const struct cpu_id *x);
+
+void centrino_set_frequency(struct oppoint *op, uint freq, uint volt)
+{
+	op->frequency = freq * 1000;
+	op->voltage = volt;
+	op->md_data = (void *)(((freq / 100) << 8) | (volt - 700) / 16);
+	printk("freq %d volt %d msr 0x%x\n", op->frequency, op->voltage,
+	    (unsigned int)op->md_data);
+}
+EXPORT_SYMBOL(centrino_set_frequency);
+
+int centrino_transition(struct oppoint *cur, struct oppoint *new)
+{
+	unsigned int msr, oldmsr = 0, h = 0;
+
+	if (cur == new)
+		return 0;
+
+	msr = (unsigned int)new->md_data;
+	rdmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+	/* all but 16 LSB are reserved, treat them with care */
+	oldmsr &= ~0xffff;
+	msr &= 0xffff;
+	oldmsr |= msr;
+
+	wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+	udelay(new->latency);
+
+	return 0;
+}
+EXPORT_SYMBOL(centrino_transition);
+
+#define OP(mhz, mv)                                                     \
+        {                                                               \
+                .frequency = (mhz) * 1000,                              \
+                .index = (((mhz)/100) << 8) | ((mv - 700) / 16)         \
+        }
+
+/*
+ * These voltage tables were derived from the Intel Pentium M
+ * datasheet, document 25261202.pdf, Table 5.  I have verified they
+ * are consistent with my IBM ThinkPad X31, which has a 1.3GHz Pentium
+ * M.
+ */
+
+/* Ultra Low Voltage Intel Pentium M processor 900MHz (Banias) */
+static struct cpufreq_frequency_table banias_900[] =
+{
+        OP(600,  844),
+        OP(800,  988),
+        OP(900, 1004),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+/* Ultra Low Voltage Intel Pentium M processor 1000MHz (Banias) */
+static struct cpufreq_frequency_table banias_1000[] =
+{
+        OP(600,   844),
+        OP(800,   972),
+        OP(900,   988),
+        OP(1000, 1004),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Low Voltage Intel Pentium M processor 1.10GHz (Banias) */
+static struct cpufreq_frequency_table banias_1100[] =
+{
+        OP( 600,  956),
+        OP( 800, 1020),
+        OP( 900, 1100),
+        OP(1000, 1164),
+        OP(1100, 1180),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+
+/* Low Voltage Intel Pentium M processor 1.20GHz (Banias) */
+static struct cpufreq_frequency_table banias_1200[] =
+{
+        OP( 600,  956),
+        OP( 800, 1004),
+        OP( 900, 1020),
+        OP(1000, 1100),
+        OP(1100, 1164),
+        OP(1200, 1180),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.30GHz (Banias) */
+static struct cpufreq_frequency_table banias_1300[] =
+{
+        OP( 600,  956),
+        OP( 800, 1260),
+        OP(1000, 1292),
+        OP(1200, 1356),
+        OP(1300, 1388),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.40GHz (Banias) */
+static struct cpufreq_frequency_table banias_1400[] =
+{
+        OP( 600,  956),
+        OP( 800, 1180),
+        OP(1000, 1308),
+        OP(1200, 1436),
+        OP(1400, 1484),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.50GHz (Banias) */
+static struct cpufreq_frequency_table banias_1500[] =
+{
+        OP( 600,  956),
+        OP( 800, 1116),
+        OP(1000, 1228),
+        OP(1200, 1356),
+        OP(1400, 1452),
+        OP(1500, 1484),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.60GHz (Banias) */
+static struct cpufreq_frequency_table banias_1600[] =
+{
+        OP( 600,  956),
+        OP( 800, 1036),
+        OP(1000, 1164),
+        OP(1200, 1276),
+        OP(1400, 1420),
+        OP(1600, 1484),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.70GHz (Banias) */
+static struct cpufreq_frequency_table banias_1700[] =
+{
+        OP( 600,  956),
+        OP( 800, 1004),
+        OP(1000, 1116),
+        OP(1200, 1228),
+        OP(1400, 1308),
+        OP(1700, 1484),
+        { .frequency = CPUFREQ_TABLE_END }
+};
+
+#define _BANIAS(cpuid, max, name)	\
+{	.cpu_id		= cpuid,	\
+	.model_name	= "Intel(R) Pentium(R) M processor " name "MHz", \
+	.max_freq	= (max)*1000,	\
+	.op_points	= banias_##max,	\
+}
+#define BANIAS(max)	_BANIAS(&cpu_ids[CPU_BANIAS], max, #max)
+
+static struct oppoint lowest = {
+	.name = "lowest",
+	.type = PM_FREQ_CHANGE,
+	.frequency = 0,
+	.voltage = 0,
+	.latency = 15,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint low = {
+	.name = "low",
+	.type = PM_FREQ_CHANGE,
+	.latency = 15,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint mediumlow = {
+	.name = "mediumlow",
+	.type = PM_FREQ_CHANGE,
+	.latency = 15,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint medium = {
+	.name = "medium",
+	.type = PM_FREQ_CHANGE,
+	.latency = 15,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint mediumhigh = {
+	.name = "mediumhigh",
+	.type = PM_FREQ_CHANGE,
+	.latency = 15,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint high = {
+	.name = "high",
+	.type = PM_FREQ_CHANGE,
+	.latency = 15,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint highest = {
+	.name = "highest",
+	.type = PM_FREQ_CHANGE,
+	.latency = 15,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+
+/* CPU models, their operating frequency range, and freq/voltage
+   operating points */
+static struct cpu_model models[] =
+{
+	_BANIAS(&cpu_ids[CPU_BANIAS], 900, " 900"),
+	BANIAS(1000),
+	BANIAS(1100),
+	BANIAS(1200),
+	BANIAS(1300),
+	BANIAS(1400),
+	BANIAS(1500),
+	BANIAS(1600),
+	BANIAS(1700),
+
+	/* NULL model_name is a wildcard */
+	{ &cpu_ids[CPU_DOTHAN_A1], NULL, 0, NULL },
+	{ &cpu_ids[CPU_DOTHAN_A2], NULL, 0, NULL },
+	{ &cpu_ids[CPU_DOTHAN_B0], NULL, 0, NULL },
+	{ &cpu_ids[CPU_MP4HT_D0], NULL, 0, NULL },
+	{ &cpu_ids[CPU_MP4HT_E0], NULL, 0, NULL },
+
+	{ NULL, }
+};
+#undef _BANIAS
+#undef BANIAS
+
+static int __init centrino_init_oppoint(void)
+{
+	struct cpuinfo_x86 *cpu = &cpu_data[0];
+	struct cpu_model *model;
+
+	for(model = models; model->cpu_id != NULL; model++) {
+		if (centrino_verify_cpu_id(cpu, model->cpu_id) &&
+		    (model->model_name == NULL ||
+		     strcmp(cpu->x86_model_id, model->model_name) == 0))
+			break;
+	}
+
+	if (model->cpu_id == NULL) {
+		/* No match at all */
+		printk("no support for CPU model %s\n", cpu->x86_model_id);
+		return -ENOENT;
+	}
+
+	printk("found \"%s\": max frequency: %dkHz\n",
+	       model->model_name, model->max_freq);
+	switch (model->max_freq) {
+	    case (900000) :
+	    {
+		centrino_set_frequency(&low, 600, 844);
+		centrino_set_frequency(&medium, 800, 988);
+		centrino_set_frequency(&high, 900, 1004);
+		break;
+	    }
+	    case (1000000) :
+	    {
+		centrino_set_frequency(&low, 600, 844);
+		centrino_set_frequency(&medium, 800, 972);
+		centrino_set_frequency(&high, 900, 988);
+		centrino_set_frequency(&highest, 1000, 1004);
+		break;
+	    }
+	    case (1100000) :
+	    {
+		centrino_set_frequency(&lowest, 600, 956);
+		centrino_set_frequency(&low, 800, 1020);
+		centrino_set_frequency(&medium, 900, 1100);
+		centrino_set_frequency(&high, 1000, 1164);
+		centrino_set_frequency(&highest, 1100, 1180);
+		break;
+	    }
+	    case (1200000) :
+	    {
+		centrino_set_frequency(&lowest, 600, 956);
+		centrino_set_frequency(&low, 800, 1004);
+		centrino_set_frequency(&medium, 900, 1020);
+		centrino_set_frequency(&mediumhigh, 1000, 1100);
+		centrino_set_frequency(&high, 1100, 1164);
+		centrino_set_frequency(&highest, 1200, 1180);
+		break;
+	    }
+	    case (1300000) :
+	    {
+		centrino_set_frequency(&lowest, 600, 956);
+		centrino_set_frequency(&low, 800, 1260);
+		centrino_set_frequency(&medium, 1000, 1292);
+		centrino_set_frequency(&high, 1200, 1356);
+		centrino_set_frequency(&highest, 1300, 1388);
+		break;
+	    }
+	    case (1400000) :
+	    {
+		centrino_set_frequency(&lowest, 600, 956);
+		centrino_set_frequency(&low, 800, 1180);
+		centrino_set_frequency(&medium, 1000, 1308);
+		centrino_set_frequency(&high, 1200, 1436);
+		centrino_set_frequency(&highest, 1400, 1484);
+		break;
+	    }
+	    case (1500000) :
+	    {
+		centrino_set_frequency(&lowest, 600, 956);
+		centrino_set_frequency(&low, 800, 1116);
+		centrino_set_frequency(&medium, 1000, 1228);
+		centrino_set_frequency(&mediumhigh, 1200, 1356);
+		centrino_set_frequency(&high, 1400, 1452);
+		centrino_set_frequency(&highest, 1500, 1484);
+		break;
+	    }
+	    case (1600000) :
+	    {
+		centrino_set_frequency(&lowest, 600, 956);
+		centrino_set_frequency(&low, 800, 1036);
+		centrino_set_frequency(&medium, 1000, 1164);
+		centrino_set_frequency(&mediumhigh, 1200, 1276);
+		centrino_set_frequency(&high, 1400, 1420);
+		centrino_set_frequency(&highest, 1600, 1484);
+		break;
+	    }
+	    case (1700000) :
+	    {
+		centrino_set_frequency(&lowest, 600, 956);
+		centrino_set_frequency(&low, 800, 1004);
+		centrino_set_frequency(&medium, 1000, 1116);
+		centrino_set_frequency(&mediumhigh, 1200, 1228);
+		centrino_set_frequency(&high, 1400, 1308);
+		centrino_set_frequency(&highest, 1700, 1484);
+		break;
+	    }
+	}
+	if (lowest.frequency)
+		list_add_tail(&lowest.list, &pm_states.list);
+	if (low.frequency)
+		list_add_tail(&low.list, &pm_states.list);
+	if (mediumlow.frequency)
+		list_add_tail(&mediumlow.list, &pm_states.list);
+	if (medium.frequency)
+		list_add_tail(&medium.list, &pm_states.list);
+	if (mediumhigh.frequency)
+		list_add_tail(&mediumhigh.list, &pm_states.list);
+	if (high.frequency) {
+		list_add_tail(&high.list, &pm_states.list);
+		current_state = &high;
+	}
+	if (highest.frequency) {
+		list_add_tail(&highest.list, &pm_states.list);
+		current_state = &highest;
+	}
+	return 0;
+}
+
+static void centrino_exit_oppoint(void)
+{
+	if (lowest.frequency)
+		list_del_init(&lowest.list);
+	if (low.frequency)
+		list_del_init(&low.list);
+	if (mediumlow.frequency)
+		list_del_init(&mediumlow.list);
+	if (medium.frequency)
+		list_del_init(&medium.list);
+	if (mediumhigh.frequency)
+		list_del_init(&mediumhigh.list);
+	if (high.frequency)
+		list_del_init(&high.list);
+	if (highest.frequency)
+		list_del_init(&highest.list);
+	return;
+}
+
+static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c, const struct cpu_id *x)
+{
+	if ((c->x86 == x->x86) &&
+	    (c->x86_model == x->x86_model) &&
+	    (c->x86_mask == x->x86_mask))
+		return 1;
+	return 0;
+}
+
+MODULE_AUTHOR ("David Singleton <dsingleton@mvista.com>");
+MODULE_DESCRIPTION ("PowerOp operting points for Intel Pentium M processors.");
+MODULE_LICENSE ("GPL");
+
+late_initcall(centrino_init_oppoint);
+module_exit(centrino_exit_oppoint);

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15  1:00         ` Greg KH
  2006-08-15  3:03           ` Dave Jones
  2006-08-15 10:35           ` Amit Kucheria
@ 2006-08-19  6:10           ` David Singleton
  2006-08-22  2:13             ` Greg KH
                               ` (2 more replies)
  2006-08-19  6:19           ` David Singleton
  3 siblings, 3 replies; 136+ messages in thread
From: David Singleton @ 2006-08-19  6:10 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 4055 bytes --]

On 8/14/06, Greg KH <greg@kroah.com> wrote:
> On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> >
> > This adds a whole bunch of new code, and doesn't seem to make any
> > existing code any simpler (to me at least).  From a cpufreq point of view,
> > what does adding this buy us? What problem do we have today that is
> > being solved by all this?

Greg and Dave,

                 there are two competing patch sets for a new power management
        framework.   The patch set I sent out simplifies power management,
        from both the cpufreq perspective and the embedded world's view of
        power management.

                I've renamed my patch oppoint so as not confuse it
        with the powerop set from Matt Locke (which will probably make
        it even more confusing).  I've renamed it so it can be seen as an
        alternative design approach, not just an alternative implementation
        of the same ideas.  I've also incorporated suggestions from
        Pavel in cleaning up the original patches.

                If you'd be willing to take a look at, or try out, the patches
        in my patch set you should be able to see how oppoint could simplify
        cpufreq code.  The first patch is the oppoint-cpufreq.patch and
        the second is the oppoint-x86-centrino.patch.

                Oppoint could replace large pieces of the cpufreq code
        in the kernel, most notably the policy and governor code, which I
        believe belongs in user space in the power manager daemon.

                You'll notice that the oppoint-cpufreq.patch only touches
        two files, cpufreq.c and cpufreq.h. It only creates two new interfaces
        to the cpufreq frequency scaling notifier lists to support driver pre
        and post scaling routines, already supported in the kernel.

                The oppoint-x86-centrino.patch completes the replacement
        of cpufreq code by introducing the transition routine to
        change frequencies and creates operating points for the
        centrino-speedstep processors already supported by Linux.

        (although I've recieved a note from Intel that the data I've copied
        from the centrino-speedstep cpufreq tables is known to be inaccurate
        and unsupported)

                This code could replace cpufreq code and simplify it quite a
        bit in the process.  The kernel drivers that support cpufreq frequency
        scaling would not have to be changed.  Operating points for the rest
        of the processors that support cpufreq would have to be created, but
        as you can see it's quite a straight forward transformation from
        a cpufreq table to a set of operating points for a processor.

                The entire patch set can be found at:

                http://source.mvista.com/~dsingleton/2.6.18-rc4/

        The patch set consists of:

                oppoint-core.patch
                oppoint-cpufreq.patch
                oppoint-x86-centrino.patch
                oppoint-arm-pxa27x.patch

        I'll attach oppoint-cpufreq.patch  to this email and
        send out oppoint-x86-centrino.patch next.


David



> >
> > Every explanation of powerop I've seen so far dives into microdetails,
> > whilst the 10,000ft view has always passed me by other than "this is
> > what we've had in the embedded world".
> >
> > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> > also confuses me.  I was under the impression that powerop was adding additional
> > userspace interfaces.  If we're not changing how things from a userspace
> > point of view, we're churning a lot of kernel code,.. why?
> >
> > Clue me in here, I'm feeling thick.
>
> You're not alone, I really don't get it either.
>
> But I guess we'll just wait for the next round of unified patches and
> then go from there.
>
> thanks,
>
> greg k-h
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

[-- Attachment #2: oppoint-cpufreq.patch --]
[-- Type: application/octet-stream, Size: 2433 bytes --]


Signed-Off-by: David Singleton <dsingleton@mvista.com>

 drivers/cpufreq/cpufreq.c |   36 ++++++++++++++++++++++++++++++++++++
 include/linux/cpufreq.h   |    2 ++
 2 files changed, 38 insertions(+)

Index: linux-2.6.17/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.17.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6.17/drivers/cpufreq/cpufreq.c
@@ -226,6 +226,35 @@ static void adjust_jiffies(unsigned long
 static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci) { return; }
 #endif
 
+int cpufreq_prepare_transition(struct oppoint *cur, struct oppoint *new)
+{
+	struct cpufreq_freqs freqs;
+
+	freqs.old = cur->frequency;
+	freqs.new = new->frequency;
+	freqs.cpu = 0;
+	freqs.flags = cpufreq_driver->flags;
+	blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+			CPUFREQ_PRECHANGE, &freqs);
+	adjust_jiffies(CPUFREQ_PRECHANGE, &freqs);
+	return 0;
+}
+EXPORT_SYMBOL(cpufreq_prepare_transition);
+
+int cpufreq_finish_transition(struct oppoint *cur, struct oppoint *new)
+{
+	struct cpufreq_freqs freqs;
+
+	freqs.old = cur->frequency;
+	freqs.new = new->frequency;
+	freqs.cpu = 0;
+	freqs.flags = cpufreq_driver->flags;
+	adjust_jiffies(CPUFREQ_POSTCHANGE, &freqs);
+	blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+			CPUFREQ_POSTCHANGE, &freqs);
+	return 0;
+}
+EXPORT_SYMBOL(cpufreq_finish_transition);
 
 /**
  * cpufreq_notify_transition - call notifier chain and adjust_jiffies
@@ -920,6 +949,12 @@ static void cpufreq_out_of_sync(unsigned
 }
 
 
+#ifdef CONFIG_PM
+unsigned int cpufreq_quick_get(unsigned int cpu)
+{
+	return (current_state->frequency * 1000);
+}
+#else
 /**
  * cpufreq_quick_get - get the CPU frequency (in kHz) frpm policy->cur
  * @cpu: CPU number
@@ -941,6 +976,7 @@ unsigned int cpufreq_quick_get(unsigned 
 
 	return (ret);
 }
+#endif
 EXPORT_SYMBOL(cpufreq_quick_get);
 
 
Index: linux-2.6.17/include/linux/cpufreq.h
===================================================================
--- linux-2.6.17.orig/include/linux/cpufreq.h
+++ linux-2.6.17/include/linux/cpufreq.h
@@ -268,6 +268,8 @@ static inline unsigned int cpufreq_quick
 	return 0;
 }
 #endif
+int cpufreq_prepare_transition(struct oppoint *cur, struct oppoint *new);
+int cpufreq_finish_transition(struct oppoint *cur, struct oppoint *new);
 
 
 /*********************************************************************

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-18 17:54                       ` Igor Stoppa
@ 2006-08-18 21:05                         ` Alexey Starikovskiy
  2006-08-20 13:19                           ` Igor Stoppa
  0 siblings, 1 reply; 136+ messages in thread
From: Alexey Starikovskiy @ 2006-08-18 21:05 UTC (permalink / raw)
  To: Igor Stoppa; +Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)

Igor Stoppa wrote:
> On Fri, 2006-08-18 at 18:29 +0300, ext Alexey Starikovskiy wrote:
>> Igor Stoppa wrote:
>>> On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
>>>> Hi!
>>>>
>>>>>> If there are dependancies inherently linking core1 and core2,
>>>> cpufreq
>>>>>> should already be programming both parts. For example, the SA1100
>>>>>> driver programs both CPU and SDRAM controller.  If there isn't
>> any
>>>>>> dependancy
>>>>>> between them, I don't see the attraction of creating an
>> artificial
>>>> one
>>>>>> in the way suggested for no real purpose.
>>>>>>
>>>>>> Things like voltage and frequency are closely tied together, so
>>>>>> offering
>>>>>> any means of controlling them independantly makes no sense
>> afaics.
>>>>> Yet a certain subsystem (for example an onboard camera, in a
>> phone)
>>>>> might require a higher voltage when it's active, effectively
>>>> loosening
>>>>> the tight coupling between freq and voltage that the porcessor is
>>>>> enforcing.
>>>> So... you expect userland to echo high > state before camera can be
>>>> used?
>>>>
>>>> I'd rather have kernel automagically up the voltage
>> when /dev/video0
>>>> is opened...
>>> Not really, I meant that the CPU is not the only customer of power
>>> domains (depend on the HW design), so the relation freq <-> voltage
>> is
>>> not always true.
>>>
>> Then you need to introduce power domains and associate your devices
>> with them, isn't it?
>> So if your camera appears in the same domain with CPU, the voltage of
>> that domain will go up either with camera=on, or CPU going to higher
>> frequency.
> 
> I used the expression "power domain" to refer to a generic domain,
> either voltage or frequency, to indicate that changing either freq or
> voltage in a domain implies changing the domain power level.
> 
> Of course it is changing linearly with frequency, while the dependency
> from voltage is quadratic.
> 
> So in the camera example we might have 2 different cases:
> 
> -the one mentioned above, where the camera shares the same voltage
> domain with CPU and the correlation is the one you described
> 
> -another case where the clock frequency provided to the camera is
> related to the resolution being used
> 
> camera off => no constraints
> low res => low freq, high voltage
> high res => high freq, high voltage
> 
> in such case the currently active resolution would affect whatever
> device shares the camera clock, if any.
> 
> But no need to introduce power domains. 
> 
How about introducing a frequency domain as well?

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-18 15:29                     ` Alexey Starikovskiy
@ 2006-08-18 17:54                       ` Igor Stoppa
  2006-08-18 21:05                         ` Alexey Starikovskiy
  0 siblings, 1 reply; 136+ messages in thread
From: Igor Stoppa @ 2006-08-18 17:54 UTC (permalink / raw)
  To: ext Alexey Starikovskiy
  Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)

On Fri, 2006-08-18 at 18:29 +0300, ext Alexey Starikovskiy wrote:
> Igor Stoppa wrote:
> > On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
> >> Hi!
> >>
> >>>> If there are dependancies inherently linking core1 and core2,
> >> cpufreq
> >>>> should already be programming both parts. For example, the SA1100
> >>>> driver programs both CPU and SDRAM controller.  If there isn't
> any
> >>>> dependancy
> >>>> between them, I don't see the attraction of creating an
> artificial
> >> one
> >>>> in the way suggested for no real purpose.
> >>>>
> >>>> Things like voltage and frequency are closely tied together, so
> >>>> offering
> >>>> any means of controlling them independantly makes no sense
> afaics.
> >>> Yet a certain subsystem (for example an onboard camera, in a
> phone)
> >>> might require a higher voltage when it's active, effectively
> >> loosening
> >>> the tight coupling between freq and voltage that the porcessor is
> >>> enforcing.
> >> So... you expect userland to echo high > state before camera can be
> >> used?
> >>
> >> I'd rather have kernel automagically up the voltage
> when /dev/video0
> >> is opened...
> >
> > Not really, I meant that the CPU is not the only customer of power
> > domains (depend on the HW design), so the relation freq <-> voltage
> is
> > not always true.
> >
> Then you need to introduce power domains and associate your devices
> with them, isn't it?
> So if your camera appears in the same domain with CPU, the voltage of
> that domain will go up either with camera=on, or CPU going to higher
> frequency.

I used the expression "power domain" to refer to a generic domain,
either voltage or frequency, to indicate that changing either freq or
voltage in a domain implies changing the domain power level.

Of course it is changing linearly with frequency, while the dependency
from voltage is quadratic.

So in the camera example we might have 2 different cases:

-the one mentioned above, where the camera shares the same voltage
domain with CPU and the correlation is the one you described

-another case where the clock frequency provided to the camera is
related to the resolution being used

camera off => no constraints
low res => low freq, high voltage
high res => high freq, high voltage

in such case the currently active resolution would affect whatever
device shares the camera clock, if any.

But no need to introduce power domains. 

-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-18 10:02                   ` Igor Stoppa
@ 2006-08-18 15:29                     ` Alexey Starikovskiy
  2006-08-18 17:54                       ` Igor Stoppa
  0 siblings, 1 reply; 136+ messages in thread
From: Alexey Starikovskiy @ 2006-08-18 15:29 UTC (permalink / raw)
  To: Igor Stoppa; +Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)

Igor Stoppa wrote:
> On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
>> Hi!
>>
>>>> If there are dependancies inherently linking core1 and core2,
>> cpufreq
>>>> should already be programming both parts. For example, the SA1100
>>>> driver programs both CPU and SDRAM controller.  If there isn't any
>>>> dependancy
>>>> between them, I don't see the attraction of creating an artificial
>> one
>>>> in the way suggested for no real purpose.
>>>>
>>>> Things like voltage and frequency are closely tied together, so
>>>> offering
>>>> any means of controlling them independantly makes no sense afaics.
>>> Yet a certain subsystem (for example an onboard camera, in a phone)
>>> might require a higher voltage when it's active, effectively
>> loosening
>>> the tight coupling between freq and voltage that the porcessor is
>>> enforcing.
>> So... you expect userland to echo high > state before camera can be
>> used?
>>
>> I'd rather have kernel automagically up the voltage when /dev/video0
>> is opened...
> 
> Not really, I meant that the CPU is not the only customer of power
> domains (depend on the HW design), so the relation freq <-> voltage is
> not always true.
> 
Then you need to introduce power domains and associate your devices with them, isn't it?
So if your camera appears in the same domain with CPU, the voltage of
that domain will go up either with camera=on, or CPU going to higher frequency.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-17 21:40                 ` Pavel Machek
  2006-08-18  5:42                   ` Vitaly Wool
@ 2006-08-18 11:48                   ` Amit Kucheria
  2006-08-24  7:59                     ` Pavel Machek
  1 sibling, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-18 11:48 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: linux-pm

On Thu, 2006-08-17 at 21:40 +0000, ext Pavel Machek wrote:
 
> > The userspace interface in Eungeny's patches is for other userspace
> > programs (policy managers) to activate/deactivate valid operating points
> > in the system dynamically and if necessary, introduce new ones into the
> > system. It will also allow the operating points to be referenced by name
> > instead of the tuple.
> > 
> > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > 'powersave', 'usb' to switch to the relevant operating point based on
> > configuration of the policy manager.
> 
> This seems to be too specific to embedded machine.
> 
> If userspace wants to work with usb and play mp3s at the same time,
> what does it do?

Switch to 'fast'?

The operating point for a use-case specifies the _minimum_ required for
the use-case. You can always go up.

The system designer is responsible for 'designing' operating points that
take into account multiple use-cases. Designing here refers to mapping
use-cases to HW operating points.

Consider an example system with a main CPU and a DSP. To simplify
discussion, lets assume 3 levels for CPU and DSP speeds and system
voltage. Then, here is what an example operating-point to use-case
mapping table could look like:

#     CPU speed      DSP speed      Voltage       use-case
----------------------------------------------------------
1.    high           high           high          fast, video
2.    med            high           high          
3.    med            med            med           usb[1]
4.    low            med            med           mp3
5.    low            low            low           powersave

[1] USB has voltage constraint (voltage >= med)

Mapping
=======
Performance related: fast, video, mp3
Power related: powersave
Miscellaneous: usb

- Now if we are playing mp3, we switch to OP 4.
- Add usb and we switch to OP 3.
- Now our performance monitor (e.g load avg) indicates that we need more
CPU processing. So we switch to OP 2.

Regards,
Amit

-- 
Amit Kucheria <amit.kucheria@nokia.com>
Nokia

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-17 21:39                 ` Pavel Machek
@ 2006-08-18 10:02                   ` Igor Stoppa
  2006-08-18 15:29                     ` Alexey Starikovskiy
  0 siblings, 1 reply; 136+ messages in thread
From: Igor Stoppa @ 2006-08-18 10:02 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: linux-pm, Kucheria Amit (Nokia-M/Tampere)

On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
> Hi!
> 
> > > If there are dependancies inherently linking core1 and core2,
> cpufreq
> > > should already be programming both parts. For example, the SA1100
> > > driver programs both CPU and SDRAM controller.  If there isn't any
> > > dependancy
> > > between them, I don't see the attraction of creating an artificial
> one
> > > in the way suggested for no real purpose.
> > >
> > > Things like voltage and frequency are closely tied together, so
> > > offering
> > > any means of controlling them independantly makes no sense afaics.
> 
> > Yet a certain subsystem (for example an onboard camera, in a phone)
> > might require a higher voltage when it's active, effectively
> loosening
> > the tight coupling between freq and voltage that the porcessor is
> > enforcing.
> 
> So... you expect userland to echo high > state before camera can be
> used?
> 
> I'd rather have kernel automagically up the voltage when /dev/video0
> is opened...

Not really, I meant that the CPU is not the only customer of power
domains (depend on the HW design), so the relation freq <-> voltage is
not always true.

-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-17 21:40                 ` Pavel Machek
@ 2006-08-18  5:42                   ` Vitaly Wool
  2006-08-23 12:28                     ` Pavel Machek
  2006-08-18 11:48                   ` Amit Kucheria
  1 sibling, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-18  5:42 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

On 8/18/06, Pavel Machek <pavel@ucw.cz> wrote:
> > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > 'powersave', 'usb' to switch to the relevant operating point based on
> > configuration of the policy manager.
>
> This seems to be too specific to embedded machine.
>
> If userspace wants to work with usb and play mp3s at the same time,
> what does it do?

I guess it just defines an appropriate policy. You can call it
'usb_mp3' if you wish ;)
I don't think it's too embedded-specific.

Vitaly

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-17  9:18               ` Amit Kucheria
@ 2006-08-17 21:40                 ` Pavel Machek
  2006-08-18  5:42                   ` Vitaly Wool
  2006-08-18 11:48                   ` Amit Kucheria
  0 siblings, 2 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-17 21:40 UTC (permalink / raw)
  To: Amit Kucheria; +Cc: linux-pm

Hi!

> > Userspace cares about "save power" or "go fast".
> > Historically, I wish we had never exposed frequencies, but instead
> > a performance percentage, so that the various userspace tools
> > didn't have to care about things like 'what frequencies are
> > available'.
> > Adding the same mistake for voltages doesn't strike me as a fantastic
> > idea. 
> 
> The userspace interface in Eungeny's patches is for other userspace
> programs (policy managers) to activate/deactivate valid operating points
> in the system dynamically and if necessary, introduce new ones into the
> system. It will also allow the operating points to be referenced by name
> instead of the tuple.
> 
> Then, we will be able to use names like 'video', 'mp3', 'fast',
> 'powersave', 'usb' to switch to the relevant operating point based on
> configuration of the policy manager.

This seems to be too specific to embedded machine.

If userspace wants to work with usb and play mp3s at the same time,
what does it do?
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-16 12:58               ` Igor Stoppa
@ 2006-08-17 21:39                 ` Pavel Machek
  2006-08-18 10:02                   ` Igor Stoppa
  0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-17 21:39 UTC (permalink / raw)
  To: Igor Stoppa; +Cc: linux-pm, Kucheria Amit (Nokia-M/Tampere)

Hi!

> > If there are dependancies inherently linking core1 and core2, cpufreq
> > should already be programming both parts. For example, the SA1100
> > driver programs both CPU and SDRAM controller.  If there isn't any
> > dependancy
> > between them, I don't see the attraction of creating an artificial one
> > in the way suggested for no real purpose.
> > 
> > Things like voltage and frequency are closely tied together, so
> > offering
> > any means of controlling them independantly makes no sense afaics.

> Yet a certain subsystem (for example an onboard camera, in a phone)
> might require a higher voltage when it's active, effectively loosening
> the tight coupling between freq and voltage that the porcessor is
> enforcing.

So... you expect userland to echo high > state before camera can be
used?

I'd rather have kernel automagically up the voltage when /dev/video0
is opened...
							Pavel

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15 10:35           ` Amit Kucheria
  2006-08-15 19:04             ` Dave Jones
@ 2006-08-17 21:24             ` Pavel Machek
  1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-17 21:24 UTC (permalink / raw)
  To: Amit Kucheria; +Cc: linux-pm

Hi!

> 5. <Crystal Ball Gazing/Wishful Thinking>
> - Clock/Voltage FW <--> ACPI logical mappings allows us to use global
> state names in /sys/power/state for system power state transitions.

We probably do not want cpufreq-like controls in /sys/power/state.

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15 19:04             ` Dave Jones
  2006-08-16 12:58               ` Igor Stoppa
  2006-08-17  5:20               ` Matthew Locke
@ 2006-08-17  9:18               ` Amit Kucheria
  2006-08-17 21:40                 ` Pavel Machek
  2 siblings, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-17  9:18 UTC (permalink / raw)
  To: ext Dave Jones; +Cc: linux-pm

On Tue, 2006-08-15 at 15:04 -0400, ext Dave Jones wrote:
> 
>  > d. In the end, all this is leading to an interface for a user-space
>  > policy manager that will control _system_ power state based on
>  > constraints imposed by HW peripherals or on policies implemented by
>  > device manufacturer/distro maintainer.
> 
> How does that interface look from a userspace point of view ?
> Hopefully not anything like the tuple described above.
> Why would userspace ever care about "interconnect freq" ?
> 
> Userspace cares about "save power" or "go fast".
> Historically, I wish we had never exposed frequencies, but instead
> a performance percentage, so that the various userspace tools
> didn't have to care about things like 'what frequencies are
> available'.
> Adding the same mistake for voltages doesn't strike me as a fantastic
> idea. 

The userspace interface in Eungeny's patches is for other userspace
programs (policy managers) to activate/deactivate valid operating points
in the system dynamically and if necessary, introduce new ones into the
system. It will also allow the operating points to be referenced by name
instead of the tuple.

Then, we will be able to use names like 'video', 'mp3', 'fast',
'powersave', 'usb' to switch to the relevant operating point based on
configuration of the policy manager.

Regards,
Amit

-- 
Amit Kucheria <amit.kucheria@nokia.com>
Nokia

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-17  5:20               ` Matthew Locke
@ 2006-08-17  7:20                 ` Paul Mundt
  0 siblings, 0 replies; 136+ messages in thread
From: Paul Mundt @ 2006-08-17  7:20 UTC (permalink / raw)
  To: Matthew Locke; +Cc: linux-pm

On Wed, Aug 16, 2006 at 10:20:42PM -0700, Matthew Locke wrote:
> On Aug 15, 2006, at 12:04 PM, Dave Jones wrote:
> > If there are dependancies inherently linking core1 and core2, cpufreq
> > should already be programming both parts. For example, the SA1100
> > driver programs both CPU and SDRAM controller.  If there isn't any 
> > dependancy between them, I don't see the attraction of creating an
> > artificial one in the way suggested for no real purpose.
> 
> Are you arguing against the operating point concept because it creates 
> an artificial dependency?  I assume your definition of dependency means 
> a physical dependency.
> 
>   The operating point represents both a physical and operational 
> dependency.  It is a collection of parameters that can/will be adjusted 
> to reduce power consumption.  However, adjusting these parameters can 
> have a severe impact to performance and operational state of the 
> system.  The parameters can not be adjusted individually and still 
> achieve the goal of an operational and power efficient system.   SoC's 
> have a fixed number of values in a fixed number of combinations that 
> keep the system operational and power efficient.  Using power op,  a 
> piece of controlling software can tell the system to go to  specific 
> instance of the power parameters that provide the best combination of 
> power savings and performance/operational integrity according to the 
> current state of the system.  This instance is represented by a string.
> 
The core1 and core interdependencies are something that cpufreq doesn't
handle particularly well. If it's something as simple as recalibrating
your baud rate generator or adjusting the SDRAM controller, these are
all things that need to be done for normal operation to continue, not
things that end up being exposed or configurable, so it tends to largely
ignore the interdependency issue.

The problem occurs when you have independent cores that want to be
throttled or scaled independently, yet still have some fixed dependency
between them (say, enabling a synchronization circuit), where failure to
handle this will ultimately result in core reset or otherwise undefined
behaviour.

In order to handle this cleanly, one would need multiple drivers for
each core, as well as some shared common code for handling the clocks,
voltage, and sanity checks. This alone already begins to enter the scope
for what things like PowerOP are reasonably suited for. The operating
point semantics work well here, as independent core states can be
trivially defined based off of vendor-defined usage profiles. It's
necessary to have a big picture view of the operating point in order to
sanely handle sanity checks and rate validation, which is something that
gets rather ugly with cpufreq without referencing common validation code
between each core (which will also cause problems for code reuse in the
cases where one of the cores is used in another processor).

PowerOP seems well suited for these sorts of cases, and it's not clear
that trying to beat cpufreq in to submission will offer any benefits in
this case, particularly since it's something x86 people largely don't
seem to care about, given that ACPI does most of the heavy lifting.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15 19:04             ` Dave Jones
  2006-08-16 12:58               ` Igor Stoppa
@ 2006-08-17  5:20               ` Matthew Locke
  2006-08-17  7:20                 ` Paul Mundt
  2006-08-17  9:18               ` Amit Kucheria
  2 siblings, 1 reply; 136+ messages in thread
From: Matthew Locke @ 2006-08-17  5:20 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-pm


On Aug 15, 2006, at 12:04 PM, Dave Jones wrote:

> On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
>
>> Here is my shot at providing a 10,000ft view:
>>
>> a. For embedded platforms, cpufreq just does not cut it when 
>> specifying
>> an operating point for the device. These platforms use tuples such as
>>
>>   <voltage, pll_freq, core1_freq, core2_freq, interconnect_freq>
>>
>> to signify an 'operating point'. Note that core1 and core2 can be
>> totally different processors e.g. ARM and DSP. x86 platforms hide this
>> complexity behind ACPI.
>
> If there are dependancies inherently linking core1 and core2, cpufreq
> should already be programming both parts. For example, the SA1100
> driver programs both CPU and SDRAM controller.  If there isn't any 
> dependancy
> between them, I don't see the attraction of creating an artificial one
> in the way suggested for no real purpose.

Are you arguing against the operating point concept because it creates 
an artificial dependency?  I assume your definition of dependency means 
a physical dependency.

  The operating point represents both a physical and operational 
dependency.  It is a collection of parameters that can/will be adjusted 
to reduce power consumption.  However, adjusting these parameters can 
have a severe impact to performance and operational state of the 
system.  The parameters can not be adjusted individually and still 
achieve the goal of an operational and power efficient system.   SoC's 
have a fixed number of values in a fixed number of combinations that 
keep the system operational and power efficient.  Using power op,  a 
piece of controlling software can tell the system to go to  specific 
instance of the power parameters that provide the best combination of 
power savings and performance/operational integrity according to the 
current state of the system.  This instance is represented by a string.

PowerOP is needed to do advanced power management on embedded mobile 
devices.

>
> Things like voltage and frequency are closely tied together, so 
> offering
> any means of controlling them independantly makes no sense afaics.

It's not about controlling parameters independently.  We need to be 
able to control them as described above.

>
>> b. The clocking and voltage dependency tree in embedded devices can be
>> summarised in the clock framework ( find ./arch -name clock* ) and
>> soon-to-be-available voltage framework. These is again done to large
>> extent by ACPI on x86.
>
> Of the 14 x86 cpufreq drivers, 3 of them _optionally_ use ACPI.
> powernow-k7 for example doesn't use it, and is possibly one of the most
> stable cpufreq drivers we've had in the tree (for x86 at least).
>
>> d. In the end, all this is leading to an interface for a user-space
>> policy manager that will control _system_ power state based on
>> constraints imposed by HW peripherals or on policies implemented by
>> device manufacturer/distro maintainer.
>
> How does that interface look from a userspace point of view ?
> Hopefully not anything like the tuple described above.
> Why would userspace ever care about "interconnect freq" ?
>
> Userspace cares about "save power" or "go fast".
> Historically, I wish we had never exposed frequencies, but instead
> a performance percentage, so that the various userspace tools
> didn't have to care about things like 'what frequencies are available'.
> Adding the same mistake for voltages doesn't strike me as a fantastic 
> idea.

I'm not sure I follow your comments here.  We are not making the same 
mistake.  In fact we are fixing it with PowerOP.  The power parameters 
are represented by a name and you create whatever name makes sense for 
your system.  In fact the names can all be the same for the various x86 
platforms if you so desire.  The abstraction allows userspace to use 
the name and not know anything about the frequencies or voltages.  As 
Scott pointed out,  some power managers will need to know lots of 
architecture and board specific details to be able to reduce power 
consumption and keep the system operational.  The abstraction enables 
this as well.

>
>> At this point, PowerOP is an optional component in mainline tree.
>> Cpufreq drivers can _choose_ to use it or not. But now embedded
>> platforms can do the PM dance in a consistent way.
>
> That's about the only part I really like so far. The option to opt-out
> where it makes absolutely no sense to pointlessly abstract stuff
> (which for x86 seems to be the case).  For ARM, I'm going to leave
> Russell to comment/review.

I'm not following why you think PowerOP isn't needed for x86.  It seems 
to address the issues with cpufreq that you point out above.  The 
conclusion we reached at the PM summit was that cpufreq/PowerOP 
integration was useful and desired.

If we need to, I'm happy to put the integration of cpufreq/PowerOP 
aside and just work on getting PowerOP accepted.

>
> 		Dave
>
> -- 
> http://www.codemonkey.org.uk
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-16  1:27 Scott E. Preece
@ 2006-08-16 15:25 ` Mark Gross
  0 siblings, 0 replies; 136+ messages in thread
From: Mark Gross @ 2006-08-16 15:25 UTC (permalink / raw)
  To: Scott E. Preece; +Cc: linux-pm

On Tue, Aug 15, 2006 at 08:27:49PM -0500, Scott E. Preece wrote:
> 
> 
> | From: Dave Jones<davej@redhat.com>
> | 
> | On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
> | 
> |  > d. In the end, all this is leading to an interface for a user-space
> |  > policy manager that will control _system_ power state based on
> |  > constraints imposed by HW peripherals or on policies implemented by
> |  > device manufacturer/distro maintainer.
> | 
> | How does that interface look from a userspace point of view ?
> | Hopefully not anything like the tuple described above.
> | Why would userspace ever care about "interconnect freq" ?
> | 
> | Userspace cares about "save power" or "go fast".
> | Historically, I wish we had never exposed frequencies, but instead
> | a performance percentage, so that the various userspace tools
> | didn't have to care about things like 'what frequencies are available'.
> | Adding the same mistake for voltages doesn't strike me as a fantastic idea.
> ---
> 
> For us, "userspace" means a power policy manager that potentially has a
> lot of awareness about the power needs of specific applications and the
> overall use cases driving the device. There is no interface available or
> visible to a "user".  The policy manager does want to know about
> specific frequencies and voltages and their interaction, because they
> determine the circumstances under which it makes sense to make
> particular transitions.
> 
> As I think I mentioned at the PM Summit in April, it's important to
> recognize that the power and performance implications of operating
> points are not simply based on frequency. Sometimes you want so shift
> "sideways", because changing one parameter may be preferable to changing
> another. 

Yes, over time it will become unrealistic to assume that voltage is a 1
to 1 function of frequency in a power management implementation for most
architectures.  Additionally the control of more than just CPU power
consumption will become only a fraction of the runtime platform PM
story.  

What is trying to happen with this work is to take some initial steps to
enable more global power load controls by adding infrastructure to
expose the types of platform knobs to the system needed to implement
more power savings.  

The target is to enable cpufreq styled power load control to multiple
platform components.  Plugging a PowerOP interface in under CPUFREQ is
one way to try to get this while not breaking existing work.

I don't know if its ready for the mm tree yet, it should at least build
for i386 or x86_64 even if today there is not obvious value in non-ACPI
PM platform throttling for these guys.

It is true that the embedded folks will be the early adopters of this
type of thing, but the big iron folks will not be far behind, and
eventually the desktop and laptop crowd will likely follow.

> 
> Note that we also want to be able to run the same code on a range of
> devices that may have significantly different hardware performance, so
> an abstract set of names (fastest to slowest, for instance) is also a
> problem.  

The problem of what to expose to user space will vary from platform to
platform and use-case to use-case.  I don't think we'll find a one size
fits all solution to this issue.  

--mgross

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15 19:04             ` Dave Jones
@ 2006-08-16 12:58               ` Igor Stoppa
  2006-08-17 21:39                 ` Pavel Machek
  2006-08-17  5:20               ` Matthew Locke
  2006-08-17  9:18               ` Amit Kucheria
  2 siblings, 1 reply; 136+ messages in thread
From: Igor Stoppa @ 2006-08-16 12:58 UTC (permalink / raw)
  To: ext Dave Jones; +Cc: linux-pm, Kucheria Amit (Nokia-M/Tampere)

On Tue, 2006-08-15 at 22:04 +0300, ext Dave Jones wrote:
> On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
> 
>  > Here is my shot at providing a 10,000ft view:
>  >
>  > a. For embedded platforms, cpufreq just does not cut it when
> specifying
>  > an operating point for the device. These platforms use tuples such
> as
>  >
>  >   <voltage, pll_freq, core1_freq, core2_freq, interconnect_freq>
>  >
>  > to signify an 'operating point'. Note that core1 and core2 can be
>  > totally different processors e.g. ARM and DSP. x86 platforms hide
> this
>  > complexity behind ACPI.
> 
> If there are dependancies inherently linking core1 and core2, cpufreq
> should already be programming both parts. For example, the SA1100
> driver programs both CPU and SDRAM controller.  If there isn't any
> dependancy
> between them, I don't see the attraction of creating an artificial one
> in the way suggested for no real purpose.
> 
> Things like voltage and frequency are closely tied together, so
> offering
> any means of controlling them independantly makes no sense afaics.
Yet a certain subsystem (for example an onboard camera, in a phone)
might require a higher voltage when it's active, effectively loosening
the tight coupling between freq and voltage that the porcessor is
enforcing.
> 
>  > b. The clocking and voltage dependency tree in embedded devices can
> be
>  > summarised in the clock framework ( find ./arch -name clock* ) and
>  > soon-to-be-available voltage framework. These is again done to
> large
>  > extent by ACPI on x86.
> 
> Of the 14 x86 cpufreq drivers, 3 of them _optionally_ use ACPI.
> powernow-k7 for example doesn't use it, and is possibly one of the
> most
> stable cpufreq drivers we've had in the tree (for x86 at least).
> 
>  > d. In the end, all this is leading to an interface for a user-space
>  > policy manager that will control _system_ power state based on
>  > constraints imposed by HW peripherals or on policies implemented by
>  > device manufacturer/distro maintainer.
> 
> How does that interface look from a userspace point of view ?
> Hopefully not anything like the tuple described above.
> Why would userspace ever care about "interconnect freq" ?
> 
> Userspace cares about "save power" or "go fast".
> Historically, I wish we had never exposed frequencies, but instead
> a performance percentage, so that the various userspace tools
> didn't have to care about things like 'what frequencies are
> available'.
> Adding the same mistake for voltages doesn't strike me as a fantastic
> idea.

Such generic definitions are not enough for embedded userspace, the
complexity of the tuning is expected and accepted as long as it allows
to leverage the HW performance.
> 
>  > At this point, PowerOP is an optional component in mainline tree.
>  > Cpufreq drivers can _choose_ to use it or not. But now embedded
>  > platforms can do the PM dance in a consistent way.
> 
> That's about the only part I really like so far. The option to opt-out
> where it makes absolutely no sense to pointlessly abstract stuff
> (which for x86 seems to be the case).  For ARM, I'm going to leave
> Russell to comment/review.
> 
>                 Dave
> 
> --
> http://www.codemonkey.org.uk
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
> 
> 
-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
@ 2006-08-16  1:27 Scott E. Preece
  2006-08-16 15:25 ` Mark Gross
  0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-08-16  1:27 UTC (permalink / raw)
  To: davej; +Cc: linux-pm



| From: Dave Jones<davej@redhat.com>
| 
| On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
| 
|  > d. In the end, all this is leading to an interface for a user-space
|  > policy manager that will control _system_ power state based on
|  > constraints imposed by HW peripherals or on policies implemented by
|  > device manufacturer/distro maintainer.
| 
| How does that interface look from a userspace point of view ?
| Hopefully not anything like the tuple described above.
| Why would userspace ever care about "interconnect freq" ?
| 
| Userspace cares about "save power" or "go fast".
| Historically, I wish we had never exposed frequencies, but instead
| a performance percentage, so that the various userspace tools
| didn't have to care about things like 'what frequencies are available'.
| Adding the same mistake for voltages doesn't strike me as a fantastic idea.
---

For us, "userspace" means a power policy manager that potentially has a
lot of awareness about the power needs of specific applications and the
overall use cases driving the device. There is no interface available or
visible to a "user".  The policy manager does want to know about
specific frequencies and voltages and their interaction, because they
determine the circumstances under which it makes sense to make
particular transitions.

As I think I mentioned at the PM Summit in April, it's important to
recognize that the power and performance implications of operating
points are not simply based on frequency. Sometimes you want so shift
"sideways", because changing one parameter may be preferable to changing
another. 

Note that we also want to be able to run the same code on a range of
devices that may have significantly different hardware performance, so
an abstract set of names (fastest to slowest, for instance) is also a
problem.  

scott
-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15 10:35           ` Amit Kucheria
@ 2006-08-15 19:04             ` Dave Jones
  2006-08-16 12:58               ` Igor Stoppa
                                 ` (2 more replies)
  2006-08-17 21:24             ` Pavel Machek
  1 sibling, 3 replies; 136+ messages in thread
From: Dave Jones @ 2006-08-15 19:04 UTC (permalink / raw)
  To: Amit Kucheria; +Cc: linux-pm

On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:

 > Here is my shot at providing a 10,000ft view:
 > 
 > a. For embedded platforms, cpufreq just does not cut it when specifying
 > an operating point for the device. These platforms use tuples such as
 > 
 >   <voltage, pll_freq, core1_freq, core2_freq, interconnect_freq> 
 > 
 > to signify an 'operating point'. Note that core1 and core2 can be
 > totally different processors e.g. ARM and DSP. x86 platforms hide this
 > complexity behind ACPI.

If there are dependancies inherently linking core1 and core2, cpufreq
should already be programming both parts. For example, the SA1100
driver programs both CPU and SDRAM controller.  If there isn't any dependancy
between them, I don't see the attraction of creating an artificial one
in the way suggested for no real purpose.

Things like voltage and frequency are closely tied together, so offering
any means of controlling them independantly makes no sense afaics.

 > b. The clocking and voltage dependency tree in embedded devices can be
 > summarised in the clock framework ( find ./arch -name clock* ) and
 > soon-to-be-available voltage framework. These is again done to large
 > extent by ACPI on x86.

Of the 14 x86 cpufreq drivers, 3 of them _optionally_ use ACPI.
powernow-k7 for example doesn't use it, and is possibly one of the most
stable cpufreq drivers we've had in the tree (for x86 at least).

 > d. In the end, all this is leading to an interface for a user-space
 > policy manager that will control _system_ power state based on
 > constraints imposed by HW peripherals or on policies implemented by
 > device manufacturer/distro maintainer.

How does that interface look from a userspace point of view ?
Hopefully not anything like the tuple described above.
Why would userspace ever care about "interconnect freq" ?

Userspace cares about "save power" or "go fast".
Historically, I wish we had never exposed frequencies, but instead
a performance percentage, so that the various userspace tools
didn't have to care about things like 'what frequencies are available'.
Adding the same mistake for voltages doesn't strike me as a fantastic idea.

 > At this point, PowerOP is an optional component in mainline tree.
 > Cpufreq drivers can _choose_ to use it or not. But now embedded
 > platforms can do the PM dance in a consistent way.

That's about the only part I really like so far. The option to opt-out
where it makes absolutely no sense to pointlessly abstract stuff
(which for x86 seems to be the case).  For ARM, I'm going to leave
Russell to comment/review.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15  1:00         ` Greg KH
  2006-08-15  3:03           ` Dave Jones
@ 2006-08-15 10:35           ` Amit Kucheria
  2006-08-15 19:04             ` Dave Jones
  2006-08-17 21:24             ` Pavel Machek
  2006-08-19  6:10           ` David Singleton
  2006-08-19  6:19           ` David Singleton
  3 siblings, 2 replies; 136+ messages in thread
From: Amit Kucheria @ 2006-08-15 10:35 UTC (permalink / raw)
  To: ext Greg KH; +Cc: linux-pm

On Mon, 2006-08-14 at 18:00 -0700, ext Greg KH wrote:
> On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> > 
> > This adds a whole bunch of new code, and doesn't seem to make any
> > existing code any simpler (to me at least).  From a cpufreq point of view,
> > what does adding this buy us? What problem do we have today that is
> > being solved by all this?
> > 
> > Every explanation of powerop I've seen so far dives into microdetails,
> > whilst the 10,000ft view has always passed me by other than "this is
> > what we've had in the embedded world".
> > 
> > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> > also confuses me.  I was under the impression that powerop was adding additional
> > userspace interfaces.  If we're not changing how things from a userspace
> > point of view, we're churning a lot of kernel code,.. why?
> > 
> > Clue me in here, I'm feeling thick.
> 
> You're not alone, I really don't get it either.
> 
> But I guess we'll just wait for the next round of unified patches and
> then go from there.

Here is my shot at providing a 10,000ft view:

a. For embedded platforms, cpufreq just does not cut it when specifying
an operating point for the device. These platforms use tuples such as

  <voltage, pll_freq, core1_freq, core2_freq, interconnect_freq> 

to signify an 'operating point'. Note that core1 and core2 can be
totally different processors e.g. ARM and DSP. x86 platforms hide this
complexity behind ACPI.

b. The clocking and voltage dependency tree in embedded devices can be
summarised in the clock framework ( find ./arch -name clock* ) and
soon-to-be-available voltage framework. These is again done to large
extent by ACPI on x86.

c. PowerOP provides an interface, that most embedded developers agree,
is a good starting point to encapsulate platform information in a
consistent way without having to resort to subarch-specific kludges.

d. In the end, all this is leading to an interface for a user-space
policy manager that will control _system_ power state based on
constraints imposed by HW peripherals or on policies implemented by
device manufacturer/distro maintainer.

In conclusion, PowerOP only allows embedded platforms to join the PM fun
without affecting cpufreq-supported platforms adversely. Or that is the
original idea. If that rule is being violated, please feel free to point
that out.

The way forward as I see is:

1. PowerOP integration into mainline [patch 1/3]
    - PM Core drivers (OMAP, x86) [patch 2/3]
    - cpufreq drivers modified to use PowerOP (OMAP, x86) [patch 3/3]

At this point, PowerOP is an optional component in mainline tree.
Cpufreq drivers can _choose_ to use it or not. But now embedded
platforms can do the PM dance in a consistent way.

2. Support for more architectures - PPC, x86_64, XScale, MIPS? Platform
expertise needed here.

3. Move clock/voltage framework from arch/arm to kernel/clock and
kernel/voltage. This would allow more embedded platforms and potentially
future PC platforms to utilise the framework for 'automated'
clock/voltage dependency management.

4. Userspace policy managers are created to provide basic control of
system operating points. Embedded system integrators might want to
extend these policy managers to fully utilise the 'knobs' available on
their platforms.

5. <Crystal Ball Gazing/Wishful Thinking>
- Clock/Voltage FW <--> ACPI logical mappings allows us to use global
state names in /sys/power/state for system power state transitions.
- Drivers on PC platforms are fixed to use/unuse resources dynamically
to allow asynchronous peripheral power state transitions. Embedded
platforms use clock framework for this.
- PowerOP can address needs of all platforms, allowing removal of
cpufreq.
   </CBG/WT>

Regards,
Amit
-- 
Amit Kucheria <amit.kucheria@nokia.com>
Nokia

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-15  1:00         ` Greg KH
@ 2006-08-15  3:03           ` Dave Jones
  2006-08-15 10:35           ` Amit Kucheria
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 136+ messages in thread
From: Dave Jones @ 2006-08-15  3:03 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-pm

On Mon, Aug 14, 2006 at 06:00:20PM -0700, Greg Kroah-Hartman wrote:
 
 > > This adds a whole bunch of new code, and doesn't seem to make any
 > > existing code any simpler (to me at least).  From a cpufreq point of view,
 > > what does adding this buy us? What problem do we have today that is
 > > being solved by all this?
 > > 
 > > Every explanation of powerop I've seen so far dives into microdetails,
 > > whilst the 10,000ft view has always passed me by other than "this is
 > > what we've had in the embedded world".
 > > 
 > > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
 > > also confuses me.  I was under the impression that powerop was adding additional
 > > userspace interfaces.  If we're not changing how things from a userspace
 > > point of view, we're churning a lot of kernel code,.. why?
 > > 
 > > Clue me in here, I'm feeling thick.
 > 
 > You're not alone, I really don't get it either.
 > 
 > But I guess we'll just wait for the next round of unified patches and
 > then go from there.

I have concerns over this because the cpufreq code has gotten pretty damned
complicated in parts, and it's really impacting our ability to fix bugs in
the thing.  Every time something new falls out we have to play archaeologist
looking up a lot of ancient changes to figure why we did x in y way, and why
z didn't work, and it's getting quite unfun.  In a lot of cases even the
original authors of the problematic parts can't remember their reasoning.

I've got a fairly good handle on most parts, but things like the recent
cpufreq vs hotplug-cpu fiasco (which went in via some other route rather than the
cpufreq tree) really threw me a curve-ball, and no-one other than Linus, Andrew
and myself stepped up to the plate to fix that mess, despite there being
the better part of a half dozen people who had hacked on it during its integration.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-14 23:48       ` Dave Jones
@ 2006-08-15  1:00         ` Greg KH
  2006-08-15  3:03           ` Dave Jones
                             ` (3 more replies)
  0 siblings, 4 replies; 136+ messages in thread
From: Greg KH @ 2006-08-15  1:00 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-pm

On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> 
> This adds a whole bunch of new code, and doesn't seem to make any
> existing code any simpler (to me at least).  From a cpufreq point of view,
> what does adding this buy us? What problem do we have today that is
> being solved by all this?
> 
> Every explanation of powerop I've seen so far dives into microdetails,
> whilst the 10,000ft view has always passed me by other than "this is
> what we've had in the embedded world".
> 
> The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> also confuses me.  I was under the impression that powerop was adding additional
> userspace interfaces.  If we're not changing how things from a userspace
> point of view, we're churning a lot of kernel code,.. why?
> 
> Clue me in here, I'm feeling thick.

You're not alone, I really don't get it either.

But I guess we'll just wait for the next round of unified patches and
then go from there.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-14 23:29   ` Dominik Brodowski
@ 2006-08-14 23:48     ` Matthew Locke
  0 siblings, 0 replies; 136+ messages in thread
From: Matthew Locke @ 2006-08-14 23:48 UTC (permalink / raw)
  To: Dominik Brodowski; +Cc: linux-pm

Dominik,

On Aug 14, 2006, at 4:29 PM, Dominik Brodowski wrote:

> Hi,
>
> On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:
>> I am a little concerned that none of the cpufreq developers have
>> responded. I was hoping to get their feedback.
>
> Graduating from one law school, moving to the US, and adapting to 
> another law
> school proved to be quite time-consuming for me, but I hope to get 
> back to
> linux-related things within this and the next week -- so please excuse 
> my
> delay so far...

I hope the transition is happening smoothly.  Looking forward to your 
comments when you have time.

>
> Thanks,
> 	Dominik
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-14 23:24     ` Matthew Locke
@ 2006-08-14 23:48       ` Dave Jones
  2006-08-15  1:00         ` Greg KH
  0 siblings, 1 reply; 136+ messages in thread
From: Dave Jones @ 2006-08-14 23:48 UTC (permalink / raw)
  To: Matthew Locke; +Cc: linux-pm

On Mon, Aug 14, 2006 at 04:24:33PM -0700, Matthew Locke wrote:

 > > If we're arriving any closer to consensus on whats mergable from the
 > > cpufreq side, and what needs more input, I'll find the time to review
 > > soon, but there still seems to be ongoing discussion which is why I
 > > decided to leave it sort itself out :)
 > 
 > I think we are at the stage of need more input on the last set of 
 > Eugeny's patches. (the ones I point to in my email)  The cpufreq 
 > patches, so far, are more for example.  We need a bit of work before 
 > they are ready for merging.  However, I would prefer to have your 
 > feedback now rather than later.

I gave them a quick lookover, and there are the to-be-expected minor
nits, but there's something more fundamental that I'm still not getting.

This adds a whole bunch of new code, and doesn't seem to make any
existing code any simpler (to me at least).  From a cpufreq point of view,
what does adding this buy us? What problem do we have today that is
being solved by all this?

Every explanation of powerop I've seen so far dives into microdetails,
whilst the 10,000ft view has always passed me by other than "this is
what we've had in the embedded world".

The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
also confuses me.  I was under the impression that powerop was adding additional
userspace interfaces.  If we're not changing how things from a userspace
point of view, we're churning a lot of kernel code,.. why?

Clue me in here, I'm feeling thick.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-14 22:24 ` Matthew Locke
  2006-08-14 22:46   ` Dave Jones
@ 2006-08-14 23:29   ` Dominik Brodowski
  2006-08-14 23:48     ` Matthew Locke
  1 sibling, 1 reply; 136+ messages in thread
From: Dominik Brodowski @ 2006-08-14 23:29 UTC (permalink / raw)
  To: Matthew Locke; +Cc: linux-pm

Hi,

On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:
> I am a little concerned that none of the cpufreq developers have 
> responded. I was hoping to get their feedback.

Graduating from one law school, moving to the US, and adapting to another law
school proved to be quite time-consuming for me, but I hope to get back to
linux-related things within this and the next week -- so please excuse my
delay so far...

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-14 22:46   ` Dave Jones
@ 2006-08-14 23:24     ` Matthew Locke
  2006-08-14 23:48       ` Dave Jones
  0 siblings, 1 reply; 136+ messages in thread
From: Matthew Locke @ 2006-08-14 23:24 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-pm


On Aug 14, 2006, at 3:46 PM, Dave Jones wrote:

> On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:
>
>> I am a little concerned that none of the cpufreq developers have
>> responded. I was hoping to get their feedback.
>
> I was waiting for the dust to settle before spending a significant
> amount of time reviewing.  I have to admit, the two patchsets thing
> did confuse me too. (Though I've also been swamped with bugs since
> I got back from OLS, so I've appreciated the breathing room :)

Yeah,  I understand that.  I'm still catching up as well.

>
> If we're arriving any closer to consensus on whats mergable from the
> cpufreq side, and what needs more input, I'll find the time to review
> soon, but there still seems to be ongoing discussion which is why I
> decided to leave it sort itself out :)

I think we are at the stage of need more input on the last set of 
Eugeny's patches. (the ones I point to in my email)  The cpufreq 
patches, so far, are more for example.  We need a bit of work before 
they are ready for merging.  However, I would prefer to have your 
feedback now rather than later.

>
>>> (If you can't tell I'm getting a bit annoyed at having to tell people
>>> all the time that yes, power management on Linux is bad, and yes,
>>> people
>>> are working on it, but no, I have no idea when it will ever see the
>>> light of day...)
>>
>> Well, we are working on it.
>
> Sadly powerop is but a tiny piece of the puzzle.

Cheer up guys. Power management will get better one piece at a time; 
just like the rest of Linux:)

>
> 		Dave
>
> -- 
> http://www.codemonkey.org.uk
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-14 22:24 ` Matthew Locke
@ 2006-08-14 22:46   ` Dave Jones
  2006-08-14 23:24     ` Matthew Locke
  2006-08-14 23:29   ` Dominik Brodowski
  1 sibling, 1 reply; 136+ messages in thread
From: Dave Jones @ 2006-08-14 22:46 UTC (permalink / raw)
  To: Matthew Locke; +Cc: linux-pm

On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:

 > I am a little concerned that none of the cpufreq developers have 
 > responded. I was hoping to get their feedback.

I was waiting for the dust to settle before spending a significant
amount of time reviewing.  I have to admit, the two patchsets thing
did confuse me too. (Though I've also been swamped with bugs since
I got back from OLS, so I've appreciated the breathing room :)

If we're arriving any closer to consensus on whats mergable from the
cpufreq side, and what needs more input, I'll find the time to review
soon, but there still seems to be ongoing discussion which is why I
decided to leave it sort itself out :)

 > > (If you can't tell I'm getting a bit annoyed at having to tell people
 > > all the time that yes, power management on Linux is bad, and yes, 
 > > people
 > > are working on it, but no, I have no idea when it will ever see the
 > > light of day...)
 > 
 > Well, we are working on it.

Sadly powerop is but a tiny piece of the puzzle.
 
		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: So, what's the status on the recent patches here?
  2006-08-14 20:07 Greg KH
@ 2006-08-14 22:24 ` Matthew Locke
  2006-08-14 22:46   ` Dave Jones
  2006-08-14 23:29   ` Dominik Brodowski
  0 siblings, 2 replies; 136+ messages in thread
From: Matthew Locke @ 2006-08-14 22:24 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-pm

On Aug 14, 2006, at 1:07 PM, Greg KH wrote:

> I'm seeing a lot of threads without very much resolution on the
> differing patches that are flying around here in regards to the rework
> of the power management stuff (not suspend stuff...)
>

RIght now there are two sets of patches with the name powerop.

One set (from Eugeny and myself) is focused on getting agreement for 
the PowerOP interface and operating point definition.  I believe the 
last patchset Eugeny submitted as incorporated all the comments about 
PowerOP so far.   I don't think integrating PowerOP with suspend 
(/sys/power/state) is appropriate at this time (as others agreed).  I 
would rather see PowerOP accepted and used by cpufreq before we tackle 
suspend/resume.

The other set posted by Dave Singleton is geared towards showing how 
PowerOP can be used by both cpufreq and suspend code.  It contains lots 
of features that have not been reviewed or discussed.



> So, should I just grab a random patchset from here and add it to my
> trees and get it into -mm for testing, or does someone want to possibly
> guide me to the set that everyone seems to agree apon?

No, please don't grab a random patchset:)  IMO,  the patches from 
Eugeny and myself are the ones to grab and put into -mm.  We were 
hoping to get some feedback on the set posted here  
http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html but I 
think the two patchsets have confused the situation.   We are working 
on the next rev of these patches which will mostly be some clean up and 
tighter integration  with cpufreq.  Our plan was to get the next rev 
out before we request inclusion in -mm.  However if you are ready to 
look at and play with patches. Start with the ones at the link above.

I am a little concerned that none of the cpufreq developers have 
responded. I was hoping to get their feedback.

>
> Or, is there two (or more) competing patch sets here that need to get
> resolved?

I don't view the two patchsets as competing.  Eugeny and I are focused 
on getting the basic building block necessary to do advanced frequency 
and voltage scaling accepted.  If we can get PowerOP in the mainline, 
then we can add more feature by feature.    As Dave outlined in his 
email,  his patches are a starting point for further discussion about 
integrating with other subsystems and additional features.  Let's focus 
on getting PowerOP accepted by starting with Eugeny's patches which 
provides powerop as a separate component and integration with cpufreq.

> (If you can't tell I'm getting a bit annoyed at having to tell people
> all the time that yes, power management on Linux is bad, and yes, 
> people
> are working on it, but no, I have no idea when it will ever see the
> light of day...)

Well, we are working on it.  I think we had some really good 
discussion/feedback over the last weeks and we are almost there.  
Unfortunately, the discussion tapered off recently when we needed some 
final feedback.  Probably related to having two patchsets with the name 
powerop.  Let's try to get something acceptable in -mm over the next 
couple days.

Thanks

Matt
>
> thanks,
>
> greg k-h
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* So, what's the status on the recent patches here?
@ 2006-08-14 20:07 Greg KH
  2006-08-14 22:24 ` Matthew Locke
  0 siblings, 1 reply; 136+ messages in thread
From: Greg KH @ 2006-08-14 20:07 UTC (permalink / raw)
  To: linux-pm

I'm seeing a lot of threads without very much resolution on the
differing patches that are flying around here in regards to the rework
of the power management stuff (not suspend stuff...)

So, should I just grab a random patchset from here and add it to my
trees and get it into -mm for testing, or does someone want to possibly
guide me to the set that everyone seems to agree apon?

Or, is there two (or more) competing patch sets here that need to get
resolved?

(If you can't tell I'm getting a bit annoyed at having to tell people
all the time that yes, power management on Linux is bad, and yes, people
are working on it, but no, I have no idea when it will ever see the
light of day...)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 136+ messages in thread

end of thread, other threads:[~2006-09-11 18:58 UTC | newest]

Thread overview: 136+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-25 20:05 So, what's the status on the recent patches here? Woodruff, Richard
2006-08-25 20:08 ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2006-09-05 16:03 Scott E. Preece
2006-09-05 20:42 ` Rafael J. Wysocki
2006-09-06 10:56 ` Pavel Machek
2006-09-04 15:43 Scott E. Preece
2006-09-03 23:05 Scott E. Preece
2006-09-04  9:09 ` Pavel Machek
2006-09-03 23:00 Scott E. Preece
2006-09-04  9:12 ` Pavel Machek
2006-09-05 10:31 ` Rafael J. Wysocki
2006-09-03 22:40 Scott E. Preece
2006-09-04  9:06 ` Pavel Machek
2006-09-05 16:45   ` Mark Gross
2006-09-06 10:59     ` Pavel Machek
2006-09-03 22:31 Scott E. Preece
2006-09-03 22:41 ` Pavel Machek
2006-09-03 22:12 Scott E. Preece
2006-09-03 22:25 ` Pavel Machek
2006-09-03 21:34 Scott E. Preece
2006-09-03 21:43 ` Pavel Machek
2006-09-03 22:10 ` Rafael J. Wysocki
2006-09-03 21:21 Scott E. Preece
2006-09-03 21:54 ` Pavel Machek
2006-09-01 14:49 Scott E. Preece
2006-08-31 15:14 Scott E. Preece
2006-08-31  2:41 Woodruff, Richard
2006-08-31  0:52 Scott E. Preece
2006-08-25 22:11 Woodruff, Richard
2006-08-25 21:21 Woodruff, Richard
2006-08-25 21:42 ` Alan Stern
2006-08-25 20:57 Woodruff, Richard
2006-08-25 21:13 ` Alan Stern
2006-08-25 20:22 Woodruff, Richard
2006-08-25 20:34 ` Alan Stern
2006-08-25 21:27   ` Pavel Machek
2006-08-25 21:46     ` Alan Stern
2006-08-25 22:03       ` Pavel Machek
2006-08-26  2:21         ` Alan Stern
2006-08-24 14:52 Woodruff, Richard
2006-08-25 19:58 ` Pavel Machek
2006-08-24 12:16 Woodruff, Richard
2006-08-24 12:29 ` Pavel Machek
2006-08-23 19:20 Woodruff, Richard
2006-08-24  8:03 ` Pavel Machek
2006-08-20 13:36 Woodruff, Richard
2006-08-16  1:27 Scott E. Preece
2006-08-16 15:25 ` Mark Gross
2006-08-14 20:07 Greg KH
2006-08-14 22:24 ` Matthew Locke
2006-08-14 22:46   ` Dave Jones
2006-08-14 23:24     ` Matthew Locke
2006-08-14 23:48       ` Dave Jones
2006-08-15  1:00         ` Greg KH
2006-08-15  3:03           ` Dave Jones
2006-08-15 10:35           ` Amit Kucheria
2006-08-15 19:04             ` Dave Jones
2006-08-16 12:58               ` Igor Stoppa
2006-08-17 21:39                 ` Pavel Machek
2006-08-18 10:02                   ` Igor Stoppa
2006-08-18 15:29                     ` Alexey Starikovskiy
2006-08-18 17:54                       ` Igor Stoppa
2006-08-18 21:05                         ` Alexey Starikovskiy
2006-08-20 13:19                           ` Igor Stoppa
2006-08-17  5:20               ` Matthew Locke
2006-08-17  7:20                 ` Paul Mundt
2006-08-17  9:18               ` Amit Kucheria
2006-08-17 21:40                 ` Pavel Machek
2006-08-18  5:42                   ` Vitaly Wool
2006-08-23 12:28                     ` Pavel Machek
2006-08-23 15:26                       ` Igor Stoppa
2006-08-24 12:58                       ` Vitaly Wool
2006-08-25 19:55                         ` Pavel Machek
2006-08-25 23:26                           ` Vitaly Wool
2006-08-26 10:18                             ` Pavel Machek
2006-08-26 13:30                               ` Vitaly Wool
2006-08-26 13:46                                 ` Pavel Machek
2006-08-28 16:40                                   ` Mark Gross
2006-08-28 17:39                                     ` Pavel Machek
2006-08-29  7:51                                       ` Matthew Locke
2006-08-30 22:13                                       ` Mark Gross
2006-08-30 22:27                                         ` Pavel Machek
2006-08-18 11:48                   ` Amit Kucheria
2006-08-24  7:59                     ` Pavel Machek
2006-08-30 11:00                       ` Amit Kucheria
2006-08-30 22:36                         ` Pavel Machek
2006-08-31 13:44                           ` Amit Kucheria
2006-09-02 11:17                             ` Pavel Machek
2006-08-17 21:24             ` Pavel Machek
2006-08-19  6:10           ` David Singleton
2006-08-22  2:13             ` Greg KH
2006-08-22  5:20               ` David Singleton
2006-08-23 19:05             ` Mark Gross
2006-08-24 12:39             ` Pavel Machek
2006-08-19  6:19           ` David Singleton
     [not found]             ` <20060819184843.GB15644@redhat.com>
2006-08-20  3:20               ` David Singleton
2006-08-20  3:30                 ` Dave Jones
2006-08-23 18:50                   ` Mark Gross
2006-08-27  4:37                   ` David Singleton
2006-08-27 15:41                     ` Pavel Machek
2006-08-29 15:55                       ` David Singleton
2006-08-29 16:34                         ` Pavel Machek
2006-08-29 17:49                           ` Preece Scott-PREECE
2006-08-30  6:20                             ` Matthew Locke
2006-08-30 13:26                               ` Preece Scott-PREECE
2006-08-30 22:50                                 ` Pavel Machek
2006-08-31  0:22                                   ` Preece Scott-PREECE
2006-08-31 12:04                                     ` Pavel Machek
2006-09-02 18:05                               ` David Singleton
2006-09-02 19:30                                 ` Rafael J. Wysocki
2006-09-03 16:25                                   ` David Singleton
2006-09-03 20:57                                     ` Rafael J. Wysocki
2006-09-03 21:33                                     ` Pavel Machek
2006-09-09  0:39                                       ` David Singleton
2006-09-09  0:48                                         ` David Singleton
2006-09-09 16:13                                           ` Pavel Machek
2006-09-09 12:17                                         ` Pavel Machek
2006-09-11 15:11                                           ` David Singleton
2006-09-11 17:14                                             ` Pavel Machek
2006-09-11 18:58                                             ` Matthew Locke
2006-08-30  4:52                           ` David Singleton
2006-08-30  5:52                             ` Matthew Locke
2006-08-30 13:39                               ` Preece Scott-PREECE
2006-08-30 22:43                             ` Pavel Machek
2006-08-27 19:48                     ` Greg KH
2006-08-28  0:07                       ` David Singleton
2006-08-27 20:54                     ` Eugeny S. Mints
2006-08-28 22:18                       ` Pavel Machek
2006-08-29 21:46                         ` Eugeny S. Mints
2006-08-29  1:29                       ` David Singleton
2006-08-29 22:39                         ` Eugeny S. Mints
2006-08-31 13:27                         ` Amit Kucheria
2006-08-31 19:22                           ` Preece Scott-PREECE
2006-09-01  8:11                             ` Amit Kucheria
2006-08-14 23:29   ` Dominik Brodowski
2006-08-14 23:48     ` Matthew Locke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.