All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 17:30 ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-23 17:30 UTC (permalink / raw)
  To: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet,
	Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet
  Cc: linux-kernel, linux-arm-kernel, lm-sensors, linaro-dev

Greetings All,

More and more of people are getting interested in the subject of power
(energy) consumption monitoring. We have some external tools like
"battery simulators", energy probes etc., but some targets can measure
their power usage on their own.

Traditionally such data should be exposed to the user via hwmon sysfs
interface, and that's exactly what I did for "my" platform - I have
a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
enough to draw pretty graphs in userspace. Everyone was happy...

Now I am getting new requests to do more with this data. In particular
I'm asked how to add such information to ftrace/perf output. The second
most frequent request is about providing it to a "energy aware"
cpufreq governor.

I've came up with three (non-mutually exclusive) options. I will
appreciate any other ideas and comments (including "it makes not sense
whatsoever" ones, with justification). Of course I am more than willing
to spend time on prototyping anything that seems reasonable and propose
patches.



=== Option 1: Trace event ===

This seems to be the "cheapest" option. Simply defining a trace event
that can be generated by a hwmon (or any other) driver makes the
interesting data immediately available to any ftrace/perf user. Of
course it doesn't really help with the cpufreq case, but seems to be
a good place to start with.

The question is how to define it... I've came up with two prototypes:

= Generic hwmon trace event =

This one allows any driver to generate a trace event whenever any
"hwmon attribute" (measured value) gets updated. The rate at which the
updates happen can be controlled by already existing "update_interval"
attribute.

8<-------------------------------------------
TRACE_EVENT(hwmon_attr_update,
	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
	TP_ARGS(dev, attr, input),

	TP_STRUCT__entry(
		__string(       dev,		dev_name(dev))
		__string(	attr,		attr->name)
		__field(	long long,	input)
	),

	TP_fast_assign(
		__assign_str(dev, dev_name(dev));
		__assign_str(attr, attr->name);
		__entry->input = input;
	),

	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
);
8<-------------------------------------------

It generates such ftrace message:

<...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361

One issue with this is that some external knowledge is required to
relate a number to a processor core. Or maybe it's not an issue at all
because it should be left for the user(space)?

= CPU power/energy/temperature trace event =

This one is designed to emphasize the relation between the measured
value (whether it is energy, temperature or any other physical
phenomena, really) and CPUs, so it is quite specific (too specific?)

8<-------------------------------------------
TRACE_EVENT(cpus_environment,
	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
	TP_ARGS(cpus, value, unit),

	TP_STRUCT__entry(
		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
		__field(	long long,	value)
		__field(	char,		unit)
	),

	TP_fast_assign(
		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
		__entry->value = value;
		__entry->unit = unit;
	),

	TP_printk("cpus %s %lld[%c]",
		__print_cpumask((struct cpumask *)__entry->cpus),
		__entry->value, __entry->unit)
);
8<-------------------------------------------

And the equivalent ftrace message is:

<...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]

It's a cpumask, not just single cpu id, because the sensor may measure
the value per set of CPUs, eg. a temperature of the whole silicon die
(so all the cores) or an energy consumed by a subset of cores (this
is my particular use case - two meters monitor a cluster of two
processors and a cluster of three processors, all working as a SMP
system).

Of course the cpus __array could be actually a special __cpumask field
type (I've just hacked the __print_cpumask so far). And I've just
realised that the unit field should actually be a string to allow unit
prefixes to be specified (the above should obviously be "34361[mC]"
not "[C]"). Also - excuse the "cpus_environment" name - this was the
best I was able to come up with at the time and I'm eager to accept
any alternative suggestions :-)



=== Option 2: hwmon perf PMU ===

Although the trace event makes it possible to obtain interesting
information using perf, the user wouldn't be able to treat the
energy meter as a normal data source. In particular there would
be no way of creating a group of events consisting eg. of a
"normal" leader (eg. cache miss event) triggering energy meter
read. The only way to get this done is to implement a perf PMU
backend providing "environmental data" to the user.

= High-level hwmon API and PMU =

Current hwmon subsystem does not provide any abstraction for the
measured values and requires particular drivers to create specified
sysfs attributes than used by userspace libsensors. This makes
the framework ultimately flexible and ultimately hard to access
from within the kernel...

What could be done here is some (simple) API to register the
measured values with the hwmon core which would result in creating
equivalent sysfs attributes automagically, but also allow a
in-kernel API for values enumeration and access. That way the core
could also register a "hwmon PMU" with the perf framework providing
data from all "compliant" drivers.

= A driver-specific PMU =

Of course a particular driver could register its own perf PMU on its
own. It's certainly an option, just very suboptimal in my opinion.
Or maybe not? Maybe the task is so specialized that it makes sense?



=== Option 3: CPU power(energy) monitoring framework ===

And last but not least, maybe the problem deserves some dedicated
API? Something that would take providers and feed their data into
interested parties, in particular a perf PMU implementation and
cpufreq governors?

Maybe it could be an extension to the thermal framework? It already
gives some meaning to a physical phenomena. Adding other, related ones
like energy, and relating it to cpu cores could make some sense.



I've tried to gather all potentially interested audience in the To:
list, but if I missed anyone - please, do let them (and/or me) know.

Best regards and thanks for participation in the discussion!

Pawel




^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 17:30 ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-23 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

Greetings All,

More and more of people are getting interested in the subject of power
(energy) consumption monitoring. We have some external tools like
"battery simulators", energy probes etc., but some targets can measure
their power usage on their own.

Traditionally such data should be exposed to the user via hwmon sysfs
interface, and that's exactly what I did for "my" platform - I have
a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
enough to draw pretty graphs in userspace. Everyone was happy...

Now I am getting new requests to do more with this data. In particular
I'm asked how to add such information to ftrace/perf output. The second
most frequent request is about providing it to a "energy aware"
cpufreq governor.

I've came up with three (non-mutually exclusive) options. I will
appreciate any other ideas and comments (including "it makes not sense
whatsoever" ones, with justification). Of course I am more than willing
to spend time on prototyping anything that seems reasonable and propose
patches.



=== Option 1: Trace event ===

This seems to be the "cheapest" option. Simply defining a trace event
that can be generated by a hwmon (or any other) driver makes the
interesting data immediately available to any ftrace/perf user. Of
course it doesn't really help with the cpufreq case, but seems to be
a good place to start with.

The question is how to define it... I've came up with two prototypes:

= Generic hwmon trace event =

This one allows any driver to generate a trace event whenever any
"hwmon attribute" (measured value) gets updated. The rate at which the
updates happen can be controlled by already existing "update_interval"
attribute.

8<-------------------------------------------
TRACE_EVENT(hwmon_attr_update,
	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
	TP_ARGS(dev, attr, input),

	TP_STRUCT__entry(
		__string(       dev,		dev_name(dev))
		__string(	attr,		attr->name)
		__field(	long long,	input)
	),

	TP_fast_assign(
		__assign_str(dev, dev_name(dev));
		__assign_str(attr, attr->name);
		__entry->input = input;
	),

	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
);
8<-------------------------------------------

It generates such ftrace message:

<...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361

One issue with this is that some external knowledge is required to
relate a number to a processor core. Or maybe it's not an issue at all
because it should be left for the user(space)?

= CPU power/energy/temperature trace event =

This one is designed to emphasize the relation between the measured
value (whether it is energy, temperature or any other physical
phenomena, really) and CPUs, so it is quite specific (too specific?)

8<-------------------------------------------
TRACE_EVENT(cpus_environment,
	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
	TP_ARGS(cpus, value, unit),

	TP_STRUCT__entry(
		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
		__field(	long long,	value)
		__field(	char,		unit)
	),

	TP_fast_assign(
		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
		__entry->value = value;
		__entry->unit = unit;
	),

	TP_printk("cpus %s %lld[%c]",
		__print_cpumask((struct cpumask *)__entry->cpus),
		__entry->value, __entry->unit)
);
8<-------------------------------------------

And the equivalent ftrace message is:

<...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]

It's a cpumask, not just single cpu id, because the sensor may measure
the value per set of CPUs, eg. a temperature of the whole silicon die
(so all the cores) or an energy consumed by a subset of cores (this
is my particular use case - two meters monitor a cluster of two
processors and a cluster of three processors, all working as a SMP
system).

Of course the cpus __array could be actually a special __cpumask field
type (I've just hacked the __print_cpumask so far). And I've just
realised that the unit field should actually be a string to allow unit
prefixes to be specified (the above should obviously be "34361[mC]"
not "[C]"). Also - excuse the "cpus_environment" name - this was the
best I was able to come up with at the time and I'm eager to accept
any alternative suggestions :-)



=== Option 2: hwmon perf PMU ===

Although the trace event makes it possible to obtain interesting
information using perf, the user wouldn't be able to treat the
energy meter as a normal data source. In particular there would
be no way of creating a group of events consisting eg. of a
"normal" leader (eg. cache miss event) triggering energy meter
read. The only way to get this done is to implement a perf PMU
backend providing "environmental data" to the user.

= High-level hwmon API and PMU =

Current hwmon subsystem does not provide any abstraction for the
measured values and requires particular drivers to create specified
sysfs attributes than used by userspace libsensors. This makes
the framework ultimately flexible and ultimately hard to access
from within the kernel...

What could be done here is some (simple) API to register the
measured values with the hwmon core which would result in creating
equivalent sysfs attributes automagically, but also allow a
in-kernel API for values enumeration and access. That way the core
could also register a "hwmon PMU" with the perf framework providing
data from all "compliant" drivers.

= A driver-specific PMU =

Of course a particular driver could register its own perf PMU on its
own. It's certainly an option, just very suboptimal in my opinion.
Or maybe not? Maybe the task is so specialized that it makes sense?



=== Option 3: CPU power(energy) monitoring framework ===

And last but not least, maybe the problem deserves some dedicated
API? Something that would take providers and feed their data into
interested parties, in particular a perf PMU implementation and
cpufreq governors?

Maybe it could be an extension to the thermal framework? It already
gives some meaning to a physical phenomena. Adding other, related ones
like energy, and relating it to cpu cores could make some sense.



I've tried to gather all potentially interested audience in the To:
list, but if I missed anyone - please, do let them (and/or me) know.

Best regards and thanks for participation in the discussion!

Pawel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 17:30 ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-23 17:30 UTC (permalink / raw)
  To: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet,
	Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet
  Cc: linux-kernel, linux-arm-kernel, lm-sensors, linaro-dev

Greetings All,

More and more of people are getting interested in the subject of power
(energy) consumption monitoring. We have some external tools like
"battery simulators", energy probes etc., but some targets can measure
their power usage on their own.

Traditionally such data should be exposed to the user via hwmon sysfs
interface, and that's exactly what I did for "my" platform - I have
a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
enough to draw pretty graphs in userspace. Everyone was happy...

Now I am getting new requests to do more with this data. In particular
I'm asked how to add such information to ftrace/perf output. The second
most frequent request is about providing it to a "energy aware"
cpufreq governor.

I've came up with three (non-mutually exclusive) options. I will
appreciate any other ideas and comments (including "it makes not sense
whatsoever" ones, with justification). Of course I am more than willing
to spend time on prototyping anything that seems reasonable and propose
patches.



== Option 1: Trace event =
This seems to be the "cheapest" option. Simply defining a trace event
that can be generated by a hwmon (or any other) driver makes the
interesting data immediately available to any ftrace/perf user. Of
course it doesn't really help with the cpufreq case, but seems to be
a good place to start with.

The question is how to define it... I've came up with two prototypes:

= Generic hwmon trace event 
This one allows any driver to generate a trace event whenever any
"hwmon attribute" (measured value) gets updated. The rate at which the
updates happen can be controlled by already existing "update_interval"
attribute.

8<-------------------------------------------
TRACE_EVENT(hwmon_attr_update,
	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
	TP_ARGS(dev, attr, input),

	TP_STRUCT__entry(
		__string(       dev,		dev_name(dev))
		__string(	attr,		attr->name)
		__field(	long long,	input)
	),

	TP_fast_assign(
		__assign_str(dev, dev_name(dev));
		__assign_str(attr, attr->name);
		__entry->input = input;
	),

	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
);
8<-------------------------------------------

It generates such ftrace message:

<...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361

One issue with this is that some external knowledge is required to
relate a number to a processor core. Or maybe it's not an issue at all
because it should be left for the user(space)?

= CPU power/energy/temperature trace event 
This one is designed to emphasize the relation between the measured
value (whether it is energy, temperature or any other physical
phenomena, really) and CPUs, so it is quite specific (too specific?)

8<-------------------------------------------
TRACE_EVENT(cpus_environment,
	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
	TP_ARGS(cpus, value, unit),

	TP_STRUCT__entry(
		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
		__field(	long long,	value)
		__field(	char,		unit)
	),

	TP_fast_assign(
		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
		__entry->value = value;
		__entry->unit = unit;
	),

	TP_printk("cpus %s %lld[%c]",
		__print_cpumask((struct cpumask *)__entry->cpus),
		__entry->value, __entry->unit)
);
8<-------------------------------------------

And the equivalent ftrace message is:

<...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]

It's a cpumask, not just single cpu id, because the sensor may measure
the value per set of CPUs, eg. a temperature of the whole silicon die
(so all the cores) or an energy consumed by a subset of cores (this
is my particular use case - two meters monitor a cluster of two
processors and a cluster of three processors, all working as a SMP
system).

Of course the cpus __array could be actually a special __cpumask field
type (I've just hacked the __print_cpumask so far). And I've just
realised that the unit field should actually be a string to allow unit
prefixes to be specified (the above should obviously be "34361[mC]"
not "[C]"). Also - excuse the "cpus_environment" name - this was the
best I was able to come up with at the time and I'm eager to accept
any alternative suggestions :-)



== Option 2: hwmon perf PMU =
Although the trace event makes it possible to obtain interesting
information using perf, the user wouldn't be able to treat the
energy meter as a normal data source. In particular there would
be no way of creating a group of events consisting eg. of a
"normal" leader (eg. cache miss event) triggering energy meter
read. The only way to get this done is to implement a perf PMU
backend providing "environmental data" to the user.

= High-level hwmon API and PMU 
Current hwmon subsystem does not provide any abstraction for the
measured values and requires particular drivers to create specified
sysfs attributes than used by userspace libsensors. This makes
the framework ultimately flexible and ultimately hard to access
from within the kernel...

What could be done here is some (simple) API to register the
measured values with the hwmon core which would result in creating
equivalent sysfs attributes automagically, but also allow a
in-kernel API for values enumeration and access. That way the core
could also register a "hwmon PMU" with the perf framework providing
data from all "compliant" drivers.

= A driver-specific PMU 
Of course a particular driver could register its own perf PMU on its
own. It's certainly an option, just very suboptimal in my opinion.
Or maybe not? Maybe the task is so specialized that it makes sense?



== Option 3: CPU power(energy) monitoring framework =
And last but not least, maybe the problem deserves some dedicated
API? Something that would take providers and feed their data into
interested parties, in particular a perf PMU implementation and
cpufreq governors?

Maybe it could be an extension to the thermal framework? It already
gives some meaning to a physical phenomena. Adding other, related ones
like energy, and relating it to cpu cores could make some sense.



I've tried to gather all potentially interested audience in the To:
list, but if I missed anyone - please, do let them (and/or me) know.

Best regards and thanks for participation in the discussion!

Pawel




_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 17:30 ` Pawel Moll
  (?)
@ 2012-10-23 17:43   ` Steven Rostedt
  -1 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2012-10-23 17:43 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Tue, 2012-10-23 at 18:30 +0100, Pawel Moll wrote:

> 
> === Option 1: Trace event ===
> 
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
> 
> The question is how to define it... I've came up with two prototypes:
> 
> = Generic hwmon trace event =
> 
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
> 
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
> 
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
> 
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
> 
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
> 
> It generates such ftrace message:
> 
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> 
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?

If the external knowledge can be characterized in a userspace tool with
the given data here, I see no issues with this.

> 
> = CPU power/energy/temperature trace event =
> 
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
> 
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
> 
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
> 
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));

Copying the entire cpumask seems like overkill. Especially when you have
4096 CPU machines.

> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
> 
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
> 
> And the equivalent ftrace message is:
> 
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
> 
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
> 
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)

Perhaps making a field that can be a subset of cpus may be better. That
way we don't waste the ring buffer with lots of zeros. I'm guessing that
it will only be a group of cpus, and not a scattered list? Of course,
I've seen boxes where the cpu numbers went from core to core. That is,
cpu 0 was on core 1, cpu 1 was on core 2, and then it would repeat. 
cpu 8 was on core 1, cpu 9 was on core 2, etc.

But still, this could be compressed somehow.

I'll let others comment on the rest.

-- Steve



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 17:43   ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2012-10-23 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2012-10-23 at 18:30 +0100, Pawel Moll wrote:

> 
> === Option 1: Trace event ===
> 
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
> 
> The question is how to define it... I've came up with two prototypes:
> 
> = Generic hwmon trace event =
> 
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
> 
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
> 
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
> 
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
> 
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
> 
> It generates such ftrace message:
> 
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> 
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?

If the external knowledge can be characterized in a userspace tool with
the given data here, I see no issues with this.

> 
> = CPU power/energy/temperature trace event =
> 
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
> 
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
> 
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
> 
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));

Copying the entire cpumask seems like overkill. Especially when you have
4096 CPU machines.

> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
> 
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
> 
> And the equivalent ftrace message is:
> 
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
> 
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
> 
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)

Perhaps making a field that can be a subset of cpus may be better. That
way we don't waste the ring buffer with lots of zeros. I'm guessing that
it will only be a group of cpus, and not a scattered list? Of course,
I've seen boxes where the cpu numbers went from core to core. That is,
cpu 0 was on core 1, cpu 1 was on core 2, and then it would repeat. 
cpu 8 was on core 1, cpu 9 was on core 2, etc.

But still, this could be compressed somehow.

I'll let others comment on the rest.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 17:43   ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2012-10-23 17:43 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Tue, 2012-10-23 at 18:30 +0100, Pawel Moll wrote:

> 
> == Option 1: Trace event => 
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
> 
> The question is how to define it... I've came up with two prototypes:
> 
> = Generic hwmon trace event > 
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
> 
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
> 
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
> 
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
> 
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
> 
> It generates such ftrace message:
> 
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> 
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?

If the external knowledge can be characterized in a userspace tool with
the given data here, I see no issues with this.

> 
> = CPU power/energy/temperature trace event > 
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
> 
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
> 
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
> 
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));

Copying the entire cpumask seems like overkill. Especially when you have
4096 CPU machines.

> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
> 
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
> 
> And the equivalent ftrace message is:
> 
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
> 
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
> 
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)

Perhaps making a field that can be a subset of cpus may be better. That
way we don't waste the ring buffer with lots of zeros. I'm guessing that
it will only be a group of cpus, and not a scattered list? Of course,
I've seen boxes where the cpu numbers went from core to core. That is,
cpu 0 was on core 1, cpu 1 was on core 2, and then it would repeat. 
cpu 8 was on core 1, cpu 9 was on core 2, etc.

But still, this could be compressed somehow.

I'll let others comment on the rest.

-- Steve



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 17:30 ` Pawel Moll
  (?)
@ 2012-10-23 18:49   ` Andy Green
  -1 siblings, 0 replies; 33+ messages in thread
From: Andy Green @ 2012-10-23 18:49 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet,
	linaro-dev, linux-kernel, linux-arm-kernel, lm-sensors

On 10/23/12 19:30, the mail apparently from Pawel Moll included:
> Greetings All,
>
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
>
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
>
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output. The second
> most frequent request is about providing it to a "energy aware"
> cpufreq governor.
>
> I've came up with three (non-mutually exclusive) options. I will
> appreciate any other ideas and comments (including "it makes not sense
> whatsoever" ones, with justification). Of course I am more than willing
> to spend time on prototyping anything that seems reasonable and propose
> patches.
>
>
>
> === Option 1: Trace event ===
>
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
>
> The question is how to define it... I've came up with two prototypes:
>
> = Generic hwmon trace event =
>
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
>
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
>
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
>
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
>
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
>
> It generates such ftrace message:
>
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
>
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?
>
> = CPU power/energy/temperature trace event =
>
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
>
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
>
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
>
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
>
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
>
> And the equivalent ftrace message is:
>
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
>
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
>
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)

A thought on that... from an SoC perspective there are other interesting 
power rails than go to just the CPU core.  For example DDR power and 
rails involved with other IP units on the SoC such as 3D graphics unit. 
  So tying one number to specifically a CPU core does not sound like 
it's enough.

> === Option 2: hwmon perf PMU ===
>
> Although the trace event makes it possible to obtain interesting
> information using perf, the user wouldn't be able to treat the
> energy meter as a normal data source. In particular there would
> be no way of creating a group of events consisting eg. of a
> "normal" leader (eg. cache miss event) triggering energy meter
> read. The only way to get this done is to implement a perf PMU
> backend providing "environmental data" to the user.

In terms of like perf top don't think it'll be possible to know when to 
sample the acquisition hardware to tie the result to a particular line 
of code, even if it had the bandwidth to do that.  Power readings are 
likely to lag activities on the cpu somewhat, considering sub-ns core 
clocks, especially if it's actually measuring the input side of a regulator.

> = High-level hwmon API and PMU =
>
> Current hwmon subsystem does not provide any abstraction for the
> measured values and requires particular drivers to create specified
> sysfs attributes than used by userspace libsensors. This makes
> the framework ultimately flexible and ultimately hard to access
> from within the kernel...
>
> What could be done here is some (simple) API to register the
> measured values with the hwmon core which would result in creating
> equivalent sysfs attributes automagically, but also allow a
> in-kernel API for values enumeration and access. That way the core
> could also register a "hwmon PMU" with the perf framework providing
> data from all "compliant" drivers.
>
> = A driver-specific PMU =
>
> Of course a particular driver could register its own perf PMU on its
> own. It's certainly an option, just very suboptimal in my opinion.
> Or maybe not? Maybe the task is so specialized that it makes sense?
>
>
>
> === Option 3: CPU power(energy) monitoring framework ===
>
> And last but not least, maybe the problem deserves some dedicated
> API? Something that would take providers and feed their data into
> interested parties, in particular a perf PMU implementation and
> cpufreq governors?
>
> Maybe it could be an extension to the thermal framework? It already
> gives some meaning to a physical phenomena. Adding other, related ones
> like energy, and relating it to cpu cores could make some sense.

If you turn the problem upside down to solve the representation question 
first, maybe there's a way forward defining the "power tree" in terms of 
regulators, and then adding something in struct regulator that spams 
readers with timestamped results if the regulator has a power monitoring 
capability.

Then you can map the regulators in the power tree to real devices by the 
names or the supply stuff.  Just a thought.

-Andy

-- 
Andy Green | TI Landing Team Leader
Linaro.org │ Open source software for ARM SoCs | Follow Linaro
http://facebook.com/pages/Linaro/155974581091106  - 
http://twitter.com/#!/linaroorg - http://linaro.org/linaro-blog

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 18:49   ` Andy Green
  0 siblings, 0 replies; 33+ messages in thread
From: Andy Green @ 2012-10-23 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/23/12 19:30, the mail apparently from Pawel Moll included:
> Greetings All,
>
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
>
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
>
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output. The second
> most frequent request is about providing it to a "energy aware"
> cpufreq governor.
>
> I've came up with three (non-mutually exclusive) options. I will
> appreciate any other ideas and comments (including "it makes not sense
> whatsoever" ones, with justification). Of course I am more than willing
> to spend time on prototyping anything that seems reasonable and propose
> patches.
>
>
>
> === Option 1: Trace event ===
>
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
>
> The question is how to define it... I've came up with two prototypes:
>
> = Generic hwmon trace event =
>
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
>
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
>
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
>
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
>
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
>
> It generates such ftrace message:
>
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
>
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?
>
> = CPU power/energy/temperature trace event =
>
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
>
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
>
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
>
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
>
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
>
> And the equivalent ftrace message is:
>
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
>
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
>
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)

A thought on that... from an SoC perspective there are other interesting 
power rails than go to just the CPU core.  For example DDR power and 
rails involved with other IP units on the SoC such as 3D graphics unit. 
  So tying one number to specifically a CPU core does not sound like 
it's enough.

> === Option 2: hwmon perf PMU ===
>
> Although the trace event makes it possible to obtain interesting
> information using perf, the user wouldn't be able to treat the
> energy meter as a normal data source. In particular there would
> be no way of creating a group of events consisting eg. of a
> "normal" leader (eg. cache miss event) triggering energy meter
> read. The only way to get this done is to implement a perf PMU
> backend providing "environmental data" to the user.

In terms of like perf top don't think it'll be possible to know when to 
sample the acquisition hardware to tie the result to a particular line 
of code, even if it had the bandwidth to do that.  Power readings are 
likely to lag activities on the cpu somewhat, considering sub-ns core 
clocks, especially if it's actually measuring the input side of a regulator.

> = High-level hwmon API and PMU =
>
> Current hwmon subsystem does not provide any abstraction for the
> measured values and requires particular drivers to create specified
> sysfs attributes than used by userspace libsensors. This makes
> the framework ultimately flexible and ultimately hard to access
> from within the kernel...
>
> What could be done here is some (simple) API to register the
> measured values with the hwmon core which would result in creating
> equivalent sysfs attributes automagically, but also allow a
> in-kernel API for values enumeration and access. That way the core
> could also register a "hwmon PMU" with the perf framework providing
> data from all "compliant" drivers.
>
> = A driver-specific PMU =
>
> Of course a particular driver could register its own perf PMU on its
> own. It's certainly an option, just very suboptimal in my opinion.
> Or maybe not? Maybe the task is so specialized that it makes sense?
>
>
>
> === Option 3: CPU power(energy) monitoring framework ===
>
> And last but not least, maybe the problem deserves some dedicated
> API? Something that would take providers and feed their data into
> interested parties, in particular a perf PMU implementation and
> cpufreq governors?
>
> Maybe it could be an extension to the thermal framework? It already
> gives some meaning to a physical phenomena. Adding other, related ones
> like energy, and relating it to cpu cores could make some sense.

If you turn the problem upside down to solve the representation question 
first, maybe there's a way forward defining the "power tree" in terms of 
regulators, and then adding something in struct regulator that spams 
readers with timestamped results if the regulator has a power monitoring 
capability.

Then you can map the regulators in the power tree to real devices by the 
names or the supply stuff.  Just a thought.

-Andy

-- 
Andy Green | TI Landing Team Leader
Linaro.org ? Open source software for ARM SoCs | Follow Linaro
http://facebook.com/pages/Linaro/155974581091106  - 
http://twitter.com/#!/linaroorg - http://linaro.org/linaro-blog

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 18:49   ` Andy Green
  0 siblings, 0 replies; 33+ messages in thread
From: Andy Green @ 2012-10-23 18:49 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet,
	linaro-dev, linux-kernel, linux-arm-kernel, lm-sensors

T24gMTAvMjMvMTIgMTk6MzAsIHRoZSBtYWlsIGFwcGFyZW50bHkgZnJvbSBQYXdlbCBNb2xsIGlu
Y2x1ZGVkOgo+IEdyZWV0aW5ncyBBbGwsCj4KPiBNb3JlIGFuZCBtb3JlIG9mIHBlb3BsZSBhcmUg
Z2V0dGluZyBpbnRlcmVzdGVkIGluIHRoZSBzdWJqZWN0IG9mIHBvd2VyCj4gKGVuZXJneSkgY29u
c3VtcHRpb24gbW9uaXRvcmluZy4gV2UgaGF2ZSBzb21lIGV4dGVybmFsIHRvb2xzIGxpa2UKPiAi
YmF0dGVyeSBzaW11bGF0b3JzIiwgZW5lcmd5IHByb2JlcyBldGMuLCBidXQgc29tZSB0YXJnZXRz
IGNhbiBtZWFzdXJlCj4gdGhlaXIgcG93ZXIgdXNhZ2Ugb24gdGhlaXIgb3duLgo+Cj4gVHJhZGl0
aW9uYWxseSBzdWNoIGRhdGEgc2hvdWxkIGJlIGV4cG9zZWQgdG8gdGhlIHVzZXIgdmlhIGh3bW9u
IHN5c2ZzCj4gaW50ZXJmYWNlLCBhbmQgdGhhdCdzIGV4YWN0bHkgd2hhdCBJIGRpZCBmb3IgIm15
IiBwbGF0Zm9ybSAtIEkgaGF2ZQo+IGEgL3N5cy9jbGFzcy9od21vbi9od21vbiovZGV2aWNlL2Vu
ZXJneSpfaW5wdXQgYW5kIHRoaXMgd2FzIGdvb2QKPiBlbm91Z2ggdG8gZHJhdyBwcmV0dHkgZ3Jh
cGhzIGluIHVzZXJzcGFjZS4gRXZlcnlvbmUgd2FzIGhhcHB5Li4uCj4KPiBOb3cgSSBhbSBnZXR0
aW5nIG5ldyByZXF1ZXN0cyB0byBkbyBtb3JlIHdpdGggdGhpcyBkYXRhLiBJbiBwYXJ0aWN1bGFy
Cj4gSSdtIGFza2VkIGhvdyB0byBhZGQgc3VjaCBpbmZvcm1hdGlvbiB0byBmdHJhY2UvcGVyZiBv
dXRwdXQuIFRoZSBzZWNvbmQKPiBtb3N0IGZyZXF1ZW50IHJlcXVlc3QgaXMgYWJvdXQgcHJvdmlk
aW5nIGl0IHRvIGEgImVuZXJneSBhd2FyZSIKPiBjcHVmcmVxIGdvdmVybm9yLgo+Cj4gSSd2ZSBj
YW1lIHVwIHdpdGggdGhyZWUgKG5vbi1tdXR1YWxseSBleGNsdXNpdmUpIG9wdGlvbnMuIEkgd2ls
bAo+IGFwcHJlY2lhdGUgYW55IG90aGVyIGlkZWFzIGFuZCBjb21tZW50cyAoaW5jbHVkaW5nICJp
dCBtYWtlcyBub3Qgc2Vuc2UKPiB3aGF0c29ldmVyIiBvbmVzLCB3aXRoIGp1c3RpZmljYXRpb24p
LiBPZiBjb3Vyc2UgSSBhbSBtb3JlIHRoYW4gd2lsbGluZwo+IHRvIHNwZW5kIHRpbWUgb24gcHJv
dG90eXBpbmcgYW55dGhpbmcgdGhhdCBzZWVtcyByZWFzb25hYmxlIGFuZCBwcm9wb3NlCj4gcGF0
Y2hlcy4KPgo+Cj4KPiA9PT0gT3B0aW9uIDE6IFRyYWNlIGV2ZW50ID09PQo+Cj4gVGhpcyBzZWVt
cyB0byBiZSB0aGUgImNoZWFwZXN0IiBvcHRpb24uIFNpbXBseSBkZWZpbmluZyBhIHRyYWNlIGV2
ZW50Cj4gdGhhdCBjYW4gYmUgZ2VuZXJhdGVkIGJ5IGEgaHdtb24gKG9yIGFueSBvdGhlcikgZHJp
dmVyIG1ha2VzIHRoZQo+IGludGVyZXN0aW5nIGRhdGEgaW1tZWRpYXRlbHkgYXZhaWxhYmxlIHRv
IGFueSBmdHJhY2UvcGVyZiB1c2VyLiBPZgo+IGNvdXJzZSBpdCBkb2Vzbid0IHJlYWxseSBoZWxw
IHdpdGggdGhlIGNwdWZyZXEgY2FzZSwgYnV0IHNlZW1zIHRvIGJlCj4gYSBnb29kIHBsYWNlIHRv
IHN0YXJ0IHdpdGguCj4KPiBUaGUgcXVlc3Rpb24gaXMgaG93IHRvIGRlZmluZSBpdC4uLiBJJ3Zl
IGNhbWUgdXAgd2l0aCB0d28gcHJvdG90eXBlczoKPgo+ID0gR2VuZXJpYyBod21vbiB0cmFjZSBl
dmVudCA9Cj4KPiBUaGlzIG9uZSBhbGxvd3MgYW55IGRyaXZlciB0byBnZW5lcmF0ZSBhIHRyYWNl
IGV2ZW50IHdoZW5ldmVyIGFueQo+ICJod21vbiBhdHRyaWJ1dGUiIChtZWFzdXJlZCB2YWx1ZSkg
Z2V0cyB1cGRhdGVkLiBUaGUgcmF0ZSBhdCB3aGljaCB0aGUKPiB1cGRhdGVzIGhhcHBlbiBjYW4g
YmUgY29udHJvbGxlZCBieSBhbHJlYWR5IGV4aXN0aW5nICJ1cGRhdGVfaW50ZXJ2YWwiCj4gYXR0
cmlidXRlLgo+Cj4gODwtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
Cj4gVFJBQ0VfRVZFTlQoaHdtb25fYXR0cl91cGRhdGUsCj4gCVRQX1BST1RPKHN0cnVjdCBkZXZp
Y2UgKmRldiwgc3RydWN0IGF0dHJpYnV0ZSAqYXR0ciwgbG9uZyBsb25nIGlucHV0KSwKPiAJVFBf
QVJHUyhkZXYsIGF0dHIsIGlucHV0KSwKPgo+IAlUUF9TVFJVQ1RfX2VudHJ5KAo+IAkJX19zdHJp
bmcoICAgICAgIGRldiwJCWRldl9uYW1lKGRldikpCj4gCQlfX3N0cmluZygJYXR0ciwJCWF0dHIt
Pm5hbWUpCj4gCQlfX2ZpZWxkKAlsb25nIGxvbmcsCWlucHV0KQo+IAkpLAo+Cj4gCVRQX2Zhc3Rf
YXNzaWduKAo+IAkJX19hc3NpZ25fc3RyKGRldiwgZGV2X25hbWUoZGV2KSk7Cj4gCQlfX2Fzc2ln
bl9zdHIoYXR0ciwgYXR0ci0+bmFtZSk7Cj4gCQlfX2VudHJ5LT5pbnB1dCA9IGlucHV0Owo+IAkp
LAo+Cj4gCVRQX3ByaW50aygiJXMgJXMgJWxsZCIsIF9fZ2V0X3N0cihkZXYpLCBfX2dldF9zdHIo
YXR0ciksIF9fZW50cnktPmlucHV0KQo+ICk7Cj4gODwtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tCj4KPiBJdCBnZW5lcmF0ZXMgc3VjaCBmdHJhY2UgbWVzc2FnZToK
Pgo+IDwuLi4+MjEyLjY3MzEyNjogaHdtb25fYXR0cl91cGRhdGU6IGh3bW9uNCB0ZW1wMV9pbnB1
dCAzNDM2MQo+Cj4gT25lIGlzc3VlIHdpdGggdGhpcyBpcyB0aGF0IHNvbWUgZXh0ZXJuYWwga25v
d2xlZGdlIGlzIHJlcXVpcmVkIHRvCj4gcmVsYXRlIGEgbnVtYmVyIHRvIGEgcHJvY2Vzc29yIGNv
cmUuIE9yIG1heWJlIGl0J3Mgbm90IGFuIGlzc3VlIGF0IGFsbAo+IGJlY2F1c2UgaXQgc2hvdWxk
IGJlIGxlZnQgZm9yIHRoZSB1c2VyKHNwYWNlKT8KPgo+ID0gQ1BVIHBvd2VyL2VuZXJneS90ZW1w
ZXJhdHVyZSB0cmFjZSBldmVudCA9Cj4KPiBUaGlzIG9uZSBpcyBkZXNpZ25lZCB0byBlbXBoYXNp
emUgdGhlIHJlbGF0aW9uIGJldHdlZW4gdGhlIG1lYXN1cmVkCj4gdmFsdWUgKHdoZXRoZXIgaXQg
aXMgZW5lcmd5LCB0ZW1wZXJhdHVyZSBvciBhbnkgb3RoZXIgcGh5c2ljYWwKPiBwaGVub21lbmEs
IHJlYWxseSkgYW5kIENQVXMsIHNvIGl0IGlzIHF1aXRlIHNwZWNpZmljICh0b28gc3BlY2lmaWM/
KQo+Cj4gODwtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCj4gVFJB
Q0VfRVZFTlQoY3B1c19lbnZpcm9ubWVudCwKPiAJVFBfUFJPVE8oY29uc3Qgc3RydWN0IGNwdW1h
c2sgKmNwdXMsIGxvbmcgbG9uZyB2YWx1ZSwgY2hhciB1bml0KSwKPiAJVFBfQVJHUyhjcHVzLCB2
YWx1ZSwgdW5pdCksCj4KPiAJVFBfU1RSVUNUX19lbnRyeSgKPiAJCV9fYXJyYXkoCXVuc2lnbmVk
IGNoYXIsCWNwdXMsCXNpemVvZihzdHJ1Y3QgY3B1bWFzaykpCj4gCQlfX2ZpZWxkKAlsb25nIGxv
bmcsCXZhbHVlKQo+IAkJX19maWVsZCgJY2hhciwJCXVuaXQpCj4gCSksCj4KPiAJVFBfZmFzdF9h
c3NpZ24oCj4gCQltZW1jcHkoX19lbnRyeS0+Y3B1cywgY3B1cywgc2l6ZW9mKHN0cnVjdCBjcHVt
YXNrKSk7Cj4gCQlfX2VudHJ5LT52YWx1ZSA9IHZhbHVlOwo+IAkJX19lbnRyeS0+dW5pdCA9IHVu
aXQ7Cj4gCSksCj4KPiAJVFBfcHJpbnRrKCJjcHVzICVzICVsbGRbJWNdIiwKPiAJCV9fcHJpbnRf
Y3B1bWFzaygoc3RydWN0IGNwdW1hc2sgKilfX2VudHJ5LT5jcHVzKSwKPiAJCV9fZW50cnktPnZh
bHVlLCBfX2VudHJ5LT51bml0KQo+ICk7Cj4gODwtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tCj4KPiBBbmQgdGhlIGVxdWl2YWxlbnQgZnRyYWNlIG1lc3NhZ2UgaXM6
Cj4KPiA8Li4uPjEyNy4wNjMxMDc6IGNwdXNfZW52aXJvbm1lbnQ6IGNwdXMgMCwxLDIsMyAzNDM2
MVtDXQo+Cj4gSXQncyBhIGNwdW1hc2ssIG5vdCBqdXN0IHNpbmdsZSBjcHUgaWQsIGJlY2F1c2Ug
dGhlIHNlbnNvciBtYXkgbWVhc3VyZQo+IHRoZSB2YWx1ZSBwZXIgc2V0IG9mIENQVXMsIGVnLiBh
IHRlbXBlcmF0dXJlIG9mIHRoZSB3aG9sZSBzaWxpY29uIGRpZQo+IChzbyBhbGwgdGhlIGNvcmVz
KSBvciBhbiBlbmVyZ3kgY29uc3VtZWQgYnkgYSBzdWJzZXQgb2YgY29yZXMgKHRoaXMKPiBpcyBt
eSBwYXJ0aWN1bGFyIHVzZSBjYXNlIC0gdHdvIG1ldGVycyBtb25pdG9yIGEgY2x1c3RlciBvZiB0
d28KPiBwcm9jZXNzb3JzIGFuZCBhIGNsdXN0ZXIgb2YgdGhyZWUgcHJvY2Vzc29ycywgYWxsIHdv
cmtpbmcgYXMgYSBTTVAKPiBzeXN0ZW0pLgo+Cj4gT2YgY291cnNlIHRoZSBjcHVzIF9fYXJyYXkg
Y291bGQgYmUgYWN0dWFsbHkgYSBzcGVjaWFsIF9fY3B1bWFzayBmaWVsZAo+IHR5cGUgKEkndmUg
anVzdCBoYWNrZWQgdGhlIF9fcHJpbnRfY3B1bWFzayBzbyBmYXIpLiBBbmQgSSd2ZSBqdXN0Cj4g
cmVhbGlzZWQgdGhhdCB0aGUgdW5pdCBmaWVsZCBzaG91bGQgYWN0dWFsbHkgYmUgYSBzdHJpbmcg
dG8gYWxsb3cgdW5pdAo+IHByZWZpeGVzIHRvIGJlIHNwZWNpZmllZCAodGhlIGFib3ZlIHNob3Vs
ZCBvYnZpb3VzbHkgYmUgIjM0MzYxW21DXSIKPiBub3QgIltDXSIpLiBBbHNvIC0gZXhjdXNlIHRo
ZSAiY3B1c19lbnZpcm9ubWVudCIgbmFtZSAtIHRoaXMgd2FzIHRoZQo+IGJlc3QgSSB3YXMgYWJs
ZSB0byBjb21lIHVwIHdpdGggYXQgdGhlIHRpbWUgYW5kIEknbSBlYWdlciB0byBhY2NlcHQKPiBh
bnkgYWx0ZXJuYXRpdmUgc3VnZ2VzdGlvbnMgOi0pCgpBIHRob3VnaHQgb24gdGhhdC4uLiBmcm9t
IGFuIFNvQyBwZXJzcGVjdGl2ZSB0aGVyZSBhcmUgb3RoZXIgaW50ZXJlc3RpbmcgCnBvd2VyIHJh
aWxzIHRoYW4gZ28gdG8ganVzdCB0aGUgQ1BVIGNvcmUuICBGb3IgZXhhbXBsZSBERFIgcG93ZXIg
YW5kIApyYWlscyBpbnZvbHZlZCB3aXRoIG90aGVyIElQIHVuaXRzIG9uIHRoZSBTb0Mgc3VjaCBh
cyAzRCBncmFwaGljcyB1bml0LiAKICBTbyB0eWluZyBvbmUgbnVtYmVyIHRvIHNwZWNpZmljYWxs
eSBhIENQVSBjb3JlIGRvZXMgbm90IHNvdW5kIGxpa2UgCml0J3MgZW5vdWdoLgoKPiA9PT0gT3B0
aW9uIDI6IGh3bW9uIHBlcmYgUE1VID09PQo+Cj4gQWx0aG91Z2ggdGhlIHRyYWNlIGV2ZW50IG1h
a2VzIGl0IHBvc3NpYmxlIHRvIG9idGFpbiBpbnRlcmVzdGluZwo+IGluZm9ybWF0aW9uIHVzaW5n
IHBlcmYsIHRoZSB1c2VyIHdvdWxkbid0IGJlIGFibGUgdG8gdHJlYXQgdGhlCj4gZW5lcmd5IG1l
dGVyIGFzIGEgbm9ybWFsIGRhdGEgc291cmNlLiBJbiBwYXJ0aWN1bGFyIHRoZXJlIHdvdWxkCj4g
YmUgbm8gd2F5IG9mIGNyZWF0aW5nIGEgZ3JvdXAgb2YgZXZlbnRzIGNvbnNpc3RpbmcgZWcuIG9m
IGEKPiAibm9ybWFsIiBsZWFkZXIgKGVnLiBjYWNoZSBtaXNzIGV2ZW50KSB0cmlnZ2VyaW5nIGVu
ZXJneSBtZXRlcgo+IHJlYWQuIFRoZSBvbmx5IHdheSB0byBnZXQgdGhpcyBkb25lIGlzIHRvIGlt
cGxlbWVudCBhIHBlcmYgUE1VCj4gYmFja2VuZCBwcm92aWRpbmcgImVudmlyb25tZW50YWwgZGF0
YSIgdG8gdGhlIHVzZXIuCgpJbiB0ZXJtcyBvZiBsaWtlIHBlcmYgdG9wIGRvbid0IHRoaW5rIGl0
J2xsIGJlIHBvc3NpYmxlIHRvIGtub3cgd2hlbiB0byAKc2FtcGxlIHRoZSBhY3F1aXNpdGlvbiBo
YXJkd2FyZSB0byB0aWUgdGhlIHJlc3VsdCB0byBhIHBhcnRpY3VsYXIgbGluZSAKb2YgY29kZSwg
ZXZlbiBpZiBpdCBoYWQgdGhlIGJhbmR3aWR0aCB0byBkbyB0aGF0LiAgUG93ZXIgcmVhZGluZ3Mg
YXJlIApsaWtlbHkgdG8gbGFnIGFjdGl2aXRpZXMgb24gdGhlIGNwdSBzb21ld2hhdCwgY29uc2lk
ZXJpbmcgc3ViLW5zIGNvcmUgCmNsb2NrcywgZXNwZWNpYWxseSBpZiBpdCdzIGFjdHVhbGx5IG1l
YXN1cmluZyB0aGUgaW5wdXQgc2lkZSBvZiBhIHJlZ3VsYXRvci4KCj4gPSBIaWdoLWxldmVsIGh3
bW9uIEFQSSBhbmQgUE1VID0KPgo+IEN1cnJlbnQgaHdtb24gc3Vic3lzdGVtIGRvZXMgbm90IHBy
b3ZpZGUgYW55IGFic3RyYWN0aW9uIGZvciB0aGUKPiBtZWFzdXJlZCB2YWx1ZXMgYW5kIHJlcXVp
cmVzIHBhcnRpY3VsYXIgZHJpdmVycyB0byBjcmVhdGUgc3BlY2lmaWVkCj4gc3lzZnMgYXR0cmli
dXRlcyB0aGFuIHVzZWQgYnkgdXNlcnNwYWNlIGxpYnNlbnNvcnMuIFRoaXMgbWFrZXMKPiB0aGUg
ZnJhbWV3b3JrIHVsdGltYXRlbHkgZmxleGlibGUgYW5kIHVsdGltYXRlbHkgaGFyZCB0byBhY2Nl
c3MKPiBmcm9tIHdpdGhpbiB0aGUga2VybmVsLi4uCj4KPiBXaGF0IGNvdWxkIGJlIGRvbmUgaGVy
ZSBpcyBzb21lIChzaW1wbGUpIEFQSSB0byByZWdpc3RlciB0aGUKPiBtZWFzdXJlZCB2YWx1ZXMg
d2l0aCB0aGUgaHdtb24gY29yZSB3aGljaCB3b3VsZCByZXN1bHQgaW4gY3JlYXRpbmcKPiBlcXVp
dmFsZW50IHN5c2ZzIGF0dHJpYnV0ZXMgYXV0b21hZ2ljYWxseSwgYnV0IGFsc28gYWxsb3cgYQo+
IGluLWtlcm5lbCBBUEkgZm9yIHZhbHVlcyBlbnVtZXJhdGlvbiBhbmQgYWNjZXNzLiBUaGF0IHdh
eSB0aGUgY29yZQo+IGNvdWxkIGFsc28gcmVnaXN0ZXIgYSAiaHdtb24gUE1VIiB3aXRoIHRoZSBw
ZXJmIGZyYW1ld29yayBwcm92aWRpbmcKPiBkYXRhIGZyb20gYWxsICJjb21wbGlhbnQiIGRyaXZl
cnMuCj4KPiA9IEEgZHJpdmVyLXNwZWNpZmljIFBNVSA9Cj4KPiBPZiBjb3Vyc2UgYSBwYXJ0aWN1
bGFyIGRyaXZlciBjb3VsZCByZWdpc3RlciBpdHMgb3duIHBlcmYgUE1VIG9uIGl0cwo+IG93bi4g
SXQncyBjZXJ0YWlubHkgYW4gb3B0aW9uLCBqdXN0IHZlcnkgc3Vib3B0aW1hbCBpbiBteSBvcGlu
aW9uLgo+IE9yIG1heWJlIG5vdD8gTWF5YmUgdGhlIHRhc2sgaXMgc28gc3BlY2lhbGl6ZWQgdGhh
dCBpdCBtYWtlcyBzZW5zZT8KPgo+Cj4KPiA9PT0gT3B0aW9uIDM6IENQVSBwb3dlcihlbmVyZ3kp
IG1vbml0b3JpbmcgZnJhbWV3b3JrID09PQo+Cj4gQW5kIGxhc3QgYnV0IG5vdCBsZWFzdCwgbWF5
YmUgdGhlIHByb2JsZW0gZGVzZXJ2ZXMgc29tZSBkZWRpY2F0ZWQKPiBBUEk/IFNvbWV0aGluZyB0
aGF0IHdvdWxkIHRha2UgcHJvdmlkZXJzIGFuZCBmZWVkIHRoZWlyIGRhdGEgaW50bwo+IGludGVy
ZXN0ZWQgcGFydGllcywgaW4gcGFydGljdWxhciBhIHBlcmYgUE1VIGltcGxlbWVudGF0aW9uIGFu
ZAo+IGNwdWZyZXEgZ292ZXJub3JzPwo+Cj4gTWF5YmUgaXQgY291bGQgYmUgYW4gZXh0ZW5zaW9u
IHRvIHRoZSB0aGVybWFsIGZyYW1ld29yaz8gSXQgYWxyZWFkeQo+IGdpdmVzIHNvbWUgbWVhbmlu
ZyB0byBhIHBoeXNpY2FsIHBoZW5vbWVuYS4gQWRkaW5nIG90aGVyLCByZWxhdGVkIG9uZXMKPiBs
aWtlIGVuZXJneSwgYW5kIHJlbGF0aW5nIGl0IHRvIGNwdSBjb3JlcyBjb3VsZCBtYWtlIHNvbWUg
c2Vuc2UuCgpJZiB5b3UgdHVybiB0aGUgcHJvYmxlbSB1cHNpZGUgZG93biB0byBzb2x2ZSB0aGUg
cmVwcmVzZW50YXRpb24gcXVlc3Rpb24gCmZpcnN0LCBtYXliZSB0aGVyZSdzIGEgd2F5IGZvcndh
cmQgZGVmaW5pbmcgdGhlICJwb3dlciB0cmVlIiBpbiB0ZXJtcyBvZiAKcmVndWxhdG9ycywgYW5k
IHRoZW4gYWRkaW5nIHNvbWV0aGluZyBpbiBzdHJ1Y3QgcmVndWxhdG9yIHRoYXQgc3BhbXMgCnJl
YWRlcnMgd2l0aCB0aW1lc3RhbXBlZCByZXN1bHRzIGlmIHRoZSByZWd1bGF0b3IgaGFzIGEgcG93
ZXIgbW9uaXRvcmluZyAKY2FwYWJpbGl0eS4KClRoZW4geW91IGNhbiBtYXAgdGhlIHJlZ3VsYXRv
cnMgaW4gdGhlIHBvd2VyIHRyZWUgdG8gcmVhbCBkZXZpY2VzIGJ5IHRoZSAKbmFtZXMgb3IgdGhl
IHN1cHBseSBzdHVmZi4gIEp1c3QgYSB0aG91Z2h0LgoKLUFuZHkKCi0tIApBbmR5IEdyZWVuIHwg
VEkgTGFuZGluZyBUZWFtIExlYWRlcgpMaW5hcm8ub3JnIOKUgiBPcGVuIHNvdXJjZSBzb2Z0d2Fy
ZSBmb3IgQVJNIFNvQ3MgfCBGb2xsb3cgTGluYXJvCmh0dHA6Ly9mYWNlYm9vay5jb20vcGFnZXMv
TGluYXJvLzE1NTk3NDU4MTA5MTEwNiAgLSAKaHR0cDovL3R3aXR0ZXIuY29tLyMhL2xpbmFyb29y
ZyAtIGh0dHA6Ly9saW5hcm8ub3JnL2xpbmFyby1ibG9nCgpfX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fXwpsbS1zZW5zb3JzIG1haWxpbmcgbGlzdApsbS1zZW5z
b3JzQGxtLXNlbnNvcnMub3JnCmh0dHA6Ly9saXN0cy5sbS1zZW5zb3JzLm9yZy9tYWlsbWFuL2xp
c3RpbmZvL2xtLXNlbnNvcnM

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 17:30 ` Pawel Moll
  (?)
@ 2012-10-23 22:02   ` Guenter Roeck
  -1 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2012-10-23 22:02 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Tue, Oct 23, 2012 at 06:30:49PM +0100, Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
Only driver supporting "energy" output so far is ibmaem, and the reported energy
is supposed to be cumulative, as in energy = power * time. Do you mean power,
possibly ?

> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output. The second
> most frequent request is about providing it to a "energy aware"
> cpufreq governor.
> 

Anything energy related would have to be along the line of "do something after a
certain amount of work has been performed", which at least at the surface does
not make much sense to me, unless you mean something along the line of a
process scheduler which schedules a process not based on time slices but based
on energy consumed, ie if you want to define a time slice not in milli-seconds
but in Joule.

If so, I would argue that a similar behavior could be achieved by varying the
duration of time slices with the current CPU speed, or simply by using cycle
count instead of time as time slice parameter. Not that I am sure if such an
approach would really be of interest for anyone. 

Or do you really mean power, not energy, such as in "reduce CPU speed if its
power consumption is above X Watt" ?

> I've came up with three (non-mutually exclusive) options. I will
> appreciate any other ideas and comments (including "it makes not sense
> whatsoever" ones, with justification). Of course I am more than willing
> to spend time on prototyping anything that seems reasonable and propose
> patches.
> 
> 
> 
> === Option 1: Trace event ===
> 
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
> 
> The question is how to define it... I've came up with two prototypes:
> 
> = Generic hwmon trace event =
> 
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
> 
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
> 
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
> 
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
> 
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
> 
> It generates such ftrace message:
> 
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> 
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?
> 
> = CPU power/energy/temperature trace event =
> 
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
> 
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
> 
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
> 
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
> 
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
> 
> And the equivalent ftrace message is:
> 
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
> 
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
> 
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)
> 
I am not sure how this would be expected to work. hwmon is, by its very nature,
a passive subsystem: It doesn't do anything unless data is explicitly requested
from it. It does not update an attribute unless that attribute is read.
That does not seem to fit well with the idea of tracing - which assumes
that some activity is happening, ultimately, all by itself, presumably
periodically. The idea to have a user space application read hwmon data only
for it to trigger trace events does not seem to be very compelling to me.

An exception is if a monitoring device suppports interrupts, and if its driver
actually implements those interrupts. This is, however, not the case for most of
the current drivers (if any), mostly because interrupt support for hardware
monitoring devices is very platform dependent and thus difficult to implement.

> 
> === Option 2: hwmon perf PMU ===
> 
> Although the trace event makes it possible to obtain interesting
> information using perf, the user wouldn't be able to treat the
> energy meter as a normal data source. In particular there would
> be no way of creating a group of events consisting eg. of a
> "normal" leader (eg. cache miss event) triggering energy meter
> read. The only way to get this done is to implement a perf PMU
> backend providing "environmental data" to the user.
> 
> = High-level hwmon API and PMU =
> 
> Current hwmon subsystem does not provide any abstraction for the
> measured values and requires particular drivers to create specified
> sysfs attributes than used by userspace libsensors. This makes
> the framework ultimately flexible and ultimately hard to access
> from within the kernel...
> 
> What could be done here is some (simple) API to register the
> measured values with the hwmon core which would result in creating
> equivalent sysfs attributes automagically, but also allow a
> in-kernel API for values enumeration and access. That way the core
> could also register a "hwmon PMU" with the perf framework providing
> data from all "compliant" drivers.
> 
> = A driver-specific PMU =
> 
> Of course a particular driver could register its own perf PMU on its
> own. It's certainly an option, just very suboptimal in my opinion.
> Or maybe not? Maybe the task is so specialized that it makes sense?
> 
We had a couple of attempts to provide an in-kernel API. Unfortunately,
the result was, at least so far, more complexity on the driver side.
So the difficulty is really to define an API which is really simple, and does
not just complicate driver development for a (presumably) rare use case.

Guenter

> 
> 
> === Option 3: CPU power(energy) monitoring framework ===
> 
> And last but not least, maybe the problem deserves some dedicated
> API? Something that would take providers and feed their data into
> interested parties, in particular a perf PMU implementation and
> cpufreq governors?
> 
> Maybe it could be an extension to the thermal framework? It already
> gives some meaning to a physical phenomena. Adding other, related ones
> like energy, and relating it to cpu cores could make some sense.
> 
> 
> 
> I've tried to gather all potentially interested audience in the To:
> list, but if I missed anyone - please, do let them (and/or me) know.
> 
> Best regards and thanks for participation in the discussion!
> 
> Pawel
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 22:02   ` Guenter Roeck
  0 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2012-10-23 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 23, 2012 at 06:30:49PM +0100, Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
Only driver supporting "energy" output so far is ibmaem, and the reported energy
is supposed to be cumulative, as in energy = power * time. Do you mean power,
possibly ?

> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output. The second
> most frequent request is about providing it to a "energy aware"
> cpufreq governor.
> 

Anything energy related would have to be along the line of "do something after a
certain amount of work has been performed", which at least at the surface does
not make much sense to me, unless you mean something along the line of a
process scheduler which schedules a process not based on time slices but based
on energy consumed, ie if you want to define a time slice not in milli-seconds
but in Joule.

If so, I would argue that a similar behavior could be achieved by varying the
duration of time slices with the current CPU speed, or simply by using cycle
count instead of time as time slice parameter. Not that I am sure if such an
approach would really be of interest for anyone. 

Or do you really mean power, not energy, such as in "reduce CPU speed if its
power consumption is above X Watt" ?

> I've came up with three (non-mutually exclusive) options. I will
> appreciate any other ideas and comments (including "it makes not sense
> whatsoever" ones, with justification). Of course I am more than willing
> to spend time on prototyping anything that seems reasonable and propose
> patches.
> 
> 
> 
> === Option 1: Trace event ===
> 
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
> 
> The question is how to define it... I've came up with two prototypes:
> 
> = Generic hwmon trace event =
> 
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
> 
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
> 
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
> 
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
> 
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
> 
> It generates such ftrace message:
> 
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> 
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?
> 
> = CPU power/energy/temperature trace event =
> 
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
> 
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
> 
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
> 
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
> 
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
> 
> And the equivalent ftrace message is:
> 
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
> 
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
> 
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)
> 
I am not sure how this would be expected to work. hwmon is, by its very nature,
a passive subsystem: It doesn't do anything unless data is explicitly requested
from it. It does not update an attribute unless that attribute is read.
That does not seem to fit well with the idea of tracing - which assumes
that some activity is happening, ultimately, all by itself, presumably
periodically. The idea to have a user space application read hwmon data only
for it to trigger trace events does not seem to be very compelling to me.

An exception is if a monitoring device suppports interrupts, and if its driver
actually implements those interrupts. This is, however, not the case for most of
the current drivers (if any), mostly because interrupt support for hardware
monitoring devices is very platform dependent and thus difficult to implement.

> 
> === Option 2: hwmon perf PMU ===
> 
> Although the trace event makes it possible to obtain interesting
> information using perf, the user wouldn't be able to treat the
> energy meter as a normal data source. In particular there would
> be no way of creating a group of events consisting eg. of a
> "normal" leader (eg. cache miss event) triggering energy meter
> read. The only way to get this done is to implement a perf PMU
> backend providing "environmental data" to the user.
> 
> = High-level hwmon API and PMU =
> 
> Current hwmon subsystem does not provide any abstraction for the
> measured values and requires particular drivers to create specified
> sysfs attributes than used by userspace libsensors. This makes
> the framework ultimately flexible and ultimately hard to access
> from within the kernel...
> 
> What could be done here is some (simple) API to register the
> measured values with the hwmon core which would result in creating
> equivalent sysfs attributes automagically, but also allow a
> in-kernel API for values enumeration and access. That way the core
> could also register a "hwmon PMU" with the perf framework providing
> data from all "compliant" drivers.
> 
> = A driver-specific PMU =
> 
> Of course a particular driver could register its own perf PMU on its
> own. It's certainly an option, just very suboptimal in my opinion.
> Or maybe not? Maybe the task is so specialized that it makes sense?
> 
We had a couple of attempts to provide an in-kernel API. Unfortunately,
the result was, at least so far, more complexity on the driver side.
So the difficulty is really to define an API which is really simple, and does
not just complicate driver development for a (presumably) rare use case.

Guenter

> 
> 
> === Option 3: CPU power(energy) monitoring framework ===
> 
> And last but not least, maybe the problem deserves some dedicated
> API? Something that would take providers and feed their data into
> interested parties, in particular a perf PMU implementation and
> cpufreq governors?
> 
> Maybe it could be an extension to the thermal framework? It already
> gives some meaning to a physical phenomena. Adding other, related ones
> like energy, and relating it to cpu cores could make some sense.
> 
> 
> 
> I've tried to gather all potentially interested audience in the To:
> list, but if I missed anyone - please, do let them (and/or me) know.
> 
> Best regards and thanks for participation in the discussion!
> 
> Pawel
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-23 22:02   ` Guenter Roeck
  0 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2012-10-23 22:02 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Tue, Oct 23, 2012 at 06:30:49PM +0100, Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
Only driver supporting "energy" output so far is ibmaem, and the reported energy
is supposed to be cumulative, as in energy = power * time. Do you mean power,
possibly ?

> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output. The second
> most frequent request is about providing it to a "energy aware"
> cpufreq governor.
> 

Anything energy related would have to be along the line of "do something after a
certain amount of work has been performed", which at least at the surface does
not make much sense to me, unless you mean something along the line of a
process scheduler which schedules a process not based on time slices but based
on energy consumed, ie if you want to define a time slice not in milli-seconds
but in Joule.

If so, I would argue that a similar behavior could be achieved by varying the
duration of time slices with the current CPU speed, or simply by using cycle
count instead of time as time slice parameter. Not that I am sure if such an
approach would really be of interest for anyone. 

Or do you really mean power, not energy, such as in "reduce CPU speed if its
power consumption is above X Watt" ?

> I've came up with three (non-mutually exclusive) options. I will
> appreciate any other ideas and comments (including "it makes not sense
> whatsoever" ones, with justification). Of course I am more than willing
> to spend time on prototyping anything that seems reasonable and propose
> patches.
> 
> 
> 
> == Option 1: Trace event => 
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
> 
> The question is how to define it... I've came up with two prototypes:
> 
> = Generic hwmon trace event > 
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
> 
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> 	TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> 	TP_ARGS(dev, attr, input),
> 
> 	TP_STRUCT__entry(
> 		__string(       dev,		dev_name(dev))
> 		__string(	attr,		attr->name)
> 		__field(	long long,	input)
> 	),
> 
> 	TP_fast_assign(
> 		__assign_str(dev, dev_name(dev));
> 		__assign_str(attr, attr->name);
> 		__entry->input = input;
> 	),
> 
> 	TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
> 
> It generates such ftrace message:
> 
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> 
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?
> 
> = CPU power/energy/temperature trace event > 
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
> 
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> 	TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> 	TP_ARGS(cpus, value, unit),
> 
> 	TP_STRUCT__entry(
> 		__array(	unsigned char,	cpus,	sizeof(struct cpumask))
> 		__field(	long long,	value)
> 		__field(	char,		unit)
> 	),
> 
> 	TP_fast_assign(
> 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
> 		__entry->value = value;
> 		__entry->unit = unit;
> 	),
> 
> 	TP_printk("cpus %s %lld[%c]",
> 		__print_cpumask((struct cpumask *)__entry->cpus),
> 		__entry->value, __entry->unit)
> );
> 8<-------------------------------------------
> 
> And the equivalent ftrace message is:
> 
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
> 
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
> 
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)
> 
I am not sure how this would be expected to work. hwmon is, by its very nature,
a passive subsystem: It doesn't do anything unless data is explicitly requested
from it. It does not update an attribute unless that attribute is read.
That does not seem to fit well with the idea of tracing - which assumes
that some activity is happening, ultimately, all by itself, presumably
periodically. The idea to have a user space application read hwmon data only
for it to trigger trace events does not seem to be very compelling to me.

An exception is if a monitoring device suppports interrupts, and if its driver
actually implements those interrupts. This is, however, not the case for most of
the current drivers (if any), mostly because interrupt support for hardware
monitoring devices is very platform dependent and thus difficult to implement.

> 
> == Option 2: hwmon perf PMU => 
> Although the trace event makes it possible to obtain interesting
> information using perf, the user wouldn't be able to treat the
> energy meter as a normal data source. In particular there would
> be no way of creating a group of events consisting eg. of a
> "normal" leader (eg. cache miss event) triggering energy meter
> read. The only way to get this done is to implement a perf PMU
> backend providing "environmental data" to the user.
> 
> = High-level hwmon API and PMU > 
> Current hwmon subsystem does not provide any abstraction for the
> measured values and requires particular drivers to create specified
> sysfs attributes than used by userspace libsensors. This makes
> the framework ultimately flexible and ultimately hard to access
> from within the kernel...
> 
> What could be done here is some (simple) API to register the
> measured values with the hwmon core which would result in creating
> equivalent sysfs attributes automagically, but also allow a
> in-kernel API for values enumeration and access. That way the core
> could also register a "hwmon PMU" with the perf framework providing
> data from all "compliant" drivers.
> 
> = A driver-specific PMU > 
> Of course a particular driver could register its own perf PMU on its
> own. It's certainly an option, just very suboptimal in my opinion.
> Or maybe not? Maybe the task is so specialized that it makes sense?
> 
We had a couple of attempts to provide an in-kernel API. Unfortunately,
the result was, at least so far, more complexity on the driver side.
So the difficulty is really to define an API which is really simple, and does
not just complicate driver development for a (presumably) rare use case.

Guenter

> 
> 
> == Option 3: CPU power(energy) monitoring framework => 
> And last but not least, maybe the problem deserves some dedicated
> API? Something that would take providers and feed their data into
> interested parties, in particular a perf PMU implementation and
> cpufreq governors?
> 
> Maybe it could be an extension to the thermal framework? It already
> gives some meaning to a physical phenomena. Adding other, related ones
> like energy, and relating it to cpu cores could make some sense.
> 
> 
> 
> I've tried to gather all potentially interested audience in the To:
> list, but if I missed anyone - please, do let them (and/or me) know.
> 
> Best regards and thanks for participation in the discussion!
> 
> Pawel
> 
> 
> 
> 

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 17:30 ` Pawel Moll
  (?)
@ 2012-10-24  0:40   ` Thomas Renninger
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Renninger @ 2012-10-24  0:40 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

Hi,

On Tuesday, October 23, 2012 06:30:49 PM Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output.
Why? What is the gain?

Perf events can be triggered at any point in the kernel.
A cpufreq event is triggered when the frequency gets changed.
CPU idle events are triggered when the kernel requests to enter an idle state
or exits one.

When would you trigger a thermal or a power event?
There is the possibility of (critical) thermal limits.
But if I understand this correctly you want this for debugging and
I guess you have everything interesting one can do with temperature
values:
  - read the temperature
  - draw some nice graphs from the results

Hm, I guess I know what you want to do:
In your temperature/energy graph, you want to have some dots
when relevant HW states (frequency, sleep states,  DDR power,...)
changed. Then you are able to see the effects over a timeline.

So you have to bring the existing frequency/idle perf events together
with temperature readings

Cleanest solution could be to enhance the exisiting userspace apps
(pytimechart/perf timechart) and let them add another line
(temperature/energy), but the data would not come from perf, but
from sysfs/hwmon.
Not sure whether this works out with the timechart tools.
Anyway, this sounds like a userspace only problem.

   Thomas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-24  0:40   ` Thomas Renninger
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Renninger @ 2012-10-24  0:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tuesday, October 23, 2012 06:30:49 PM Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output.
Why? What is the gain?

Perf events can be triggered at any point in the kernel.
A cpufreq event is triggered when the frequency gets changed.
CPU idle events are triggered when the kernel requests to enter an idle state
or exits one.

When would you trigger a thermal or a power event?
There is the possibility of (critical) thermal limits.
But if I understand this correctly you want this for debugging and
I guess you have everything interesting one can do with temperature
values:
  - read the temperature
  - draw some nice graphs from the results

Hm, I guess I know what you want to do:
In your temperature/energy graph, you want to have some dots
when relevant HW states (frequency, sleep states,  DDR power,...)
changed. Then you are able to see the effects over a timeline.

So you have to bring the existing frequency/idle perf events together
with temperature readings

Cleanest solution could be to enhance the exisiting userspace apps
(pytimechart/perf timechart) and let them add another line
(temperature/energy), but the data would not come from perf, but
from sysfs/hwmon.
Not sure whether this works out with the timechart tools.
Anyway, this sounds like a userspace only problem.

   Thomas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-24  0:40   ` Thomas Renninger
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Renninger @ 2012-10-24  0:40 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

Hi,

On Tuesday, October 23, 2012 06:30:49 PM Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output.
Why? What is the gain?

Perf events can be triggered at any point in the kernel.
A cpufreq event is triggered when the frequency gets changed.
CPU idle events are triggered when the kernel requests to enter an idle state
or exits one.

When would you trigger a thermal or a power event?
There is the possibility of (critical) thermal limits.
But if I understand this correctly you want this for debugging and
I guess you have everything interesting one can do with temperature
values:
  - read the temperature
  - draw some nice graphs from the results

Hm, I guess I know what you want to do:
In your temperature/energy graph, you want to have some dots
when relevant HW states (frequency, sleep states,  DDR power,...)
changed. Then you are able to see the effects over a timeline.

So you have to bring the existing frequency/idle perf events together
with temperature readings

Cleanest solution could be to enhance the exisiting userspace apps
(pytimechart/perf timechart) and let them add another line
(temperature/energy), but the data would not come from perf, but
from sysfs/hwmon.
Not sure whether this works out with the timechart tools.
Anyway, this sounds like a userspace only problem.

   Thomas

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 17:30 ` Pawel Moll
  (?)
@ 2012-10-24  0:41   ` Thomas Renninger
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Renninger @ 2012-10-24  0:41 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

Hi,

On Tuesday, October 23, 2012 06:30:49 PM Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output.
Why? What is the gain?

Perf events can be triggered at any point in the kernel.
A cpufreq event is triggered when the frequency gets changed.
CPU idle events are triggered when the kernel requests to enter an idle state
or exits one.

When would you trigger a thermal or a power event?
There is the possibility of (critical) thermal limits.
But if I understand this correctly you want this for debugging and
I guess you have everything interesting one can do with temperature
values:
  - read the temperature
  - draw some nice graphs from the results

Hm, I guess I know what you want to do:
In your temperature/energy graph, you want to have some dots
when relevant HW states (frequency, sleep states,  DDR power,...)
changed. Then you are able to see the effects over a timeline.

So you have to bring the existing frequency/idle perf events together
with temperature readings

Cleanest solution could be to enhance the exisiting userspace apps
(pytimechart/perf timechart) and let them add another line
(temperature/energy), but the data would not come from perf, but
from sysfs/hwmon.
Not sure whether this works out with the timechart tools.
Anyway, this sounds like a userspace only problem.

   Thomas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-24  0:41   ` Thomas Renninger
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Renninger @ 2012-10-24  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tuesday, October 23, 2012 06:30:49 PM Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output.
Why? What is the gain?

Perf events can be triggered at any point in the kernel.
A cpufreq event is triggered when the frequency gets changed.
CPU idle events are triggered when the kernel requests to enter an idle state
or exits one.

When would you trigger a thermal or a power event?
There is the possibility of (critical) thermal limits.
But if I understand this correctly you want this for debugging and
I guess you have everything interesting one can do with temperature
values:
  - read the temperature
  - draw some nice graphs from the results

Hm, I guess I know what you want to do:
In your temperature/energy graph, you want to have some dots
when relevant HW states (frequency, sleep states,  DDR power,...)
changed. Then you are able to see the effects over a timeline.

So you have to bring the existing frequency/idle perf events together
with temperature readings

Cleanest solution could be to enhance the exisiting userspace apps
(pytimechart/perf timechart) and let them add another line
(temperature/energy), but the data would not come from perf, but
from sysfs/hwmon.
Not sure whether this works out with the timechart tools.
Anyway, this sounds like a userspace only problem.

   Thomas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-24  0:41   ` Thomas Renninger
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Renninger @ 2012-10-24  0:41 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

Hi,

On Tuesday, October 23, 2012 06:30:49 PM Pawel Moll wrote:
> Greetings All,
> 
> More and more of people are getting interested in the subject of power
> (energy) consumption monitoring. We have some external tools like
> "battery simulators", energy probes etc., but some targets can measure
> their power usage on their own.
> 
> Traditionally such data should be exposed to the user via hwmon sysfs
> interface, and that's exactly what I did for "my" platform - I have
> a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> enough to draw pretty graphs in userspace. Everyone was happy...
> 
> Now I am getting new requests to do more with this data. In particular
> I'm asked how to add such information to ftrace/perf output.
Why? What is the gain?

Perf events can be triggered at any point in the kernel.
A cpufreq event is triggered when the frequency gets changed.
CPU idle events are triggered when the kernel requests to enter an idle state
or exits one.

When would you trigger a thermal or a power event?
There is the possibility of (critical) thermal limits.
But if I understand this correctly you want this for debugging and
I guess you have everything interesting one can do with temperature
values:
  - read the temperature
  - draw some nice graphs from the results

Hm, I guess I know what you want to do:
In your temperature/energy graph, you want to have some dots
when relevant HW states (frequency, sleep states,  DDR power,...)
changed. Then you are able to see the effects over a timeline.

So you have to bring the existing frequency/idle perf events together
with temperature readings

Cleanest solution could be to enhance the exisiting userspace apps
(pytimechart/perf timechart) and let them add another line
(temperature/energy), but the data would not come from perf, but
from sysfs/hwmon.
Not sure whether this works out with the timechart tools.
Anyway, this sounds like a userspace only problem.

   Thomas

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 17:43   ` Steven Rostedt
  (?)
@ 2012-10-24 16:00     ` Pawel Moll
  -1 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Tue, 2012-10-23 at 18:43 +0100, Steven Rostedt wrote:
> > <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> > 
> > One issue with this is that some external knowledge is required to
> > relate a number to a processor core. Or maybe it's not an issue at all
> > because it should be left for the user(space)?
> 
> If the external knowledge can be characterized in a userspace tool with
> the given data here, I see no issues with this.

Ok, fine.

> > 	TP_fast_assign(
> > 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
> 
> Copying the entire cpumask seems like overkill. Especially when you have
> 4096 CPU machines.

Uh, right. I didn't consider such use case...

> Perhaps making a field that can be a subset of cpus may be better. That
> way we don't waste the ring buffer with lots of zeros. I'm guessing that
> it will only be a group of cpus, and not a scattered list? Of course,
> I've seen boxes where the cpu numbers went from core to core. That is,
> cpu 0 was on core 1, cpu 1 was on core 2, and then it would repeat. 
> cpu 8 was on core 1, cpu 9 was on core 2, etc.
> 
> But still, this could be compressed somehow.

Sure thing. Or I could simply use cpumask_scnprintf() on the assign
stage and keep an already-formatted string. Or, as the cpumask per
sensor would be de-facto constant, I could assume keep only a pointer to
it. Will keep it in mind if this event was supposed to happen.

Thanks!

Paweł




^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:00     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2012-10-23 at 18:43 +0100, Steven Rostedt wrote:
> > <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
> > 
> > One issue with this is that some external knowledge is required to
> > relate a number to a processor core. Or maybe it's not an issue at all
> > because it should be left for the user(space)?
> 
> If the external knowledge can be characterized in a userspace tool with
> the given data here, I see no issues with this.

Ok, fine.

> > 	TP_fast_assign(
> > 		memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
> 
> Copying the entire cpumask seems like overkill. Especially when you have
> 4096 CPU machines.

Uh, right. I didn't consider such use case...

> Perhaps making a field that can be a subset of cpus may be better. That
> way we don't waste the ring buffer with lots of zeros. I'm guessing that
> it will only be a group of cpus, and not a scattered list? Of course,
> I've seen boxes where the cpu numbers went from core to core. That is,
> cpu 0 was on core 1, cpu 1 was on core 2, and then it would repeat. 
> cpu 8 was on core 1, cpu 9 was on core 2, etc.
> 
> But still, this could be compressed somehow.

Sure thing. Or I could simply use cpumask_scnprintf() on the assign
stage and keep an already-formatted string. Or, as the cpumask per
sensor would be de-facto constant, I could assume keep only a pointer to
it. Will keep it in mind if this event was supposed to happen.

Thanks!

Pawe?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:00     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

T24gVHVlLCAyMDEyLTEwLTIzIGF0IDE4OjQzICswMTAwLCBTdGV2ZW4gUm9zdGVkdCB3cm90ZToK
PiA+IDwuLi4+MjEyLjY3MzEyNjogaHdtb25fYXR0cl91cGRhdGU6IGh3bW9uNCB0ZW1wMV9pbnB1
dCAzNDM2MQo+ID4gCj4gPiBPbmUgaXNzdWUgd2l0aCB0aGlzIGlzIHRoYXQgc29tZSBleHRlcm5h
bCBrbm93bGVkZ2UgaXMgcmVxdWlyZWQgdG8KPiA+IHJlbGF0ZSBhIG51bWJlciB0byBhIHByb2Nl
c3NvciBjb3JlLiBPciBtYXliZSBpdCdzIG5vdCBhbiBpc3N1ZSBhdCBhbGwKPiA+IGJlY2F1c2Ug
aXQgc2hvdWxkIGJlIGxlZnQgZm9yIHRoZSB1c2VyKHNwYWNlKT8KPiAKPiBJZiB0aGUgZXh0ZXJu
YWwga25vd2xlZGdlIGNhbiBiZSBjaGFyYWN0ZXJpemVkIGluIGEgdXNlcnNwYWNlIHRvb2wgd2l0
aAo+IHRoZSBnaXZlbiBkYXRhIGhlcmUsIEkgc2VlIG5vIGlzc3VlcyB3aXRoIHRoaXMuCgpPaywg
ZmluZS4KCj4gPiAJVFBfZmFzdF9hc3NpZ24oCj4gPiAJCW1lbWNweShfX2VudHJ5LT5jcHVzLCBj
cHVzLCBzaXplb2Yoc3RydWN0IGNwdW1hc2spKTsKPiAKPiBDb3B5aW5nIHRoZSBlbnRpcmUgY3B1
bWFzayBzZWVtcyBsaWtlIG92ZXJraWxsLiBFc3BlY2lhbGx5IHdoZW4geW91IGhhdmUKPiA0MDk2
IENQVSBtYWNoaW5lcy4KClVoLCByaWdodC4gSSBkaWRuJ3QgY29uc2lkZXIgc3VjaCB1c2UgY2Fz
ZS4uLgoKPiBQZXJoYXBzIG1ha2luZyBhIGZpZWxkIHRoYXQgY2FuIGJlIGEgc3Vic2V0IG9mIGNw
dXMgbWF5IGJlIGJldHRlci4gVGhhdAo+IHdheSB3ZSBkb24ndCB3YXN0ZSB0aGUgcmluZyBidWZm
ZXIgd2l0aCBsb3RzIG9mIHplcm9zLiBJJ20gZ3Vlc3NpbmcgdGhhdAo+IGl0IHdpbGwgb25seSBi
ZSBhIGdyb3VwIG9mIGNwdXMsIGFuZCBub3QgYSBzY2F0dGVyZWQgbGlzdD8gT2YgY291cnNlLAo+
IEkndmUgc2VlbiBib3hlcyB3aGVyZSB0aGUgY3B1IG51bWJlcnMgd2VudCBmcm9tIGNvcmUgdG8g
Y29yZS4gVGhhdCBpcywKPiBjcHUgMCB3YXMgb24gY29yZSAxLCBjcHUgMSB3YXMgb24gY29yZSAy
LCBhbmQgdGhlbiBpdCB3b3VsZCByZXBlYXQuIAo+IGNwdSA4IHdhcyBvbiBjb3JlIDEsIGNwdSA5
IHdhcyBvbiBjb3JlIDIsIGV0Yy4KPiAKPiBCdXQgc3RpbGwsIHRoaXMgY291bGQgYmUgY29tcHJl
c3NlZCBzb21laG93LgoKU3VyZSB0aGluZy4gT3IgSSBjb3VsZCBzaW1wbHkgdXNlIGNwdW1hc2tf
c2NucHJpbnRmKCkgb24gdGhlIGFzc2lnbgpzdGFnZSBhbmQga2VlcCBhbiBhbHJlYWR5LWZvcm1h
dHRlZCBzdHJpbmcuIE9yLCBhcyB0aGUgY3B1bWFzayBwZXIKc2Vuc29yIHdvdWxkIGJlIGRlLWZh
Y3RvIGNvbnN0YW50LCBJIGNvdWxkIGFzc3VtZSBrZWVwIG9ubHkgYSBwb2ludGVyIHRvCml0LiBX
aWxsIGtlZXAgaXQgaW4gbWluZCBpZiB0aGlzIGV2ZW50IHdhcyBzdXBwb3NlZCB0byBoYXBwZW4u
CgpUaGFua3MhCgpQYXdlxYIKCgoKCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fCmxtLXNlbnNvcnMgbWFpbGluZyBsaXN0CmxtLXNlbnNvcnNAbG0tc2Vuc29y
cy5vcmcKaHR0cDovL2xpc3RzLmxtLXNlbnNvcnMub3JnL21haWxtYW4vbGlzdGluZm8vbG0tc2Vu
c29ycw=

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 18:49   ` Andy Green
  (?)
@ 2012-10-24 16:05     ` Pawel Moll
  -1 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:05 UTC (permalink / raw)
  To: Andy Green
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet,
	linaro-dev, linux-kernel, linux-arm-kernel, lm-sensors

On Tue, 2012-10-23 at 19:49 +0100, Andy Green wrote:
> A thought on that... from an SoC perspective there are other interesting 
> power rails than go to just the CPU core.  For example DDR power and 
> rails involved with other IP units on the SoC such as 3D graphics unit. 
>   So tying one number to specifically a CPU core does not sound like 
> it's enough.

I do realize this. I just didn't want to try to cover too much ground,
and cpufreq governor would be interested in cpu-related data anyway...

> If you turn the problem upside down to solve the representation question 
> first, maybe there's a way forward defining the "power tree" in terms of 
> regulators, and then adding something in struct regulator that spams 
> readers with timestamped results if the regulator has a power monitoring 
> capability.
> 
> Then you can map the regulators in the power tree to real devices by the 
> names or the supply stuff.  Just a thought.

Hm. Interesting idea indeed - if a regulator device was able to report
the energy being produced by it (instead of looking at cumulative energy
consumed by more than one device), defining "power domains" (by adding
selected cpus as consumers) would be straight forward and the cpufreq
could request the information that way.

I'll look into it, thanks!

Paweł



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:05     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2012-10-23 at 19:49 +0100, Andy Green wrote:
> A thought on that... from an SoC perspective there are other interesting 
> power rails than go to just the CPU core.  For example DDR power and 
> rails involved with other IP units on the SoC such as 3D graphics unit. 
>   So tying one number to specifically a CPU core does not sound like 
> it's enough.

I do realize this. I just didn't want to try to cover too much ground,
and cpufreq governor would be interested in cpu-related data anyway...

> If you turn the problem upside down to solve the representation question 
> first, maybe there's a way forward defining the "power tree" in terms of 
> regulators, and then adding something in struct regulator that spams 
> readers with timestamped results if the regulator has a power monitoring 
> capability.
> 
> Then you can map the regulators in the power tree to real devices by the 
> names or the supply stuff.  Just a thought.

Hm. Interesting idea indeed - if a regulator device was able to report
the energy being produced by it (instead of looking at cumulative energy
consumed by more than one device), defining "power domains" (by adding
selected cpus as consumers) would be straight forward and the cpufreq
could request the information that way.

I'll look into it, thanks!

Pawe?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:05     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:05 UTC (permalink / raw)
  To: Andy Green
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Thomas Renninger, Jean Pihet,
	linaro-dev, linux-kernel, linux-arm-kernel, lm-sensors

T24gVHVlLCAyMDEyLTEwLTIzIGF0IDE5OjQ5ICswMTAwLCBBbmR5IEdyZWVuIHdyb3RlOgo+IEEg
dGhvdWdodCBvbiB0aGF0Li4uIGZyb20gYW4gU29DIHBlcnNwZWN0aXZlIHRoZXJlIGFyZSBvdGhl
ciBpbnRlcmVzdGluZyAKPiBwb3dlciByYWlscyB0aGFuIGdvIHRvIGp1c3QgdGhlIENQVSBjb3Jl
LiAgRm9yIGV4YW1wbGUgRERSIHBvd2VyIGFuZCAKPiByYWlscyBpbnZvbHZlZCB3aXRoIG90aGVy
IElQIHVuaXRzIG9uIHRoZSBTb0Mgc3VjaCBhcyAzRCBncmFwaGljcyB1bml0LiAKPiAgIFNvIHR5
aW5nIG9uZSBudW1iZXIgdG8gc3BlY2lmaWNhbGx5IGEgQ1BVIGNvcmUgZG9lcyBub3Qgc291bmQg
bGlrZSAKPiBpdCdzIGVub3VnaC4KCkkgZG8gcmVhbGl6ZSB0aGlzLiBJIGp1c3QgZGlkbid0IHdh
bnQgdG8gdHJ5IHRvIGNvdmVyIHRvbyBtdWNoIGdyb3VuZCwKYW5kIGNwdWZyZXEgZ292ZXJub3Ig
d291bGQgYmUgaW50ZXJlc3RlZCBpbiBjcHUtcmVsYXRlZCBkYXRhIGFueXdheS4uLgoKPiBJZiB5
b3UgdHVybiB0aGUgcHJvYmxlbSB1cHNpZGUgZG93biB0byBzb2x2ZSB0aGUgcmVwcmVzZW50YXRp
b24gcXVlc3Rpb24gCj4gZmlyc3QsIG1heWJlIHRoZXJlJ3MgYSB3YXkgZm9yd2FyZCBkZWZpbmlu
ZyB0aGUgInBvd2VyIHRyZWUiIGluIHRlcm1zIG9mIAo+IHJlZ3VsYXRvcnMsIGFuZCB0aGVuIGFk
ZGluZyBzb21ldGhpbmcgaW4gc3RydWN0IHJlZ3VsYXRvciB0aGF0IHNwYW1zIAo+IHJlYWRlcnMg
d2l0aCB0aW1lc3RhbXBlZCByZXN1bHRzIGlmIHRoZSByZWd1bGF0b3IgaGFzIGEgcG93ZXIgbW9u
aXRvcmluZyAKPiBjYXBhYmlsaXR5Lgo+IAo+IFRoZW4geW91IGNhbiBtYXAgdGhlIHJlZ3VsYXRv
cnMgaW4gdGhlIHBvd2VyIHRyZWUgdG8gcmVhbCBkZXZpY2VzIGJ5IHRoZSAKPiBuYW1lcyBvciB0
aGUgc3VwcGx5IHN0dWZmLiAgSnVzdCBhIHRob3VnaHQuCgpIbS4gSW50ZXJlc3RpbmcgaWRlYSBp
bmRlZWQgLSBpZiBhIHJlZ3VsYXRvciBkZXZpY2Ugd2FzIGFibGUgdG8gcmVwb3J0CnRoZSBlbmVy
Z3kgYmVpbmcgcHJvZHVjZWQgYnkgaXQgKGluc3RlYWQgb2YgbG9va2luZyBhdCBjdW11bGF0aXZl
IGVuZXJneQpjb25zdW1lZCBieSBtb3JlIHRoYW4gb25lIGRldmljZSksIGRlZmluaW5nICJwb3dl
ciBkb21haW5zIiAoYnkgYWRkaW5nCnNlbGVjdGVkIGNwdXMgYXMgY29uc3VtZXJzKSB3b3VsZCBi
ZSBzdHJhaWdodCBmb3J3YXJkIGFuZCB0aGUgY3B1ZnJlcQpjb3VsZCByZXF1ZXN0IHRoZSBpbmZv
cm1hdGlvbiB0aGF0IHdheS4KCkknbGwgbG9vayBpbnRvIGl0LCB0aGFua3MhCgpQYXdlxYIKCgoK
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbG0tc2Vuc29y
cyBtYWlsaW5nIGxpc3QKbG0tc2Vuc29yc0BsbS1zZW5zb3JzLm9yZwpodHRwOi8vbGlzdHMubG0t
c2Vuc29ycy5vcmcvbWFpbG1hbi9saXN0aW5mby9sbS1zZW5zb3Jz

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-23 22:02   ` Guenter Roeck
  (?)
@ 2012-10-24 16:37     ` Pawel Moll
  -1 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:37 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote:
> > Traditionally such data should be exposed to the user via hwmon sysfs
> > interface, and that's exactly what I did for "my" platform - I have
> > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > enough to draw pretty graphs in userspace. Everyone was happy...
> > 
> Only driver supporting "energy" output so far is ibmaem, and the reported energy
> is supposed to be cumulative, as in energy = power * time. Do you mean power,
> possibly ?

So the vexpress would be the second one, than :-) as the energy
"monitor" actually on the latest tiles reports 64-bit value of
microJoules consumed (or produced) since the power-up.

Some of the older boards were able to report instant power, but this
metrics is less useful in our case.

> > Now I am getting new requests to do more with this data. In particular
> > I'm asked how to add such information to ftrace/perf output. The second
> > most frequent request is about providing it to a "energy aware"
> > cpufreq governor.
> 
> Anything energy related would have to be along the line of "do something after a
> certain amount of work has been performed", which at least at the surface does
> not make much sense to me, unless you mean something along the line of a
> process scheduler which schedules a process not based on time slices but based
> on energy consumed, ie if you want to define a time slice not in milli-seconds
> but in Joule.

Actually there is some research being done in this direction, but it's
way too early to draw any conclusions...

> If so, I would argue that a similar behavior could be achieved by varying the
> duration of time slices with the current CPU speed, or simply by using cycle
> count instead of time as time slice parameter. Not that I am sure if such an
> approach would really be of interest for anyone. 
> 
> Or do you really mean power, not energy, such as in "reduce CPU speed if its
> power consumption is above X Watt" ?

Uh. To be completely honest I must answer: I'm not sure how the "energy
aware" cpufreq governor is supposed to work. I have been simply asked to
provide the data in some standard way, if possible.

> I am not sure how this would be expected to work. hwmon is, by its very nature,
> a passive subsystem: It doesn't do anything unless data is explicitly requested
> from it. It does not update an attribute unless that attribute is read.
> That does not seem to fit well with the idea of tracing - which assumes
> that some activity is happening, ultimately, all by itself, presumably
> periodically. The idea to have a user space application read hwmon data only
> for it to trigger trace events does not seem to be very compelling to me.

What I had in mind was similar to what adt7470 driver does. The driver
would automatically access the device every now and then to update it's
internal state and generate the trace event on the way. This
auto-refresh "feature" is particularly appealing for me, as on some of
"my" platforms can take up to 500 microseconds to actually get the data.
So doing this in background (and providing users with the last known
value in the meantime) seems attractive.

> An exception is if a monitoring device suppports interrupts, and if its driver
> actually implements those interrupts. This is, however, not the case for most of
> the current drivers (if any), mostly because interrupt support for hardware
> monitoring devices is very platform dependent and thus difficult to implement.

Interestingly enough the newest version of our platform control micro
(doing the energy monitoring as well) can generate and interrupt when a
transaction is finished, so I was planning to periodically update the
all sort of values. And again, generating a trace event on this
opportunity would be trivial.

> > Of course a particular driver could register its own perf PMU on its
> > own. It's certainly an option, just very suboptimal in my opinion.
> > Or maybe not? Maybe the task is so specialized that it makes sense?
> > 
> We had a couple of attempts to provide an in-kernel API. Unfortunately,
> the result was, at least so far, more complexity on the driver side.
> So the difficulty is really to define an API which is really simple, and does
> not just complicate driver development for a (presumably) rare use case.

Yes, I appreciate this. That's why this option is actually my least
favourite. Anyway, what I was thinking about was just a thin shin that
*can* be used by a driver to register some particular value with the
core (so it can be enumerated and accessed by in-kernel clients) and the
core could (or not) create a sysfs attribute for this value on behalf of
the driver. Seems lightweight enough, unless previous experience
suggests otherwise?

Cheers!

Paweł



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:37     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote:
> > Traditionally such data should be exposed to the user via hwmon sysfs
> > interface, and that's exactly what I did for "my" platform - I have
> > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > enough to draw pretty graphs in userspace. Everyone was happy...
> > 
> Only driver supporting "energy" output so far is ibmaem, and the reported energy
> is supposed to be cumulative, as in energy = power * time. Do you mean power,
> possibly ?

So the vexpress would be the second one, than :-) as the energy
"monitor" actually on the latest tiles reports 64-bit value of
microJoules consumed (or produced) since the power-up.

Some of the older boards were able to report instant power, but this
metrics is less useful in our case.

> > Now I am getting new requests to do more with this data. In particular
> > I'm asked how to add such information to ftrace/perf output. The second
> > most frequent request is about providing it to a "energy aware"
> > cpufreq governor.
> 
> Anything energy related would have to be along the line of "do something after a
> certain amount of work has been performed", which at least at the surface does
> not make much sense to me, unless you mean something along the line of a
> process scheduler which schedules a process not based on time slices but based
> on energy consumed, ie if you want to define a time slice not in milli-seconds
> but in Joule.

Actually there is some research being done in this direction, but it's
way too early to draw any conclusions...

> If so, I would argue that a similar behavior could be achieved by varying the
> duration of time slices with the current CPU speed, or simply by using cycle
> count instead of time as time slice parameter. Not that I am sure if such an
> approach would really be of interest for anyone. 
> 
> Or do you really mean power, not energy, such as in "reduce CPU speed if its
> power consumption is above X Watt" ?

Uh. To be completely honest I must answer: I'm not sure how the "energy
aware" cpufreq governor is supposed to work. I have been simply asked to
provide the data in some standard way, if possible.

> I am not sure how this would be expected to work. hwmon is, by its very nature,
> a passive subsystem: It doesn't do anything unless data is explicitly requested
> from it. It does not update an attribute unless that attribute is read.
> That does not seem to fit well with the idea of tracing - which assumes
> that some activity is happening, ultimately, all by itself, presumably
> periodically. The idea to have a user space application read hwmon data only
> for it to trigger trace events does not seem to be very compelling to me.

What I had in mind was similar to what adt7470 driver does. The driver
would automatically access the device every now and then to update it's
internal state and generate the trace event on the way. This
auto-refresh "feature" is particularly appealing for me, as on some of
"my" platforms can take up to 500 microseconds to actually get the data.
So doing this in background (and providing users with the last known
value in the meantime) seems attractive.

> An exception is if a monitoring device suppports interrupts, and if its driver
> actually implements those interrupts. This is, however, not the case for most of
> the current drivers (if any), mostly because interrupt support for hardware
> monitoring devices is very platform dependent and thus difficult to implement.

Interestingly enough the newest version of our platform control micro
(doing the energy monitoring as well) can generate and interrupt when a
transaction is finished, so I was planning to periodically update the
all sort of values. And again, generating a trace event on this
opportunity would be trivial.

> > Of course a particular driver could register its own perf PMU on its
> > own. It's certainly an option, just very suboptimal in my opinion.
> > Or maybe not? Maybe the task is so specialized that it makes sense?
> > 
> We had a couple of attempts to provide an in-kernel API. Unfortunately,
> the result was, at least so far, more complexity on the driver side.
> So the difficulty is really to define an API which is really simple, and does
> not just complicate driver development for a (presumably) rare use case.

Yes, I appreciate this. That's why this option is actually my least
favourite. Anyway, what I was thinking about was just a thin shin that
*can* be used by a driver to register some particular value with the
core (so it can be enumerated and accessed by in-kernel clients) and the
core could (or not) create a sysfs attribute for this value on behalf of
the driver. Seems lightweight enough, unless previous experience
suggests otherwise?

Cheers!

Pawe?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:37     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:37 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

T24gVHVlLCAyMDEyLTEwLTIzIGF0IDIzOjAyICswMTAwLCBHdWVudGVyIFJvZWNrIHdyb3RlOgo+
ID4gVHJhZGl0aW9uYWxseSBzdWNoIGRhdGEgc2hvdWxkIGJlIGV4cG9zZWQgdG8gdGhlIHVzZXIg
dmlhIGh3bW9uIHN5c2ZzCj4gPiBpbnRlcmZhY2UsIGFuZCB0aGF0J3MgZXhhY3RseSB3aGF0IEkg
ZGlkIGZvciAibXkiIHBsYXRmb3JtIC0gSSBoYXZlCj4gPiBhIC9zeXMvY2xhc3MvaHdtb24vaHdt
b24qL2RldmljZS9lbmVyZ3kqX2lucHV0IGFuZCB0aGlzIHdhcyBnb29kCj4gPiBlbm91Z2ggdG8g
ZHJhdyBwcmV0dHkgZ3JhcGhzIGluIHVzZXJzcGFjZS4gRXZlcnlvbmUgd2FzIGhhcHB5Li4uCj4g
PiAKPiBPbmx5IGRyaXZlciBzdXBwb3J0aW5nICJlbmVyZ3kiIG91dHB1dCBzbyBmYXIgaXMgaWJt
YWVtLCBhbmQgdGhlIHJlcG9ydGVkIGVuZXJneQo+IGlzIHN1cHBvc2VkIHRvIGJlIGN1bXVsYXRp
dmUsIGFzIGluIGVuZXJneSA9IHBvd2VyICogdGltZS4gRG8geW91IG1lYW4gcG93ZXIsCj4gcG9z
c2libHkgPwoKU28gdGhlIHZleHByZXNzIHdvdWxkIGJlIHRoZSBzZWNvbmQgb25lLCB0aGFuIDot
KSBhcyB0aGUgZW5lcmd5CiJtb25pdG9yIiBhY3R1YWxseSBvbiB0aGUgbGF0ZXN0IHRpbGVzIHJl
cG9ydHMgNjQtYml0IHZhbHVlIG9mCm1pY3JvSm91bGVzIGNvbnN1bWVkIChvciBwcm9kdWNlZCkg
c2luY2UgdGhlIHBvd2VyLXVwLgoKU29tZSBvZiB0aGUgb2xkZXIgYm9hcmRzIHdlcmUgYWJsZSB0
byByZXBvcnQgaW5zdGFudCBwb3dlciwgYnV0IHRoaXMKbWV0cmljcyBpcyBsZXNzIHVzZWZ1bCBp
biBvdXIgY2FzZS4KCj4gPiBOb3cgSSBhbSBnZXR0aW5nIG5ldyByZXF1ZXN0cyB0byBkbyBtb3Jl
IHdpdGggdGhpcyBkYXRhLiBJbiBwYXJ0aWN1bGFyCj4gPiBJJ20gYXNrZWQgaG93IHRvIGFkZCBz
dWNoIGluZm9ybWF0aW9uIHRvIGZ0cmFjZS9wZXJmIG91dHB1dC4gVGhlIHNlY29uZAo+ID4gbW9z
dCBmcmVxdWVudCByZXF1ZXN0IGlzIGFib3V0IHByb3ZpZGluZyBpdCB0byBhICJlbmVyZ3kgYXdh
cmUiCj4gPiBjcHVmcmVxIGdvdmVybm9yLgo+IAo+IEFueXRoaW5nIGVuZXJneSByZWxhdGVkIHdv
dWxkIGhhdmUgdG8gYmUgYWxvbmcgdGhlIGxpbmUgb2YgImRvIHNvbWV0aGluZyBhZnRlciBhCj4g
Y2VydGFpbiBhbW91bnQgb2Ygd29yayBoYXMgYmVlbiBwZXJmb3JtZWQiLCB3aGljaCBhdCBsZWFz
dCBhdCB0aGUgc3VyZmFjZSBkb2VzCj4gbm90IG1ha2UgbXVjaCBzZW5zZSB0byBtZSwgdW5sZXNz
IHlvdSBtZWFuIHNvbWV0aGluZyBhbG9uZyB0aGUgbGluZSBvZiBhCj4gcHJvY2VzcyBzY2hlZHVs
ZXIgd2hpY2ggc2NoZWR1bGVzIGEgcHJvY2VzcyBub3QgYmFzZWQgb24gdGltZSBzbGljZXMgYnV0
IGJhc2VkCj4gb24gZW5lcmd5IGNvbnN1bWVkLCBpZSBpZiB5b3Ugd2FudCB0byBkZWZpbmUgYSB0
aW1lIHNsaWNlIG5vdCBpbiBtaWxsaS1zZWNvbmRzCj4gYnV0IGluIEpvdWxlLgoKQWN0dWFsbHkg
dGhlcmUgaXMgc29tZSByZXNlYXJjaCBiZWluZyBkb25lIGluIHRoaXMgZGlyZWN0aW9uLCBidXQg
aXQncwp3YXkgdG9vIGVhcmx5IHRvIGRyYXcgYW55IGNvbmNsdXNpb25zLi4uCgo+IElmIHNvLCBJ
IHdvdWxkIGFyZ3VlIHRoYXQgYSBzaW1pbGFyIGJlaGF2aW9yIGNvdWxkIGJlIGFjaGlldmVkIGJ5
IHZhcnlpbmcgdGhlCj4gZHVyYXRpb24gb2YgdGltZSBzbGljZXMgd2l0aCB0aGUgY3VycmVudCBD
UFUgc3BlZWQsIG9yIHNpbXBseSBieSB1c2luZyBjeWNsZQo+IGNvdW50IGluc3RlYWQgb2YgdGlt
ZSBhcyB0aW1lIHNsaWNlIHBhcmFtZXRlci4gTm90IHRoYXQgSSBhbSBzdXJlIGlmIHN1Y2ggYW4K
PiBhcHByb2FjaCB3b3VsZCByZWFsbHkgYmUgb2YgaW50ZXJlc3QgZm9yIGFueW9uZS4gCj4gCj4g
T3IgZG8geW91IHJlYWxseSBtZWFuIHBvd2VyLCBub3QgZW5lcmd5LCBzdWNoIGFzIGluICJyZWR1
Y2UgQ1BVIHNwZWVkIGlmIGl0cwo+IHBvd2VyIGNvbnN1bXB0aW9uIGlzIGFib3ZlIFggV2F0dCIg
PwoKVWguIFRvIGJlIGNvbXBsZXRlbHkgaG9uZXN0IEkgbXVzdCBhbnN3ZXI6IEknbSBub3Qgc3Vy
ZSBob3cgdGhlICJlbmVyZ3kKYXdhcmUiIGNwdWZyZXEgZ292ZXJub3IgaXMgc3VwcG9zZWQgdG8g
d29yay4gSSBoYXZlIGJlZW4gc2ltcGx5IGFza2VkIHRvCnByb3ZpZGUgdGhlIGRhdGEgaW4gc29t
ZSBzdGFuZGFyZCB3YXksIGlmIHBvc3NpYmxlLgoKPiBJIGFtIG5vdCBzdXJlIGhvdyB0aGlzIHdv
dWxkIGJlIGV4cGVjdGVkIHRvIHdvcmsuIGh3bW9uIGlzLCBieSBpdHMgdmVyeSBuYXR1cmUsCj4g
YSBwYXNzaXZlIHN1YnN5c3RlbTogSXQgZG9lc24ndCBkbyBhbnl0aGluZyB1bmxlc3MgZGF0YSBp
cyBleHBsaWNpdGx5IHJlcXVlc3RlZAo+IGZyb20gaXQuIEl0IGRvZXMgbm90IHVwZGF0ZSBhbiBh
dHRyaWJ1dGUgdW5sZXNzIHRoYXQgYXR0cmlidXRlIGlzIHJlYWQuCj4gVGhhdCBkb2VzIG5vdCBz
ZWVtIHRvIGZpdCB3ZWxsIHdpdGggdGhlIGlkZWEgb2YgdHJhY2luZyAtIHdoaWNoIGFzc3VtZXMK
PiB0aGF0IHNvbWUgYWN0aXZpdHkgaXMgaGFwcGVuaW5nLCB1bHRpbWF0ZWx5LCBhbGwgYnkgaXRz
ZWxmLCBwcmVzdW1hYmx5Cj4gcGVyaW9kaWNhbGx5LiBUaGUgaWRlYSB0byBoYXZlIGEgdXNlciBz
cGFjZSBhcHBsaWNhdGlvbiByZWFkIGh3bW9uIGRhdGEgb25seQo+IGZvciBpdCB0byB0cmlnZ2Vy
IHRyYWNlIGV2ZW50cyBkb2VzIG5vdCBzZWVtIHRvIGJlIHZlcnkgY29tcGVsbGluZyB0byBtZS4K
CldoYXQgSSBoYWQgaW4gbWluZCB3YXMgc2ltaWxhciB0byB3aGF0IGFkdDc0NzAgZHJpdmVyIGRv
ZXMuIFRoZSBkcml2ZXIKd291bGQgYXV0b21hdGljYWxseSBhY2Nlc3MgdGhlIGRldmljZSBldmVy
eSBub3cgYW5kIHRoZW4gdG8gdXBkYXRlIGl0J3MKaW50ZXJuYWwgc3RhdGUgYW5kIGdlbmVyYXRl
IHRoZSB0cmFjZSBldmVudCBvbiB0aGUgd2F5LiBUaGlzCmF1dG8tcmVmcmVzaCAiZmVhdHVyZSIg
aXMgcGFydGljdWxhcmx5IGFwcGVhbGluZyBmb3IgbWUsIGFzIG9uIHNvbWUgb2YKIm15IiBwbGF0
Zm9ybXMgY2FuIHRha2UgdXAgdG8gNTAwIG1pY3Jvc2Vjb25kcyB0byBhY3R1YWxseSBnZXQgdGhl
IGRhdGEuClNvIGRvaW5nIHRoaXMgaW4gYmFja2dyb3VuZCAoYW5kIHByb3ZpZGluZyB1c2VycyB3
aXRoIHRoZSBsYXN0IGtub3duCnZhbHVlIGluIHRoZSBtZWFudGltZSkgc2VlbXMgYXR0cmFjdGl2
ZS4KCj4gQW4gZXhjZXB0aW9uIGlzIGlmIGEgbW9uaXRvcmluZyBkZXZpY2Ugc3VwcHBvcnRzIGlu
dGVycnVwdHMsIGFuZCBpZiBpdHMgZHJpdmVyCj4gYWN0dWFsbHkgaW1wbGVtZW50cyB0aG9zZSBp
bnRlcnJ1cHRzLiBUaGlzIGlzLCBob3dldmVyLCBub3QgdGhlIGNhc2UgZm9yIG1vc3Qgb2YKPiB0
aGUgY3VycmVudCBkcml2ZXJzIChpZiBhbnkpLCBtb3N0bHkgYmVjYXVzZSBpbnRlcnJ1cHQgc3Vw
cG9ydCBmb3IgaGFyZHdhcmUKPiBtb25pdG9yaW5nIGRldmljZXMgaXMgdmVyeSBwbGF0Zm9ybSBk
ZXBlbmRlbnQgYW5kIHRodXMgZGlmZmljdWx0IHRvIGltcGxlbWVudC4KCkludGVyZXN0aW5nbHkg
ZW5vdWdoIHRoZSBuZXdlc3QgdmVyc2lvbiBvZiBvdXIgcGxhdGZvcm0gY29udHJvbCBtaWNybwoo
ZG9pbmcgdGhlIGVuZXJneSBtb25pdG9yaW5nIGFzIHdlbGwpIGNhbiBnZW5lcmF0ZSBhbmQgaW50
ZXJydXB0IHdoZW4gYQp0cmFuc2FjdGlvbiBpcyBmaW5pc2hlZCwgc28gSSB3YXMgcGxhbm5pbmcg
dG8gcGVyaW9kaWNhbGx5IHVwZGF0ZSB0aGUKYWxsIHNvcnQgb2YgdmFsdWVzLiBBbmQgYWdhaW4s
IGdlbmVyYXRpbmcgYSB0cmFjZSBldmVudCBvbiB0aGlzCm9wcG9ydHVuaXR5IHdvdWxkIGJlIHRy
aXZpYWwuCgo+ID4gT2YgY291cnNlIGEgcGFydGljdWxhciBkcml2ZXIgY291bGQgcmVnaXN0ZXIg
aXRzIG93biBwZXJmIFBNVSBvbiBpdHMKPiA+IG93bi4gSXQncyBjZXJ0YWlubHkgYW4gb3B0aW9u
LCBqdXN0IHZlcnkgc3Vib3B0aW1hbCBpbiBteSBvcGluaW9uLgo+ID4gT3IgbWF5YmUgbm90PyBN
YXliZSB0aGUgdGFzayBpcyBzbyBzcGVjaWFsaXplZCB0aGF0IGl0IG1ha2VzIHNlbnNlPwo+ID4g
Cj4gV2UgaGFkIGEgY291cGxlIG9mIGF0dGVtcHRzIHRvIHByb3ZpZGUgYW4gaW4ta2VybmVsIEFQ
SS4gVW5mb3J0dW5hdGVseSwKPiB0aGUgcmVzdWx0IHdhcywgYXQgbGVhc3Qgc28gZmFyLCBtb3Jl
IGNvbXBsZXhpdHkgb24gdGhlIGRyaXZlciBzaWRlLgo+IFNvIHRoZSBkaWZmaWN1bHR5IGlzIHJl
YWxseSB0byBkZWZpbmUgYW4gQVBJIHdoaWNoIGlzIHJlYWxseSBzaW1wbGUsIGFuZCBkb2VzCj4g
bm90IGp1c3QgY29tcGxpY2F0ZSBkcml2ZXIgZGV2ZWxvcG1lbnQgZm9yIGEgKHByZXN1bWFibHkp
IHJhcmUgdXNlIGNhc2UuCgpZZXMsIEkgYXBwcmVjaWF0ZSB0aGlzLiBUaGF0J3Mgd2h5IHRoaXMg
b3B0aW9uIGlzIGFjdHVhbGx5IG15IGxlYXN0CmZhdm91cml0ZS4gQW55d2F5LCB3aGF0IEkgd2Fz
IHRoaW5raW5nIGFib3V0IHdhcyBqdXN0IGEgdGhpbiBzaGluIHRoYXQKKmNhbiogYmUgdXNlZCBi
eSBhIGRyaXZlciB0byByZWdpc3RlciBzb21lIHBhcnRpY3VsYXIgdmFsdWUgd2l0aCB0aGUKY29y
ZSAoc28gaXQgY2FuIGJlIGVudW1lcmF0ZWQgYW5kIGFjY2Vzc2VkIGJ5IGluLWtlcm5lbCBjbGll
bnRzKSBhbmQgdGhlCmNvcmUgY291bGQgKG9yIG5vdCkgY3JlYXRlIGEgc3lzZnMgYXR0cmlidXRl
IGZvciB0aGlzIHZhbHVlIG9uIGJlaGFsZiBvZgp0aGUgZHJpdmVyLiBTZWVtcyBsaWdodHdlaWdo
dCBlbm91Z2gsIHVubGVzcyBwcmV2aW91cyBleHBlcmllbmNlCnN1Z2dlc3RzIG90aGVyd2lzZT8K
CkNoZWVycyEKClBhd2XFggoKCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fXwpsbS1zZW5zb3JzIG1haWxpbmcgbGlzdApsbS1zZW5zb3JzQGxtLXNlbnNvcnMu
b3JnCmh0dHA6Ly9saXN0cy5sbS1zZW5zb3JzLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xtLXNlbnNv
cnM

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-24  0:40   ` Thomas Renninger
  (?)
@ 2012-10-24 16:51     ` Pawel Moll
  -1 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:51 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Wed, 2012-10-24 at 01:40 +0100, Thomas Renninger wrote:
> > More and more of people are getting interested in the subject of power
> > (energy) consumption monitoring. We have some external tools like
> > "battery simulators", energy probes etc., but some targets can measure
> > their power usage on their own.
> > 
> > Traditionally such data should be exposed to the user via hwmon sysfs
> > interface, and that's exactly what I did for "my" platform - I have
> > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > enough to draw pretty graphs in userspace. Everyone was happy...
> > 
> > Now I am getting new requests to do more with this data. In particular
> > I'm asked how to add such information to ftrace/perf output.
> Why? What is the gain?
> 
> Perf events can be triggered at any point in the kernel.
> A cpufreq event is triggered when the frequency gets changed.
> CPU idle events are triggered when the kernel requests to enter an idle state
> or exits one.
> 
> When would you trigger a thermal or a power event?
> There is the possibility of (critical) thermal limits.
> But if I understand this correctly you want this for debugging and
> I guess you have everything interesting one can do with temperature
> values:
>   - read the temperature
>   - draw some nice graphs from the results
> 
> Hm, I guess I know what you want to do:
> In your temperature/energy graph, you want to have some dots
> when relevant HW states (frequency, sleep states,  DDR power,...)
> changed. Then you are able to see the effects over a timeline.
> 
> So you have to bring the existing frequency/idle perf events together
> with temperature readings
> 
> Cleanest solution could be to enhance the exisiting userspace apps
> (pytimechart/perf timechart) and let them add another line
> (temperature/energy), but the data would not come from perf, but
> from sysfs/hwmon.
> Not sure whether this works out with the timechart tools.
> Anyway, this sounds like a userspace only problem.

Ok, so it is actually what I'm working on right now. Not with the
standard perf tool (there are other users of that API ;-) but indeed I'm
trying to "enrich" the data stream coming from kernel with user-space
originating values. I am a little bit concerned about effect of extra
syscalls (accessing the value and gettimeofday to generate a timestamp)
at a higher sampling rates, but most likely it won't be a problem. Can
report once I know more, if this is of interest to anyone.

Anyway, there are at least two debug/trace related use cases that can
not be satisfied that way (of course one could argue about their
usefulness):

1. ftrace-over-network (https://lwn.net/Articles/410200/) which is
particularly appealing for "embedded users", where there's virtually no
useful userspace available (think Android). Here a (functional) trace
event is embedded into a normal trace and available "for free" at the
host side.

2. perf groups - the general idea is that one event (let it be cycle
counter interrupt or even a timer) triggers read of other values (eg.
cache counter or - in this case - energy counter). The aim is to have a
regular "snapshots" of the system state. I'm not sure if the standard
perf tool can do this, but I do :-)

And last, but not least, there are the non-debug/trace clients for
energy data as discussed in other mails in this thread. Of course the
trace event won't really satisfy their needs either.

Thanks for your feedback!

Paweł



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:51     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2012-10-24 at 01:40 +0100, Thomas Renninger wrote:
> > More and more of people are getting interested in the subject of power
> > (energy) consumption monitoring. We have some external tools like
> > "battery simulators", energy probes etc., but some targets can measure
> > their power usage on their own.
> > 
> > Traditionally such data should be exposed to the user via hwmon sysfs
> > interface, and that's exactly what I did for "my" platform - I have
> > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > enough to draw pretty graphs in userspace. Everyone was happy...
> > 
> > Now I am getting new requests to do more with this data. In particular
> > I'm asked how to add such information to ftrace/perf output.
> Why? What is the gain?
> 
> Perf events can be triggered at any point in the kernel.
> A cpufreq event is triggered when the frequency gets changed.
> CPU idle events are triggered when the kernel requests to enter an idle state
> or exits one.
> 
> When would you trigger a thermal or a power event?
> There is the possibility of (critical) thermal limits.
> But if I understand this correctly you want this for debugging and
> I guess you have everything interesting one can do with temperature
> values:
>   - read the temperature
>   - draw some nice graphs from the results
> 
> Hm, I guess I know what you want to do:
> In your temperature/energy graph, you want to have some dots
> when relevant HW states (frequency, sleep states,  DDR power,...)
> changed. Then you are able to see the effects over a timeline.
> 
> So you have to bring the existing frequency/idle perf events together
> with temperature readings
> 
> Cleanest solution could be to enhance the exisiting userspace apps
> (pytimechart/perf timechart) and let them add another line
> (temperature/energy), but the data would not come from perf, but
> from sysfs/hwmon.
> Not sure whether this works out with the timechart tools.
> Anyway, this sounds like a userspace only problem.

Ok, so it is actually what I'm working on right now. Not with the
standard perf tool (there are other users of that API ;-) but indeed I'm
trying to "enrich" the data stream coming from kernel with user-space
originating values. I am a little bit concerned about effect of extra
syscalls (accessing the value and gettimeofday to generate a timestamp)
at a higher sampling rates, but most likely it won't be a problem. Can
report once I know more, if this is of interest to anyone.

Anyway, there are at least two debug/trace related use cases that can
not be satisfied that way (of course one could argue about their
usefulness):

1. ftrace-over-network (https://lwn.net/Articles/410200/) which is
particularly appealing for "embedded users", where there's virtually no
useful userspace available (think Android). Here a (functional) trace
event is embedded into a normal trace and available "for free" at the
host side.

2. perf groups - the general idea is that one event (let it be cycle
counter interrupt or even a timer) triggers read of other values (eg.
cache counter or - in this case - energy counter). The aim is to have a
regular "snapshots" of the system state. I'm not sure if the standard
perf tool can do this, but I do :-)

And last, but not least, there are the non-debug/trace clients for
energy data as discussed in other mails in this thread. Of course the
trace event won't really satisfy their needs either.

Thanks for your feedback!

Pawe?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 16:51     ` Pawel Moll
  0 siblings, 0 replies; 33+ messages in thread
From: Pawel Moll @ 2012-10-24 16:51 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Guenter Roeck, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jesper Juhl, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

T24gV2VkLCAyMDEyLTEwLTI0IGF0IDAxOjQwICswMTAwLCBUaG9tYXMgUmVubmluZ2VyIHdyb3Rl
Ogo+ID4gTW9yZSBhbmQgbW9yZSBvZiBwZW9wbGUgYXJlIGdldHRpbmcgaW50ZXJlc3RlZCBpbiB0
aGUgc3ViamVjdCBvZiBwb3dlcgo+ID4gKGVuZXJneSkgY29uc3VtcHRpb24gbW9uaXRvcmluZy4g
V2UgaGF2ZSBzb21lIGV4dGVybmFsIHRvb2xzIGxpa2UKPiA+ICJiYXR0ZXJ5IHNpbXVsYXRvcnMi
LCBlbmVyZ3kgcHJvYmVzIGV0Yy4sIGJ1dCBzb21lIHRhcmdldHMgY2FuIG1lYXN1cmUKPiA+IHRo
ZWlyIHBvd2VyIHVzYWdlIG9uIHRoZWlyIG93bi4KPiA+IAo+ID4gVHJhZGl0aW9uYWxseSBzdWNo
IGRhdGEgc2hvdWxkIGJlIGV4cG9zZWQgdG8gdGhlIHVzZXIgdmlhIGh3bW9uIHN5c2ZzCj4gPiBp
bnRlcmZhY2UsIGFuZCB0aGF0J3MgZXhhY3RseSB3aGF0IEkgZGlkIGZvciAibXkiIHBsYXRmb3Jt
IC0gSSBoYXZlCj4gPiBhIC9zeXMvY2xhc3MvaHdtb24vaHdtb24qL2RldmljZS9lbmVyZ3kqX2lu
cHV0IGFuZCB0aGlzIHdhcyBnb29kCj4gPiBlbm91Z2ggdG8gZHJhdyBwcmV0dHkgZ3JhcGhzIGlu
IHVzZXJzcGFjZS4gRXZlcnlvbmUgd2FzIGhhcHB5Li4uCj4gPiAKPiA+IE5vdyBJIGFtIGdldHRp
bmcgbmV3IHJlcXVlc3RzIHRvIGRvIG1vcmUgd2l0aCB0aGlzIGRhdGEuIEluIHBhcnRpY3VsYXIK
PiA+IEknbSBhc2tlZCBob3cgdG8gYWRkIHN1Y2ggaW5mb3JtYXRpb24gdG8gZnRyYWNlL3BlcmYg
b3V0cHV0Lgo+IFdoeT8gV2hhdCBpcyB0aGUgZ2Fpbj8KPiAKPiBQZXJmIGV2ZW50cyBjYW4gYmUg
dHJpZ2dlcmVkIGF0IGFueSBwb2ludCBpbiB0aGUga2VybmVsLgo+IEEgY3B1ZnJlcSBldmVudCBp
cyB0cmlnZ2VyZWQgd2hlbiB0aGUgZnJlcXVlbmN5IGdldHMgY2hhbmdlZC4KPiBDUFUgaWRsZSBl
dmVudHMgYXJlIHRyaWdnZXJlZCB3aGVuIHRoZSBrZXJuZWwgcmVxdWVzdHMgdG8gZW50ZXIgYW4g
aWRsZSBzdGF0ZQo+IG9yIGV4aXRzIG9uZS4KPiAKPiBXaGVuIHdvdWxkIHlvdSB0cmlnZ2VyIGEg
dGhlcm1hbCBvciBhIHBvd2VyIGV2ZW50Pwo+IFRoZXJlIGlzIHRoZSBwb3NzaWJpbGl0eSBvZiAo
Y3JpdGljYWwpIHRoZXJtYWwgbGltaXRzLgo+IEJ1dCBpZiBJIHVuZGVyc3RhbmQgdGhpcyBjb3Jy
ZWN0bHkgeW91IHdhbnQgdGhpcyBmb3IgZGVidWdnaW5nIGFuZAo+IEkgZ3Vlc3MgeW91IGhhdmUg
ZXZlcnl0aGluZyBpbnRlcmVzdGluZyBvbmUgY2FuIGRvIHdpdGggdGVtcGVyYXR1cmUKPiB2YWx1
ZXM6Cj4gICAtIHJlYWQgdGhlIHRlbXBlcmF0dXJlCj4gICAtIGRyYXcgc29tZSBuaWNlIGdyYXBo
cyBmcm9tIHRoZSByZXN1bHRzCj4gCj4gSG0sIEkgZ3Vlc3MgSSBrbm93IHdoYXQgeW91IHdhbnQg
dG8gZG86Cj4gSW4geW91ciB0ZW1wZXJhdHVyZS9lbmVyZ3kgZ3JhcGgsIHlvdSB3YW50IHRvIGhh
dmUgc29tZSBkb3RzCj4gd2hlbiByZWxldmFudCBIVyBzdGF0ZXMgKGZyZXF1ZW5jeSwgc2xlZXAg
c3RhdGVzLCAgRERSIHBvd2VyLC4uLikKPiBjaGFuZ2VkLiBUaGVuIHlvdSBhcmUgYWJsZSB0byBz
ZWUgdGhlIGVmZmVjdHMgb3ZlciBhIHRpbWVsaW5lLgo+IAo+IFNvIHlvdSBoYXZlIHRvIGJyaW5n
IHRoZSBleGlzdGluZyBmcmVxdWVuY3kvaWRsZSBwZXJmIGV2ZW50cyB0b2dldGhlcgo+IHdpdGgg
dGVtcGVyYXR1cmUgcmVhZGluZ3MKPiAKPiBDbGVhbmVzdCBzb2x1dGlvbiBjb3VsZCBiZSB0byBl
bmhhbmNlIHRoZSBleGlzaXRpbmcgdXNlcnNwYWNlIGFwcHMKPiAocHl0aW1lY2hhcnQvcGVyZiB0
aW1lY2hhcnQpIGFuZCBsZXQgdGhlbSBhZGQgYW5vdGhlciBsaW5lCj4gKHRlbXBlcmF0dXJlL2Vu
ZXJneSksIGJ1dCB0aGUgZGF0YSB3b3VsZCBub3QgY29tZSBmcm9tIHBlcmYsIGJ1dAo+IGZyb20g
c3lzZnMvaHdtb24uCj4gTm90IHN1cmUgd2hldGhlciB0aGlzIHdvcmtzIG91dCB3aXRoIHRoZSB0
aW1lY2hhcnQgdG9vbHMuCj4gQW55d2F5LCB0aGlzIHNvdW5kcyBsaWtlIGEgdXNlcnNwYWNlIG9u
bHkgcHJvYmxlbS4KCk9rLCBzbyBpdCBpcyBhY3R1YWxseSB3aGF0IEknbSB3b3JraW5nIG9uIHJp
Z2h0IG5vdy4gTm90IHdpdGggdGhlCnN0YW5kYXJkIHBlcmYgdG9vbCAodGhlcmUgYXJlIG90aGVy
IHVzZXJzIG9mIHRoYXQgQVBJIDstKSBidXQgaW5kZWVkIEknbQp0cnlpbmcgdG8gImVucmljaCIg
dGhlIGRhdGEgc3RyZWFtIGNvbWluZyBmcm9tIGtlcm5lbCB3aXRoIHVzZXItc3BhY2UKb3JpZ2lu
YXRpbmcgdmFsdWVzLiBJIGFtIGEgbGl0dGxlIGJpdCBjb25jZXJuZWQgYWJvdXQgZWZmZWN0IG9m
IGV4dHJhCnN5c2NhbGxzIChhY2Nlc3NpbmcgdGhlIHZhbHVlIGFuZCBnZXR0aW1lb2ZkYXkgdG8g
Z2VuZXJhdGUgYSB0aW1lc3RhbXApCmF0IGEgaGlnaGVyIHNhbXBsaW5nIHJhdGVzLCBidXQgbW9z
dCBsaWtlbHkgaXQgd29uJ3QgYmUgYSBwcm9ibGVtLiBDYW4KcmVwb3J0IG9uY2UgSSBrbm93IG1v
cmUsIGlmIHRoaXMgaXMgb2YgaW50ZXJlc3QgdG8gYW55b25lLgoKQW55d2F5LCB0aGVyZSBhcmUg
YXQgbGVhc3QgdHdvIGRlYnVnL3RyYWNlIHJlbGF0ZWQgdXNlIGNhc2VzIHRoYXQgY2FuCm5vdCBi
ZSBzYXRpc2ZpZWQgdGhhdCB3YXkgKG9mIGNvdXJzZSBvbmUgY291bGQgYXJndWUgYWJvdXQgdGhl
aXIKdXNlZnVsbmVzcyk6CgoxLiBmdHJhY2Utb3Zlci1uZXR3b3JrIChodHRwczovL2x3bi5uZXQv
QXJ0aWNsZXMvNDEwMjAwLykgd2hpY2ggaXMKcGFydGljdWxhcmx5IGFwcGVhbGluZyBmb3IgImVt
YmVkZGVkIHVzZXJzIiwgd2hlcmUgdGhlcmUncyB2aXJ0dWFsbHkgbm8KdXNlZnVsIHVzZXJzcGFj
ZSBhdmFpbGFibGUgKHRoaW5rIEFuZHJvaWQpLiBIZXJlIGEgKGZ1bmN0aW9uYWwpIHRyYWNlCmV2
ZW50IGlzIGVtYmVkZGVkIGludG8gYSBub3JtYWwgdHJhY2UgYW5kIGF2YWlsYWJsZSAiZm9yIGZy
ZWUiIGF0IHRoZQpob3N0IHNpZGUuCgoyLiBwZXJmIGdyb3VwcyAtIHRoZSBnZW5lcmFsIGlkZWEg
aXMgdGhhdCBvbmUgZXZlbnQgKGxldCBpdCBiZSBjeWNsZQpjb3VudGVyIGludGVycnVwdCBvciBl
dmVuIGEgdGltZXIpIHRyaWdnZXJzIHJlYWQgb2Ygb3RoZXIgdmFsdWVzIChlZy4KY2FjaGUgY291
bnRlciBvciAtIGluIHRoaXMgY2FzZSAtIGVuZXJneSBjb3VudGVyKS4gVGhlIGFpbSBpcyB0byBo
YXZlIGEKcmVndWxhciAic25hcHNob3RzIiBvZiB0aGUgc3lzdGVtIHN0YXRlLiBJJ20gbm90IHN1
cmUgaWYgdGhlIHN0YW5kYXJkCnBlcmYgdG9vbCBjYW4gZG8gdGhpcywgYnV0IEkgZG8gOi0pCgpB
bmQgbGFzdCwgYnV0IG5vdCBsZWFzdCwgdGhlcmUgYXJlIHRoZSBub24tZGVidWcvdHJhY2UgY2xp
ZW50cyBmb3IKZW5lcmd5IGRhdGEgYXMgZGlzY3Vzc2VkIGluIG90aGVyIG1haWxzIGluIHRoaXMg
dGhyZWFkLiBPZiBjb3Vyc2UgdGhlCnRyYWNlIGV2ZW50IHdvbid0IHJlYWxseSBzYXRpc2Z5IHRo
ZWlyIG5lZWRzIGVpdGhlci4KClRoYW5rcyBmb3IgeW91ciBmZWVkYmFjayEKClBhd2XFggoKCgpf
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsbS1zZW5zb3Jz
IG1haWxpbmcgbGlzdApsbS1zZW5zb3JzQGxtLXNlbnNvcnMub3JnCmh0dHA6Ly9saXN0cy5sbS1z
ZW5zb3JzLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xtLXNlbnNvcnM

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC] Energy/power monitoring within the kernel
  2012-10-24 16:37     ` Pawel Moll
  (?)
@ 2012-10-24 20:01       ` Guenter Roeck
  -1 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2012-10-24 20:01 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

On Wed, Oct 24, 2012 at 05:37:27PM +0100, Pawel Moll wrote:
> On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote:
> > > Traditionally such data should be exposed to the user via hwmon sysfs
> > > interface, and that's exactly what I did for "my" platform - I have
> > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > > enough to draw pretty graphs in userspace. Everyone was happy...
> > > 
> > Only driver supporting "energy" output so far is ibmaem, and the reported energy
> > is supposed to be cumulative, as in energy = power * time. Do you mean power,
> > possibly ?
> 
> So the vexpress would be the second one, than :-) as the energy
> "monitor" actually on the latest tiles reports 64-bit value of
> microJoules consumed (or produced) since the power-up.
> 
> Some of the older boards were able to report instant power, but this
> metrics is less useful in our case.
> 
> > > Now I am getting new requests to do more with this data. In particular
> > > I'm asked how to add such information to ftrace/perf output. The second
> > > most frequent request is about providing it to a "energy aware"
> > > cpufreq governor.
> > 
> > Anything energy related would have to be along the line of "do something after a
> > certain amount of work has been performed", which at least at the surface does
> > not make much sense to me, unless you mean something along the line of a
> > process scheduler which schedules a process not based on time slices but based
> > on energy consumed, ie if you want to define a time slice not in milli-seconds
> > but in Joule.
> 
> Actually there is some research being done in this direction, but it's
> way too early to draw any conclusions...
> 
> > If so, I would argue that a similar behavior could be achieved by varying the
> > duration of time slices with the current CPU speed, or simply by using cycle
> > count instead of time as time slice parameter. Not that I am sure if such an
> > approach would really be of interest for anyone. 
> > 
> > Or do you really mean power, not energy, such as in "reduce CPU speed if its
> > power consumption is above X Watt" ?
> 
> Uh. To be completely honest I must answer: I'm not sure how the "energy
> aware" cpufreq governor is supposed to work. I have been simply asked to
> provide the data in some standard way, if possible.
> 
> > I am not sure how this would be expected to work. hwmon is, by its very nature,
> > a passive subsystem: It doesn't do anything unless data is explicitly requested
> > from it. It does not update an attribute unless that attribute is read.
> > That does not seem to fit well with the idea of tracing - which assumes
> > that some activity is happening, ultimately, all by itself, presumably
> > periodically. The idea to have a user space application read hwmon data only
> > for it to trigger trace events does not seem to be very compelling to me.
> 
> What I had in mind was similar to what adt7470 driver does. The driver
> would automatically access the device every now and then to update it's
> internal state and generate the trace event on the way. This
> auto-refresh "feature" is particularly appealing for me, as on some of
> "my" platforms can take up to 500 microseconds to actually get the data.
> So doing this in background (and providing users with the last known
> value in the meantime) seems attractive.
> 
A bad example doesn't mean it should be used elsewhere.

adt7470 needs up to two seconds for a temperature measurement cycle, and it
can not perform automatic cycles all by itself. In this context, executing
temperature measurement cycles in the background makes a lot of sense,
especially since one does not want to wait for two seconds when reading
a sysfs attribute.

But that only means that the chip is most likely not a good choice when selecting
a temperature sensor, not that the code necessary to get it working should be used
as an example for other drivers. 

Guenter

> > An exception is if a monitoring device suppports interrupts, and if its driver
> > actually implements those interrupts. This is, however, not the case for most of
> > the current drivers (if any), mostly because interrupt support for hardware
> > monitoring devices is very platform dependent and thus difficult to implement.
> 
> Interestingly enough the newest version of our platform control micro
> (doing the energy monitoring as well) can generate and interrupt when a
> transaction is finished, so I was planning to periodically update the
> all sort of values. And again, generating a trace event on this
> opportunity would be trivial.
> 
> > > Of course a particular driver could register its own perf PMU on its
> > > own. It's certainly an option, just very suboptimal in my opinion.
> > > Or maybe not? Maybe the task is so specialized that it makes sense?
> > > 
> > We had a couple of attempts to provide an in-kernel API. Unfortunately,
> > the result was, at least so far, more complexity on the driver side.
> > So the difficulty is really to define an API which is really simple, and does
> > not just complicate driver development for a (presumably) rare use case.
> 
> Yes, I appreciate this. That's why this option is actually my least
> favourite. Anyway, what I was thinking about was just a thin shin that
> *can* be used by a driver to register some particular value with the
> core (so it can be enumerated and accessed by in-kernel clients) and the
> core could (or not) create a sysfs attribute for this value on behalf of
> the driver. Seems lightweight enough, unless previous experience
> suggests otherwise?
> 
> Cheers!
> 
> Paweł
> 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 20:01       ` Guenter Roeck
  0 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2012-10-24 20:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 24, 2012 at 05:37:27PM +0100, Pawel Moll wrote:
> On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote:
> > > Traditionally such data should be exposed to the user via hwmon sysfs
> > > interface, and that's exactly what I did for "my" platform - I have
> > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > > enough to draw pretty graphs in userspace. Everyone was happy...
> > > 
> > Only driver supporting "energy" output so far is ibmaem, and the reported energy
> > is supposed to be cumulative, as in energy = power * time. Do you mean power,
> > possibly ?
> 
> So the vexpress would be the second one, than :-) as the energy
> "monitor" actually on the latest tiles reports 64-bit value of
> microJoules consumed (or produced) since the power-up.
> 
> Some of the older boards were able to report instant power, but this
> metrics is less useful in our case.
> 
> > > Now I am getting new requests to do more with this data. In particular
> > > I'm asked how to add such information to ftrace/perf output. The second
> > > most frequent request is about providing it to a "energy aware"
> > > cpufreq governor.
> > 
> > Anything energy related would have to be along the line of "do something after a
> > certain amount of work has been performed", which at least at the surface does
> > not make much sense to me, unless you mean something along the line of a
> > process scheduler which schedules a process not based on time slices but based
> > on energy consumed, ie if you want to define a time slice not in milli-seconds
> > but in Joule.
> 
> Actually there is some research being done in this direction, but it's
> way too early to draw any conclusions...
> 
> > If so, I would argue that a similar behavior could be achieved by varying the
> > duration of time slices with the current CPU speed, or simply by using cycle
> > count instead of time as time slice parameter. Not that I am sure if such an
> > approach would really be of interest for anyone. 
> > 
> > Or do you really mean power, not energy, such as in "reduce CPU speed if its
> > power consumption is above X Watt" ?
> 
> Uh. To be completely honest I must answer: I'm not sure how the "energy
> aware" cpufreq governor is supposed to work. I have been simply asked to
> provide the data in some standard way, if possible.
> 
> > I am not sure how this would be expected to work. hwmon is, by its very nature,
> > a passive subsystem: It doesn't do anything unless data is explicitly requested
> > from it. It does not update an attribute unless that attribute is read.
> > That does not seem to fit well with the idea of tracing - which assumes
> > that some activity is happening, ultimately, all by itself, presumably
> > periodically. The idea to have a user space application read hwmon data only
> > for it to trigger trace events does not seem to be very compelling to me.
> 
> What I had in mind was similar to what adt7470 driver does. The driver
> would automatically access the device every now and then to update it's
> internal state and generate the trace event on the way. This
> auto-refresh "feature" is particularly appealing for me, as on some of
> "my" platforms can take up to 500 microseconds to actually get the data.
> So doing this in background (and providing users with the last known
> value in the meantime) seems attractive.
> 
A bad example doesn't mean it should be used elsewhere.

adt7470 needs up to two seconds for a temperature measurement cycle, and it
can not perform automatic cycles all by itself. In this context, executing
temperature measurement cycles in the background makes a lot of sense,
especially since one does not want to wait for two seconds when reading
a sysfs attribute.

But that only means that the chip is most likely not a good choice when selecting
a temperature sensor, not that the code necessary to get it working should be used
as an example for other drivers. 

Guenter

> > An exception is if a monitoring device suppports interrupts, and if its driver
> > actually implements those interrupts. This is, however, not the case for most of
> > the current drivers (if any), mostly because interrupt support for hardware
> > monitoring devices is very platform dependent and thus difficult to implement.
> 
> Interestingly enough the newest version of our platform control micro
> (doing the energy monitoring as well) can generate and interrupt when a
> transaction is finished, so I was planning to periodically update the
> all sort of values. And again, generating a trace event on this
> opportunity would be trivial.
> 
> > > Of course a particular driver could register its own perf PMU on its
> > > own. It's certainly an option, just very suboptimal in my opinion.
> > > Or maybe not? Maybe the task is so specialized that it makes sense?
> > > 
> > We had a couple of attempts to provide an in-kernel API. Unfortunately,
> > the result was, at least so far, more complexity on the driver side.
> > So the difficulty is really to define an API which is really simple, and does
> > not just complicate driver development for a (presumably) rare use case.
> 
> Yes, I appreciate this. That's why this option is actually my least
> favourite. Anyway, what I was thinking about was just a thin shin that
> *can* be used by a driver to register some particular value with the
> core (so it can be enumerated and accessed by in-kernel clients) and the
> core could (or not) create a sysfs attribute for this value on behalf of
> the driver. Seems lightweight enough, unless previous experience
> suggests otherwise?
> 
> Cheers!
> 
> Pawe?
> 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
@ 2012-10-24 20:01       ` Guenter Roeck
  0 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2012-10-24 20:01 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Amit Daniel Kachhap, Zhang Rui, Viresh Kumar, Daniel Lezcano,
	Jean Delvare, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Jesper Juhl, Thomas Renninger, Jean Pihet, linux-kernel,
	linux-arm-kernel, lm-sensors, linaro-dev

T24gV2VkLCBPY3QgMjQsIDIwMTIgYXQgMDU6Mzc6MjdQTSArMDEwMCwgUGF3ZWwgTW9sbCB3cm90
ZToKPiBPbiBUdWUsIDIwMTItMTAtMjMgYXQgMjM6MDIgKzAxMDAsIEd1ZW50ZXIgUm9lY2sgd3Jv
dGU6Cj4gPiA+IFRyYWRpdGlvbmFsbHkgc3VjaCBkYXRhIHNob3VsZCBiZSBleHBvc2VkIHRvIHRo
ZSB1c2VyIHZpYSBod21vbiBzeXNmcwo+ID4gPiBpbnRlcmZhY2UsIGFuZCB0aGF0J3MgZXhhY3Rs
eSB3aGF0IEkgZGlkIGZvciAibXkiIHBsYXRmb3JtIC0gSSBoYXZlCj4gPiA+IGEgL3N5cy9jbGFz
cy9od21vbi9od21vbiovZGV2aWNlL2VuZXJneSpfaW5wdXQgYW5kIHRoaXMgd2FzIGdvb2QKPiA+
ID4gZW5vdWdoIHRvIGRyYXcgcHJldHR5IGdyYXBocyBpbiB1c2Vyc3BhY2UuIEV2ZXJ5b25lIHdh
cyBoYXBweS4uLgo+ID4gPiAKPiA+IE9ubHkgZHJpdmVyIHN1cHBvcnRpbmcgImVuZXJneSIgb3V0
cHV0IHNvIGZhciBpcyBpYm1hZW0sIGFuZCB0aGUgcmVwb3J0ZWQgZW5lcmd5Cj4gPiBpcyBzdXBw
b3NlZCB0byBiZSBjdW11bGF0aXZlLCBhcyBpbiBlbmVyZ3kgPSBwb3dlciAqIHRpbWUuIERvIHlv
dSBtZWFuIHBvd2VyLAo+ID4gcG9zc2libHkgPwo+IAo+IFNvIHRoZSB2ZXhwcmVzcyB3b3VsZCBi
ZSB0aGUgc2Vjb25kIG9uZSwgdGhhbiA6LSkgYXMgdGhlIGVuZXJneQo+ICJtb25pdG9yIiBhY3R1
YWxseSBvbiB0aGUgbGF0ZXN0IHRpbGVzIHJlcG9ydHMgNjQtYml0IHZhbHVlIG9mCj4gbWljcm9K
b3VsZXMgY29uc3VtZWQgKG9yIHByb2R1Y2VkKSBzaW5jZSB0aGUgcG93ZXItdXAuCj4gCj4gU29t
ZSBvZiB0aGUgb2xkZXIgYm9hcmRzIHdlcmUgYWJsZSB0byByZXBvcnQgaW5zdGFudCBwb3dlciwg
YnV0IHRoaXMKPiBtZXRyaWNzIGlzIGxlc3MgdXNlZnVsIGluIG91ciBjYXNlLgo+IAo+ID4gPiBO
b3cgSSBhbSBnZXR0aW5nIG5ldyByZXF1ZXN0cyB0byBkbyBtb3JlIHdpdGggdGhpcyBkYXRhLiBJ
biBwYXJ0aWN1bGFyCj4gPiA+IEknbSBhc2tlZCBob3cgdG8gYWRkIHN1Y2ggaW5mb3JtYXRpb24g
dG8gZnRyYWNlL3BlcmYgb3V0cHV0LiBUaGUgc2Vjb25kCj4gPiA+IG1vc3QgZnJlcXVlbnQgcmVx
dWVzdCBpcyBhYm91dCBwcm92aWRpbmcgaXQgdG8gYSAiZW5lcmd5IGF3YXJlIgo+ID4gPiBjcHVm
cmVxIGdvdmVybm9yLgo+ID4gCj4gPiBBbnl0aGluZyBlbmVyZ3kgcmVsYXRlZCB3b3VsZCBoYXZl
IHRvIGJlIGFsb25nIHRoZSBsaW5lIG9mICJkbyBzb21ldGhpbmcgYWZ0ZXIgYQo+ID4gY2VydGFp
biBhbW91bnQgb2Ygd29yayBoYXMgYmVlbiBwZXJmb3JtZWQiLCB3aGljaCBhdCBsZWFzdCBhdCB0
aGUgc3VyZmFjZSBkb2VzCj4gPiBub3QgbWFrZSBtdWNoIHNlbnNlIHRvIG1lLCB1bmxlc3MgeW91
IG1lYW4gc29tZXRoaW5nIGFsb25nIHRoZSBsaW5lIG9mIGEKPiA+IHByb2Nlc3Mgc2NoZWR1bGVy
IHdoaWNoIHNjaGVkdWxlcyBhIHByb2Nlc3Mgbm90IGJhc2VkIG9uIHRpbWUgc2xpY2VzIGJ1dCBi
YXNlZAo+ID4gb24gZW5lcmd5IGNvbnN1bWVkLCBpZSBpZiB5b3Ugd2FudCB0byBkZWZpbmUgYSB0
aW1lIHNsaWNlIG5vdCBpbiBtaWxsaS1zZWNvbmRzCj4gPiBidXQgaW4gSm91bGUuCj4gCj4gQWN0
dWFsbHkgdGhlcmUgaXMgc29tZSByZXNlYXJjaCBiZWluZyBkb25lIGluIHRoaXMgZGlyZWN0aW9u
LCBidXQgaXQncwo+IHdheSB0b28gZWFybHkgdG8gZHJhdyBhbnkgY29uY2x1c2lvbnMuLi4KPiAK
PiA+IElmIHNvLCBJIHdvdWxkIGFyZ3VlIHRoYXQgYSBzaW1pbGFyIGJlaGF2aW9yIGNvdWxkIGJl
IGFjaGlldmVkIGJ5IHZhcnlpbmcgdGhlCj4gPiBkdXJhdGlvbiBvZiB0aW1lIHNsaWNlcyB3aXRo
IHRoZSBjdXJyZW50IENQVSBzcGVlZCwgb3Igc2ltcGx5IGJ5IHVzaW5nIGN5Y2xlCj4gPiBjb3Vu
dCBpbnN0ZWFkIG9mIHRpbWUgYXMgdGltZSBzbGljZSBwYXJhbWV0ZXIuIE5vdCB0aGF0IEkgYW0g
c3VyZSBpZiBzdWNoIGFuCj4gPiBhcHByb2FjaCB3b3VsZCByZWFsbHkgYmUgb2YgaW50ZXJlc3Qg
Zm9yIGFueW9uZS4gCj4gPiAKPiA+IE9yIGRvIHlvdSByZWFsbHkgbWVhbiBwb3dlciwgbm90IGVu
ZXJneSwgc3VjaCBhcyBpbiAicmVkdWNlIENQVSBzcGVlZCBpZiBpdHMKPiA+IHBvd2VyIGNvbnN1
bXB0aW9uIGlzIGFib3ZlIFggV2F0dCIgPwo+IAo+IFVoLiBUbyBiZSBjb21wbGV0ZWx5IGhvbmVz
dCBJIG11c3QgYW5zd2VyOiBJJ20gbm90IHN1cmUgaG93IHRoZSAiZW5lcmd5Cj4gYXdhcmUiIGNw
dWZyZXEgZ292ZXJub3IgaXMgc3VwcG9zZWQgdG8gd29yay4gSSBoYXZlIGJlZW4gc2ltcGx5IGFz
a2VkIHRvCj4gcHJvdmlkZSB0aGUgZGF0YSBpbiBzb21lIHN0YW5kYXJkIHdheSwgaWYgcG9zc2li
bGUuCj4gCj4gPiBJIGFtIG5vdCBzdXJlIGhvdyB0aGlzIHdvdWxkIGJlIGV4cGVjdGVkIHRvIHdv
cmsuIGh3bW9uIGlzLCBieSBpdHMgdmVyeSBuYXR1cmUsCj4gPiBhIHBhc3NpdmUgc3Vic3lzdGVt
OiBJdCBkb2Vzbid0IGRvIGFueXRoaW5nIHVubGVzcyBkYXRhIGlzIGV4cGxpY2l0bHkgcmVxdWVz
dGVkCj4gPiBmcm9tIGl0LiBJdCBkb2VzIG5vdCB1cGRhdGUgYW4gYXR0cmlidXRlIHVubGVzcyB0
aGF0IGF0dHJpYnV0ZSBpcyByZWFkLgo+ID4gVGhhdCBkb2VzIG5vdCBzZWVtIHRvIGZpdCB3ZWxs
IHdpdGggdGhlIGlkZWEgb2YgdHJhY2luZyAtIHdoaWNoIGFzc3VtZXMKPiA+IHRoYXQgc29tZSBh
Y3Rpdml0eSBpcyBoYXBwZW5pbmcsIHVsdGltYXRlbHksIGFsbCBieSBpdHNlbGYsIHByZXN1bWFi
bHkKPiA+IHBlcmlvZGljYWxseS4gVGhlIGlkZWEgdG8gaGF2ZSBhIHVzZXIgc3BhY2UgYXBwbGlj
YXRpb24gcmVhZCBod21vbiBkYXRhIG9ubHkKPiA+IGZvciBpdCB0byB0cmlnZ2VyIHRyYWNlIGV2
ZW50cyBkb2VzIG5vdCBzZWVtIHRvIGJlIHZlcnkgY29tcGVsbGluZyB0byBtZS4KPiAKPiBXaGF0
IEkgaGFkIGluIG1pbmQgd2FzIHNpbWlsYXIgdG8gd2hhdCBhZHQ3NDcwIGRyaXZlciBkb2VzLiBU
aGUgZHJpdmVyCj4gd291bGQgYXV0b21hdGljYWxseSBhY2Nlc3MgdGhlIGRldmljZSBldmVyeSBu
b3cgYW5kIHRoZW4gdG8gdXBkYXRlIGl0J3MKPiBpbnRlcm5hbCBzdGF0ZSBhbmQgZ2VuZXJhdGUg
dGhlIHRyYWNlIGV2ZW50IG9uIHRoZSB3YXkuIFRoaXMKPiBhdXRvLXJlZnJlc2ggImZlYXR1cmUi
IGlzIHBhcnRpY3VsYXJseSBhcHBlYWxpbmcgZm9yIG1lLCBhcyBvbiBzb21lIG9mCj4gIm15IiBw
bGF0Zm9ybXMgY2FuIHRha2UgdXAgdG8gNTAwIG1pY3Jvc2Vjb25kcyB0byBhY3R1YWxseSBnZXQg
dGhlIGRhdGEuCj4gU28gZG9pbmcgdGhpcyBpbiBiYWNrZ3JvdW5kIChhbmQgcHJvdmlkaW5nIHVz
ZXJzIHdpdGggdGhlIGxhc3Qga25vd24KPiB2YWx1ZSBpbiB0aGUgbWVhbnRpbWUpIHNlZW1zIGF0
dHJhY3RpdmUuCj4gCkEgYmFkIGV4YW1wbGUgZG9lc24ndCBtZWFuIGl0IHNob3VsZCBiZSB1c2Vk
IGVsc2V3aGVyZS4KCmFkdDc0NzAgbmVlZHMgdXAgdG8gdHdvIHNlY29uZHMgZm9yIGEgdGVtcGVy
YXR1cmUgbWVhc3VyZW1lbnQgY3ljbGUsIGFuZCBpdApjYW4gbm90IHBlcmZvcm0gYXV0b21hdGlj
IGN5Y2xlcyBhbGwgYnkgaXRzZWxmLiBJbiB0aGlzIGNvbnRleHQsIGV4ZWN1dGluZwp0ZW1wZXJh
dHVyZSBtZWFzdXJlbWVudCBjeWNsZXMgaW4gdGhlIGJhY2tncm91bmQgbWFrZXMgYSBsb3Qgb2Yg
c2Vuc2UsCmVzcGVjaWFsbHkgc2luY2Ugb25lIGRvZXMgbm90IHdhbnQgdG8gd2FpdCBmb3IgdHdv
IHNlY29uZHMgd2hlbiByZWFkaW5nCmEgc3lzZnMgYXR0cmlidXRlLgoKQnV0IHRoYXQgb25seSBt
ZWFucyB0aGF0IHRoZSBjaGlwIGlzIG1vc3QgbGlrZWx5IG5vdCBhIGdvb2QgY2hvaWNlIHdoZW4g
c2VsZWN0aW5nCmEgdGVtcGVyYXR1cmUgc2Vuc29yLCBub3QgdGhhdCB0aGUgY29kZSBuZWNlc3Nh
cnkgdG8gZ2V0IGl0IHdvcmtpbmcgc2hvdWxkIGJlIHVzZWQKYXMgYW4gZXhhbXBsZSBmb3Igb3Ro
ZXIgZHJpdmVycy4gCgpHdWVudGVyCgo+ID4gQW4gZXhjZXB0aW9uIGlzIGlmIGEgbW9uaXRvcmlu
ZyBkZXZpY2Ugc3VwcHBvcnRzIGludGVycnVwdHMsIGFuZCBpZiBpdHMgZHJpdmVyCj4gPiBhY3R1
YWxseSBpbXBsZW1lbnRzIHRob3NlIGludGVycnVwdHMuIFRoaXMgaXMsIGhvd2V2ZXIsIG5vdCB0
aGUgY2FzZSBmb3IgbW9zdCBvZgo+ID4gdGhlIGN1cnJlbnQgZHJpdmVycyAoaWYgYW55KSwgbW9z
dGx5IGJlY2F1c2UgaW50ZXJydXB0IHN1cHBvcnQgZm9yIGhhcmR3YXJlCj4gPiBtb25pdG9yaW5n
IGRldmljZXMgaXMgdmVyeSBwbGF0Zm9ybSBkZXBlbmRlbnQgYW5kIHRodXMgZGlmZmljdWx0IHRv
IGltcGxlbWVudC4KPiAKPiBJbnRlcmVzdGluZ2x5IGVub3VnaCB0aGUgbmV3ZXN0IHZlcnNpb24g
b2Ygb3VyIHBsYXRmb3JtIGNvbnRyb2wgbWljcm8KPiAoZG9pbmcgdGhlIGVuZXJneSBtb25pdG9y
aW5nIGFzIHdlbGwpIGNhbiBnZW5lcmF0ZSBhbmQgaW50ZXJydXB0IHdoZW4gYQo+IHRyYW5zYWN0
aW9uIGlzIGZpbmlzaGVkLCBzbyBJIHdhcyBwbGFubmluZyB0byBwZXJpb2RpY2FsbHkgdXBkYXRl
IHRoZQo+IGFsbCBzb3J0IG9mIHZhbHVlcy4gQW5kIGFnYWluLCBnZW5lcmF0aW5nIGEgdHJhY2Ug
ZXZlbnQgb24gdGhpcwo+IG9wcG9ydHVuaXR5IHdvdWxkIGJlIHRyaXZpYWwuCj4gCj4gPiA+IE9m
IGNvdXJzZSBhIHBhcnRpY3VsYXIgZHJpdmVyIGNvdWxkIHJlZ2lzdGVyIGl0cyBvd24gcGVyZiBQ
TVUgb24gaXRzCj4gPiA+IG93bi4gSXQncyBjZXJ0YWlubHkgYW4gb3B0aW9uLCBqdXN0IHZlcnkg
c3Vib3B0aW1hbCBpbiBteSBvcGluaW9uLgo+ID4gPiBPciBtYXliZSBub3Q/IE1heWJlIHRoZSB0
YXNrIGlzIHNvIHNwZWNpYWxpemVkIHRoYXQgaXQgbWFrZXMgc2Vuc2U/Cj4gPiA+IAo+ID4gV2Ug
aGFkIGEgY291cGxlIG9mIGF0dGVtcHRzIHRvIHByb3ZpZGUgYW4gaW4ta2VybmVsIEFQSS4gVW5m
b3J0dW5hdGVseSwKPiA+IHRoZSByZXN1bHQgd2FzLCBhdCBsZWFzdCBzbyBmYXIsIG1vcmUgY29t
cGxleGl0eSBvbiB0aGUgZHJpdmVyIHNpZGUuCj4gPiBTbyB0aGUgZGlmZmljdWx0eSBpcyByZWFs
bHkgdG8gZGVmaW5lIGFuIEFQSSB3aGljaCBpcyByZWFsbHkgc2ltcGxlLCBhbmQgZG9lcwo+ID4g
bm90IGp1c3QgY29tcGxpY2F0ZSBkcml2ZXIgZGV2ZWxvcG1lbnQgZm9yIGEgKHByZXN1bWFibHkp
IHJhcmUgdXNlIGNhc2UuCj4gCj4gWWVzLCBJIGFwcHJlY2lhdGUgdGhpcy4gVGhhdCdzIHdoeSB0
aGlzIG9wdGlvbiBpcyBhY3R1YWxseSBteSBsZWFzdAo+IGZhdm91cml0ZS4gQW55d2F5LCB3aGF0
IEkgd2FzIHRoaW5raW5nIGFib3V0IHdhcyBqdXN0IGEgdGhpbiBzaGluIHRoYXQKPiAqY2FuKiBi
ZSB1c2VkIGJ5IGEgZHJpdmVyIHRvIHJlZ2lzdGVyIHNvbWUgcGFydGljdWxhciB2YWx1ZSB3aXRo
IHRoZQo+IGNvcmUgKHNvIGl0IGNhbiBiZSBlbnVtZXJhdGVkIGFuZCBhY2Nlc3NlZCBieSBpbi1r
ZXJuZWwgY2xpZW50cykgYW5kIHRoZQo+IGNvcmUgY291bGQgKG9yIG5vdCkgY3JlYXRlIGEgc3lz
ZnMgYXR0cmlidXRlIGZvciB0aGlzIHZhbHVlIG9uIGJlaGFsZiBvZgo+IHRoZSBkcml2ZXIuIFNl
ZW1zIGxpZ2h0d2VpZ2h0IGVub3VnaCwgdW5sZXNzIHByZXZpb3VzIGV4cGVyaWVuY2UKPiBzdWdn
ZXN0cyBvdGhlcndpc2U/Cj4gCj4gQ2hlZXJzIQo+IAo+IFBhd2XFggo+IAo+IAo+IAoKX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbG0tc2Vuc29ycyBtYWls
aW5nIGxpc3QKbG0tc2Vuc29yc0BsbS1zZW5zb3JzLm9yZwpodHRwOi8vbGlzdHMubG0tc2Vuc29y
cy5vcmcvbWFpbG1hbi9saXN0aW5mby9sbS1zZW5zb3Jz

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2012-10-24 20:01 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-23 17:30 [RFC] Energy/power monitoring within the kernel Pawel Moll
2012-10-23 17:30 ` [lm-sensors] " Pawel Moll
2012-10-23 17:30 ` Pawel Moll
2012-10-23 17:43 ` Steven Rostedt
2012-10-23 17:43   ` [lm-sensors] " Steven Rostedt
2012-10-23 17:43   ` Steven Rostedt
2012-10-24 16:00   ` Pawel Moll
2012-10-24 16:00     ` [lm-sensors] " Pawel Moll
2012-10-24 16:00     ` Pawel Moll
2012-10-23 18:49 ` Andy Green
2012-10-23 18:49   ` [lm-sensors] " Andy Green
2012-10-23 18:49   ` Andy Green
2012-10-24 16:05   ` Pawel Moll
2012-10-24 16:05     ` [lm-sensors] " Pawel Moll
2012-10-24 16:05     ` Pawel Moll
2012-10-23 22:02 ` Guenter Roeck
2012-10-23 22:02   ` [lm-sensors] " Guenter Roeck
2012-10-23 22:02   ` Guenter Roeck
2012-10-24 16:37   ` Pawel Moll
2012-10-24 16:37     ` [lm-sensors] " Pawel Moll
2012-10-24 16:37     ` Pawel Moll
2012-10-24 20:01     ` Guenter Roeck
2012-10-24 20:01       ` [lm-sensors] " Guenter Roeck
2012-10-24 20:01       ` Guenter Roeck
2012-10-24  0:40 ` Thomas Renninger
2012-10-24  0:40   ` [lm-sensors] " Thomas Renninger
2012-10-24  0:40   ` Thomas Renninger
2012-10-24 16:51   ` Pawel Moll
2012-10-24 16:51     ` [lm-sensors] " Pawel Moll
2012-10-24 16:51     ` Pawel Moll
2012-10-24  0:41 ` Thomas Renninger
2012-10-24  0:41   ` [lm-sensors] " Thomas Renninger
2012-10-24  0:41   ` Thomas Renninger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.