From: Steven Rostedt <rostedt@goodmis.org>
To: Pawel Moll <pawel.moll@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@linaro.org>,
Zhang Rui <rui.zhang@intel.com>,
Viresh Kumar <viresh.kumar@linaro.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Jean Delvare <khali@linux-fr.org>,
Guenter Roeck <linux@roeck-us.net>,
Frederic Weisbecker <fweisbec@gmail.com>,
Ingo Molnar <mingo@elte.hu>, Jesper Juhl <jj@chaosbits.net>,
Thomas Renninger <trenn@suse.de>,
Jean Pihet <jean.pihet@newoldbits.com>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, lm-sensors@lm-sensors.org,
linaro-dev@lists.linaro.org
Subject: Re: [RFC] Energy/power monitoring within the kernel
Date: Tue, 23 Oct 2012 13:43:07 -0400 [thread overview]
Message-ID: <1351014187.8467.24.camel@gandalf.local.home> (raw)
In-Reply-To: <1351013449.9070.5.camel@hornet>
On Tue, 2012-10-23 at 18:30 +0100, Pawel Moll wrote:
>
> === Option 1: Trace event ===
>
> This seems to be the "cheapest" option. Simply defining a trace event
> that can be generated by a hwmon (or any other) driver makes the
> interesting data immediately available to any ftrace/perf user. Of
> course it doesn't really help with the cpufreq case, but seems to be
> a good place to start with.
>
> The question is how to define it... I've came up with two prototypes:
>
> = Generic hwmon trace event =
>
> This one allows any driver to generate a trace event whenever any
> "hwmon attribute" (measured value) gets updated. The rate at which the
> updates happen can be controlled by already existing "update_interval"
> attribute.
>
> 8<-------------------------------------------
> TRACE_EVENT(hwmon_attr_update,
> TP_PROTO(struct device *dev, struct attribute *attr, long long input),
> TP_ARGS(dev, attr, input),
>
> TP_STRUCT__entry(
> __string( dev, dev_name(dev))
> __string( attr, attr->name)
> __field( long long, input)
> ),
>
> TP_fast_assign(
> __assign_str(dev, dev_name(dev));
> __assign_str(attr, attr->name);
> __entry->input = input;
> ),
>
> TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input)
> );
> 8<-------------------------------------------
>
> It generates such ftrace message:
>
> <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361
>
> One issue with this is that some external knowledge is required to
> relate a number to a processor core. Or maybe it's not an issue at all
> because it should be left for the user(space)?
If the external knowledge can be characterized in a userspace tool with
the given data here, I see no issues with this.
>
> = CPU power/energy/temperature trace event =
>
> This one is designed to emphasize the relation between the measured
> value (whether it is energy, temperature or any other physical
> phenomena, really) and CPUs, so it is quite specific (too specific?)
>
> 8<-------------------------------------------
> TRACE_EVENT(cpus_environment,
> TP_PROTO(const struct cpumask *cpus, long long value, char unit),
> TP_ARGS(cpus, value, unit),
>
> TP_STRUCT__entry(
> __array( unsigned char, cpus, sizeof(struct cpumask))
> __field( long long, value)
> __field( char, unit)
> ),
>
> TP_fast_assign(
> memcpy(__entry->cpus, cpus, sizeof(struct cpumask));
Copying the entire cpumask seems like overkill. Especially when you have
4096 CPU machines.
> __entry->value = value;
> __entry->unit = unit;
> ),
>
> TP_printk("cpus %s %lld[%c]",
> __print_cpumask((struct cpumask *)__entry->cpus),
> __entry->value, __entry->unit)
> );
> 8<-------------------------------------------
>
> And the equivalent ftrace message is:
>
> <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C]
>
> It's a cpumask, not just single cpu id, because the sensor may measure
> the value per set of CPUs, eg. a temperature of the whole silicon die
> (so all the cores) or an energy consumed by a subset of cores (this
> is my particular use case - two meters monitor a cluster of two
> processors and a cluster of three processors, all working as a SMP
> system).
>
> Of course the cpus __array could be actually a special __cpumask field
> type (I've just hacked the __print_cpumask so far). And I've just
> realised that the unit field should actually be a string to allow unit
> prefixes to be specified (the above should obviously be "34361[mC]"
> not "[C]"). Also - excuse the "cpus_environment" name - this was the
> best I was able to come up with at the time and I'm eager to accept
> any alternative suggestions :-)
Perhaps making a field that can be a subset of cpus may be better. That
way we don't waste the ring buffer with lots of zeros. I'm guessing that
it will only be a group of cpus, and not a scattered list? Of course,
I've seen boxes where the cpu numbers went from core to core. That is,
cpu 0 was on core 1, cpu 1 was on core 2, and then it would repeat.
cpu 8 was on core 1, cpu 9 was on core 2, etc.
But still, this could be compressed somehow.
I'll let others comment on the rest.
-- Steve
next prev parent reply other threads:[~2012-10-23 17:43 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-23 17:30 [RFC] Energy/power monitoring within the kernel Pawel Moll
2012-10-23 17:43 ` Steven Rostedt [this message]
2012-10-24 16:00 ` Pawel Moll
2012-10-23 18:49 ` Andy Green
2012-10-24 16:05 ` Pawel Moll
2012-10-23 22:02 ` Guenter Roeck
2012-10-24 16:37 ` Pawel Moll
2012-10-24 20:01 ` Guenter Roeck
2012-10-24 0:40 ` Thomas Renninger
2012-10-24 16:51 ` Pawel Moll
2012-10-24 0:41 ` Thomas Renninger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1351014187.8467.24.camel@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=amit.kachhap@linaro.org \
--cc=daniel.lezcano@linaro.org \
--cc=fweisbec@gmail.com \
--cc=jean.pihet@newoldbits.com \
--cc=jj@chaosbits.net \
--cc=khali@linux-fr.org \
--cc=linaro-dev@lists.linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=lm-sensors@lm-sensors.org \
--cc=mingo@elte.hu \
--cc=pawel.moll@arm.com \
--cc=rui.zhang@intel.com \
--cc=trenn@suse.de \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).