From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757922Ab2JXQvy (ORCPT ); Wed, 24 Oct 2012 12:51:54 -0400 Received: from service87.mimecast.com ([91.220.42.44]:50385 "EHLO service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753075Ab2JXQvw convert rfc822-to-8bit (ORCPT ); Wed, 24 Oct 2012 12:51:52 -0400 Message-ID: <1351097507.23327.78.camel@hornet> Subject: Re: [RFC] Energy/power monitoring within the kernel From: Pawel Moll To: Thomas Renninger Cc: Amit Daniel Kachhap , Zhang Rui , Viresh Kumar , Daniel Lezcano , Jean Delvare , Guenter Roeck , Steven Rostedt , Frederic Weisbecker , Ingo Molnar , Jesper Juhl , Jean Pihet , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "lm-sensors@lm-sensors.org" , "linaro-dev@lists.linaro.org" Date: Wed, 24 Oct 2012 17:51:47 +0100 In-Reply-To: <4317776.evLpJapyim@hammer82.arch.suse.de> References: <1351013449.9070.5.camel@hornet> <4317776.evLpJapyim@hammer82.arch.suse.de> X-Mailer: Evolution 3.6.0-0ubuntu3 Mime-Version: 1.0 X-OriginalArrivalTime: 24 Oct 2012 16:51:48.0443 (UTC) FILETIME=[DBB882B0:01CDB207] X-MC-Unique: 112102417515014101 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2012-10-24 at 01:40 +0100, Thomas Renninger wrote: > > More and more of people are getting interested in the subject of power > > (energy) consumption monitoring. We have some external tools like > > "battery simulators", energy probes etc., but some targets can measure > > their power usage on their own. > > > > Traditionally such data should be exposed to the user via hwmon sysfs > > interface, and that's exactly what I did for "my" platform - I have > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good > > enough to draw pretty graphs in userspace. Everyone was happy... > > > > Now I am getting new requests to do more with this data. In particular > > I'm asked how to add such information to ftrace/perf output. > Why? What is the gain? > > Perf events can be triggered at any point in the kernel. > A cpufreq event is triggered when the frequency gets changed. > CPU idle events are triggered when the kernel requests to enter an idle state > or exits one. > > When would you trigger a thermal or a power event? > There is the possibility of (critical) thermal limits. > But if I understand this correctly you want this for debugging and > I guess you have everything interesting one can do with temperature > values: > - read the temperature > - draw some nice graphs from the results > > Hm, I guess I know what you want to do: > In your temperature/energy graph, you want to have some dots > when relevant HW states (frequency, sleep states, DDR power,...) > changed. Then you are able to see the effects over a timeline. > > So you have to bring the existing frequency/idle perf events together > with temperature readings > > Cleanest solution could be to enhance the exisiting userspace apps > (pytimechart/perf timechart) and let them add another line > (temperature/energy), but the data would not come from perf, but > from sysfs/hwmon. > Not sure whether this works out with the timechart tools. > Anyway, this sounds like a userspace only problem. Ok, so it is actually what I'm working on right now. Not with the standard perf tool (there are other users of that API ;-) but indeed I'm trying to "enrich" the data stream coming from kernel with user-space originating values. I am a little bit concerned about effect of extra syscalls (accessing the value and gettimeofday to generate a timestamp) at a higher sampling rates, but most likely it won't be a problem. Can report once I know more, if this is of interest to anyone. Anyway, there are at least two debug/trace related use cases that can not be satisfied that way (of course one could argue about their usefulness): 1. ftrace-over-network (https://lwn.net/Articles/410200/) which is particularly appealing for "embedded users", where there's virtually no useful userspace available (think Android). Here a (functional) trace event is embedded into a normal trace and available "for free" at the host side. 2. perf groups - the general idea is that one event (let it be cycle counter interrupt or even a timer) triggers read of other values (eg. cache counter or - in this case - energy counter). The aim is to have a regular "snapshots" of the system state. I'm not sure if the standard perf tool can do this, but I do :-) And last, but not least, there are the non-debug/trace clients for energy data as discussed in other mails in this thread. Of course the trace event won't really satisfy their needs either. Thanks for your feedback! Paweł From mboxrd@z Thu Jan 1 00:00:00 1970 From: pawel.moll@arm.com (Pawel Moll) Date: Wed, 24 Oct 2012 17:51:47 +0100 Subject: [RFC] Energy/power monitoring within the kernel In-Reply-To: <4317776.evLpJapyim@hammer82.arch.suse.de> References: <1351013449.9070.5.camel@hornet> <4317776.evLpJapyim@hammer82.arch.suse.de> Message-ID: <1351097507.23327.78.camel@hornet> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, 2012-10-24 at 01:40 +0100, Thomas Renninger wrote: > > More and more of people are getting interested in the subject of power > > (energy) consumption monitoring. We have some external tools like > > "battery simulators", energy probes etc., but some targets can measure > > their power usage on their own. > > > > Traditionally such data should be exposed to the user via hwmon sysfs > > interface, and that's exactly what I did for "my" platform - I have > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good > > enough to draw pretty graphs in userspace. Everyone was happy... > > > > Now I am getting new requests to do more with this data. In particular > > I'm asked how to add such information to ftrace/perf output. > Why? What is the gain? > > Perf events can be triggered at any point in the kernel. > A cpufreq event is triggered when the frequency gets changed. > CPU idle events are triggered when the kernel requests to enter an idle state > or exits one. > > When would you trigger a thermal or a power event? > There is the possibility of (critical) thermal limits. > But if I understand this correctly you want this for debugging and > I guess you have everything interesting one can do with temperature > values: > - read the temperature > - draw some nice graphs from the results > > Hm, I guess I know what you want to do: > In your temperature/energy graph, you want to have some dots > when relevant HW states (frequency, sleep states, DDR power,...) > changed. Then you are able to see the effects over a timeline. > > So you have to bring the existing frequency/idle perf events together > with temperature readings > > Cleanest solution could be to enhance the exisiting userspace apps > (pytimechart/perf timechart) and let them add another line > (temperature/energy), but the data would not come from perf, but > from sysfs/hwmon. > Not sure whether this works out with the timechart tools. > Anyway, this sounds like a userspace only problem. Ok, so it is actually what I'm working on right now. Not with the standard perf tool (there are other users of that API ;-) but indeed I'm trying to "enrich" the data stream coming from kernel with user-space originating values. I am a little bit concerned about effect of extra syscalls (accessing the value and gettimeofday to generate a timestamp) at a higher sampling rates, but most likely it won't be a problem. Can report once I know more, if this is of interest to anyone. Anyway, there are at least two debug/trace related use cases that can not be satisfied that way (of course one could argue about their usefulness): 1. ftrace-over-network (https://lwn.net/Articles/410200/) which is particularly appealing for "embedded users", where there's virtually no useful userspace available (think Android). Here a (functional) trace event is embedded into a normal trace and available "for free" at the host side. 2. perf groups - the general idea is that one event (let it be cycle counter interrupt or even a timer) triggers read of other values (eg. cache counter or - in this case - energy counter). The aim is to have a regular "snapshots" of the system state. I'm not sure if the standard perf tool can do this, but I do :-) And last, but not least, there are the non-debug/trace clients for energy data as discussed in other mails in this thread. Of course the trace event won't really satisfy their needs either. Thanks for your feedback! Pawe? From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pawel Moll Date: Wed, 24 Oct 2012 16:51:47 +0000 Subject: Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel Message-Id: <1351097507.23327.78.camel@hornet> List-Id: References: <1351013449.9070.5.camel@hornet> <4317776.evLpJapyim@hammer82.arch.suse.de> In-Reply-To: <4317776.evLpJapyim@hammer82.arch.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Thomas Renninger Cc: Amit Daniel Kachhap , Zhang Rui , Viresh Kumar , Daniel Lezcano , Jean Delvare , Guenter Roeck , Steven Rostedt , Frederic Weisbecker , Ingo Molnar , Jesper Juhl , Jean Pihet , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "lm-sensors@lm-sensors.org" , "linaro-dev@lists.linaro.org" T24gV2VkLCAyMDEyLTEwLTI0IGF0IDAxOjQwICswMTAwLCBUaG9tYXMgUmVubmluZ2VyIHdyb3Rl Ogo+ID4gTW9yZSBhbmQgbW9yZSBvZiBwZW9wbGUgYXJlIGdldHRpbmcgaW50ZXJlc3RlZCBpbiB0 aGUgc3ViamVjdCBvZiBwb3dlcgo+ID4gKGVuZXJneSkgY29uc3VtcHRpb24gbW9uaXRvcmluZy4g V2UgaGF2ZSBzb21lIGV4dGVybmFsIHRvb2xzIGxpa2UKPiA+ICJiYXR0ZXJ5IHNpbXVsYXRvcnMi LCBlbmVyZ3kgcHJvYmVzIGV0Yy4sIGJ1dCBzb21lIHRhcmdldHMgY2FuIG1lYXN1cmUKPiA+IHRo ZWlyIHBvd2VyIHVzYWdlIG9uIHRoZWlyIG93bi4KPiA+IAo+ID4gVHJhZGl0aW9uYWxseSBzdWNo IGRhdGEgc2hvdWxkIGJlIGV4cG9zZWQgdG8gdGhlIHVzZXIgdmlhIGh3bW9uIHN5c2ZzCj4gPiBp bnRlcmZhY2UsIGFuZCB0aGF0J3MgZXhhY3RseSB3aGF0IEkgZGlkIGZvciAibXkiIHBsYXRmb3Jt IC0gSSBoYXZlCj4gPiBhIC9zeXMvY2xhc3MvaHdtb24vaHdtb24qL2RldmljZS9lbmVyZ3kqX2lu cHV0IGFuZCB0aGlzIHdhcyBnb29kCj4gPiBlbm91Z2ggdG8gZHJhdyBwcmV0dHkgZ3JhcGhzIGlu IHVzZXJzcGFjZS4gRXZlcnlvbmUgd2FzIGhhcHB5Li4uCj4gPiAKPiA+IE5vdyBJIGFtIGdldHRp bmcgbmV3IHJlcXVlc3RzIHRvIGRvIG1vcmUgd2l0aCB0aGlzIGRhdGEuIEluIHBhcnRpY3VsYXIK PiA+IEknbSBhc2tlZCBob3cgdG8gYWRkIHN1Y2ggaW5mb3JtYXRpb24gdG8gZnRyYWNlL3BlcmYg b3V0cHV0Lgo+IFdoeT8gV2hhdCBpcyB0aGUgZ2Fpbj8KPiAKPiBQZXJmIGV2ZW50cyBjYW4gYmUg dHJpZ2dlcmVkIGF0IGFueSBwb2ludCBpbiB0aGUga2VybmVsLgo+IEEgY3B1ZnJlcSBldmVudCBp cyB0cmlnZ2VyZWQgd2hlbiB0aGUgZnJlcXVlbmN5IGdldHMgY2hhbmdlZC4KPiBDUFUgaWRsZSBl dmVudHMgYXJlIHRyaWdnZXJlZCB3aGVuIHRoZSBrZXJuZWwgcmVxdWVzdHMgdG8gZW50ZXIgYW4g aWRsZSBzdGF0ZQo+IG9yIGV4aXRzIG9uZS4KPiAKPiBXaGVuIHdvdWxkIHlvdSB0cmlnZ2VyIGEg dGhlcm1hbCBvciBhIHBvd2VyIGV2ZW50Pwo+IFRoZXJlIGlzIHRoZSBwb3NzaWJpbGl0eSBvZiAo Y3JpdGljYWwpIHRoZXJtYWwgbGltaXRzLgo+IEJ1dCBpZiBJIHVuZGVyc3RhbmQgdGhpcyBjb3Jy ZWN0bHkgeW91IHdhbnQgdGhpcyBmb3IgZGVidWdnaW5nIGFuZAo+IEkgZ3Vlc3MgeW91IGhhdmUg ZXZlcnl0aGluZyBpbnRlcmVzdGluZyBvbmUgY2FuIGRvIHdpdGggdGVtcGVyYXR1cmUKPiB2YWx1 ZXM6Cj4gICAtIHJlYWQgdGhlIHRlbXBlcmF0dXJlCj4gICAtIGRyYXcgc29tZSBuaWNlIGdyYXBo cyBmcm9tIHRoZSByZXN1bHRzCj4gCj4gSG0sIEkgZ3Vlc3MgSSBrbm93IHdoYXQgeW91IHdhbnQg dG8gZG86Cj4gSW4geW91ciB0ZW1wZXJhdHVyZS9lbmVyZ3kgZ3JhcGgsIHlvdSB3YW50IHRvIGhh dmUgc29tZSBkb3RzCj4gd2hlbiByZWxldmFudCBIVyBzdGF0ZXMgKGZyZXF1ZW5jeSwgc2xlZXAg c3RhdGVzLCAgRERSIHBvd2VyLC4uLikKPiBjaGFuZ2VkLiBUaGVuIHlvdSBhcmUgYWJsZSB0byBz ZWUgdGhlIGVmZmVjdHMgb3ZlciBhIHRpbWVsaW5lLgo+IAo+IFNvIHlvdSBoYXZlIHRvIGJyaW5n IHRoZSBleGlzdGluZyBmcmVxdWVuY3kvaWRsZSBwZXJmIGV2ZW50cyB0b2dldGhlcgo+IHdpdGgg dGVtcGVyYXR1cmUgcmVhZGluZ3MKPiAKPiBDbGVhbmVzdCBzb2x1dGlvbiBjb3VsZCBiZSB0byBl bmhhbmNlIHRoZSBleGlzaXRpbmcgdXNlcnNwYWNlIGFwcHMKPiAocHl0aW1lY2hhcnQvcGVyZiB0 aW1lY2hhcnQpIGFuZCBsZXQgdGhlbSBhZGQgYW5vdGhlciBsaW5lCj4gKHRlbXBlcmF0dXJlL2Vu ZXJneSksIGJ1dCB0aGUgZGF0YSB3b3VsZCBub3QgY29tZSBmcm9tIHBlcmYsIGJ1dAo+IGZyb20g c3lzZnMvaHdtb24uCj4gTm90IHN1cmUgd2hldGhlciB0aGlzIHdvcmtzIG91dCB3aXRoIHRoZSB0 aW1lY2hhcnQgdG9vbHMuCj4gQW55d2F5LCB0aGlzIHNvdW5kcyBsaWtlIGEgdXNlcnNwYWNlIG9u bHkgcHJvYmxlbS4KCk9rLCBzbyBpdCBpcyBhY3R1YWxseSB3aGF0IEknbSB3b3JraW5nIG9uIHJp Z2h0IG5vdy4gTm90IHdpdGggdGhlCnN0YW5kYXJkIHBlcmYgdG9vbCAodGhlcmUgYXJlIG90aGVy IHVzZXJzIG9mIHRoYXQgQVBJIDstKSBidXQgaW5kZWVkIEknbQp0cnlpbmcgdG8gImVucmljaCIg dGhlIGRhdGEgc3RyZWFtIGNvbWluZyBmcm9tIGtlcm5lbCB3aXRoIHVzZXItc3BhY2UKb3JpZ2lu YXRpbmcgdmFsdWVzLiBJIGFtIGEgbGl0dGxlIGJpdCBjb25jZXJuZWQgYWJvdXQgZWZmZWN0IG9m IGV4dHJhCnN5c2NhbGxzIChhY2Nlc3NpbmcgdGhlIHZhbHVlIGFuZCBnZXR0aW1lb2ZkYXkgdG8g Z2VuZXJhdGUgYSB0aW1lc3RhbXApCmF0IGEgaGlnaGVyIHNhbXBsaW5nIHJhdGVzLCBidXQgbW9z dCBsaWtlbHkgaXQgd29uJ3QgYmUgYSBwcm9ibGVtLiBDYW4KcmVwb3J0IG9uY2UgSSBrbm93IG1v cmUsIGlmIHRoaXMgaXMgb2YgaW50ZXJlc3QgdG8gYW55b25lLgoKQW55d2F5LCB0aGVyZSBhcmUg YXQgbGVhc3QgdHdvIGRlYnVnL3RyYWNlIHJlbGF0ZWQgdXNlIGNhc2VzIHRoYXQgY2FuCm5vdCBi ZSBzYXRpc2ZpZWQgdGhhdCB3YXkgKG9mIGNvdXJzZSBvbmUgY291bGQgYXJndWUgYWJvdXQgdGhl aXIKdXNlZnVsbmVzcyk6CgoxLiBmdHJhY2Utb3Zlci1uZXR3b3JrIChodHRwczovL2x3bi5uZXQv QXJ0aWNsZXMvNDEwMjAwLykgd2hpY2ggaXMKcGFydGljdWxhcmx5IGFwcGVhbGluZyBmb3IgImVt YmVkZGVkIHVzZXJzIiwgd2hlcmUgdGhlcmUncyB2aXJ0dWFsbHkgbm8KdXNlZnVsIHVzZXJzcGFj ZSBhdmFpbGFibGUgKHRoaW5rIEFuZHJvaWQpLiBIZXJlIGEgKGZ1bmN0aW9uYWwpIHRyYWNlCmV2 ZW50IGlzIGVtYmVkZGVkIGludG8gYSBub3JtYWwgdHJhY2UgYW5kIGF2YWlsYWJsZSAiZm9yIGZy ZWUiIGF0IHRoZQpob3N0IHNpZGUuCgoyLiBwZXJmIGdyb3VwcyAtIHRoZSBnZW5lcmFsIGlkZWEg aXMgdGhhdCBvbmUgZXZlbnQgKGxldCBpdCBiZSBjeWNsZQpjb3VudGVyIGludGVycnVwdCBvciBl dmVuIGEgdGltZXIpIHRyaWdnZXJzIHJlYWQgb2Ygb3RoZXIgdmFsdWVzIChlZy4KY2FjaGUgY291 bnRlciBvciAtIGluIHRoaXMgY2FzZSAtIGVuZXJneSBjb3VudGVyKS4gVGhlIGFpbSBpcyB0byBo YXZlIGEKcmVndWxhciAic25hcHNob3RzIiBvZiB0aGUgc3lzdGVtIHN0YXRlLiBJJ20gbm90IHN1 cmUgaWYgdGhlIHN0YW5kYXJkCnBlcmYgdG9vbCBjYW4gZG8gdGhpcywgYnV0IEkgZG8gOi0pCgpB bmQgbGFzdCwgYnV0IG5vdCBsZWFzdCwgdGhlcmUgYXJlIHRoZSBub24tZGVidWcvdHJhY2UgY2xp ZW50cyBmb3IKZW5lcmd5IGRhdGEgYXMgZGlzY3Vzc2VkIGluIG90aGVyIG1haWxzIGluIHRoaXMg dGhyZWFkLiBPZiBjb3Vyc2UgdGhlCnRyYWNlIGV2ZW50IHdvbid0IHJlYWxseSBzYXRpc2Z5IHRo ZWlyIG5lZWRzIGVpdGhlci4KClRoYW5rcyBmb3IgeW91ciBmZWVkYmFjayEKClBhd2XFggoKCgpf X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsbS1zZW5zb3Jz IG1haWxpbmcgbGlzdApsbS1zZW5zb3JzQGxtLXNlbnNvcnMub3JnCmh0dHA6Ly9saXN0cy5sbS1z ZW5zb3JzLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xtLXNlbnNvcnM