* [RFC PATCH 0/3] perf: show package power consumption in perf @ 2010-08-18 7:59 Zhang Rui 2010-08-18 12:25 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Zhang Rui @ 2010-08-18 7:59 UTC (permalink / raw) To: peterz Cc: LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Lin Ming, Brown, Len, Matthew Garrett, Zhang, Rui Hi, all, RAPL(running average power limit) is a new feature which provides mechanisms to enforce power consumption limit, on some new processors. Generally speaking, by using RAPL, OS can set a power budget in a certain time window, and let Hardware to throttle the processor P/T-state to meet this energy limitation. RAPL also provides a new MSR, i.e. MSR_PKG_ENERGY_STATUS, which reports the total amount of energy consumed by the package. I'm not sure if to support RAPL or not, but anyway, it sounds like a good idea to export the energy status in perf. So a new perf pmu and event to show the package energy consumed is introduced in this patch. Here is what I get after applying the three patches, #./perf stat -e energy test Performance counter stats for 'test': 202 Joules cost by package 7.926001238 seconds time elapsed Note that this patch set is made based on Peter's perf-pmu branch, git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git which provides better interfaces to register/unregister a new pmu. any comment are welcome. :) thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-18 7:59 [RFC PATCH 0/3] perf: show package power consumption in perf Zhang Rui @ 2010-08-18 12:25 ` Peter Zijlstra 2010-08-18 12:41 ` Matt Fleming 2010-08-19 2:43 ` Lin Ming 0 siblings, 2 replies; 20+ messages in thread From: Peter Zijlstra @ 2010-08-18 12:25 UTC (permalink / raw) To: Zhang Rui Cc: LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Lin Ming, Brown, Len, Matthew Garrett, Matt Fleming On Wed, 2010-08-18 at 15:59 +0800, Zhang Rui wrote: > Hi, all, > > RAPL(running average power limit) is a new feature which provides > mechanisms to enforce power consumption limit, on some new processors. > > Generally speaking, by using RAPL, OS can set a power budget in a > certain time window, and let Hardware to throttle the processor > P/T-state to meet this energy limitation. > > RAPL also provides a new MSR, i.e. MSR_PKG_ENERGY_STATUS, which reports > the total amount of energy consumed by the package. > > I'm not sure if to support RAPL or not, but anyway, it sounds like a > good idea to export the energy status in perf. > > So a new perf pmu and event to show the package energy consumed is > introduced in this patch. > > Here is what I get after applying the three patches, > > #./perf stat -e energy test > Performance counter stats for 'test': > > 202 Joules cost by package > 7.926001238 seconds time elapsed > > > Note that this patch set is made based on Peter's perf-pmu branch, > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git > which provides better interfaces to register/unregister a new pmu. > > any comment are welcome. :) Nice,.. however: - if it is a pure read-only counter without sampling support, expose it as such, don't fudge in the hrtimer stuff. Simply fail to create a sampling event. SH has the same problem for its 'normal' PMU, the solution is to use event groups, Matt was looking at adding support to perf-record for that, if creating a sampling event fails, fall back to {hrtimer, $event} groups. - since its a free-running, non-configurable counter, you can indeed act like its a 'software' event in that you can schedule consumers without constraints, however I don't think the PERF_COUNT_SW_* space is the right way to expose this counter. Better would be to use the sysfs stuff Lin has been working on (for which I still need to catch up on the latest discussions), it would then be tied to the pmu instance and appear/disappear when you load/ unload the module. However for testing purposes I see why you'd want to have _a_ interface :-) - it would be nice if you'd write the cpu detection a bit more readable, also, it looks like you forgot to check x86_vendor == X86_VENDOR_INTEL. > +static int __init intel_rapl_init(void) > +{ > + /* > + * RAPL features are only supported on processors have a CPUID > + * signature with DisplayFamily_DisplayModel of 06_2AH, 06_2DH > + */ > + if (boot_cpu_data.x86 != 0x06 || > + (boot_cpu_data.x86_model != 0x2A && > + boot_cpu_data.x86_model != 0x2D)) > + return -ENODEV; > + > + if (rapl_check_unit()) > + return -ENODEV; > + > + perf_pmu_register(&rapl_pmu); > + return 0; > +} Maybe something like (see intel_pmu_init() for example): if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) return -ENODEV; if (boot_cpu_data.x86 != 0x06) return -ENODEV; switch (boot_cpu_data.x86_model) { case 0x2A: /* sandybridge ?! 32nm */ case 0x2D: /* othermodel 32nm */ break; default: return -ENODEV; } Which again reminds me to ask of Intel, a comprehensive x86_model list, please? Alternatively, you can create a X86_FEATURE_RAPL and simply use boot_cpu_has(X86_FEATURE_RAPL) (much like intel_ds_init() has). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-18 12:25 ` Peter Zijlstra @ 2010-08-18 12:41 ` Matt Fleming 2010-08-19 3:28 ` Lin Ming 2010-08-19 2:43 ` Lin Ming 1 sibling, 1 reply; 20+ messages in thread From: Matt Fleming @ 2010-08-18 12:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Zhang Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Lin Ming, Brown, Len, Matthew Garrett On Wed, Aug 18, 2010 at 02:25:29PM +0200, Peter Zijlstra wrote: > On Wed, 2010-08-18 at 15:59 +0800, Zhang Rui wrote: > > Hi, all, > > > > RAPL(running average power limit) is a new feature which provides > > mechanisms to enforce power consumption limit, on some new processors. > > > > Generally speaking, by using RAPL, OS can set a power budget in a > > certain time window, and let Hardware to throttle the processor > > P/T-state to meet this energy limitation. > > > > RAPL also provides a new MSR, i.e. MSR_PKG_ENERGY_STATUS, which reports > > the total amount of energy consumed by the package. > > > > I'm not sure if to support RAPL or not, but anyway, it sounds like a > > good idea to export the energy status in perf. > > > > So a new perf pmu and event to show the package energy consumed is > > introduced in this patch. > > > > Here is what I get after applying the three patches, > > > > #./perf stat -e energy test > > Performance counter stats for 'test': > > > > 202 Joules cost by package > > 7.926001238 seconds time elapsed > > > > > > Note that this patch set is made based on Peter's perf-pmu branch, > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git > > which provides better interfaces to register/unregister a new pmu. > > > > any comment are welcome. :) > > > Nice,.. however: > > - if it is a pure read-only counter without sampling support, > expose it as such, don't fudge in the hrtimer stuff. Simply > fail to create a sampling event. > > SH has the same problem for its 'normal' PMU, the solution is > to use event groups, Matt was looking at adding support to > perf-record for that, if creating a sampling event fails, fall > back to {hrtimer, $event} groups. I had a quick look over the patches and Peter is right - the group events stuff would probably fit quite well here. Unfortunately, due to holidays and things, I haven't been able to get them finished yet. I'll get on that ASAP. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-18 12:41 ` Matt Fleming @ 2010-08-19 3:28 ` Lin Ming 2010-08-19 7:54 ` Matt Fleming 2010-08-19 9:02 ` Peter Zijlstra 0 siblings, 2 replies; 20+ messages in thread From: Lin Ming @ 2010-08-19 3:28 UTC (permalink / raw) To: Matt Fleming Cc: Peter Zijlstra, Zhang, Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Wed, 2010-08-18 at 20:41 +0800, Matt Fleming wrote: > On Wed, Aug 18, 2010 at 02:25:29PM +0200, Peter Zijlstra wrote: > > On Wed, 2010-08-18 at 15:59 +0800, Zhang Rui wrote: > > > Hi, all, > > > > > > RAPL(running average power limit) is a new feature which provides > > > mechanisms to enforce power consumption limit, on some new processors. > > > > > > Generally speaking, by using RAPL, OS can set a power budget in a > > > certain time window, and let Hardware to throttle the processor > > > P/T-state to meet this energy limitation. > > > > > > RAPL also provides a new MSR, i.e. MSR_PKG_ENERGY_STATUS, which reports > > > the total amount of energy consumed by the package. > > > > > > I'm not sure if to support RAPL or not, but anyway, it sounds like a > > > good idea to export the energy status in perf. > > > > > > So a new perf pmu and event to show the package energy consumed is > > > introduced in this patch. > > > > > > Here is what I get after applying the three patches, > > > > > > #./perf stat -e energy test > > > Performance counter stats for 'test': > > > > > > 202 Joules cost by package > > > 7.926001238 seconds time elapsed > > > > > > > > > Note that this patch set is made based on Peter's perf-pmu branch, > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git > > > which provides better interfaces to register/unregister a new pmu. > > > > > > any comment are welcome. :) > > > > > > Nice,.. however: > > > > - if it is a pure read-only counter without sampling support, > > expose it as such, don't fudge in the hrtimer stuff. Simply > > fail to create a sampling event. > > > > SH has the same problem for its 'normal' PMU, the solution is > > to use event groups, Matt was looking at adding support to > > perf-record for that, if creating a sampling event fails, fall > > back to {hrtimer, $event} groups. > > I had a quick look over the patches and Peter is right - the group > events stuff would probably fit quite well here. Unfortunately, due to > holidays and things, I haven't been able to get them finished > yet. I'll get on that ASAP. Hi, Matt What's the "group events stuff"? Is there some discussion on LKML or elsewhere I can have a look at? Thanks, Lin Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 3:28 ` Lin Ming @ 2010-08-19 7:54 ` Matt Fleming 2010-08-19 8:15 ` Lin Ming 2010-08-19 8:31 ` Zhang Rui 2010-08-19 9:02 ` Peter Zijlstra 1 sibling, 2 replies; 20+ messages in thread From: Matt Fleming @ 2010-08-19 7:54 UTC (permalink / raw) To: Lin Ming Cc: Peter Zijlstra, Zhang, Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Thu, Aug 19, 2010 at 11:28:17AM +0800, Lin Ming wrote: > On Wed, 2010-08-18 at 20:41 +0800, Matt Fleming wrote: > > > > I had a quick look over the patches and Peter is right - the group > > events stuff would probably fit quite well here. Unfortunately, due to > > holidays and things, I haven't been able to get them finished > > yet. I'll get on that ASAP. > > Hi, Matt > > What's the "group events stuff"? > Is there some discussion on LKML or elsewhere I can have a look at? > > Thanks, > Lin Ming The relevant information can be found here in this thread, http://lkml.org/lkml/2010/8/4/174. I'm working on some patches for this but they're not finished yet. I can probably get something to show by next week. The discussion started because the performance counters on SH do not generate an interrupt on overflow, so we need to periodically sample them. Am I correct in thinking that the energy counters also do not generate an interrupt on overflow and that's why you wrote the event as a software event? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 7:54 ` Matt Fleming @ 2010-08-19 8:15 ` Lin Ming 2010-08-19 8:31 ` Zhang Rui 1 sibling, 0 replies; 20+ messages in thread From: Lin Ming @ 2010-08-19 8:15 UTC (permalink / raw) To: Matt Fleming Cc: Peter Zijlstra, Zhang, Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Thu, 2010-08-19 at 15:54 +0800, Matt Fleming wrote: > On Thu, Aug 19, 2010 at 11:28:17AM +0800, Lin Ming wrote: > > On Wed, 2010-08-18 at 20:41 +0800, Matt Fleming wrote: > > > > > > I had a quick look over the patches and Peter is right - the group > > > events stuff would probably fit quite well here. Unfortunately, due to > > > holidays and things, I haven't been able to get them finished > > > yet. I'll get on that ASAP. > > > > Hi, Matt > > > > What's the "group events stuff"? > > Is there some discussion on LKML or elsewhere I can have a look at? > > > > Thanks, > > Lin Ming > > The relevant information can be found here in this thread, > http://lkml.org/lkml/2010/8/4/174. I'm working on some patches for > this but they're not finished yet. I can probably get something to > show by next week. Thanks. > > The discussion started because the performance counters on SH do not > generate an interrupt on overflow, so we need to periodically sample > them. Am I correct in thinking that the energy counters also do not > generate an interrupt on overflow and that's why you wrote the event > as a software event? I think so. Rui, could you confirm this? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 7:54 ` Matt Fleming 2010-08-19 8:15 ` Lin Ming @ 2010-08-19 8:31 ` Zhang Rui 2010-08-19 8:32 ` Matt Fleming 1 sibling, 1 reply; 20+ messages in thread From: Zhang Rui @ 2010-08-19 8:31 UTC (permalink / raw) To: Matt Fleming Cc: Lin, Ming M, Peter Zijlstra, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Thu, 2010-08-19 at 15:54 +0800, Matt Fleming wrote: > On Thu, Aug 19, 2010 at 11:28:17AM +0800, Lin Ming wrote: > > On Wed, 2010-08-18 at 20:41 +0800, Matt Fleming wrote: > > > > > > I had a quick look over the patches and Peter is right - the group > > > events stuff would probably fit quite well here. Unfortunately, due to > > > holidays and things, I haven't been able to get them finished > > > yet. I'll get on that ASAP. > > > > Hi, Matt > > > > What's the "group events stuff"? > > Is there some discussion on LKML or elsewhere I can have a look at? > > > > Thanks, > > Lin Ming > > The relevant information can be found here in this thread, > http://lkml.org/lkml/2010/8/4/174. I'm working on some patches for > this but they're not finished yet. I can probably get something to > show by next week. > > The discussion started because the performance counters on SH do not > generate an interrupt on overflow, so we need to periodically sample > them. Am I correct in thinking that the energy counters also do not > generate an interrupt on overflow and that's why you wrote the event > as a software event? right. BTW, I'm not quite familiar with perf tool, and now I'm wondering if the periodically sample is needed. because IMO, .start is invoked every time the process is scheduled in, and .stop is invoked when it's scheduled out. It seems that we just need to read the energy consumed in .start and .stop, and update the counter in .stop, right? thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 8:31 ` Zhang Rui @ 2010-08-19 8:32 ` Matt Fleming 2010-08-19 9:44 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Matt Fleming @ 2010-08-19 8:32 UTC (permalink / raw) To: Zhang Rui Cc: Lin, Ming M, Peter Zijlstra, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Thu, Aug 19, 2010 at 04:31:54PM +0800, Zhang Rui wrote: > On Thu, 2010-08-19 at 15:54 +0800, Matt Fleming wrote: > > On Thu, Aug 19, 2010 at 11:28:17AM +0800, Lin Ming wrote: > > > On Wed, 2010-08-18 at 20:41 +0800, Matt Fleming wrote: > > > > > > > > I had a quick look over the patches and Peter is right - the group > > > > events stuff would probably fit quite well here. Unfortunately, due to > > > > holidays and things, I haven't been able to get them finished > > > > yet. I'll get on that ASAP. > > > > > > Hi, Matt > > > > > > What's the "group events stuff"? > > > Is there some discussion on LKML or elsewhere I can have a look at? > > > > > > Thanks, > > > Lin Ming > > > > The relevant information can be found here in this thread, > > http://lkml.org/lkml/2010/8/4/174. I'm working on some patches for > > this but they're not finished yet. I can probably get something to > > show by next week. > > > > The discussion started because the performance counters on SH do not > > generate an interrupt on overflow, so we need to periodically sample > > them. Am I correct in thinking that the energy counters also do not > > generate an interrupt on overflow and that's why you wrote the event > > as a software event? > > right. > > BTW, I'm not quite familiar with perf tool, and now I'm wondering if the > periodically sample is needed. > because IMO, .start is invoked every time the process is scheduled in, > and .stop is invoked when it's scheduled out. It seems that we just need > to read the energy consumed in .start and .stop, and update the counter > in .stop, right? How big is the hardware counter? The problem comes when the process is scheduled in and runs for a long time, e.g. so long that the energy hardware counter wraps. This is why it's necessary to periodically sample the counter. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 8:32 ` Matt Fleming @ 2010-08-19 9:44 ` Peter Zijlstra 2010-08-21 1:18 ` Frederic Weisbecker 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2010-08-19 9:44 UTC (permalink / raw) To: Matt Fleming Cc: Zhang Rui, Lin, Ming M, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Thu, 2010-08-19 at 09:32 +0100, Matt Fleming wrote: > > > How big is the hardware counter? The problem comes when the process is > scheduled in and runs for a long time, e.g. so long that the energy > hardware counter wraps. This is why it's necessary to periodically > sample the counter. > Long running processes aren't the only case, you could associate an event with a CPU. Right, short counters (like SH when not chained) need something to accumulate deltas into the larger u64. You can indeed use timers for that, hr or otherwise, but you don't need the swcounter hrtimer infrastructure for that. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 9:44 ` Peter Zijlstra @ 2010-08-21 1:18 ` Frederic Weisbecker 2010-08-21 9:30 ` Ingo Molnar 2010-08-23 9:31 ` Peter Zijlstra 0 siblings, 2 replies; 20+ messages in thread From: Frederic Weisbecker @ 2010-08-21 1:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Matt Fleming, Zhang Rui, Lin, Ming M, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, Brown, Len, Matthew Garrett On Thu, Aug 19, 2010 at 11:44:45AM +0200, Peter Zijlstra wrote: > On Thu, 2010-08-19 at 09:32 +0100, Matt Fleming wrote: > > > > > > How big is the hardware counter? The problem comes when the process is > > scheduled in and runs for a long time, e.g. so long that the energy > > hardware counter wraps. This is why it's necessary to periodically > > sample the counter. > > > Long running processes aren't the only case, you could associate an > event with a CPU. I don't understand what you mean. > Right, short counters (like SH when not chained) need something to > accumulate deltas into the larger u64. You can indeed use timers for > that, hr or otherwise, but you don't need the swcounter hrtimer > infrastructure for that. So what is the point in simulating a PMI using an hrtimer? It won't be based on periods on the interesting counter but on time periods. This is not how we want the samples. If we want timer based samples, we can just launch a seperate software timer based event. In the case of SH where we need to flush to avoid wraps, I understand, but oterwise? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-21 1:18 ` Frederic Weisbecker @ 2010-08-21 9:30 ` Ingo Molnar 2010-08-23 9:31 ` Peter Zijlstra 1 sibling, 0 replies; 20+ messages in thread From: Ingo Molnar @ 2010-08-21 9:30 UTC (permalink / raw) To: Frederic Weisbecker Cc: Peter Zijlstra, Matt Fleming, Zhang Rui, Lin, Ming M, LKML, robert.richter, acme, paulus, dzickus, gorcunov, Brown, Len, Matthew Garrett * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > Right, short counters (like SH when not chained) need something to > > accumulate deltas into the larger u64. You can indeed use timers for > > that, hr or otherwise, but you don't need the swcounter hrtimer > > infrastructure for that. > > So what is the point in simulating a PMI using an hrtimer? It won't be > based on periods on the interesting counter but on time periods. This > is not how we want the samples. If we want timer based samples, we can > just launch a seperate software timer based event. If we then measure the delta of the count during that constant-time period, we'll get a 'weight' to consider. So for example if we sample with a period of every 1000 cache-misses, regular same-counter-PMU-IRQ sampling goes like this: 1000 1000 1000 1000 1000 .... While if we use a hrtimer, we get variations: 1050 711 1539 2210 400 But using that variable period as a weight will, statistically, compensate for the variation. It's similar to how the auto-freq code works - that too has variable periods (due to the self-adjustment) - which we compensate with weight. Thanks, Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-21 1:18 ` Frederic Weisbecker 2010-08-21 9:30 ` Ingo Molnar @ 2010-08-23 9:31 ` Peter Zijlstra 1 sibling, 0 replies; 20+ messages in thread From: Peter Zijlstra @ 2010-08-23 9:31 UTC (permalink / raw) To: Frederic Weisbecker Cc: Matt Fleming, Zhang Rui, Lin, Ming M, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, Brown, Len, Matthew Garrett On Sat, 2010-08-21 at 03:18 +0200, Frederic Weisbecker wrote: > On Thu, Aug 19, 2010 at 11:44:45AM +0200, Peter Zijlstra wrote: > > On Thu, 2010-08-19 at 09:32 +0100, Matt Fleming wrote: > > > > > > > > > How big is the hardware counter? The problem comes when the process is > > > scheduled in and runs for a long time, e.g. so long that the energy > > > hardware counter wraps. This is why it's necessary to periodically > > > sample the counter. > > > > > Long running processes aren't the only case, you could associate an > > event with a CPU. > I don't understand what you mean. perf_event_open(.pid = -1, .cpu = n); > > Right, short counters (like SH when not chained) need something to > > accumulate deltas into the larger u64. You can indeed use timers for > > that, hr or otherwise, but you don't need the swcounter hrtimer > > infrastructure for that. > > > So what is the point in simulating a PMI using an hrtimer? It won't be > based on periods on the interesting counter but on time periods. This > is not how we want the samples. If we want timer based samples, we can > just launch a seperate software timer based event. *sigh* that's exactly what we're doing, we're creating a separate software hrtimer to create samples, the only thing that's different is that we put this hrtimer and the hw-counter in a group and let the hrtimer sample include the hw-counter's value. If you then weight the samples by the hw-counter delta, you get something that's more or less related to the thing the hw-counter is counting. For counter's that do no provide overflow interrupts this is the only possible way to get anything. > In the case of SH where we need to flush to avoid wraps, I understand, but > oterwise? The wrap issue it totally unrelated. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 3:28 ` Lin Ming 2010-08-19 7:54 ` Matt Fleming @ 2010-08-19 9:02 ` Peter Zijlstra 2010-08-20 1:44 ` Zhang Rui 1 sibling, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2010-08-19 9:02 UTC (permalink / raw) To: Lin Ming Cc: Matt Fleming, Zhang, Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Thu, 2010-08-19 at 11:28 +0800, Lin Ming wrote: > On Wed, 2010-08-18 at 20:41 +0800, Matt Fleming wrote: > > On Wed, Aug 18, 2010 at 02:25:29PM +0200, Peter Zijlstra wrote: > > > On Wed, 2010-08-18 at 15:59 +0800, Zhang Rui wrote: > > > > Hi, all, > > > > > > > > RAPL(running average power limit) is a new feature which provides > > > > mechanisms to enforce power consumption limit, on some new processors. > > > > > > > > Generally speaking, by using RAPL, OS can set a power budget in a > > > > certain time window, and let Hardware to throttle the processor > > > > P/T-state to meet this energy limitation. > > > > > > > > RAPL also provides a new MSR, i.e. MSR_PKG_ENERGY_STATUS, which reports > > > > the total amount of energy consumed by the package. > > > > > > > > I'm not sure if to support RAPL or not, but anyway, it sounds like a > > > > good idea to export the energy status in perf. > > > > > > > > So a new perf pmu and event to show the package energy consumed is > > > > introduced in this patch. > > > > > > > > Here is what I get after applying the three patches, > > > > > > > > #./perf stat -e energy test > > > > Performance counter stats for 'test': > > > > > > > > 202 Joules cost by package > > > > 7.926001238 seconds time elapsed > > > > > > > > > > > > Note that this patch set is made based on Peter's perf-pmu branch, > > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git > > > > which provides better interfaces to register/unregister a new pmu. > > > > > > > > any comment are welcome. :) > > > > > > > > > Nice,.. however: > > > > > > - if it is a pure read-only counter without sampling support, > > > expose it as such, don't fudge in the hrtimer stuff. Simply > > > fail to create a sampling event. > > > > > > SH has the same problem for its 'normal' PMU, the solution is > > > to use event groups, Matt was looking at adding support to > > > perf-record for that, if creating a sampling event fails, fall > > > back to {hrtimer, $event} groups. > > > > I had a quick look over the patches and Peter is right - the group > > events stuff would probably fit quite well here. Unfortunately, due to > > holidays and things, I haven't been able to get them finished > > yet. I'll get on that ASAP. > > Hi, Matt > > What's the "group events stuff"? > Is there some discussion on LKML or elsewhere I can have a look at? its some obscure perf feature: leader = sys_perf_event_open(&hrtimer_attr, pid, cpu, 0, 0); sibling = sys_perf_event_open(&rapl_attr, pid, cpu, leader, 0); will create an even group (which means that both events require to be co-scheduled). If you then provided: hrtimer_attr.read_format |= PERF_FORMAT_GROUP; hrtimer_attr.sample_type |= PERF_SAMPLE_READ; the samples from the hrtimer will contain a field like: * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; * } && PERF_FORMAT_GROUP Which contains both the hrtimer count (ns) and the RAPL count (watts). Using that you can compute the RAPL delta between consecutive samples and use that to weight the sample. For perf-stat non of this is needed, since it doesn't use sampling counters anyway ;-). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 9:02 ` Peter Zijlstra @ 2010-08-20 1:44 ` Zhang Rui 2010-08-20 9:34 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Zhang Rui @ 2010-08-20 1:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Lin, Ming M, Matt Fleming, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Thu, 2010-08-19 at 17:02 +0800, Peter Zijlstra wrote: > > > > > > > > - if it is a pure read-only counter without sampling support, > > > > expose it as such, don't fudge in the hrtimer stuff. Simply > > > > fail to create a sampling event. > > > > > > > > SH has the same problem for its 'normal' PMU, the solution is > > > > to use event groups, Matt was looking at adding support to > > > > perf-record for that, if creating a sampling event fails, fall > > > > back to {hrtimer, $event} groups. > > > > > > I had a quick look over the patches and Peter is right - the group > > > events stuff would probably fit quite well here. Unfortunately, due to > > > holidays and things, I haven't been able to get them finished > > > yet. I'll get on that ASAP. > > > > Hi, Matt > > > > What's the "group events stuff"? > > Is there some discussion on LKML or elsewhere I can have a look at? > > its some obscure perf feature: > > leader = sys_perf_event_open(&hrtimer_attr, pid, cpu, 0, 0); > sibling = sys_perf_event_open(&rapl_attr, pid, cpu, leader, 0); > > will create an even group (which means that both events require to be > co-scheduled). If you then provided: > > hrtimer_attr.read_format |= PERF_FORMAT_GROUP; > hrtimer_attr.sample_type |= PERF_SAMPLE_READ; > hrtimer_attr is only shared in an event group, and rapl needs its owen event group, right? > the samples from the hrtimer will contain a field like: > > * { u64 nr; > * { u64 time_enabled; } && PERF_FORMAT_ENABLED > * { u64 time_running; } && PERF_FORMAT_RUNNING > * { u64 value; > * { u64 id; } && PERF_FORMAT_ID > * } cntr[nr]; > * } && PERF_FORMAT_GROUP > > Which contains both the hrtimer count (ns) and the RAPL count (watts). > > Using that you can compute the RAPL delta between consecutive samples > and use that to weight the sample. > > > For perf-stat non of this is needed, since it doesn't use sampling > counters anyway ;-). so what do you think the rapl counter should look like in userspace? showing it in perf-stat looks nice, right? :) thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-20 1:44 ` Zhang Rui @ 2010-08-20 9:34 ` Peter Zijlstra 2010-08-20 12:31 ` Ingo Molnar 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2010-08-20 9:34 UTC (permalink / raw) To: Zhang Rui Cc: Lin, Ming M, Matt Fleming, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett On Fri, 2010-08-20 at 09:44 +0800, Zhang Rui wrote: > On Thu, 2010-08-19 at 17:02 +0800, Peter Zijlstra wrote: > > its some obscure perf feature: > > > > leader = sys_perf_event_open(&hrtimer_attr, pid, cpu, 0, 0); > > sibling = sys_perf_event_open(&rapl_attr, pid, cpu, leader, 0); > > > > will create an even group (which means that both events require to be > > co-scheduled). If you then provided: > > > > hrtimer_attr.read_format |= PERF_FORMAT_GROUP; > > hrtimer_attr.sample_type |= PERF_SAMPLE_READ; > > > hrtimer_attr is only shared in an event group, and rapl needs its owen > event group, right? Uhm, no. The idea is to group the hrtimer and rapl event in order to obtain rapl 'samples'. That is, you get hrtimer samples which include the rapl count. For this we use the grouping construct where group siblings are always co-scheduled and can report on each others count. > so what do you think the rapl counter should look like in userspace? > showing it in perf-stat looks nice, right? :) Right, so the userspace interface would be using Lin's sysfs bits, which I still need to read up on. But the general idea is that each PMU gets a sysfs representation somewhere in the system topology reflecting its actual site (RAPL would be CPU local), this sysfs representation would then also allow you to discover all events it provides. perf list will then use sysfs to discover all available events, and you can still use perf stat -e $foo to select it, where foo is some to be determined string that identifies the thing, maybe something like: rapl:watts or somesuch (with rapl identifying the pmu and watts the actual event for that pmu). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-20 9:34 ` Peter Zijlstra @ 2010-08-20 12:31 ` Ingo Molnar 2010-08-20 21:34 ` acme 0 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2010-08-20 12:31 UTC (permalink / raw) To: Peter Zijlstra Cc: Zhang Rui, Lin, Ming M, Matt Fleming, LKML, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett, Steven Rostedt, Thomas Gleixner * Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, 2010-08-20 at 09:44 +0800, Zhang Rui wrote: > > On Thu, 2010-08-19 at 17:02 +0800, Peter Zijlstra wrote: > > > > its some obscure perf feature: > > > > > > leader = sys_perf_event_open(&hrtimer_attr, pid, cpu, 0, 0); > > > sibling = sys_perf_event_open(&rapl_attr, pid, cpu, leader, 0); > > > > > > will create an even group (which means that both events require to be > > > co-scheduled). If you then provided: > > > > > > hrtimer_attr.read_format |= PERF_FORMAT_GROUP; > > > hrtimer_attr.sample_type |= PERF_SAMPLE_READ; > > > > > hrtimer_attr is only shared in an event group, and rapl needs its owen > > event group, right? > > Uhm, no. The idea is to group the hrtimer and rapl event in order to > obtain rapl 'samples'. > > That is, you get hrtimer samples which include the rapl count. For this > we use the grouping construct where group siblings are always > co-scheduled and can report on each others count. > > > so what do you think the rapl counter should look like in userspace? > > showing it in perf-stat looks nice, right? :) > > Right, so the userspace interface would be using Lin's sysfs bits, which I > still need to read up on. But the general idea is that each PMU gets a sysfs > representation somewhere in the system topology reflecting its actual site > (RAPL would be CPU local), this sysfs representation would then also allow > you to discover all events it provides. > > perf list will then use sysfs to discover all available events, and you can > still use perf stat -e $foo to select it, where foo is some to be determined > string that identifies the thing, maybe something like: rapl:watts or > somesuch (with rapl identifying the pmu and watts the actual event for that > pmu). Btw., some 'perf list' thoughts. We could do a: perf list --help rapl:watts Which gives the user some idea what an event does. Also, short descriptive line in perf list output would be nice: $ perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] # CPU cycles instructions [Hardware event] # instructions executed ... rapl:watts [Tracepoint] # watts usage or something like that. Perhaps even a TUI for perf list, to browse between event types? (in that case it would probably be useful to make them collapse along natural grouping) We want users/developers to discover new events, see and understand their purpose and combine them in not-seen-before ways. Thanks, Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-20 12:31 ` Ingo Molnar @ 2010-08-20 21:34 ` acme 0 siblings, 0 replies; 20+ messages in thread From: acme @ 2010-08-20 21:34 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Zhang Rui, Lin, Ming M, Matt Fleming, LKML, robert.richter, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett, Steven Rostedt, Thomas Gleixner Em Fri, Aug 20, 2010 at 02:31:59PM +0200, Ingo Molnar escreveu: > Btw., some 'perf list' thoughts. We could do a: > > perf list --help rapl:watts > > Which gives the user some idea what an event does. Also, short descriptive > line in perf list output would be nice: > > $ perf list > > List of pre-defined events (to be used in -e): > > cpu-cycles OR cycles [Hardware event] # CPU cycles > instructions [Hardware event] # instructions executed > > ... > > rapl:watts [Tracepoint] # watts usage > > or something like that. Perhaps even a TUI for perf list, to browse between > event types? (in that case it would probably be useful to make them collapse > along natural grouping) > > We want users/developers to discover new events, see and understand their > purpose and combine them in not-seen-before ways. Right, record, list, probe, top are on the UI (not just T-UI, see latest efforts on decoupling from newt/slang) hit-list :) Moving from one to the other seamlessly like today is possible for report and annotate is the goal. Now that the UI browser code is more robust and generic that should happen faster, I think. - Arnaldo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-18 12:25 ` Peter Zijlstra 2010-08-18 12:41 ` Matt Fleming @ 2010-08-19 2:43 ` Lin Ming 2010-08-19 8:54 ` Peter Zijlstra 1 sibling, 1 reply; 20+ messages in thread From: Lin Ming @ 2010-08-19 2:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Zhang, Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett, Matt Fleming On Wed, 2010-08-18 at 20:25 +0800, Peter Zijlstra wrote: > On Wed, 2010-08-18 at 15:59 +0800, Zhang Rui wrote: > > Hi, all, > > > > RAPL(running average power limit) is a new feature which provides > > mechanisms to enforce power consumption limit, on some new processors. > > > > Generally speaking, by using RAPL, OS can set a power budget in a > > certain time window, and let Hardware to throttle the processor > > P/T-state to meet this energy limitation. > > > > RAPL also provides a new MSR, i.e. MSR_PKG_ENERGY_STATUS, which reports > > the total amount of energy consumed by the package. > > > > I'm not sure if to support RAPL or not, but anyway, it sounds like a > > good idea to export the energy status in perf. > > > > So a new perf pmu and event to show the package energy consumed is > > introduced in this patch. > > > > Here is what I get after applying the three patches, > > > > #./perf stat -e energy test > > Performance counter stats for 'test': > > > > 202 Joules cost by package > > 7.926001238 seconds time elapsed > > > > > > Note that this patch set is made based on Peter's perf-pmu branch, > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git > > which provides better interfaces to register/unregister a new pmu. > > > > any comment are welcome. :) > > > Nice,.. however: > > - if it is a pure read-only counter without sampling support, > expose it as such, don't fudge in the hrtimer stuff. Simply > fail to create a sampling event. > > SH has the same problem for its 'normal' PMU, the solution is > to use event groups, Matt was looking at adding support to > perf-record for that, if creating a sampling event fails, fall > back to {hrtimer, $event} groups. > > - since its a free-running, non-configurable counter, you can indeed > act like its a 'software' event in that you can schedule consumers > without constraints, however I don't think the PERF_COUNT_SW_* space > is the right way to expose this counter. > > Better would be to use the sysfs stuff Lin has been working on (for Sorry that I have no good idea how to export the various tracepoints events automatically, so this work will take time. Lin Ming > which I still need to catch up on the latest discussions), it would > then be tied to the pmu instance and appear/disappear when you load/ > unload the module. > > However for testing purposes I see why you'd want to have _a_ > interface :-) > > - it would be nice if you'd write the cpu detection a bit more readable, > also, it looks like you forgot to check x86_vendor == X86_VENDOR_INTEL. > > > +static int __init intel_rapl_init(void) > > +{ > > + /* > > + * RAPL features are only supported on processors have a CPUID > > + * signature with DisplayFamily_DisplayModel of 06_2AH, 06_2DH > > + */ > > + if (boot_cpu_data.x86 != 0x06 || > > + (boot_cpu_data.x86_model != 0x2A && > > + boot_cpu_data.x86_model != 0x2D)) > > + return -ENODEV; > > + > > + if (rapl_check_unit()) > > + return -ENODEV; > > + > > + perf_pmu_register(&rapl_pmu); > > + return 0; > > +} > > Maybe something like (see intel_pmu_init() for example): > > if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) > return -ENODEV; > > if (boot_cpu_data.x86 != 0x06) > return -ENODEV; > > switch (boot_cpu_data.x86_model) { > case 0x2A: /* sandybridge ?! 32nm */ > case 0x2D: /* othermodel 32nm */ > break; > > default: > return -ENODEV; > } > > Which again reminds me to ask of Intel, a comprehensive x86_model list, > please? > > Alternatively, you can create a X86_FEATURE_RAPL and simply use > boot_cpu_has(X86_FEATURE_RAPL) (much like intel_ds_init() has). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 2:43 ` Lin Ming @ 2010-08-19 8:54 ` Peter Zijlstra 2010-08-20 0:21 ` Lin Ming 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2010-08-19 8:54 UTC (permalink / raw) To: Lin Ming Cc: Zhang, Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett, Matt Fleming On Thu, 2010-08-19 at 10:43 +0800, Lin Ming wrote: > Sorry that I have no good idea how to export the various tracepoints > events automatically, so this work will take time. > Well, we could start with just he hardware bits and leave the tracepoint bits for later, right? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] perf: show package power consumption in perf 2010-08-19 8:54 ` Peter Zijlstra @ 2010-08-20 0:21 ` Lin Ming 0 siblings, 0 replies; 20+ messages in thread From: Lin Ming @ 2010-08-20 0:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Zhang, Rui, LKML, mingo, robert.richter, acme, paulus, dzickus, gorcunov, fweisbec, Brown, Len, Matthew Garrett, Matt Fleming On Thu, 2010-08-19 at 16:54 +0800, Peter Zijlstra wrote: > On Thu, 2010-08-19 at 10:43 +0800, Lin Ming wrote: > > Sorry that I have no good idea how to export the various tracepoints > > events automatically, so this work will take time. > > > Well, we could start with just he hardware bits and leave the tracepoint > bits for later, right? Right. I'll update the patches. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-08-23 9:31 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-18 7:59 [RFC PATCH 0/3] perf: show package power consumption in perf Zhang Rui 2010-08-18 12:25 ` Peter Zijlstra 2010-08-18 12:41 ` Matt Fleming 2010-08-19 3:28 ` Lin Ming 2010-08-19 7:54 ` Matt Fleming 2010-08-19 8:15 ` Lin Ming 2010-08-19 8:31 ` Zhang Rui 2010-08-19 8:32 ` Matt Fleming 2010-08-19 9:44 ` Peter Zijlstra 2010-08-21 1:18 ` Frederic Weisbecker 2010-08-21 9:30 ` Ingo Molnar 2010-08-23 9:31 ` Peter Zijlstra 2010-08-19 9:02 ` Peter Zijlstra 2010-08-20 1:44 ` Zhang Rui 2010-08-20 9:34 ` Peter Zijlstra 2010-08-20 12:31 ` Ingo Molnar 2010-08-20 21:34 ` acme 2010-08-19 2:43 ` Lin Ming 2010-08-19 8:54 ` Peter Zijlstra 2010-08-20 0:21 ` Lin Ming
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.