* Re: oprofile and ARM A9 hardware counter [not found] ` <CAMsRxfJH+9MS=Svmh0BvDSb8xqsPe2HPobkeKcn8JO35npjhwQ@mail.gmail.com> @ 2012-02-08 2:24 ` Ming Lei 2012-02-15 16:38 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2012-02-08 2:24 UTC (permalink / raw) To: eranian Cc: Shilimkar, Santosh, David Long, b-cousson, mans, will.deacon, linux-arm, Peter Zijlstra, Ingo Molnar, Linux Kernel Mailing List Hi, CC lkml and perf guys, since looks it is related with perf core. On Tue, Feb 7, 2012 at 7:59 PM, stephane eranian <eranian@googlemail.com> wrote: > An easier way to verify we're getting the right number of samples is > to use perf top: > > $ taskset -c 1 noploop 1000 & > $ sudo perf top > > You'll see around 850 irqs/sec, should be closer to 1000. > But if I drop the rate to 100Hz, then it works: > > $ sudo perf top -F 100 > > That leads me to believe there is too much overhead somewhere. > Could be in perf_event itself. Looks like the issue is caused by perf_event itself, but nothing to do with much overhead somewhere. On OMAP4, HZ is 128, and perf_rotate_context may set a new sample period(~8ms), which is much longer than 1ms in 1000HZ freq mode, so less sample events are observed. X86 isn't affected since its HZ is 1000. With patch[1], about 10000 sample events can be generated after running 'perf record -e cycles ./noploop 10' and 'perf report -D | tail -20' on panda board. I am not sure if patch[1] is a right fix, but it can verify the problem. thanks, -- Ming Lei [1], fix adjusting frequency in perf_rotate_context diff --git a/kernel/events/core.c b/kernel/events/core.c index 32b48c8..db4faf2 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2300,14 +2300,12 @@ do { \ return div64_u64(dividend, divisor); } -static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count) +static void perf_adjust_period(struct perf_event *event, u64 period, u64 count) { struct hw_perf_event *hwc = &event->hw; - s64 period, sample_period; + s64 sample_period; s64 delta; - period = perf_calculate_period(event, nsec, count); - delta = (s64)(period - hwc->sample_period); delta = (delta + 7) / 8; /* low pass filter */ @@ -2363,8 +2361,13 @@ static void perf_ctx_adjust_freq(struct perf_event_context *ctx, u64 period) delta = now - hwc->freq_count_stamp; hwc->freq_count_stamp = now; - if (delta > 0) + if (delta > 0) { + period = perf_calculate_period(event, period, delta); + + if (period > 4*hwc->sample_period) + period = hwc->sample_period; perf_adjust_period(event, period, delta); + } } } @@ -4533,8 +4536,10 @@ static int __perf_event_overflow(struct perf_event *event, hwc->freq_time_stamp = now; - if (delta > 0 && delta < 2*TICK_NSEC) + if (delta > 0 && delta < 2*TICK_NSEC) { + delta = perf_calculate_period(event, delta, hwc->last_period); perf_adjust_period(event, delta, hwc->last_period); + } } /* ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-08 2:24 ` oprofile and ARM A9 hardware counter Ming Lei @ 2012-02-15 16:38 ` Peter Zijlstra 2012-02-16 10:25 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2012-02-15 16:38 UTC (permalink / raw) To: Ming Lei Cc: eranian, Shilimkar, Santosh, David Long, b-cousson, mans, will.deacon, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Wed, 2012-02-08 at 10:24 +0800, Ming Lei wrote: > > On OMAP4, HZ is 128, and perf_rotate_context may set a new sample period(~8ms), > which is much longer than 1ms in 1000HZ freq mode, so less sample events are > observed. X86 isn't affected since its HZ is 1000. > > With patch[1], about 10000 sample events can be generated after running > 'perf record -e cycles ./noploop 10' and 'perf report -D | tail -20' > on panda board. > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 32b48c8..db4faf2 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -2300,14 +2300,12 @@ do { \ > return div64_u64(dividend, divisor); > } > > -static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count) > +static void perf_adjust_period(struct perf_event *event, u64 period, u64 count) > { > struct hw_perf_event *hwc = &event->hw; > - s64 period, sample_period; > + s64 sample_period; > s64 delta; > > - period = perf_calculate_period(event, nsec, count); > - > delta = (s64)(period - hwc->sample_period); > delta = (delta + 7) / 8; /* low pass filter */ > > @@ -2363,8 +2361,13 @@ static void perf_ctx_adjust_freq(struct > perf_event_context *ctx, u64 period) > delta = now - hwc->freq_count_stamp; > hwc->freq_count_stamp = now; > > - if (delta > 0) > + if (delta > 0) { > + period = perf_calculate_period(event, period, delta); > + > + if (period > 4*hwc->sample_period) > + period = hwc->sample_period; > perf_adjust_period(event, period, delta); > + } > } > } > > @@ -4533,8 +4536,10 @@ static int __perf_event_overflow(struct > perf_event *event, > > hwc->freq_time_stamp = now; > > - if (delta > 0 && delta < 2*TICK_NSEC) > + if (delta > 0 && delta < 2*TICK_NSEC) { > + delta = perf_calculate_period(event, delta, hwc->last_period); > perf_adjust_period(event, delta, hwc->last_period); > + } > } > > /* So what this patch seems to do is put that filter on period in perf_ctx_adjust_freq(). Not making sense.. nor can I see a HZ dependency, perf_ctx_adjust_freq() uses TICK_NSEC as time base. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-15 16:38 ` Peter Zijlstra @ 2012-02-16 10:25 ` Ming Lei 2012-02-16 15:00 ` Will Deacon 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2012-02-16 10:25 UTC (permalink / raw) To: Peter Zijlstra Cc: eranian, Shilimkar, Santosh, David Long, b-cousson, mans, will.deacon, linux-arm, Ingo Molnar, Linux Kernel Mailing List Hi, On Thu, Feb 16, 2012 at 12:38 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > So what this patch seems to do is put that filter on period in > perf_ctx_adjust_freq(). Not making sense.. nor can I see a HZ > dependency, perf_ctx_adjust_freq() uses TICK_NSEC as time base. Yes, you are right, I remembered it was observed it on -rc1, and Stephane's unthrottling patch was not merged at that time. Today I investigated the problem further on -rc3 and found that seems the problem is caused by arm pmu code. The patch below may fix the problem, now about 40000 sample events can be generated on the command: 'perf record -e cycles -F 4000 ./noploop 10&& perf report -D | tail -20' armpmu_event_update may be called in tick path, so the running counter will be overflowed and produce a great value of 'delta', then a mistaken count is stored into event->count and event->hw.freq_count_stamp. Finally the two variables are not synchronous, then a invalid and large period is computed and written to pmu, and sample events are decreased much. Will, this patch simplifies the 'delta' computation and doesn't use the overflow flag, even though which can be read directly from PMOVSR, could you comment on the patch? diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c index 5bb91bf..789700a 100644 --- a/arch/arm/kernel/perf_event.c +++ b/arch/arm/kernel/perf_event.c @@ -193,13 +193,8 @@ again: new_raw_count) != prev_raw_count) goto again; - new_raw_count &= armpmu->max_period; - prev_raw_count &= armpmu->max_period; - - if (overflow) - delta = armpmu->max_period - prev_raw_count + new_raw_count + 1; - else - delta = new_raw_count - prev_raw_count; + delta = (armpmu->max_period - prev_raw_count + new_raw_count + + 1) & armpmu->max_period; local64_add(delta, &event->count); local64_sub(delta, &hwc->period_left); thanks, -- Ming Lei ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-16 10:25 ` Ming Lei @ 2012-02-16 15:00 ` Will Deacon 2012-02-16 16:12 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Will Deacon @ 2012-02-16 15:00 UTC (permalink / raw) To: Ming Lei Cc: Peter Zijlstra, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Thu, Feb 16, 2012 at 10:25:05AM +0000, Ming Lei wrote: > On Thu, Feb 16, 2012 at 12:38 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > > So what this patch seems to do is put that filter on period in > > perf_ctx_adjust_freq(). Not making sense.. nor can I see a HZ > > dependency, perf_ctx_adjust_freq() uses TICK_NSEC as time base. > > Yes, you are right, I remembered it was observed it on -rc1, and > Stephane's unthrottling > patch was not merged at that time. Today I investigated the problem > further on -rc3 and found that seems the problem is caused by arm pmu code. As I reported previously, Stephane's patch is causing warnings on -rc3: http://lists.infradead.org/pipermail/linux-arm-kernel/2012-February/084391.html so I'd like to get to the bottom of that before changing anything else. I'd also like to know why this has only been reported on OMAP4 and I can't reproduce it on my boards. > The patch below may fix the problem, now about 40000 sample events > can be generated on the command: > > 'perf record -e cycles -F 4000 ./noploop 10&& perf report -D | tail -20' > > armpmu_event_update may be called in tick path, so the running counter > will be overflowed and produce a great value of 'delta', then a mistaken > count is stored into event->count and event->hw.freq_count_stamp. Finally > the two variables are not synchronous, then a invalid and large period is > computed and written to pmu, and sample events are decreased much. Hmm, so are you observing an event overflow during the tick handler? This should be fine unless the new value has wrapped past the previous one (i.e. more than 2^32 events have occurred). I find this extremely unlikely for sample-based profiling unless you have some major IRQ latency issues... The only way I can think of improving this (bearing in mind that at some point we're limited by 32 bits of counter) is to check for overflow in the tick path and then invoke the PMU irq handler if there is an overflow, but that's really not very nice. > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c > index 5bb91bf..789700a 100644 > --- a/arch/arm/kernel/perf_event.c > +++ b/arch/arm/kernel/perf_event.c > @@ -193,13 +193,8 @@ again: > new_raw_count) != prev_raw_count) > goto again; > > - new_raw_count &= armpmu->max_period; > - prev_raw_count &= armpmu->max_period; > - > - if (overflow) > - delta = armpmu->max_period - prev_raw_count + new_raw_count + 1; > - else > - delta = new_raw_count - prev_raw_count; > + delta = (armpmu->max_period - prev_raw_count + new_raw_count > + + 1) & armpmu->max_period; This breaks when more than max_period events have passed. See a737823d ("ARM: 6835/1: perf: ensure overflows aren't missed due to IRQ latency"). Will ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-16 15:00 ` Will Deacon @ 2012-02-16 16:12 ` Ming Lei 2012-02-16 16:19 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2012-02-16 16:12 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Thu, Feb 16, 2012 at 11:00 PM, Will Deacon <will.deacon@arm.com> wrote: > On Thu, Feb 16, 2012 at 10:25:05AM +0000, Ming Lei wrote: >> On Thu, Feb 16, 2012 at 12:38 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: >> > >> > So what this patch seems to do is put that filter on period in >> > perf_ctx_adjust_freq(). Not making sense.. nor can I see a HZ >> > dependency, perf_ctx_adjust_freq() uses TICK_NSEC as time base. >> >> Yes, you are right, I remembered it was observed it on -rc1, and >> Stephane's unthrottling >> patch was not merged at that time. Today I investigated the problem >> further on -rc3 and found that seems the problem is caused by arm pmu code. > > As I reported previously, Stephane's patch is causing warnings on -rc3: > > http://lists.infradead.org/pipermail/linux-arm-kernel/2012-February/084391.html > > so I'd like to get to the bottom of that before changing anything else. Looks I am luck enough and not see the warning on OMAP4, either -rc3 or -rc3--next-20120210, :-) Maybe we have different config options. > > I'd also like to know why this has only been reported on OMAP4 and I can't > reproduce it on my boards. > >> The patch below may fix the problem, now about 40000 sample events >> can be generated on the command: >> >> 'perf record -e cycles -F 4000 ./noploop 10&& perf report -D | tail -20' >> >> armpmu_event_update may be called in tick path, so the running counter >> will be overflowed and produce a great value of 'delta', then a mistaken >> count is stored into event->count and event->hw.freq_count_stamp. Finally >> the two variables are not synchronous, then a invalid and large period is >> computed and written to pmu, and sample events are decreased much. > > Hmm, so are you observing an event overflow during the tick handler? This Yes, I am sure I can observe it without much difficulty. > should be fine unless the new value has wrapped past the previous one (i.e. > more than 2^32 events have occurred). I find this extremely unlikely for > sample-based profiling unless you have some major IRQ latency issues... IMO, it is not so difficult to get it, suppose prev_raw_count is 1000000 and -prev_raw_count was write to one pmu counter, then the counter will be expired and pmu irq is not handled quickly enough, so the pmu counter will warp and start counting from zero. When the tick is scheduled just before handling pmu irq, armpmu_event_update() is called to read the pmu counter as 'new_raw_count', suppose it is 100, then the issue is triggered: u64 delta = 100 - 1000000 = 18446744073708551716. Looks the higher the frequency is, the easier the problem is reproduced. > > The only way I can think of improving this (bearing in mind that at some > point we're limited by 32 bits of counter) is to check for overflow in the > tick path and then invoke the PMU irq handler if there is an overflow, but > that's really not very nice. Also we may remove the 'overflow' parameter from armpmu_event_update, and introduce armpmu->is_overflow(idx) callback to check if the counter(event) is overflow inside armpmu_event_update. IMO, the pmu irq can't be lost, so the pmu irq handler is not needed to invoke in tick path. > >> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c >> index 5bb91bf..789700a 100644 >> --- a/arch/arm/kernel/perf_event.c >> +++ b/arch/arm/kernel/perf_event.c >> @@ -193,13 +193,8 @@ again: >> new_raw_count) != prev_raw_count) >> goto again; >> >> - new_raw_count &= armpmu->max_period; >> - prev_raw_count &= armpmu->max_period; >> - >> - if (overflow) >> - delta = armpmu->max_period - prev_raw_count + new_raw_count + 1; >> - else >> - delta = new_raw_count - prev_raw_count; >> + delta = (armpmu->max_period - prev_raw_count + new_raw_count >> + + 1) & armpmu->max_period; > > This breaks when more than max_period events have passed. See a737823d > ("ARM: 6835/1: perf: ensure overflows aren't missed due to IRQ latency"). Sorry, I didn't notice the commit and the problem addressed, so looks the 'overflow' information is needed for armpmu_event_update. thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-16 16:12 ` Ming Lei @ 2012-02-16 16:19 ` Peter Zijlstra 2012-02-16 16:37 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2012-02-16 16:19 UTC (permalink / raw) To: Ming Lei Cc: Will Deacon, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Fri, 2012-02-17 at 00:12 +0800, Ming Lei wrote: > is triggered: u64 delta = 100 - 1000000 = 18446744073708551716. on x86 we do: int shift = 64 - x86_pmu.cntval_bits; s64 delta; delta = (new_raw_count << shift) - (prev_raw_count << shift); delta >>= shift; This deals with short overflows (on x86 the registers are typically 40 or 48 bits wide). If the arm register is 32 you can of course also get there with some u32 casts. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-16 16:19 ` Peter Zijlstra @ 2012-02-16 16:37 ` Ming Lei 2012-02-16 18:08 ` Will Deacon 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2012-02-16 16:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Fri, Feb 17, 2012 at 12:19 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Fri, 2012-02-17 at 00:12 +0800, Ming Lei wrote: >> is triggered: u64 delta = 100 - 1000000 = 18446744073708551716. > > on x86 we do: > > int shift = 64 - x86_pmu.cntval_bits; > s64 delta; > > delta = (new_raw_count << shift) - (prev_raw_count << shift); > delta >>= shift; > > This deals with short overflows (on x86 the registers are typically 40 > or 48 bits wide). If the arm register is 32 you can of course also get > there with some u32 casts. Good idea, but it may not work if new_raw_count is bigger than prev_raw_count. thanks -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-16 16:37 ` Ming Lei @ 2012-02-16 18:08 ` Will Deacon 2012-02-17 5:24 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Will Deacon @ 2012-02-16 18:08 UTC (permalink / raw) To: Ming Lei Cc: Peter Zijlstra, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Thu, Feb 16, 2012 at 04:37:35PM +0000, Ming Lei wrote: > On Fri, Feb 17, 2012 at 12:19 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > On Fri, 2012-02-17 at 00:12 +0800, Ming Lei wrote: > >> is triggered: u64 delta = 100 - 1000000 = 18446744073708551716. > > > > on x86 we do: > > > > int shift = 64 - x86_pmu.cntval_bits; > > s64 delta; > > > > delta = (new_raw_count << shift) - (prev_raw_count << shift); > > delta >>= shift; > > > > This deals with short overflows (on x86 the registers are typically 40 > > or 48 bits wide). If the arm register is 32 you can of course also get > > there with some u32 casts. > > Good idea, but it may not work if new_raw_count is bigger than prev_raw_count. The more I think about this, the more I think that the overflow parameter to armpmu_event_update needs to go. It was introduced to prevent massive event loss in non-sampling mode, but I think we can get around that by changing the default sample_period to be half of the max_period, therefore giving ourselves a much better chance of handling the interrupt before new wraps around past prev. Ming Lei - can you try the following please? If it works for you, then I'll do it properly and kill the overflow parameter altogether. Thanks, Will git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c index 5bb91bf..ef597a3 100644 --- a/arch/arm/kernel/perf_event.c +++ b/arch/arm/kernel/perf_event.c @@ -193,13 +193,7 @@ again: new_raw_count) != prev_raw_count) goto again; - new_raw_count &= armpmu->max_period; - prev_raw_count &= armpmu->max_period; - - if (overflow) - delta = armpmu->max_period - prev_raw_count + new_raw_count + 1; - else - delta = new_raw_count - prev_raw_count; + delta = (new_raw_count - prev_raw_count) & armpmu->max_period; local64_add(delta, &event->count); local64_sub(delta, &hwc->period_left); @@ -518,7 +512,7 @@ __hw_perf_event_init(struct perf_event *event) hwc->config_base |= (unsigned long)mapping; if (!hwc->sample_period) { - hwc->sample_period = armpmu->max_period; + hwc->sample_period = armpmu->max_period >> 1; hwc->last_period = hwc->sample_period; local64_set(&hwc->period_left, hwc->sample_period); } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-16 18:08 ` Will Deacon @ 2012-02-17 5:24 ` Ming Lei 2012-02-17 10:26 ` Will Deacon 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2012-02-17 5:24 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Fri, Feb 17, 2012 at 2:08 AM, Will Deacon <will.deacon@arm.com> wrote: > > The more I think about this, the more I think that the overflow parameter to > armpmu_event_update needs to go. It was introduced to prevent massive event > loss in non-sampling mode, but I think we can get around that by changing > the default sample_period to be half of the max_period, therefore giving > ourselves a much better chance of handling the interrupt before new wraps > around past prev. > > Ming Lei - can you try the following please? If it works for you, then I'll > do it properly and kill the overflow parameter altogether. Of course, it does work for the problem reported by Stephane since it changes the delta computation basically as I did, but I am afraid that it may be not good enough for the issue fixed in a737823d ("ARM: 6835/1: perf: ensure overflows aren't missed due to IRQ latency"). > > Thanks, > > Will > > git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c > index 5bb91bf..ef597a3 100644 > --- a/arch/arm/kernel/perf_event.c > +++ b/arch/arm/kernel/perf_event.c > @@ -193,13 +193,7 @@ again: > new_raw_count) != prev_raw_count) > goto again; > > - new_raw_count &= armpmu->max_period; > - prev_raw_count &= armpmu->max_period; > - > - if (overflow) > - delta = armpmu->max_period - prev_raw_count + new_raw_count + 1; > - else > - delta = new_raw_count - prev_raw_count; > + delta = (new_raw_count - prev_raw_count) & armpmu->max_period; > > local64_add(delta, &event->count); > local64_sub(delta, &hwc->period_left); > @@ -518,7 +512,7 @@ __hw_perf_event_init(struct perf_event *event) > hwc->config_base |= (unsigned long)mapping; > > if (!hwc->sample_period) { > - hwc->sample_period = armpmu->max_period; > + hwc->sample_period = armpmu->max_period >> 1; If you assume that the issue addressed by a737823d can only happen in non-sample situation, Peter's idea of u32 cast is OK and maybe simpler. But I am afraid that the issue still can be triggered in sample-based situation, especially in very high frequency case: suppose the sample freq is 10000, 100us IRQ delay may trigger the issue. So we may use the overflow information to make perf more robust, IMO. thanks -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-17 5:24 ` Ming Lei @ 2012-02-17 10:26 ` Will Deacon 2012-02-20 3:19 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Will Deacon @ 2012-02-17 10:26 UTC (permalink / raw) To: Ming Lei Cc: Peter Zijlstra, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Fri, Feb 17, 2012 at 05:24:02AM +0000, Ming Lei wrote: > On Fri, Feb 17, 2012 at 2:08 AM, Will Deacon <will.deacon@arm.com> wrote: > > > > > The more I think about this, the more I think that the overflow parameter to > > armpmu_event_update needs to go. It was introduced to prevent massive event > > loss in non-sampling mode, but I think we can get around that by changing > > the default sample_period to be half of the max_period, therefore giving > > ourselves a much better chance of handling the interrupt before new wraps > > around past prev. > > > > Ming Lei - can you try the following please? If it works for you, then I'll > > do it properly and kill the overflow parameter altogether. > > Of course, it does work for the problem reported by Stephane since > it changes the delta computation basically as I did, but I am afraid that > it may be not good enough for the issue fixed in a737823d ("ARM: 6835/1: > perf: ensure overflows aren't missed due to IRQ latency"). I think it does solve that problem. > > > > if (!hwc->sample_period) { > > - hwc->sample_period = armpmu->max_period; > > + hwc->sample_period = armpmu->max_period >> 1; > > If you assume that the issue addressed by a737823d can only happen in > non-sample situation, Peter's idea of u32 cast is OK and maybe simpler. I don't want to assume that the counters are 32-bit in this code as we may want to plug in some other PMU with 16-bit counters, for example. That's why we have max_period defined for each PMU. Furthermore, it still doesn't help us in the stat case where prev will typically be quite small after we've had an overflow and new may well overtake it. > But I am afraid that the issue still can be triggered in sample-based situation, > especially in very high frequency case: suppose the sample freq is 10000, > 100us IRQ delay may trigger the issue. Not sure I follow. If the frequency is 10000, then we write 0xffffd8f0 to the counter. That means we have a 0xffffd8f0 event window to read the counter after it overflows before new overtakes prev and we get confused. If this passed in 100us then either your clock speed is 4.3*10^12Hz or you have a seriously wide issue :) > So we may use the overflow information to make perf more robust, IMO. There's a trade off between allowing the counter to wrap back around past its previous value or being able to handle overflow on a non-interrupt path. Given that the former problem is only a real issue for non-sampling runs, halving the period in that case should sort it (and it does stop my simple test case from exploding). Will ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-17 10:26 ` Will Deacon @ 2012-02-20 3:19 ` Ming Lei 2012-02-20 9:44 ` Will Deacon 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2012-02-20 3:19 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List Hi, On Fri, Feb 17, 2012 at 6:26 PM, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Feb 17, 2012 at 05:24:02AM +0000, Ming Lei wrote: >> On Fri, Feb 17, 2012 at 2:08 AM, Will Deacon <will.deacon@arm.com> wrote: >> >> > >> > The more I think about this, the more I think that the overflow parameter to >> > armpmu_event_update needs to go. It was introduced to prevent massive event >> > loss in non-sampling mode, but I think we can get around that by changing >> > the default sample_period to be half of the max_period, therefore giving >> > ourselves a much better chance of handling the interrupt before new wraps >> > around past prev. >> > >> > Ming Lei - can you try the following please? If it works for you, then I'll >> > do it properly and kill the overflow parameter altogether. >> >> Of course, it does work for the problem reported by Stephane since >> it changes the delta computation basically as I did, but I am afraid that >> it may be not good enough for the issue fixed in a737823d ("ARM: 6835/1: >> perf: ensure overflows aren't missed due to IRQ latency"). > > I think it does solve that problem. > >> > >> > if (!hwc->sample_period) { >> > - hwc->sample_period = armpmu->max_period; >> > + hwc->sample_period = armpmu->max_period >> 1; >> >> If you assume that the issue addressed by a737823d can only happen in >> non-sample situation, Peter's idea of u32 cast is OK and maybe simpler. > > I don't want to assume that the counters are 32-bit in this code as we may > want to plug in some other PMU with 16-bit counters, for example. That's why > we have max_period defined for each PMU. Furthermore, it still doesn't help us > in the stat case where prev will typically be quite small after we've had an > overflow and new may well overtake it. In fact, I suggested to keep the overflow information to handle the case by reading hw counter overflow flag, but looks it is a bit frail to sync overflow flag with the counter value in non-interrupt situation, so I agree with you to remove 'overflow' parameter from armpmu_event_update. > >> But I am afraid that the issue still can be triggered in sample-based situation, >> especially in very high frequency case: suppose the sample freq is 10000, >> 100us IRQ delay may trigger the issue. > > Not sure I follow. If the frequency is 10000, then we write 0xffffd8f0 to > the counter. That means we have a 0xffffd8f0 event window to read the The frequency I described is the 'freq' from '-F freq'. On OMAP4, when the 'freq' is 10000 and the interval for one samle is 100us, the observed counter ('left' variable in armpmu_event_set_period) is about 90000, so the written value to the hw counter is 0xFFFEA06F, looks the window is wide enough, and we may not consider the issue in a737823d for sample-based profiling. > counter after it overflows before new overtakes prev and we get confused. > If this passed in 100us then either your clock speed is 4.3*10^12Hz or you > have a seriously wide issue :) On OMAP4, I can observe that about 19800 sample events can be generated with the below command: 'perf record -e cycles -F 10000 ./noploop 2&& perf report -D | tail -20' So the above result looks not bad, :-) > >> So we may use the overflow information to make perf more robust, IMO. > > There's a trade off between allowing the counter to wrap back around past > its previous value or being able to handle overflow on a non-interrupt path. I agree, it is not easy to read overflow flag and counter accurately from hardware directly on a non-interrupt path. So do you need me to prepare a new patch, or you will do it by yourself? thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: oprofile and ARM A9 hardware counter 2012-02-20 3:19 ` Ming Lei @ 2012-02-20 9:44 ` Will Deacon 0 siblings, 0 replies; 12+ messages in thread From: Will Deacon @ 2012-02-20 9:44 UTC (permalink / raw) To: Ming Lei Cc: Peter Zijlstra, eranian, Shilimkar, Santosh, David Long, b-cousson, mans, linux-arm, Ingo Molnar, Linux Kernel Mailing List On Mon, Feb 20, 2012 at 03:19:02AM +0000, Ming Lei wrote: > On Fri, Feb 17, 2012 at 6:26 PM, Will Deacon <will.deacon@arm.com> wrote: > > Not sure I follow. If the frequency is 10000, then we write 0xffffd8f0 to > > the counter. That means we have a 0xffffd8f0 event window to read the > > The frequency I described is the 'freq' from '-F freq'. On OMAP4, when > the 'freq' > is 10000 and the interval for one samle is 100us, the observed counter > ('left' variable in armpmu_event_set_period) is about 90000, so the written > value to the hw counter is 0xFFFEA06F, looks the window is wide enough, > and we may not consider the issue in a737823d for sample-based profiling. My mistake - I thought you were referring to a period of 10000, but similar logic follows. > > There's a trade off between allowing the counter to wrap back around past > > its previous value or being able to handle overflow on a non-interrupt path. > > I agree, it is not easy to read overflow flag and counter accurately > from hardware > directly on a non-interrupt path. > > So do you need me to prepare a new patch, or you will do it by yourself? If the last one I posted is alright to you (seems that it is), then I'll update it with the removal of the overflow flag and post it to the list with you on CC. I also need to get to the bottom of those warnings. Thanks, Will ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-02-20 9:44 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1328578047.1724.17.camel@dave-Dell-System-XPS-L502X> [not found] ` <CAMQu2gwfo5JXAqQaLUNs7C7J3TUhEO2zOcyD0Rk-D_O61Yrfag@mail.gmail.com> [not found] ` <CAMsRxfLNBbQO5XF+EHJqNsnW+s=ay3mjSV5dh=sxdwzktUu03g@mail.gmail.com> [not found] ` <CAMQu2gzfNAwtf1c6jrTZpfGMSqBgBrQKmFTeCFzbvMh9ESBDUg@mail.gmail.com> [not found] ` <CAMsRxfKR1ODH56BtcUT5Dv6qOVEYGVheEcW9ugXsZmLKok==bg@mail.gmail.com> [not found] ` <CAMQu2gwWt6oQPjxc1YHgOoxVkHdck_q78Xf==7ncwRj0_uS-JQ@mail.gmail.com> [not found] ` <CAMQu2gyCPtbNfBa5V8ve2JORN+sDeAY+hvR0g0_9U354JgcuiA@mail.gmail.com> [not found] ` <CAMsRxfJVEgVJoHcaPQbMg5mwz=tZUce4oibDUxtnevSF8bGY9g@mail.gmail.com> [not found] ` <CAMQu2gxVUqDF2HkPD=QeX0N_tUd=9FTd+T+1LHCR0tsCgQxC8Q@mail.gmail.com> [not found] ` <CAMsRxfJH+9MS=Svmh0BvDSb8xqsPe2HPobkeKcn8JO35npjhwQ@mail.gmail.com> 2012-02-08 2:24 ` oprofile and ARM A9 hardware counter Ming Lei 2012-02-15 16:38 ` Peter Zijlstra 2012-02-16 10:25 ` Ming Lei 2012-02-16 15:00 ` Will Deacon 2012-02-16 16:12 ` Ming Lei 2012-02-16 16:19 ` Peter Zijlstra 2012-02-16 16:37 ` Ming Lei 2012-02-16 18:08 ` Will Deacon 2012-02-17 5:24 ` Ming Lei 2012-02-17 10:26 ` Will Deacon 2012-02-20 3:19 ` Ming Lei 2012-02-20 9:44 ` Will Deacon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).