* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
@ 2010-07-02 12:44 Will Deacon
2010-07-02 18:05 ` Jamie Iles
0 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2010-07-02 12:44 UTC (permalink / raw)
To: linux-arm-kernel
Hardware performance counters on ARM are 32-bits wide but
atomic64_t variables are used to represent counter data
in the hw_perf_event structure.
The armpmu_event_update function right-shifts a signed
64-bit delta variable and adds the result to the event count.
This can lead to shifting in sign-bits if the MSB of the
32-bit counter value is set. This results in perf output such as:
Performance counter stats for 'sleep 20':
18446744073460670464 cycles <-- 0xFFFFFFFFF12A6000
7783773 instructions # 0.000 IPC
465 context-switches
161 page-faults
1172393 branches
20.154242147 seconds time elapsed
This patch ensures that only the bottom 32 bits of the delta
value are used.
Cc: Jamie Iles <jamie.iles@picochip.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/kernel/perf_event.c | 7 ++-----
1 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index c457686..c7f7a14 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -199,9 +199,7 @@ armpmu_event_update(struct perf_event *event,
struct hw_perf_event *hwc,
int idx)
{
- int shift = 64 - 32;
- s64 prev_raw_count, new_raw_count;
- s64 delta;
+ s64 prev_raw_count, new_raw_count, delta;
again:
prev_raw_count = atomic64_read(&hwc->prev_count);
@@ -211,8 +209,7 @@ again:
new_raw_count) != prev_raw_count)
goto again;
- delta = (new_raw_count << shift) - (prev_raw_count << shift);
- delta >>= shift;
+ delta = (new_raw_count - prev_raw_count) & 0xffffffff;
atomic64_add(delta, &event->count);
atomic64_sub(delta, &hwc->period_left);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
2010-07-02 18:05 ` Jamie Iles
@ 2010-07-02 13:22 ` Will Deacon
2010-07-02 13:38 ` Will Deacon
1 sibling, 0 replies; 8+ messages in thread
From: Will Deacon @ 2010-07-02 13:22 UTC (permalink / raw)
To: linux-arm-kernel
Hi Jamie,
> Well spotted! I think this may have actually been a typo when porting to ARM
> from the sparc and x86 code, and this should address it so we do the same:
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index 9e70f20..6c0f3ca 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
> int idx)
> {
> int shift = 64 - 32;
> - s64 prev_raw_count, new_raw_count;
> + u64 prev_raw_count, new_raw_count;
> s64 delta;
>
> again:
I wondered about this approach, but I couldn't convince myself it was
correct. When armpmu_event_set_period updates hwc->prev_count, it sets
the top 32-bits to 1. When we do the cmpxchg in armpmu_event_update, we
will only be writing the bottom 32-bits. This means that prev_count
alternates between s64 and u64 and I'm not sure what the rest of the perf
framework will do in that case.
Tell you what, I'll make the change and see what happens to the numbers...
Will
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
2010-07-02 18:05 ` Jamie Iles
2010-07-02 13:22 ` Will Deacon
@ 2010-07-02 13:38 ` Will Deacon
2010-07-02 18:48 ` Jamie Iles
1 sibling, 1 reply; 8+ messages in thread
From: Will Deacon @ 2010-07-02 13:38 UTC (permalink / raw)
To: linux-arm-kernel
Ok, the results are in!
> Well spotted! I think this may have actually been a typo when porting to ARM
> from the sparc and x86 code, and this should address it so we do the same:
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index 9e70f20..6c0f3ca 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
> int idx)
> {
> int shift = 64 - 32;
> - s64 prev_raw_count, new_raw_count;
> + u64 prev_raw_count, new_raw_count;
> s64 delta;
>
> again:
Performance counter stats for 'git status':
3650781413 cycles
289950734 instructions # 0.079 IPC
144882 context-switches
13677 page-faults
473580406 branches
82.426290000 seconds time elapsed
Which looks insane to me. The IPC is appalling and we've taken
more branches than we've executed instructions!
Will
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
2010-07-02 18:48 ` Jamie Iles
@ 2010-07-02 14:17 ` Will Deacon
2010-07-02 20:04 ` Jamie Iles
0 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2010-07-02 14:17 UTC (permalink / raw)
To: linux-arm-kernel
> > Performance counter stats for 'git status':
> >
> > 3650781413 cycles
> > 289950734 instructions # 0.079 IPC
> > 144882 context-switches
> > 13677 page-faults
> > 473580406 branches
> >
> > 82.426290000 seconds time elapsed
> >
> >
> > Which looks insane to me. The IPC is appalling and we've taken
> > more branches than we've executed instructions!
> No, that certainly doesn't look right. Referring back your earlier email, when
> we do the cmpxchng, shouldn't it get sign extended then? An atomic64_t should
> be signed, but looking at the arm implementation, we're using u64's. Could
> this have anything to do with it? The generic implementation in lib/atomic64.c
> uses 'long long's.
It's a 64-bit cmpxchg and we're passing in 64-bit arguments so I don't
think any sign-extension should occur. The real issue is that we're taking
the difference between two u32 quantities (prev_raw_count and new_raw_count)
but using s64s, so when the delta is shifted right we treat the u32s as signed
when they're not.
> Providing that the sign extension does work, I've had a quick play with some
> uint64_t's and int64_t's in userspace and I think the algorithm is ok. Is the
> assumption of atomic64_t's being signed wrong?
I don't think the signedness of atomic64_t actually matters because everything
is handled in assembly anyway.
If you do something like:
#include <stdio.h>
typedef unsigned long u32;
typedef signed long long s64;
typedef unsigned long long u64;
int main(void)
{
u64 x = 0xffffffff;
u64 y = 0x0fffffff;
s64 z = (x << 32) - (y << 32);
z >>= 32;
printf("0x%llx\n", z);
return 0;
}
You'll get the sign-extended number printed out. Actually, a much neater
fix to this problem is to make delta unsigned and leave everything else
as-is (although I'm not a massive fan of all the 64-bit shifting).
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index c457686..de12536 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -201,7 +201,7 @@ armpmu_event_update(struct perf_event *event,
{
int shift = 64 - 32;
s64 prev_raw_count, new_raw_count;
- s64 delta;
+ u64 delta;
again:
prev_raw_count = atomic64_read(&hwc->prev_count);
What do you think?
Will
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
2010-07-02 20:04 ` Jamie Iles
@ 2010-07-02 15:36 ` Will Deacon
0 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2010-07-02 15:36 UTC (permalink / raw)
To: linux-arm-kernel
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index c457686..de12536 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -201,7 +201,7 @@ armpmu_event_update(struct perf_event *event,
> > {
> > int shift = 64 - 32;
> > s64 prev_raw_count, new_raw_count;
> > - s64 delta;
> > + u64 delta;
> >
> > again:
> > prev_raw_count = atomic64_read(&hwc->prev_count);
> >
> >
> > What do you think?
> Yes, that looks good to me.
>
> Acked-by: Jamie Iles <jamie.iles@picochip.com>
Thanks. I'll submit this to the patch system with an updated
description and your ack.
Will
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
2010-07-02 12:44 [PATCH] ARM: perf: ensure counter delta is limited to 32-bits Will Deacon
@ 2010-07-02 18:05 ` Jamie Iles
2010-07-02 13:22 ` Will Deacon
2010-07-02 13:38 ` Will Deacon
0 siblings, 2 replies; 8+ messages in thread
From: Jamie Iles @ 2010-07-02 18:05 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jul 02, 2010 at 01:44:02PM +0100, Will Deacon wrote:
> Hardware performance counters on ARM are 32-bits wide but
> atomic64_t variables are used to represent counter data
> in the hw_perf_event structure.
>
> The armpmu_event_update function right-shifts a signed
> 64-bit delta variable and adds the result to the event count.
> This can lead to shifting in sign-bits if the MSB of the
> 32-bit counter value is set. This results in perf output such as:
>
> Performance counter stats for 'sleep 20':
>
> 18446744073460670464 cycles <-- 0xFFFFFFFFF12A6000
> 7783773 instructions # 0.000 IPC
> 465 context-switches
> 161 page-faults
> 1172393 branches
> 20.154242147 seconds time elapsed
>
> This patch ensures that only the bottom 32 bits of the delta
> value are used.
>
> Cc: Jamie Iles <jamie.iles@picochip.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
Hi Will,
Well spotted! I think this may have actually been a typo when porting to ARM
from the sparc and x86 code, and this should address it so we do the same:
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 9e70f20..6c0f3ca 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
int idx)
{
int shift = 64 - 32;
- s64 prev_raw_count, new_raw_count;
+ u64 prev_raw_count, new_raw_count;
s64 delta;
again:
This way we can cope with negative deltas if we ever need to.
Jamie
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
2010-07-02 13:38 ` Will Deacon
@ 2010-07-02 18:48 ` Jamie Iles
2010-07-02 14:17 ` Will Deacon
0 siblings, 1 reply; 8+ messages in thread
From: Jamie Iles @ 2010-07-02 18:48 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jul 02, 2010 at 02:38:58PM +0100, Will Deacon wrote:
> Ok, the results are in!
>
> > Well spotted! I think this may have actually been a typo when porting to ARM
> > from the sparc and x86 code, and this should address it so we do the same:
> >
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index 9e70f20..6c0f3ca 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
> > int idx)
> > {
> > int shift = 64 - 32;
> > - s64 prev_raw_count, new_raw_count;
> > + u64 prev_raw_count, new_raw_count;
> > s64 delta;
> >
> > again:
>
> Performance counter stats for 'git status':
>
> 3650781413 cycles
> 289950734 instructions # 0.079 IPC
> 144882 context-switches
> 13677 page-faults
> 473580406 branches
>
> 82.426290000 seconds time elapsed
>
>
> Which looks insane to me. The IPC is appalling and we've taken
> more branches than we've executed instructions!
No, that certainly doesn't look right. Referring back your earlier email, when
we do the cmpxchng, shouldn't it get sign extended then? An atomic64_t should
be signed, but looking at the arm implementation, we're using u64's. Could
this have anything to do with it? The generic implementation in lib/atomic64.c
uses 'long long's.
Providing that the sign extension does work, I've had a quick play with some
uint64_t's and int64_t's in userspace and I think the algorithm is ok. Is the
assumption of atomic64_t's being signed wrong?
Jamie
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
2010-07-02 14:17 ` Will Deacon
@ 2010-07-02 20:04 ` Jamie Iles
2010-07-02 15:36 ` Will Deacon
0 siblings, 1 reply; 8+ messages in thread
From: Jamie Iles @ 2010-07-02 20:04 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jul 02, 2010 at 03:17:23PM +0100, Will Deacon wrote:
> I don't think the signedness of atomic64_t actually matters because everything
> is handled in assembly anyway.
Good point. I guess we could probably cast the new value to an s32 before
doing the cmpxchng to get the sign extension, but a lot of this stuff isn't
obvious anyway and your fix is certainly cleaner.
[...]
>
> Actually, a much neater fix to this problem is to make delta unsigned and
> leave everything else as-is (although I'm not a massive fan of all the
> 64-bit shifting).
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index c457686..de12536 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -201,7 +201,7 @@ armpmu_event_update(struct perf_event *event,
> {
> int shift = 64 - 32;
> s64 prev_raw_count, new_raw_count;
> - s64 delta;
> + u64 delta;
>
> again:
> prev_raw_count = atomic64_read(&hwc->prev_count);
>
>
> What do you think?
Yes, that looks good to me.
Acked-by: Jamie Iles <jamie.iles@picochip.com>
Jamie
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-07-02 20:04 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-02 12:44 [PATCH] ARM: perf: ensure counter delta is limited to 32-bits Will Deacon
2010-07-02 18:05 ` Jamie Iles
2010-07-02 13:22 ` Will Deacon
2010-07-02 13:38 ` Will Deacon
2010-07-02 18:48 ` Jamie Iles
2010-07-02 14:17 ` Will Deacon
2010-07-02 20:04 ` Jamie Iles
2010-07-02 15:36 ` Will Deacon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.