All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
@ 2010-07-02 12:44 Will Deacon
  2010-07-02 18:05 ` Jamie Iles
  0 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2010-07-02 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hardware performance counters on ARM are 32-bits wide but
atomic64_t variables are used to represent counter data
in the hw_perf_event structure.

The armpmu_event_update function right-shifts a signed
64-bit delta variable and adds the result to the event count.
This can lead to shifting in sign-bits if the MSB of the
32-bit counter value is set. This results in perf output such as:

 Performance counter stats for 'sleep 20':

 18446744073460670464  cycles             <-- 0xFFFFFFFFF12A6000
        7783773  instructions             #      0.000 IPC
            465  context-switches
            161  page-faults
        1172393  branches

   20.154242147  seconds time elapsed

This patch ensures that only the bottom 32 bits of the delta
value are used.

Cc: Jamie Iles <jamie.iles@picochip.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/kernel/perf_event.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index c457686..c7f7a14 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -199,9 +199,7 @@ armpmu_event_update(struct perf_event *event,
 		    struct hw_perf_event *hwc,
 		    int idx)
 {
-	int shift = 64 - 32;
-	s64 prev_raw_count, new_raw_count;
-	s64 delta;
+	s64 prev_raw_count, new_raw_count, delta;
 
 again:
 	prev_raw_count = atomic64_read(&hwc->prev_count);
@@ -211,8 +209,7 @@ again:
 			     new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count << shift) - (prev_raw_count << shift);
-	delta >>= shift;
+	delta = (new_raw_count - prev_raw_count) & 0xffffffff;
 
 	atomic64_add(delta, &event->count);
 	atomic64_sub(delta, &hwc->period_left);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
  2010-07-02 18:05 ` Jamie Iles
@ 2010-07-02 13:22   ` Will Deacon
  2010-07-02 13:38   ` Will Deacon
  1 sibling, 0 replies; 8+ messages in thread
From: Will Deacon @ 2010-07-02 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jamie,

> Well spotted! I think this may have actually been a typo when porting to ARM
> from the sparc and x86 code, and this should address it so we do the same:
> 
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index 9e70f20..6c0f3ca 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
>  		    int idx)
>  {
>  	int shift = 64 - 32;
> -	s64 prev_raw_count, new_raw_count;
> +	u64 prev_raw_count, new_raw_count;
>  	s64 delta;
> 
>  again:

I wondered about this approach, but I couldn't convince myself it was
correct. When armpmu_event_set_period updates hwc->prev_count, it sets
the top 32-bits to 1. When we do the cmpxchg in armpmu_event_update, we
will only be writing the bottom 32-bits. This means that prev_count
alternates between s64 and u64 and I'm not sure what the rest of the perf
framework will do in that case.

Tell you what, I'll make the change and see what happens to the numbers...

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
  2010-07-02 18:05 ` Jamie Iles
  2010-07-02 13:22   ` Will Deacon
@ 2010-07-02 13:38   ` Will Deacon
  2010-07-02 18:48     ` Jamie Iles
  1 sibling, 1 reply; 8+ messages in thread
From: Will Deacon @ 2010-07-02 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

Ok, the results are in!

> Well spotted! I think this may have actually been a typo when porting to ARM
> from the sparc and x86 code, and this should address it so we do the same:
> 
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index 9e70f20..6c0f3ca 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
>  		    int idx)
>  {
>  	int shift = 64 - 32;
> -	s64 prev_raw_count, new_raw_count;
> +	u64 prev_raw_count, new_raw_count;
>  	s64 delta;
> 
>  again:

 Performance counter stats for 'git status':

         3650781413  cycles                  
          289950734  instructions             #      0.079 IPC  
             144882  context-switches        
              13677  page-faults             
          473580406  branches                

       82.426290000  seconds time elapsed


Which looks insane to me. The IPC is appalling and we've taken
more branches than we've executed instructions!

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
  2010-07-02 18:48     ` Jamie Iles
@ 2010-07-02 14:17       ` Will Deacon
  2010-07-02 20:04         ` Jamie Iles
  0 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2010-07-02 14:17 UTC (permalink / raw)
  To: linux-arm-kernel

> >  Performance counter stats for 'git status':
> >
> >          3650781413  cycles
> >           289950734  instructions             #      0.079 IPC
> >              144882  context-switches
> >               13677  page-faults
> >           473580406  branches
> >
> >        82.426290000  seconds time elapsed
> >
> >
> > Which looks insane to me. The IPC is appalling and we've taken
> > more branches than we've executed instructions!
> No, that certainly doesn't look right. Referring back your earlier email, when
> we do the cmpxchng, shouldn't it get sign extended then? An atomic64_t should
> be signed, but looking at the arm implementation, we're using u64's. Could
> this have anything to do with it? The generic implementation in lib/atomic64.c
> uses 'long long's.

It's a 64-bit cmpxchg and we're passing in 64-bit arguments so I don't
think any sign-extension should occur. The real issue is that we're taking
the difference between two u32 quantities (prev_raw_count and new_raw_count)
but using s64s, so when the delta is shifted right we treat the u32s as signed
when they're not.
 
> Providing that the sign extension does work, I've had a quick play with some
> uint64_t's and int64_t's in userspace and I think the algorithm is ok. Is the
> assumption of atomic64_t's being signed wrong?

I don't think the signedness of atomic64_t actually matters because everything
is handled in assembly anyway.

If you do something like:

#include <stdio.h>

typedef unsigned long u32;
typedef signed long long s64;
typedef unsigned long long u64;

int main(void)
{
	u64 x = 0xffffffff;
	u64 y = 0x0fffffff;
	s64 z = (x << 32) - (y << 32);
	z >>= 32;
	printf("0x%llx\n", z);
	return 0;
}

You'll get the sign-extended number printed out. Actually, a much neater
fix to this problem is to make delta unsigned and leave everything else
as-is (although I'm not a massive fan of all the 64-bit shifting).

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index c457686..de12536 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -201,7 +201,7 @@ armpmu_event_update(struct perf_event *event,
 {
        int shift = 64 - 32;
        s64 prev_raw_count, new_raw_count;
-       s64 delta;
+       u64 delta;
 
 again:
        prev_raw_count = atomic64_read(&hwc->prev_count);


What do you think?

Will

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
  2010-07-02 20:04         ` Jamie Iles
@ 2010-07-02 15:36           ` Will Deacon
  0 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2010-07-02 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index c457686..de12536 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -201,7 +201,7 @@ armpmu_event_update(struct perf_event *event,
> >  {
> >         int shift = 64 - 32;
> >         s64 prev_raw_count, new_raw_count;
> > -       s64 delta;
> > +       u64 delta;
> >
> >  again:
> >         prev_raw_count = atomic64_read(&hwc->prev_count);
> >
> >
> > What do you think?
> Yes, that looks good to me.
> 
> Acked-by: Jamie Iles <jamie.iles@picochip.com>

Thanks. I'll submit this to the patch system with an updated
description and your ack.

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
  2010-07-02 12:44 [PATCH] ARM: perf: ensure counter delta is limited to 32-bits Will Deacon
@ 2010-07-02 18:05 ` Jamie Iles
  2010-07-02 13:22   ` Will Deacon
  2010-07-02 13:38   ` Will Deacon
  0 siblings, 2 replies; 8+ messages in thread
From: Jamie Iles @ 2010-07-02 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 02, 2010 at 01:44:02PM +0100, Will Deacon wrote:
> Hardware performance counters on ARM are 32-bits wide but
> atomic64_t variables are used to represent counter data
> in the hw_perf_event structure.
> 
> The armpmu_event_update function right-shifts a signed
> 64-bit delta variable and adds the result to the event count.
> This can lead to shifting in sign-bits if the MSB of the
> 32-bit counter value is set. This results in perf output such as:
> 
>  Performance counter stats for 'sleep 20':
> 
>  18446744073460670464  cycles             <-- 0xFFFFFFFFF12A6000
>         7783773  instructions             #      0.000 IPC
>             465  context-switches
>             161  page-faults
>         1172393  branches
>    20.154242147  seconds time elapsed
> 
> This patch ensures that only the bottom 32 bits of the delta
> value are used.
> 
> Cc: Jamie Iles <jamie.iles@picochip.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
Hi Will,

Well spotted! I think this may have actually been a typo when porting to ARM
from the sparc and x86 code, and this should address it so we do the same:

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 9e70f20..6c0f3ca 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
 		    int idx)
 {
 	int shift = 64 - 32;
-	s64 prev_raw_count, new_raw_count;
+	u64 prev_raw_count, new_raw_count;
 	s64 delta;
 
 again:

This way we can cope with negative deltas if we ever need to.

Jamie

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
  2010-07-02 13:38   ` Will Deacon
@ 2010-07-02 18:48     ` Jamie Iles
  2010-07-02 14:17       ` Will Deacon
  0 siblings, 1 reply; 8+ messages in thread
From: Jamie Iles @ 2010-07-02 18:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 02, 2010 at 02:38:58PM +0100, Will Deacon wrote:
> Ok, the results are in!
> 
> > Well spotted! I think this may have actually been a typo when porting to ARM
> > from the sparc and x86 code, and this should address it so we do the same:
> > 
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index 9e70f20..6c0f3ca 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -164,7 +164,7 @@ armpmu_event_update(struct perf_event *event,
> >  		    int idx)
> >  {
> >  	int shift = 64 - 32;
> > -	s64 prev_raw_count, new_raw_count;
> > +	u64 prev_raw_count, new_raw_count;
> >  	s64 delta;
> > 
> >  again:
> 
>  Performance counter stats for 'git status':
> 
>          3650781413  cycles                  
>           289950734  instructions             #      0.079 IPC  
>              144882  context-switches        
>               13677  page-faults             
>           473580406  branches                
> 
>        82.426290000  seconds time elapsed
> 
> 
> Which looks insane to me. The IPC is appalling and we've taken
> more branches than we've executed instructions!
No, that certainly doesn't look right. Referring back your earlier email, when
we do the cmpxchng, shouldn't it get sign extended then? An atomic64_t should
be signed, but looking at the arm implementation, we're using u64's. Could
this have anything to do with it? The generic implementation in lib/atomic64.c
uses 'long long's.

Providing that the sign extension does work, I've had a quick play with some
uint64_t's and int64_t's in userspace and I think the algorithm is ok. Is the
assumption of atomic64_t's being signed wrong?

Jamie

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] ARM: perf: ensure counter delta is limited to 32-bits
  2010-07-02 14:17       ` Will Deacon
@ 2010-07-02 20:04         ` Jamie Iles
  2010-07-02 15:36           ` Will Deacon
  0 siblings, 1 reply; 8+ messages in thread
From: Jamie Iles @ 2010-07-02 20:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 02, 2010 at 03:17:23PM +0100, Will Deacon wrote:
> I don't think the signedness of atomic64_t actually matters because everything
> is handled in assembly anyway.
Good point. I guess we could probably cast the new value to an s32 before
doing the cmpxchng to get the sign extension, but a lot of this stuff isn't
obvious anyway and your fix is certainly cleaner.

[...]
> 
> Actually, a much neater fix to this problem is to make delta unsigned and
> leave everything else as-is (although I'm not a massive fan of all the
> 64-bit shifting).
> 
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index c457686..de12536 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -201,7 +201,7 @@ armpmu_event_update(struct perf_event *event,
>  {
>         int shift = 64 - 32;
>         s64 prev_raw_count, new_raw_count;
> -       s64 delta;
> +       u64 delta;
>  
>  again:
>         prev_raw_count = atomic64_read(&hwc->prev_count);
> 
> 
> What do you think?
Yes, that looks good to me.

Acked-by: Jamie Iles <jamie.iles@picochip.com>

Jamie

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-07-02 20:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-02 12:44 [PATCH] ARM: perf: ensure counter delta is limited to 32-bits Will Deacon
2010-07-02 18:05 ` Jamie Iles
2010-07-02 13:22   ` Will Deacon
2010-07-02 13:38   ` Will Deacon
2010-07-02 18:48     ` Jamie Iles
2010-07-02 14:17       ` Will Deacon
2010-07-02 20:04         ` Jamie Iles
2010-07-02 15:36           ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.