All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] perf: ARMv7 wrong "branches" generalized instruction
@ 2011-08-10 17:40 Vince Weaver
  2011-08-10 18:33 ` Will Deacon
  0 siblings, 1 reply; 10+ messages in thread
From: Vince Weaver @ 2011-08-10 17:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, sam wang, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian

Hello

Sam Wang reported to me that my perf_event validation tests were failing 
with branches on ARM Cortex A9.

It turns out the branches event used (ARMV7_PERFCTR_PC_WRITE) only seems
to count taken branches.

ARMV7_PERFCTR_PC_IMM_BRANCH seems to do a better job of counting both 
taken and not-taken.  So I've attached a patch to change the definition
for Cotex A9.

This might be needed for Cortex A8 but I don't have a machine to test on 
(yet).

I'm assuming this is a proper fix.  The "generalized" events aren't 
defined very well so there's always some wiggle room about what they mean.

Patch tested on a Pandaboard.

The test code looks like this.  There should be 500,000*3 branches.  But
the second branch (the not taken "bge test_jmp2") is not counted with 
PC_WRITE.

        asm(    "\teor r3,r3,r3\n"
                "\tldr r3,=500000\n"
                "test_loop:\n"
                "\tB test_jmp\n"
                "\tnop\n"
                "test_jmp:\n"
                "\teor r2,r2,r2\n"
                "\tcmp r2,#1\n"
                "\tbge test_jmp2\n"     
                "\tnop\n"
                "\tadd r2,r2,#1\n"
                "test_jmp2:\n"
                "\tsub r3,r3,#1\n"
                "\tcmp r3,#1\n"
                "\tbgt test_loop\n"
                : /* no output registers */
                : /* no inputs           */
                : "cc", "r2", "r3" /* clobbered */
        );


Vince
vweaver1@eecs.utk.edu

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 4c85183..4d11bd5 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -323,7 +323,7 @@ static const unsigned armv7_a9_perf_map[PERF_COUNT_HW_MAX] = {
 					ARMV7_PERFCTR_INST_OUT_OF_RENAME_STAGE,
 	[PERF_COUNT_HW_CACHE_REFERENCES]    = ARMV7_PERFCTR_COHERENT_LINE_HIT,
 	[PERF_COUNT_HW_CACHE_MISSES]	    = ARMV7_PERFCTR_COHERENT_LINE_MISS,
-	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV7_PERFCTR_PC_WRITE,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV7_PERFCTR_PC_IMM_BRANCH,
 	[PERF_COUNT_HW_BRANCH_MISSES]	    = ARMV7_PERFCTR_PC_BRANCH_MIS_PRED,
 	[PERF_COUNT_HW_BUS_CYCLES]	    = ARMV7_PERFCTR_CLOCK_CYCLES,
 };


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-08-15 11:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10 17:40 [patch] perf: ARMv7 wrong "branches" generalized instruction Vince Weaver
2011-08-10 18:33 ` Will Deacon
2011-08-10 19:01   ` Vince Weaver
2011-08-10 19:16     ` Måns Rullgård
2011-08-10 22:07     ` Will Deacon
2011-08-11  8:15       ` Ingo Molnar
2011-08-11  9:16         ` Will Deacon
2011-08-12 10:34           ` Ingo Molnar
2011-08-15 11:18             ` Will Deacon
2011-08-12  4:35         ` Vince Weaver

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.