All of lore.kernel.org
 help / color / mirror / Atom feed
* Instruction/Cycle Counting in Guest Using the Kvm PMU
@ 2018-11-23  9:36 Jan Bolke
  2018-11-23 12:29 ` James Morse
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Bolke @ 2018-11-23  9:36 UTC (permalink / raw)
  To: kvmarm


[-- Attachment #1.1: Type: text/plain, Size: 2635 bytes --]

Hi,

I am not sure if this question is well-placed here, so sorry if it misses the purpose of this mailing list.

My name is Jan and i am currently writing my master's thesis.
I am using the Kvm Api and try to integrate it as an instruction set simulator in a SystemC environment.

Anyway,

I need some mechanism to count executed instructions in the guest (or cycles).
Currently I am trying to use the emulated PMU cycle counter in the guest to get the number of executed cycles in the guest.

I am working on Arm64 and use Linux Kernel 4.14.33.
I create the PMU device without creating a in-kernel vgic.

Basically I create a vcpu and run some bare metal code.
For convienence, I append the critical assembler snippet.

I configure the counter, then start the counter, execute 3 or 4 dummy instructions and read the counter again and then exit the guest with an exit_mmio.
I assumed the value should be a very small number, as the guest only executed a few instructions.

The thing is as I read the counter, the value is something like 2970 or 0 (changes in each run).
So to me it looks like the counter is also counting the cycles for instruction emulation in the host, am I right?

Is it possible to just count the cycles in the guest from the guests's point of view?
I read the kvm-api.txt Documentation and the other documents a few times and tried different approaches, so this mailing list is my last resort.

Thanks in advance!

Regards
Jan

--------------------------------------------------

APPENDIX:
// we are in el1
// init system registers
LDR X1, =0x30C50838
MSR SCTLR_EL1, X1

// enable access to pmu counters from el0
mov x0, 0xff
mrs x1, currentel
mrs x7, pmuserenr_el0
orr x7, x7, #0b1111
msr pmuserenr_el0, x7

// set pmcr register (control register)
//enable long counter, count every cycle and enable counters
mrs x5, pmcr_el0
orr x5, x5, #0b1
orr x5, x5, #(1<<6)
eor x5, x5, #(1<<3)
eor x5, x5, #(1<<5)
msr pmcr_el0, x5

// read mvccfiltr register (only enable counting of el1)
mrs x6, pmccfiltr_el0
mov x6, #(1<<30)
msr pmccfiltr_el0, x6

// get interrupt configuration and clear overflow bit
mrs x9, pmintenset_el1
mov x8, #(1<<31)
msr pmovsclr_el0, x8

// write counter
mov x0, #0x0
msr pmccntr_el0, x0 // write counter

// enable cycle counter
mov x1, #(1<<31)
msr pmcntenset_el0, x1
mov x0, #0x2 */

// dummy instruction and provoke mmio-exit
mov x1, #0x3
add x2, x0, x1
mov x2, 0x5000

//read counter
mrs x1, pmccntr_el0

// read overflow
mrs x8, pmovsclr_el0

// provoke mmio exit (0x500 is not mapped)
ldr x3, [x2]

[-- Attachment #1.2: Type: text/html, Size: 7605 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instruction/Cycle Counting in Guest Using the Kvm PMU
  2018-11-23  9:36 Instruction/Cycle Counting in Guest Using the Kvm PMU Jan Bolke
@ 2018-11-23 12:29 ` James Morse
  2018-11-23 13:27   ` Andrew Murray
  2018-11-23 17:36   ` James Morse
  0 siblings, 2 replies; 6+ messages in thread
From: James Morse @ 2018-11-23 12:29 UTC (permalink / raw)
  To: Jan Bolke, Andrew Murray; +Cc: kvmarm

Hi Jan,

(CC: +Andrew)

On 23/11/2018 09:36, Jan Bolke wrote:
> I am not sure if this question is well-placed here, so sorry if it misses the
> purpose of this mailing list.

arm64? kvm? Sounds like you've come to the right place!


> I am using the Kvm Api and try to integrate it as an instruction set simulator
> in a SystemC environment.


> I need some mechanism to count executed instructions in the guest (or cycles).
> 
> Currently I am trying to use the emulated PMU cycle counter in the guest to get
> the number of executed cycles in the guest.
> 
> I am working on Arm64 and use Linux Kernel 4.14.33.
> 
> I create the PMU device without creating a in-kernel vgic.

> I configure the counter, then start the counter, execute 3 or 4 dummy
> instructions and read the counter again and then exit the guest with an exit_mmio.
> 
> I assumed the value should be a very small number, as the guest only executed a
> few instructions.

(some of which are system register writes, which can take a long time)


> The thing is as I read the counter, the value is something like 2970 or 0
> (changes in each run).

You are missing some barriers in your assembly snippet. 0 is a good indication
that the code you wanted to measure escaped the measurement-window!


> So to me it looks like the counter is also counting the cycles for instruction
> emulation in the host, am I right?

I'd assume not, but I don't know anything about the PMU.
Andrew Murray posted a series[0] that did some stuff with starting/stopping the
the counters around the guest, but I think that was just for the host making
measurements of itself, or the guest.


KVM emulates parts of the PMU, so your measurements may be too noisy for such
small windows of code.

It might be easier to count instructions from outside the guest using perf. I
think Andrew's series is making that more reliable.


> Is it possible to just count the cycles in the guest from the guests’s point of
> view?
> 
> I read the kvm-api.txt Documentation and the other documents a few times and
> tried different approaches, so this mailing list is my last resort.


> APPENDIX:
> 
> // we are in el1
> 
> // init system registers
> LDR X1, =0x30C50838
> MSR SCTLR_EL1, X1

isb

If the next instructions depend on any of the bits you set in sctrl, you need to
make sure the cpu has synchronised this state-change before the next instruction
is executed. Otherwise (depending on the CPU) the intended side-effects only
come into effect some number of instructions later.


> // enable access to pmu counters from el0
> mov x0, 0xff
> mrs x1, currentel

> mrs x7, pmuserenr_el0
> orr x7, x7, #0b1111
> msr pmuserenr_el0, x7

Why do you need to do this? Running from EL1 the values in this register should
have no effect.


> // set pmcr register (control register)
> 
> //enable long counter, count every cycle and enable counters
> mrs x5, pmcr_el0
> orr x5, x5, #0b1
> orr x5, x5, #(1<<6)
> eor x5, x5, #(1<<3)

> eor x5, x5, #(1<<5)

(looks like this bit has no effect on the 'normal world')

> msr pmcr_el0, x5


> // read mvccfiltr register (only enable counting of el1)
> 
> mrs x6, pmccfiltr_el0
> 
> mov x6, #(1<<30)

This bit only effects EL0.

> msr pmccfiltr_el0, x6



> // get interrupt configuration and clear overflow bit
> 
> mrs x9, pmintenset_el1

You never use x9 after this. What did you want to do with this register?
(I assume its debug)


> mov x8, #(1<<31)
> msr pmovsclr_el0, x8


> // write counter
> mov x0, #0x0
> msr pmccntr_el0, x0 // write counter

> // enable cycle counter
> mov x1, #(1<<31)
> msr pmcntenset_el0, x1
> mov x0, #0x2 */


> // dummy instruction and provoke mmio-exit
> mov x1, #0x3
> add x2, x0, x1
> mov x2, 0x5000

> //read counter
> mrs x1, pmccntr_el0

At this point all the system register writes since the last 'isb' may not have
'finished', their side effects may not be visible.
You need to synchronise the changes that enable the counter, before you run your
measured instructions, and you want to make sure your measured instructions have
'finished' before you re-read the counter.

The sequence would be something like:
| isb	// for the config writes that enable the counter
| mrs x2, pmccntr_el0
| isb

[measured instructions]

| isb
| mrs x3, pmccntr_el0


> // read overflow
> mrs x8, pmovsclr_el0

> // provoke mmio exit (0x500 is not mapped)
> ldr x3, [x2]


Hope this helps!

James


[0] https://www.mail-archive.com/kvmarm@lists.cs.columbia.edu/msg19778.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instruction/Cycle Counting in Guest Using the Kvm PMU
  2018-11-23 12:29 ` James Morse
@ 2018-11-23 13:27   ` Andrew Murray
  2018-11-29 11:30     ` Jan Bolke
  2018-11-23 17:36   ` James Morse
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Murray @ 2018-11-23 13:27 UTC (permalink / raw)
  To: James Morse; +Cc: Jan Bolke, kvmarm

On Fri, Nov 23, 2018 at 12:29:08PM +0000, James Morse wrote:
> Hi Jan,
> 
> (CC: +Andrew)
> 
> On 23/11/2018 09:36, Jan Bolke wrote:
> > I am not sure if this question is well-placed here, so sorry if it misses the
> > purpose of this mailing list.
> 
> arm64? kvm? Sounds like you've come to the right place!
> 
> 
> > I am using the Kvm Api and try to integrate it as an instruction set simulator
> > in a SystemC environment.
> 
> 
> > I need some mechanism to count executed instructions in the guest (or cycles).
> > 
> > Currently I am trying to use the emulated PMU cycle counter in the guest to get
> > the number of executed cycles in the guest.
> > 
> > I am working on Arm64 and use Linux Kernel 4.14.33.
> > 
> > I create the PMU device without creating a in-kernel vgic.
> 
> > I configure the counter, then start the counter, execute 3 or 4 dummy
> > instructions and read the counter again and then exit the guest with an exit_mmio.
> > 
> > I assumed the value should be a very small number, as the guest only executed a
> > few instructions.
> 
> (some of which are system register writes, which can take a long time)
> 
> 
> > The thing is as I read the counter, the value is something like 2970 or 0
> > (changes in each run).
> 
> You are missing some barriers in your assembly snippet. 0 is a good indication
> that the code you wanted to measure escaped the measurement-window!
> 
> 
> > So to me it looks like the counter is also counting the cycles for instruction
> > emulation in the host, am I right?
> 
> I'd assume not, but I don't know anything about the PMU.
> Andrew Murray posted a series[0] that did some stuff with starting/stopping the
> the counters around the guest, but I think that was just for the host making
> measurements of itself, or the guest.
> 
> 
> KVM emulates parts of the PMU, so your measurements may be too noisy for such
> small windows of code.
> 
> It might be easier to count instructions from outside the guest using perf. I
> think Andrew's series is making that more reliable.

The PMU emulation works by creating a perf event in the host, however it is
pinned to the KVM process, so the the real PMU counters are stopped and started
as the KVM process is scheduled in and out. This means that it will include any
CPU time associated with that process of which your guest is only a subset of.

The patchset that James refers to will ensure that the underlying real PMU
counters underlying the guest only events will only be enabled upon entering the
guest (and disabled on leaving). Thus you will need to apply this (to your host)
for more accurate counting. (You could also then use the perf modifiers in the
host to counter guest cycles, e.g. perf -e instructions:G).

Also you may want to refer to kvm-unit-tests are there are test cases that
demonstrate bare metal code for PMU enabling.

Thanks,

Andrew Murray

> 
> 
> > Is it possible to just count the cycles in the guest from the guests’s point of
> > view?
> > 
> > I read the kvm-api.txt Documentation and the other documents a few times and
> > tried different approaches, so this mailing list is my last resort.
> 
> 
> > APPENDIX:
> > 
> > // we are in el1
> > 
> > // init system registers
> > LDR X1, =0x30C50838
> > MSR SCTLR_EL1, X1
> 
> isb
> 
> If the next instructions depend on any of the bits you set in sctrl, you need to
> make sure the cpu has synchronised this state-change before the next instruction
> is executed. Otherwise (depending on the CPU) the intended side-effects only
> come into effect some number of instructions later.
> 
> 
> > // enable access to pmu counters from el0
> > mov x0, 0xff
> > mrs x1, currentel
> 
> > mrs x7, pmuserenr_el0
> > orr x7, x7, #0b1111
> > msr pmuserenr_el0, x7
> 
> Why do you need to do this? Running from EL1 the values in this register should
> have no effect.
> 
> 
> > // set pmcr register (control register)
> > 
> > //enable long counter, count every cycle and enable counters
> > mrs x5, pmcr_el0
> > orr x5, x5, #0b1
> > orr x5, x5, #(1<<6)
> > eor x5, x5, #(1<<3)
> 
> > eor x5, x5, #(1<<5)
> 
> (looks like this bit has no effect on the 'normal world')
> 
> > msr pmcr_el0, x5
> 
> 
> > // read mvccfiltr register (only enable counting of el1)
> > 
> > mrs x6, pmccfiltr_el0
> > 
> > mov x6, #(1<<30)
> 
> This bit only effects EL0.
> 
> > msr pmccfiltr_el0, x6
> 
> 
> 
> > // get interrupt configuration and clear overflow bit
> > 
> > mrs x9, pmintenset_el1
> 
> You never use x9 after this. What did you want to do with this register?
> (I assume its debug)
> 
> 
> > mov x8, #(1<<31)
> > msr pmovsclr_el0, x8
> 
> 
> > // write counter
> > mov x0, #0x0
> > msr pmccntr_el0, x0 // write counter
> 
> > // enable cycle counter
> > mov x1, #(1<<31)
> > msr pmcntenset_el0, x1
> > mov x0, #0x2 */
> 
> 
> > // dummy instruction and provoke mmio-exit
> > mov x1, #0x3
> > add x2, x0, x1
> > mov x2, 0x5000
> 
> > //read counter
> > mrs x1, pmccntr_el0
> 
> At this point all the system register writes since the last 'isb' may not have
> 'finished', their side effects may not be visible.
> You need to synchronise the changes that enable the counter, before you run your
> measured instructions, and you want to make sure your measured instructions have
> 'finished' before you re-read the counter.
> 
> The sequence would be something like:
> | isb	// for the config writes that enable the counter
> | mrs x2, pmccntr_el0
> | isb
> 
> [measured instructions]
> 
> | isb
> | mrs x3, pmccntr_el0
> 
> 
> > // read overflow
> > mrs x8, pmovsclr_el0
> 
> > // provoke mmio exit (0x500 is not mapped)
> > ldr x3, [x2]
> 
> 
> Hope this helps!
> 
> James
> 
> 
> [0] https://www.mail-archive.com/kvmarm@lists.cs.columbia.edu/msg19778.html
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instruction/Cycle Counting in Guest Using the Kvm PMU
  2018-11-23 12:29 ` James Morse
  2018-11-23 13:27   ` Andrew Murray
@ 2018-11-23 17:36   ` James Morse
  1 sibling, 0 replies; 6+ messages in thread
From: James Morse @ 2018-11-23 17:36 UTC (permalink / raw)
  To: Jan Bolke, Andrew Murray; +Cc: kvmarm

Hi Jan,

On 23/11/2018 12:29, James Morse wrote:
> On 23/11/2018 09:36, Jan Bolke wrote:
>> I am using the Kvm Api and try to integrate it as an instruction set simulator
>> in a SystemC environment.
> 
> 
>> I need some mechanism to count executed instructions in the guest (or cycles).
>>
>> Currently I am trying to use the emulated PMU cycle counter in the guest to get
>> the number of executed cycles in the guest.
>>
>> I am working on Arm64 and use Linux Kernel 4.14.33.
>>
>> I create the PMU device without creating a in-kernel vgic.
> 
>> I configure the counter, then start the counter, execute 3 or 4 dummy
>> instructions and read the counter again and then exit the guest with an exit_mmio.
>>
>> I assumed the value should be a very small number, as the guest only executed a
>> few instructions.
> 
> (some of which are system register writes, which can take a long time)
> 
>> The thing is as I read the counter, the value is something like 2970 or 0
>> (changes in each run).

> You are missing some barriers in your assembly snippet. 0 is a good indication
> that the code you wanted to measure escaped the measurement-window!

This would be true on bare-metal, but for KVM I'm talking rubbish. Turns out KVM
traps all these registers, and emulates the lot.

kvm_arm_setup_debug() sets MDCR_EL2.TPM so almost all these accesses trap,
effectively putting an ISB after each one.


>> So to me it looks like the counter is also counting the cycles for instruction
>> emulation in the host, am I right?
> 
> I'd assume not, but I don't know anything about the PMU.
> Andrew Murray posted a series[0] that did some stuff with starting/stopping the
> the counters around the guest, but I think that was just for the host making
> measurements of itself, or the guest.

> It might be easier to count instructions from outside the guest using perf. I
> think Andrew's series is making that more reliable.

>> Is it possible to just count the cycles in the guest from the guests’s point of
>> view?

This is what KVM is doing, by emulating the PMU with perf.
kvm_pmu_get_counter_value() looks like it should be doing the right thing...

As a sanity check, does the cycle counter work on your test machine?:
| perf stat -e cycles -- ls



Thanks,

James

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Instruction/Cycle Counting in Guest Using the Kvm PMU
  2018-11-23 13:27   ` Andrew Murray
@ 2018-11-29 11:30     ` Jan Bolke
  2018-11-30  0:23       ` Andrew Murray
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Bolke @ 2018-11-29 11:30 UTC (permalink / raw)
  To: Andrew Murray, James Morse; +Cc: Jan Bolke, kvmarm

Hi,
And thanks for the fast replies.

>The PMU emulation works by creating a perf event in the host, however it is pinned to the KVM process, so the the real PMU counters are stopped and started as the KVM process is >scheduled in and out. This means that it will include any CPU time associated with that process of which your guest is only a subset of.

Thanks for the clarification.
As this is the case, the counted cycles from the host should deliver a larger number than the executed instructions inside the guest.

>The patchset that James refers to will ensure that the underlying real PMU counters underlying the guest only events will only be enabled upon entering the guest (and disabled on >leaving). Thus you will need to apply this (to your host) for more accurate counting. (You could also then use the perf modifiers in the host to counter guest cycles, e.g. perf -e >instructions:G).

So I applied your patch to a 4.19.5 kernel and also your other Patchseries for the perf events in the host [0].
So what I do now is running :
perf stat -e instructions:G -- ./run_loop_in_kvm.

Run_loop_in_kvm is a small c program who starts a vm and executes a little loop in the guest and then exits.
I get a output from perf like the following:
	159732   instructions:Gu
     ....
My Problem is, I am still not sure how to interpret these values as my bare metal code runs a loop for 1048577 times 
which executes 3 instructions in every run.

My question is how comes this discrepancy of the counted values.
The perf counting from the host delivers a value significantly smaller than the number of instructions in the guest.

I am struggling to interpret the perf counter values as an indication how many instructions my guest performed.
What am I missing?

Also I get the following output for perf stat -e cycles:G ls: 647284 cyles:Gu.
Is this a indicator that my guest/host modifiers do not work or am I misunderstanding the whole concept here?
Sorry for the silly question and thanks in advance!

>Also you may want to refer to kvm-unit-tests are there are test cases that demonstrate bare metal code for PMU enabling.
Thanks for the hint, these tests are very useful examples!

[0]: http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/614985.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instruction/Cycle Counting in Guest Using the Kvm PMU
  2018-11-29 11:30     ` Jan Bolke
@ 2018-11-30  0:23       ` Andrew Murray
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Murray @ 2018-11-30  0:23 UTC (permalink / raw)
  To: Jan Bolke; +Cc: kvmarm

On Thu, Nov 29, 2018 at 11:30:55AM +0000, Jan Bolke wrote:
> Hi,
> And thanks for the fast replies.
> 
> >The PMU emulation works by creating a perf event in the host, however it is pinned to the KVM process, so the the real PMU counters are stopped and started as the KVM process is >scheduled in and out. This means that it will include any CPU time associated with that process of which your guest is only a subset of.
> 
> Thanks for the clarification.
> As this is the case, the counted cycles from the host should deliver a larger number than the executed instructions inside the guest.

Indeed.


> 
> >The patchset that James refers to will ensure that the underlying real PMU counters underlying the guest only events will only be enabled upon entering the guest (and disabled on >leaving). Thus you will need to apply this (to your host) for more accurate counting. (You could also then use the perf modifiers in the host to counter guest cycles, e.g. perf -e >instructions:G).
> 
> So I applied your patch to a 4.19.5 kernel and also your other Patchseries for the perf events in the host [0].
> So what I do now is running :
> perf stat -e instructions:G -- ./run_loop_in_kvm.
> 
> Run_loop_in_kvm is a small c program who starts a vm and executes a little loop in the guest and then exits.
> I get a output from perf like the following:
> 	159732   instructions:Gu
>      ....
> My Problem is, I am still not sure how to interpret these values as my bare metal code runs a loop for 1048577 times 
> which executes 3 instructions in every run.

I notice you are using :Gu. The 'u' will record only EL0, are you confident
that your code runs at this level? Perhaps drop the 'u'.

Also are you compiling with compiler optimisations turned off - or have you
inspected an objdump of the binary to verify it is as you expect?


> 
> My question is how comes this discrepancy of the counted values.
> The perf counting from the host delivers a value significantly smaller than the number of instructions in the guest.
> 
> I am struggling to interpret the perf counter values as an indication how many instructions my guest performed.
> What am I missing?

Your expectations are correct. For my own sanity I attempted something similar:

# cat startup.s
ENTRY(_start)
SECTIONS
{
        . = 0x80080000;
        .startup . : { startup.o(.text) }
        .text : { *(.text) }
        .data : { *(.data) }
        .bss : { *(.bss COMMON) }
        . = ALIGN(16);
        . = . + 0x1000;
        stack_top = .;
}

# cat test.c
void main() {
        int x=0;
        for (x=0;x<20000000;x++)
                ;
}

# cat script.ld

ENTRY(_start)
SECTIONS
{
        . = 0x80080000;
        .startup . : { startup.o(.text) }
        .text : { *(.text) }
        .data : { *(.data) }
        .bss : { *(.bss COMMON) }
        . = ALIGN(16);
        . = . + 0x1000;
        stack_top = .;
}

And then:

# aarch64-linux-gnu-gcc -c test.c -o test.o                                                             
# aarch64-linux-gnu-as startup.s -o startup.o                                                           
# aarch64-linux-gnu-ld -T script.ld test.o startup.o -o out                                             
# aarch64-linux-gnu-objcopy -O binary -S out  

On the host:

# perf stat -e instructions:G /lkvm-static run -k /out  -m 1024 -c 1 --console serial --pmu

...

# KVM session ended normally.

 Performance counter stats for '/lkvm-static run -k /out -m 1024 -c 1 --console serial --pmu':

         160000016      instructions:G                                              


This matches exactly with what I would expect from the disassembly:

Disassembly of section .startup:

0000000080080000 <_start>:
    80080000:   580000de        ldr     x30, 80080018 <_start+0x18>
    80080004:   910003df        mov     sp, x30
    80080008:   94000006        bl      80080020 <main>
    8008000c:   d503207f        wfi
    80080010:   17ffffff        b       8008000c <_start+0xc>
    80080014:   00000000        .inst   0x00000000 ; undefined
    80080018:   80081060        .word   0x80081060
    8008001c:   00000000        .word   0x00000000

Disassembly of section .text:

0000000080080020 <main>:
    80080020:   d10043ff        sub     sp, sp, #0x10
    80080024:   b9000fff        str     wzr, [sp, #12]
    80080028:   b9000fff        str     wzr, [sp, #12]
    8008002c:   14000004        b       8008003c <main+0x1c>
    80080030:   b9400fe0        ldr     w0, [sp, #12]
    80080034:   11000400        add     w0, w0, #0x1
    80080038:   b9000fe0        str     w0, [sp, #12]
    8008003c:   b9400fe1        ldr     w1, [sp, #12]
    80080040:   52859fe0        mov     w0, #0x2cff                     // #11519
    80080044:   72a02620        movk    w0, #0x131, lsl #16
    80080048:   6b00003f        cmp     w1, w0
    8008004c:   54ffff2d        b.le    80080030 <main+0x10>
    80080050:   d503201f        nop
    80080054:   910043ff        add     sp, sp, #0x10
    80080058:   d65f03c0        ret



> 
> Also I get the following output for perf stat -e cycles:G ls: 647284 cyles:Gu.

> Is this a indicator that my guest/host modifiers do not work or am I misunderstanding the whole concept here?
> Sorry for the silly question and thanks in advance!

For 'perf stat -e cycles:G ls' I would expect 0. I'm not quite sure why it's
showing data for the 'u' modifier. It smells a little like the patch hasn't
applied or the userspace perf tool is falling back to not using exclude_guest
and recording just EL0 (which would explain all your figures assuming your
bare metal code actually runs at EL1).

Try running 'perf stat -vv -e instructions:G ls' Make sure that the last
'exclude_host    1' you see on the screen is set to 1.

Thanks,

Andrew Murray

> 
> >Also you may want to refer to kvm-unit-tests are there are test cases that demonstrate bare metal code for PMU enabling.
> Thanks for the hint, these tests are very useful examples!
> 
> [0]: http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/614985.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-11-30  0:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-23  9:36 Instruction/Cycle Counting in Guest Using the Kvm PMU Jan Bolke
2018-11-23 12:29 ` James Morse
2018-11-23 13:27   ` Andrew Murray
2018-11-29 11:30     ` Jan Bolke
2018-11-30  0:23       ` Andrew Murray
2018-11-23 17:36   ` James Morse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.