All of lore.kernel.org
 help / color / mirror / Atom feed
* Measuring KVM Performance using Hardware Performance Counters
@ 2014-01-30  5:06 Xin Tong
  2014-02-02  2:08 ` Xin Tong
  0 siblings, 1 reply; 6+ messages in thread
From: Xin Tong @ 2014-01-30  5:06 UTC (permalink / raw)
  To: kvm

Hi

I would like to measure the performance of KVM by using hardware
performance counters and I have some questions

1. if i want to get the amount of time spent in instruction and device
emulation, should i use oprofile on the kvm process on the host
machine ?
2. what about amount of time spent in TLB misses, kvm makes uses of
nested page in which a tlb miss could be expensive.
3. are there existing ways/measurements i can look into ?

Thank you

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Measuring KVM Performance using Hardware Performance Counters
  2014-01-30  5:06 Measuring KVM Performance using Hardware Performance Counters Xin Tong
@ 2014-02-02  2:08 ` Xin Tong
  2014-02-02 11:37   ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Xin Tong @ 2014-02-02  2:08 UTC (permalink / raw)
  To: kvm

I am getting very weird profile results by running operf on linux on
the host and profiling the a kvm virtual machine running dacapo
eclipse benchmark.  I am expecting a lot of time should be spent in
the qemu-system-x86_64 as the instructions from the eclipse benchmark
would be treated as part of the qemu-system-x86_64 process, but the
results tell different. any suggestions ?


CPU_CLK_UNHALTED %     Module
         Symbol/Functions
1054             31.49 /kvm
         /kvm
645             19.27 /kvm_intel
        /kvm_intel
248             7.41
/home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64
/home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64
215             6.42 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
        native_write_msr_safe
101             3.02 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
        native_read_msr_safe
93               2.78 /lib/x86_64-linux-gnu/libc-2.15.so
        /lib/x86_64-linux-gnu/libc-2.15.so
77               2.30 /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4
         /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4
65               1.94 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
         __srcu_read_lock
48               1.43 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
         __srcu_read_unlock
40               1.20 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
         native_load_tr_desc
29               0.87 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
         __ticket_spin_lock
25               0.75 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
         guest_exit
21               0.63 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
         native_load_gdt
...
Thank you.

On Wed, Jan 29, 2014 at 11:06 PM, Xin Tong <xerox.time.tech@gmail.com> wrote:
> Hi
>
> I would like to measure the performance of KVM by using hardware
> performance counters and I have some questions
>
> 1. if i want to get the amount of time spent in instruction and device
> emulation, should i use oprofile on the kvm process on the host
> machine ?
> 2. what about amount of time spent in TLB misses, kvm makes uses of
> nested page in which a tlb miss could be expensive.
> 3. are there existing ways/measurements i can look into ?
>
> Thank you

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Measuring KVM Performance using Hardware Performance Counters
  2014-02-02  2:08 ` Xin Tong
@ 2014-02-02 11:37   ` Paolo Bonzini
       [not found]     ` <CA+JLOitGuTcX9coon0EKWBuhzuw4_7BxEab3=sk80J_WFakqhA@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2014-02-02 11:37 UTC (permalink / raw)
  To: Xin Tong, kvm

Il 02/02/2014 03:08, Xin Tong ha scritto:
> I am getting very weird profile results by running operf on linux on
> the host and profiling the a kvm virtual machine running dacapo
> eclipse benchmark.  I am expecting a lot of time should be spent in
> the qemu-system-x86_64 as the instructions from the eclipse benchmark
> would be treated as part of the qemu-system-x86_64 process, but the
> results tell different. any suggestions ?

Most of the time should be spent running the guest.  This is in the 
context of process qemu-system-x86_64, but it is not running code from 
the qemu executable.

What likely happens is that when the profiling counter expires, it 
causes the VM to exit before the profiling interrupt is delivered.  The 
sample should then be associated to the kvm_intel module.

31.5% in the KVM module seems like a lot.  Can you detail the results 
further, so that I can see the function names for kvm and kvm_intel?

Paolo

>
> CPU_CLK_UNHALTED %     Module
>          Symbol/Functions
> 1054             31.49 /kvm
>          /kvm
> 645             19.27 /kvm_intel
>         /kvm_intel
> 248             7.41
> /home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64
> /home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64
> 215             6.42 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>         native_write_msr_safe
> 101             3.02 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>         native_read_msr_safe
> 93               2.78 /lib/x86_64-linux-gnu/libc-2.15.so
>         /lib/x86_64-linux-gnu/libc-2.15.so
> 77               2.30 /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4
>          /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4
> 65               1.94 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>          __srcu_read_lock
> 48               1.43 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>          __srcu_read_unlock
> 40               1.20 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>          native_load_tr_desc
> 29               0.87 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>          __ticket_spin_lock
> 25               0.75 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>          guest_exit
> 21               0.63 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic
>          native_load_gdt
> ...
> Thank you.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Measuring KVM Performance using Hardware Performance Counters
       [not found]       ` <52EF4F48.7080609@redhat.com>
@ 2014-02-03 17:06         ` Xin Tong
  2014-02-04  3:31           ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Xin Tong @ 2014-02-03 17:06 UTC (permalink / raw)
  To: Paolo Bonzini, kvm

This is a profile taken on kvm on host. the kvm guest linux is
running SPECCPU2006 403.gcc with reference data set. The commands to
run and report the profile are
sudo operf -s --separate-thread --event=CPU_CLK_UNHALTED:5000000
--vmlinux=/home/xtong/xtong-kernel/vmlinux
sudo opreport --image-path /home/xtong/xtong-kernel  -l -g -d --xml -o
current.opm


I have a few questions.

1. why are there 3 qemu-system-x86_64 threads ? and all of them are
taking a non-trivial amount of time, i.e. 71.24%, 16.25% and 12.51%
respectively.
2. why is qemu-system-x86_64 only taking 8.10% of the time, i imagine
most of the time should be spent in qemu-system-x86_64 as 403.GCC does
not do too much IO.
3. why are so much time spent in vmlinux ? the symbols for vmlinux is
listed below.
4. why are so much time spent in kvm-intel and kvm ? the symbols for
both are listed below.


Overall Profile:

current.opm ( 1549.0 ticks ) [ default counter: CPU_CLK_UNHALTED ]
  Process > Core > Thread > Module
    /.../qemu-system-x86_64 [PID 2534] (1231 ticks/79.47%)
      Core0 (1231 ticks/100.00%)
        /.../qemu-system-x86_64 TID 2537 [TID 2537] (877 ticks/71.24%)
          /.../vmlinux (395 ticks/45.04%)
          /kvm (198 ticks/22.58%)
          /kvm_intel (153 ticks/17.45%)
          /.../qemu-system-x86_64 (71 ticks/8.10%)
          /.../libc-2.15.so (47 ticks/5.36%)
          /.../libpthread-2.15.so (12 ticks/1.37%)
          /libahci (1 ticks/0.11%)
        /.../qemu-system-x86_64 TID 2658 [TID 2658] (200 ticks/16.25%)
          /.../vmlinux (190 ticks/95.00%)
          /.../libpthread-2.15.so (6 ticks/3.00%)
          /.../libc-2.15.so (2 ticks/1.00%)
          /.../qemu-system-x86_64 (1 ticks/0.50%)
          [vdso] (tgid:2534 range:0x7fff10588000-0x7fff10589fff) (1 ticks/0.50%)
        /.../qemu-system-x86_64 TID 2534 [TID 2534] (154 ticks/12.51%)
    /ksmd [PID 53] (83 ticks/5.36%)
    /.../top [PID 2617] (43 ticks/2.78%)
    /.../unity-2d-shell [PID 1807] (41 ticks/2.65%)


 /.../vmlinux (395 ticks/45.04%):
CPU_CLK_UNHALTED %     Symbol/Functions
82               20.76 native_write_msr_safe
46               11.65 native_read_msr_safe
25               6.33 __srcu_read_lock
16               4.05 native_load_tr_desc
12               3.04 __srcu_read_unlock
9               2.28 native_load_gdt
8               2.03 fget_light
8               2.03 _raw_spin_lock_irq
7               1.77 memcpy
6               1.52 _raw_spin_lock
6               1.52 __set_current_blocked
5               1.27 user_return_notifier_unregister
5               1.27 update_cfs_shares
5               1.27 page_fault
5               1.27 bsearch

/kvm (198 ticks/22.58%)
CPU_CLK_UNHALTED %     Symbol/Functions
42               21.21 kvm_arch_vcpu_ioctl_run
9               4.55 kvm_set_shared_msr
7               3.54 x86_emulate_insn
7               3.54 paging64_walk_addr_generic
7               3.54 kvm_on_user_return
7               3.54 do_insn_fetch
6               3.03 gfn_to_memslot
6               3.03 decode_operand
5               2.53 kvm_io_bus_sort_cmp
5               2.53 kvm_arch_vcpu_load
5               2.53 kvm_apic_accept_pic_intr
4               2.02 kvm_fetch_guest_virt
4               2.02 kvm_arch_vcpu_runnable
4               2.02 emulator_read_gpr
4               2.02 apic_has_pending_timer
3               1.52 x86_decode_insn

/kvm_intel (153 ticks/17.45%)
CPU_CLK_UNHALTED %     Symbol/Functions
92               60.13 vmx_vcpu_run
9               5.88 vmx_save_host_state
7               4.58 vmx_handle_external_intr
7               4.58 vmcs_writel
6               3.92 vmx_handle_exit
6               3.92 __vmx_load_host_state.part.43
4               2.61 clear_atomic_switch_msr
3               1.96 vmx_vm_has_apicv
3               1.96 vmx_get_rflags
2               1.31 vmx_read_guest_seg_selector
2               1.31 vmcs_clear_bits
2               1.31 skip_emulated_instruction
1               0.65 vmx_set_rflags
1               0.65 vmx_vcpu_load

Thank you,
Xin



On Mon, Feb 3, 2014 at 2:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 02/02/2014 16:47, Xin Tong ha scritto:
>
>> On Sun, Feb 2, 2014 at 5:37 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> Il 02/02/2014 03:08, Xin Tong ha scritto:
>>>>
>>>>
>>>> I am getting very weird profile results by running operf on linux on
>>>> the host and profiling the a kvm virtual machine running dacapo
>>>> eclipse benchmark.  I am expecting a lot of time should be spent in
>>>> the qemu-system-x86_64 as the instructions from the eclipse benchmark
>>>> would be treated as part of the qemu-system-x86_64 process, but the
>>>> results tell different. any suggestions ?
>>>
>>>
>>>
>>> Most of the time should be spent running the guest.  This is in the
>>> context
>>> of process qemu-system-x86_64, but it is not running code from the qemu
>>> executable.
>>
>>
>> That is what i was trying to say. you said it better.
>>>
>>>
>>> What likely happens is that when the profiling counter expires, it causes
>>> the VM to exit before the profiling interrupt is delivered.  The sample
>>> should then be associated to the kvm_intel module.
>>
>>
>> could not the kvm module read the counters and inject an counter
>> overflow interrupt into the guest ?  what are some ways to make the
>> profile more accurate. this is collected on a intel haswell host
>> machine.
>
>
> Yes, it can.  But then you have to run perf/operf in the guest not in the
> host.
>
> For this to work, you need to specify "-cpu host" on the QEMU command line.
>
> Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Measuring KVM Performance using Hardware Performance Counters
  2014-02-03 17:06         ` Xin Tong
@ 2014-02-04  3:31           ` Paolo Bonzini
  2014-02-04 20:53             ` Xin Tong
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2014-02-04  3:31 UTC (permalink / raw)
  To: Xin Tong, kvm

Il 03/02/2014 18:06, Xin Tong ha scritto:
>         /.../qemu-system-x86_64 TID 2537 [TID 2537] (877 ticks/71.24%)

This is the CPU thread (calls into the KVM modules).

>           /.../vmlinux (395 ticks/45.04%)
>           /kvm (198 ticks/22.58%)
>           /kvm_intel (153 ticks/17.45%)
>           /.../qemu-system-x86_64 (71 ticks/8.10%)
>           /.../libc-2.15.so (47 ticks/5.36%)
>           /.../libpthread-2.15.so (12 ticks/1.37%)
>           /libahci (1 ticks/0.11%)
>         /.../qemu-system-x86_64 TID 2658 [TID 2658] (200 ticks/16.25%)

This is probably the VNC thread.

>           /.../vmlinux (190 ticks/95.00%)
>           /.../libpthread-2.15.so (6 ticks/3.00%)
>           /.../libc-2.15.so (2 ticks/1.00%)
>           /.../qemu-system-x86_64 (1 ticks/0.50%)
>           [vdso] (tgid:2534 range:0x7fff10588000-0x7fff10589fff) (1 ticks/0.50%)
>         /.../qemu-system-x86_64 TID 2534 [TID 2534] (154 ticks/12.51%)

This is the main thread (lowest TID of all).

>     /ksmd [PID 53] (83 ticks/5.36%)
>     /.../top [PID 2617] (43 ticks/2.78%)
>     /.../unity-2d-shell [PID 1807] (41 ticks/2.65%)

Now, going to your questions:

> 2. why is qemu-system-x86_64 only taking 8.10% of the time, i imagine
> most of the time should be spent in qemu-system-x86_64 as 403.GCC does
> not do too much IO.

This is the same I already answered: the time spent running the guest is 
in the context of process qemu-system-x86_64, but it is not running code 
from the qemu executable.  In fact, 8.10% spent in qemu-system-x86_64 is 
a lot.  I would expect much less.

> 3. why are so much time spent in vmlinux ? the symbols for vmlinux is
> listed below.
> 4. why are so much time spent in kvm-intel and kvm ? the symbols for
> both are listed below.

Again, I think I already answered this.  The time spent in vmx_vcpu_run 
is the actual time spent in the guest (17.45*60.13% = 10% roughly).

Everything else is overhead introduced by either virtualization or 
profiling.

What SPEC scores are you getting from bare-metal, KVM without oprofile 
and KVM with oprofile?  Is the profile substantially different if you 
use perf instead of oprofile?

Can you run "scripts/kvm/kvm_stat" (from the QEMU tree) while your guest 
is running (under oprofile) and paste the output from that tool?

Thanks,

Paolo

>
>
>  /.../vmlinux (395 ticks/45.04%):
> CPU_CLK_UNHALTED %     Symbol/Functions
> 82               20.76 native_write_msr_safe
> 46               11.65 native_read_msr_safe
> 25               6.33 __srcu_read_lock
> 16               4.05 native_load_tr_desc
> 12               3.04 __srcu_read_unlock
> 9               2.28 native_load_gdt
> 8               2.03 fget_light
> 8               2.03 _raw_spin_lock_irq
> 7               1.77 memcpy
> 6               1.52 _raw_spin_lock
> 6               1.52 __set_current_blocked
> 5               1.27 user_return_notifier_unregister
> 5               1.27 update_cfs_shares
> 5               1.27 page_fault
> 5               1.27 bsearch
>
> /kvm (198 ticks/22.58%)
> CPU_CLK_UNHALTED %     Symbol/Functions
> 42               21.21 kvm_arch_vcpu_ioctl_run
> 9               4.55 kvm_set_shared_msr
> 7               3.54 x86_emulate_insn
> 7               3.54 paging64_walk_addr_generic
> 7               3.54 kvm_on_user_return
> 7               3.54 do_insn_fetch
> 6               3.03 gfn_to_memslot
> 6               3.03 decode_operand
> 5               2.53 kvm_io_bus_sort_cmp
> 5               2.53 kvm_arch_vcpu_load
> 5               2.53 kvm_apic_accept_pic_intr
> 4               2.02 kvm_fetch_guest_virt
> 4               2.02 kvm_arch_vcpu_runnable
> 4               2.02 emulator_read_gpr
> 4               2.02 apic_has_pending_timer
> 3               1.52 x86_decode_insn
>
> /kvm_intel (153 ticks/17.45%)
> CPU_CLK_UNHALTED %     Symbol/Functions
> 92               60.13 vmx_vcpu_run
> 9               5.88 vmx_save_host_state
> 7               4.58 vmx_handle_external_intr
> 7               4.58 vmcs_writel
> 6               3.92 vmx_handle_exit
> 6               3.92 __vmx_load_host_state.part.43
> 4               2.61 clear_atomic_switch_msr
> 3               1.96 vmx_vm_has_apicv
> 3               1.96 vmx_get_rflags
> 2               1.31 vmx_read_guest_seg_selector
> 2               1.31 vmcs_clear_bits
> 2               1.31 skip_emulated_instruction
> 1               0.65 vmx_set_rflags
> 1               0.65 vmx_vcpu_load
>
> Thank you,
> Xin
>
>
>
> On Mon, Feb 3, 2014 at 2:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 02/02/2014 16:47, Xin Tong ha scritto:
>>
>>> On Sun, Feb 2, 2014 at 5:37 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>> Il 02/02/2014 03:08, Xin Tong ha scritto:
>>>>>
>>>>>
>>>>> I am getting very weird profile results by running operf on linux on
>>>>> the host and profiling the a kvm virtual machine running dacapo
>>>>> eclipse benchmark.  I am expecting a lot of time should be spent in
>>>>> the qemu-system-x86_64 as the instructions from the eclipse benchmark
>>>>> would be treated as part of the qemu-system-x86_64 process, but the
>>>>> results tell different. any suggestions ?
>>>>
>>>>
>>>>
>>>> Most of the time should be spent running the guest.  This is in the
>>>> context
>>>> of process qemu-system-x86_64, but it is not running code from the qemu
>>>> executable.
>>>
>>>
>>> That is what i was trying to say. you said it better.
>>>>
>>>>
>>>> What likely happens is that when the profiling counter expires, it causes
>>>> the VM to exit before the profiling interrupt is delivered.  The sample
>>>> should then be associated to the kvm_intel module.
>>>
>>>
>>> could not the kvm module read the counters and inject an counter
>>> overflow interrupt into the guest ?  what are some ways to make the
>>> profile more accurate. this is collected on a intel haswell host
>>> machine.
>>
>>
>> Yes, it can.  But then you have to run perf/operf in the guest not in the
>> host.
>>
>> For this to work, you need to specify "-cpu host" on the QEMU command line.
>>
>> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Measuring KVM Performance using Hardware Performance Counters
  2014-02-04  3:31           ` Paolo Bonzini
@ 2014-02-04 20:53             ` Xin Tong
  0 siblings, 0 replies; 6+ messages in thread
From: Xin Tong @ 2014-02-04 20:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Mon, Feb 3, 2014 at 9:31 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 03/02/2014 18:06, Xin Tong ha scritto:
>
>>         /.../qemu-system-x86_64 TID 2537 [TID 2537] (877 ticks/71.24%)
>
>
> This is the CPU thread (calls into the KVM modules).

This is the thread that calls the int kvm_cpu_exec(CPUArchState *env) and ioctl.
>
>
>>           /.../vmlinux (395 ticks/45.04%)
>>           /kvm (198 ticks/22.58%)
>>           /kvm_intel (153 ticks/17.45%)
>>           /.../qemu-system-x86_64 (71 ticks/8.10%)
>>           /.../libc-2.15.so (47 ticks/5.36%)
>>           /.../libpthread-2.15.so (12 ticks/1.37%)
>>           /libahci (1 ticks/0.11%)
>>         /.../qemu-system-x86_64 TID 2658 [TID 2658] (200 ticks/16.25%)
>
>
> This is probably the VNC thread.
>
>
>>           /.../vmlinux (190 ticks/95.00%)
>>           /.../libpthread-2.15.so (6 ticks/3.00%)
>>           /.../libc-2.15.so (2 ticks/1.00%)
>>           /.../qemu-system-x86_64 (1 ticks/0.50%)
>>           [vdso] (tgid:2534 range:0x7fff10588000-0x7fff10589fff) (1
>> ticks/0.50%)
>>         /.../qemu-system-x86_64 TID 2534 [TID 2534] (154 ticks/12.51%)
>
>
> This is the main thread (lowest TID of all).

Is this  the thread that executes main_loop_wait and handles IO emulation ?
>
>
>>     /ksmd [PID 53] (83 ticks/5.36%)
>>     /.../top [PID 2617] (43 ticks/2.78%)
>>     /.../unity-2d-shell [PID 1807] (41 ticks/2.65%)
>
>
> Now, going to your questions:
>
>
>> 2. why is qemu-system-x86_64 only taking 8.10% of the time, i imagine
>> most of the time should be spent in qemu-system-x86_64 as 403.GCC does
>> not do too much IO.
>
>
> This is the same I already answered: the time spent running the guest is in
> the context of process qemu-system-x86_64, but it is not running code from
> the qemu executable.  In fact, 8.10% spent in qemu-system-x86_64 is a lot.
> I would expect much less.
>
>
>> 3. why are so much time spent in vmlinux ? the symbols for vmlinux is
>> listed below.
>> 4. why are so much time spent in kvm-intel and kvm ? the symbols for
>> both are listed below.
>
>
> Again, I think I already answered this.  The time spent in vmx_vcpu_run is
> the actual time spent in the guest (17.45*60.13% = 10% roughly).

I see, so this 10% is the actually useful GCC work done from the
prospective of the guest. and the 8.10% is the time spent in the
QEMU-system-x86_64 process.
>
> Everything else is overhead introduced by either virtualization or
> profiling.
>
> What SPEC scores are you getting from bare-metal, KVM without oprofile and
> KVM with oprofile?  Is the profile substantially different if you use perf
> instead of oprofile?
>
> Can you run "scripts/kvm/kvm_stat" (from the QEMU tree) while your guest is
> running (under oprofile) and paste the output from that tool?
>

Data below is from a new 403.gcc run on a haswell linux machine with
ref input data set.  Profile taken over the entire GCC run. Host linux
kernel (compiled myself) version: Linux aristotle 3.13.1+ #3 SMP Mon
Feb 3 11:16:00 EST 2014 x86_64 x86_64 x86_64 GNU/Linux


Command to run kvm:  kvm ~/disks/ubuntu-natty.qcow2  -nographic
Commands to oprofile:
sudo operf -s --separate-thread --event=CPU_CLK_UNHALTED:5000000
--vmlinux=/home/xtong/xtong-kernel/vmlinux
sudo opreport --image-path /home/xtong/xtong-kernel  -l -g -d --xml -o
current.opm


Profiled KVM Performance (Lower is better): 1005 seconds
Non-Profiled KVM Performance (Lower is better): 998 seconds
Non-Profiled run on host machine (Lower is better): 686 seconds ( the
virtualization overhead is still quite large in spite of the hw
extension !).


running sudo scripts/kvm/kvm_stat gave
kvm statistics

 kvm_exit                                       105      16
 kvm_entry                                      105      16
 kvm_apic                                        70      10
 kvm_emulate_insn                                35       5
 kvm_eoi                                         35       5
 kvm_mmio                                        35       5
 kvm_inj_virq                                    35       5
 kvm_apic_accept_irq                             35       5


Overall:
      Core0 (53973 ticks/100.00%)
        /.../qemu-system-x86_64 TID 8861 [TID 8861] (38676 ticks/71.66%)
          /.../vmlinux (17426 ticks/45.06%)
          /kvm (7768 ticks/20.08%)
          /kvm_intel (6882 ticks/17.79%)
          /.../qemu-system-x86_64 (3535 ticks/9.14%)
          /.../libc-2.15.so (2384 ticks/6.16%)
          /.../libpthread-2.15.so (537 ticks/1.39%)
          /.../librt-2.15.so (82 ticks/0.21%)
          /.../libglib-2.0.so.0.3200.4 (41 ticks/0.11%)
          [vdso] (tgid:8858 range:0x7fffe31fe000-0x7fffe31fffff) (16
ticks/0.04%)
          /libahci (4 ticks/0.01%)
          /r8169 (1 ticks/0.00%)
        /.../qemu-system-x86_64 TID 8858 [TID 8858] (6745 ticks/12.50%)
        /.../qemu-system-x86_64 TID 8880 [TID 8880] (2945 ticks/5.46%)
        /.../qemu-system-x86_64 TID 8882 [TID 8882] (2835 ticks/5.25%)
        /.../qemu-system-x86_64 TID 8860 [TID 8860] (2772 ticks/5.14%)


vmlinux (45.06%)
CPU_CLK_UNHALTED %     Symbol/Functions
4583             26.30 native_write_msr_safe
2067             11.86 native_read_msr_safe
1148             6.59 __srcu_read_lock
852             4.89 __srcu_read_unlock
758             4.35 native_load_tr_desc
527             3.02 native_load_gdt
492             2.82 fget_light
313             1.80 __set_current_blocked
270             1.55 _raw_spin_lock_irq
198             1.14 recalc_sigpending
194             1.11 system_call
174             1.00 memcpy
164             0.94 rcu_note_context_switch
164             0.94 fput
160             0.92 user_return_notifier_unregister
158             0.91 SyS_ioctl


kvm (20.08%)
CPU_CLK_UNHALTED %     Symbol/Functions
1949             25.09 kvm_arch_vcpu_ioctl_run
421             5.42 kvm_set_shared_msr
394             5.07 paging64_walk_addr_generic
271             3.49 kvm_on_user_return
256             3.30 x86_emulate_instruction
249             3.21 x86_emulate_insn
221             2.85 em_in
212             2.73 x86_decode_insn
156             2.01 init_emulate_ctxt
145             1.87 emulator_pio_in_out.isra.104



kvm-intel (17.79%)
CPU_CLK_UNHALTED %     Symbol/Functions
4006             58.21 vmx_vcpu_run
540             7.85 vmcs_writel
472             6.86 vmx_save_host_state
275             4.00 __vmx_load_host_state.part.43
151             2.19 vmx_handle_exit
149             2.17 vmcs_clear_bits
130             1.89 clear_atomic_switch_msr
109             1.58 vmx_set_interrupt_shadow
109             1.58 vmx_handle_external_intr
102             1.48 vmx_get_rflags


so, 58.21%*17.79% = 10.3% of the time is spent executing the code from
403.gcc ( i.e. doing useful work), is this correct ? If that is the
case, why do not i see ~10X performance degradation. Also I see a lot
of time is spent in native_write_msr_safe and native_read_msr_safe  in
vmlinux, what are these for ?

Thanks,

Xin

> Thanks,
>
> Paolo
>
>>
>>
>>  /.../vmlinux (395 ticks/45.04%):
>> CPU_CLK_UNHALTED %     Symbol/Functions
>> 82               20.76 native_write_msr_safe
>> 46               11.65 native_read_msr_safe
>> 25               6.33 __srcu_read_lock
>> 16               4.05 native_load_tr_desc
>> 12               3.04 __srcu_read_unlock
>> 9               2.28 native_load_gdt
>> 8               2.03 fget_light
>> 8               2.03 _raw_spin_lock_irq
>> 7               1.77 memcpy
>> 6               1.52 _raw_spin_lock
>> 6               1.52 __set_current_blocked
>> 5               1.27 user_return_notifier_unregister
>> 5               1.27 update_cfs_shares
>> 5               1.27 page_fault
>> 5               1.27 bsearch
>>
>> /kvm (198 ticks/22.58%)
>> CPU_CLK_UNHALTED %     Symbol/Functions
>> 42               21.21 kvm_arch_vcpu_ioctl_run
>> 9               4.55 kvm_set_shared_msr
>> 7               3.54 x86_emulate_insn
>> 7               3.54 paging64_walk_addr_generic
>> 7               3.54 kvm_on_user_return
>> 7               3.54 do_insn_fetch
>> 6               3.03 gfn_to_memslot
>> 6               3.03 decode_operand
>> 5               2.53 kvm_io_bus_sort_cmp
>> 5               2.53 kvm_arch_vcpu_load
>> 5               2.53 kvm_apic_accept_pic_intr
>> 4               2.02 kvm_fetch_guest_virt
>> 4               2.02 kvm_arch_vcpu_runnable
>> 4               2.02 emulator_read_gpr
>> 4               2.02 apic_has_pending_timer
>> 3               1.52 x86_decode_insn
>>
>> /kvm_intel (153 ticks/17.45%)
>> CPU_CLK_UNHALTED %     Symbol/Functions
>> 92               60.13 vmx_vcpu_run
>> 9               5.88 vmx_save_host_state
>> 7               4.58 vmx_handle_external_intr
>> 7               4.58 vmcs_writel
>> 6               3.92 vmx_handle_exit
>> 6               3.92 __vmx_load_host_state.part.43
>> 4               2.61 clear_atomic_switch_msr
>> 3               1.96 vmx_vm_has_apicv
>> 3               1.96 vmx_get_rflags
>> 2               1.31 vmx_read_guest_seg_selector
>> 2               1.31 vmcs_clear_bits
>> 2               1.31 skip_emulated_instruction
>> 1               0.65 vmx_set_rflags
>> 1               0.65 vmx_vcpu_load
>>
>> Thank you,
>> Xin
>>
>>
>>
>> On Mon, Feb 3, 2014 at 2:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> Il 02/02/2014 16:47, Xin Tong ha scritto:
>>>
>>>> On Sun, Feb 2, 2014 at 5:37 AM, Paolo Bonzini <pbonzini@redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Il 02/02/2014 03:08, Xin Tong ha scritto:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am getting very weird profile results by running operf on linux on
>>>>>> the host and profiling the a kvm virtual machine running dacapo
>>>>>> eclipse benchmark.  I am expecting a lot of time should be spent in
>>>>>> the qemu-system-x86_64 as the instructions from the eclipse benchmark
>>>>>> would be treated as part of the qemu-system-x86_64 process, but the
>>>>>> results tell different. any suggestions ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Most of the time should be spent running the guest.  This is in the
>>>>> context
>>>>> of process qemu-system-x86_64, but it is not running code from the qemu
>>>>> executable.
>>>>
>>>>
>>>>
>>>> That is what i was trying to say. you said it better.
>>>>>
>>>>>
>>>>>
>>>>> What likely happens is that when the profiling counter expires, it
>>>>> causes
>>>>> the VM to exit before the profiling interrupt is delivered.  The sample
>>>>> should then be associated to the kvm_intel module.
>>>>
>>>>
>>>>
>>>> could not the kvm module read the counters and inject an counter
>>>> overflow interrupt into the guest ?  what are some ways to make the
>>>> profile more accurate. this is collected on a intel haswell host
>>>> machine.
>>>
>>>
>>>
>>> Yes, it can.  But then you have to run perf/operf in the guest not in the
>>> host.
>>>
>>> For this to work, you need to specify "-cpu host" on the QEMU command
>>> line.
>>>
>>> Paolo
>>
>> --
>>
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-04 20:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-30  5:06 Measuring KVM Performance using Hardware Performance Counters Xin Tong
2014-02-02  2:08 ` Xin Tong
2014-02-02 11:37   ` Paolo Bonzini
     [not found]     ` <CA+JLOitGuTcX9coon0EKWBuhzuw4_7BxEab3=sk80J_WFakqhA@mail.gmail.com>
     [not found]       ` <52EF4F48.7080609@redhat.com>
2014-02-03 17:06         ` Xin Tong
2014-02-04  3:31           ` Paolo Bonzini
2014-02-04 20:53             ` Xin Tong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.