vm performance degradation after kvm live migration or save-restore with ETP enabled

* vm performance degradation after kvm live migration or save-restore with ETP enabled
@ 2013-07-11  9:36 ` Zhanghaoyu (A)
  0 siblings, 0 replies; 52+ messages in thread
From: Zhanghaoyu (A) @ 2013-07-11  9:36 UTC (permalink / raw)
  To: KVM, qemu-devel, cloudfantom, mpetersen, Shouta.Uehara,
	paolo.bonzini, Michael S. Tsirkin
  Cc: Luonengjun, Zanghongyong, Hanweidong, Huangweidong (C)

hi all,

I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest,
https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
https://bugzilla.kernel.org/show_bug.cgi?id=58771

After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process.
oprofile report on this process in guest,
pre live migration:
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        app name                 symbol name
248      12.3016  no-vmlinux               (no symbols)
78        3.8690  libc.so.6                memset
68        3.3730  libc.so.6                memcpy
30        1.4881  cscf.scu                 SipMmBufMemAlloc
29        1.4385  libpthread.so.0          pthread_mutex_lock
26        1.2897  cscf.scu                 SipApiGetNextIe
25        1.2401  cscf.scu                 DBFI_DATA_Search
20        0.9921  libpthread.so.0          __pthread_mutex_unlock_usercnt
16        0.7937  cscf.scu                 DLM_FreeSlice
16        0.7937  cscf.scu                 receivemessage
15        0.7440  cscf.scu                 SipSmCopyString
14        0.6944  cscf.scu                 DLM_AllocSlice

post live migration:
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        app name                 symbol name
1586     42.2370  libc.so.6                memcpy
271       7.2170  no-vmlinux               (no symbols)
83        2.2104  libc.so.6                memset
41        1.0919  libpthread.so.0          __pthread_mutex_unlock_usercnt
35        0.9321  cscf.scu                 SipMmBufMemAlloc
29        0.7723  cscf.scu                 DLM_AllocSlice
28        0.7457  libpthread.so.0          pthread_mutex_lock
23        0.6125  cscf.scu                 SipApiGetNextIe
17        0.4527  cscf.scu                 SipSmCopyString
16        0.4261  cscf.scu                 receivemessage
15        0.3995  cscf.scu                 SipcMsgStatHandle
14        0.3728  cscf.scu                 Urilex
12        0.3196  cscf.scu                 DBFI_DATA_Search
12        0.3196  cscf.scu                 SipDsmGetHdrBitValInner
12        0.3196  cscf.scu                 SipSmGetDataFromRefString

So, memcpy costs much more cpu cycles after live migration. Then, I restart the process, this problem disappeared. save-restore has the similar problem.

perf report on vcpu thread in host,
pre live migration:
Performance counter stats for thread id '21082':

                 0 page-faults
                 0 minor-faults
                 0 major-faults
             31616 cs
               506 migrations
                 0 alignment-faults
                 0 emulation-faults
        5075957539 L1-dcache-loads                                              [21.32%]
         324685106 L1-dcache-load-misses     #    6.40% of all L1-dcache hits   [21.85%]
        3681777120 L1-dcache-stores                                             [21.65%]
          65251823 L1-dcache-store-misses    # 1.77%                                   [22.78%]
                 0 L1-dcache-prefetches                                         [22.84%]
                 0 L1-dcache-prefetch-misses                                    [22.32%]
        9321652613 L1-icache-loads                                              [22.60%]
        1353418869 L1-icache-load-misses     #   14.52% of all L1-icache hits   [21.92%]
         169126969 LLC-loads                                                    [21.87%]
          12583605 LLC-load-misses           #    7.44% of all LL-cache hits    [ 5.84%]
         132853447 LLC-stores                                                   [ 6.61%]
          10601171 LLC-store-misses          #7.9%                                   [ 5.01%]
          25309497 LLC-prefetches             #30%                                  [ 4.96%]
           7723198 LLC-prefetch-misses                                          [ 6.04%]
        4954075817 dTLB-loads                                                   [11.56%]
          26753106 dTLB-load-misses          #    0.54% of all dTLB cache hits  [16.80%]
        3553702874 dTLB-stores                                                  [22.37%]
           4720313 dTLB-store-misses        #0.13%                                    [21.46%]
     <not counted> dTLB-prefetches
     <not counted> dTLB-prefetch-misses

      60.000920666 seconds time elapsed

post live migration:
Performance counter stats for thread id '1579':

                 0 page-faults                                                  [100.00%]
                 0 minor-faults                                                 [100.00%]
                 0 major-faults                                                 [100.00%]
             34979 cs                                                           [100.00%]
               441 migrations                                                   [100.00%]
                 0 alignment-faults                                             [100.00%]
                 0 emulation-faults
        6903585501 L1-dcache-loads                                              [22.06%]
         525939560 L1-dcache-load-misses     #    7.62% of all L1-dcache hits   [21.97%]
        5042552685 L1-dcache-stores                                             [22.20%]
          94493742 L1-dcache-store-misses    #1.8%                                   [22.06%]
                 0 L1-dcache-prefetches                                         [22.39%]
                 0 L1-dcache-prefetch-misses                                    [22.47%]
       13022953030 L1-icache-loads                                              [22.25%]
        1957161101 L1-icache-load-misses     #   15.03% of all L1-icache hits   [22.47%]
         348479792 LLC-loads                                                    [22.27%]
          80662778 LLC-load-misses           #   23.15% of all LL-cache hits    [ 5.64%]
         198745620 LLC-stores                                                   [ 5.63%]
          14236497 LLC-store-misses          #   7.1%                                    [ 5.41%]
          20757435 LLC-prefetches                                               [ 5.42%]
           5361819 LLC-prefetch-misses       #   25%                                [ 5.69%]
        7235715124 dTLB-loads                                                   [11.26%]
          49895163 dTLB-load-misses          #    0.69% of all dTLB cache hits  [16.96%]
        5168276218 dTLB-stores                                                  [22.44%]
           6765983 dTLB-store-misses        #0.13%                                    [22.24%]
     <not counted> dTLB-prefetches
     <not counted> dTLB-prefetch-misses

The "LLC-load-misses" went up by about 16%. Then, I restarted the process in guest, the perf data back to normal,
Performance counter stats for thread id '1579':

                 0 page-faults                                                  [100.00%]
                 0 minor-faults                                                 [100.00%]
                 0 major-faults                                                 [100.00%]
             30594 cs                                                           [100.00%]
               327 migrations                                                   [100.00%]
                 0 alignment-faults                                             [100.00%]
                 0 emulation-faults
        7707091948 L1-dcache-loads                                              [22.10%]
         559829176 L1-dcache-load-misses     #    7.26% of all L1-dcache hits   [22.28%]
        5976654983 L1-dcache-stores                                             [23.22%]
         160436114 L1-dcache-store-misses                                       [22.80%]
                 0 L1-dcache-prefetches                                         [22.51%]
                 0 L1-dcache-prefetch-misses                                    [22.53%]
       13798415672 L1-icache-loads                                              [22.28%]
        2017724676 L1-icache-load-misses     #   14.62% of all L1-icache hits   [22.49%]
         254598008 LLC-loads                                                    [22.86%]
          16035378 LLC-load-misses           #    6.30% of all LL-cache hits    [ 5.36%]
         307019606 LLC-stores                                                   [ 5.60%]
          13665033 LLC-store-misses                                             [ 5.43%]
          17715554 LLC-prefetches                                               [ 5.57%]
           4187006 LLC-prefetch-misses                                          [ 5.44%]
        7811502895 dTLB-loads                                                   [10.72%]
          40547330 dTLB-load-misses          #    0.52% of all dTLB cache hits  [16.31%]
        6144202516 dTLB-stores                                                  [21.58%]
           6313363 dTLB-store-misses                                            [21.91%]
     <not counted> dTLB-prefetches
     <not counted> dTLB-prefetch-misses

      60.000812523 seconds time elapsed

If EPT disabled, this problem gone.

I suspect that kvm hypervisor has business with this problem.
Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1),
and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches.

Any better ideas?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 52+ messages in thread