All of lore.kernel.org
 help / color / mirror / Atom feed
* Optimized clocksource with AMD AVIC enabled for Windows guest
@ 2021-02-03  6:40 Kechen Lu
  2021-02-03  7:58 ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-03  6:40 UTC (permalink / raw)
  To: kvm, qemu-discuss; +Cc: suravee.suthikulpanit, pbonzini, Somdutta Roy

[resent for the previous non-plain text format]
Hi KVM & AMD folks,
 
We are trying to enable AVIC on Windows guest and AMD host machine, on upstream kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr and write_cr8 vmexits. But it seems for Windows guest, we have to give up the Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the best of both the worlds, do we have a more optimized clocksource for Windows guest which could co-exist with AVIC enabled (as now stimer cannot cowork AVIC) ?

Some detailed performance analysis below -
 
From the kvm kernel func kvm_hv_activate_synic in https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC enabling would prevent apicv (for AMD it's AVIC), whereas SynIC is the pre-requisite of stimer. From the actual experiments, without hyper-v stimer, there are a lot of port IO vmexits which potential bring perf down cpu-bound workloads, like geekbench, around 10% of single core performance regressing. As the vmexits result when we enable AVIC but having the hypervclock and rtc as clocksource, without stimer+synic.
 ------------------------------------------------------------------------------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
                  io     575088    43.42%     1.96%      0.68us    100.62us      7.47us ( +-   0.13% )
                 msr     434530    32.81%     0.29%      0.41us    350.50us      1.45us ( +-   0.30% )
                 hlt     308635    23.30%    97.75%      0.43us   3791.74us    693.91us ( +-   0.12% )
           interrupt       4796     0.36%     0.00%      0.33us   1606.17us      1.89us ( +-  18.69% )
           write_cr4        752     0.06%     0.00%      0.53us     34.80us      1.42us ( +-   3.97% )
            read_cr4        376     0.03%     0.00%      0.40us      1.32us      0.62us ( +-   1.22% )
                 npf         85     0.01%     0.00%      1.68us     57.95us      8.33us ( +-  12.54% )
               pause         71     0.01%     0.00%      0.36us      1.44us      0.62us ( +-   3.45% )
               cpuid         50     0.00%     0.00%      0.33us      1.11us      0.45us ( +-   5.94% )
           hypercall         10     0.00%     0.00%      0.81us      1.42us      1.12us ( +-   5.87% )
                 nmi          1     0.00%     0.00%      0.67us      0.67us      0.67us ( +-   0.00% )
Total Samples:1324394, Total events handled time:219105470.74us.
-----------------------------------------------------------------------------------------------------------
It shows dramatically high IO vmexits, and we can further see which IO ports Windows guest accessed.
-----------------------------------------------------
Analyze events for all VMs, all VCPUs:
 
      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time
 
           0x70:POUT     287544    50.00%    13.10%      0.40us     23.48us      0.53us ( +-   0.06% )
            0x71:PIN     226154    39.33%     7.60%      0.31us     22.91us      0.39us ( +-   0.08% )
           0x71:POUT      61390    10.67%    79.31%     12.92us     69.99us     14.95us ( +-   0.09% )
 
Total Samples:575088, Total events handled time:1156983.53us.
---------------------------------------------
However 0070-0071 are rtc0 port, which means there are horrible guest RTC access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics look much better over IO and MSR, as below.
-----------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
                 hlt     166815    38.30%    99.66%      0.44us   1556.67us    809.48us ( +-   0.11% )
           interrupt     146218    33.57%     0.13%      0.30us   1362.10us      1.19us ( +-   1.50% )
                 msr     105267    24.17%     0.20%      0.37us     87.47us      2.51us ( +-   0.31% )
               vintr       9285     2.13%     0.01%      0.50us      1.92us      0.78us ( +-   0.16% )
           write_cr8       7537     1.73%     0.00%      0.31us     49.14us      0.66us ( +-   1.08% )
               cpuid        174     0.04%     0.00%      0.31us      1.39us      0.46us ( +-   3.21% )
                 npf        143     0.03%     0.00%      1.49us    237.66us     21.04us ( +-  12.04% )
           write_cr4         32     0.01%     0.00%      0.93us      5.78us      2.10us ( +-  11.38% )
               pause         22     0.01%     0.00%      0.45us      1.33us      0.84us ( +-   5.46% )
            read_cr4         16     0.00%     0.00%      0.47us      0.68us      0.60us ( +-   2.19% )
                 nmi         11     0.00%     0.00%      0.35us      0.70us      0.54us ( +-   5.06% )
           write_dr7          2     0.00%     0.00%      0.43us      0.45us      0.44us ( +-   2.27% )
           hypercall          1     0.00%     0.00%      0.97us      0.97us      0.97us ( +-   0.00% )
Total Samples:435523, Total events handled time:135488497.29us.
---------------------------------
From the above observations, trying to see if there's a way for enabling AVIC while also having the most optimized clock source for windows guest.
 
Really appreciated and looking forward to your response.

Best Regards,
Kechen



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-25 10:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03  6:40 Optimized clocksource with AMD AVIC enabled for Windows guest Kechen Lu
2021-02-03  7:58 ` Paolo Bonzini
2021-02-03  9:15   ` Vitaly Kuznetsov
2021-02-04  2:05     ` Kechen Lu
2021-02-04 12:24       ` Vitaly Kuznetsov
2021-02-04 13:35         ` Paolo Bonzini
2021-02-04 15:01           ` Vitaly Kuznetsov
2021-02-04 15:19             ` Vitaly Kuznetsov
2021-02-05  5:38               ` Kechen Lu
2021-02-17 20:41                 ` Kechen Lu
2021-02-25 10:25                   ` Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.