* Optimized clocksource with AMD AVIC enabled for Windows guest
@ 2021-02-03 6:40 Kechen Lu
2021-02-03 7:58 ` Paolo Bonzini
0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-03 6:40 UTC (permalink / raw)
To: kvm, qemu-discuss; +Cc: suravee.suthikulpanit, pbonzini, Somdutta Roy
[resent for the previous non-plain text format]
Hi KVM & AMD folks,
We are trying to enable AVIC on Windows guest and AMD host machine, on upstream kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr and write_cr8 vmexits. But it seems for Windows guest, we have to give up the Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the best of both the worlds, do we have a more optimized clocksource for Windows guest which could co-exist with AVIC enabled (as now stimer cannot cowork AVIC) ?
Some detailed performance analysis below -
From the kvm kernel func kvm_hv_activate_synic in https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC enabling would prevent apicv (for AMD it's AVIC), whereas SynIC is the pre-requisite of stimer. From the actual experiments, without hyper-v stimer, there are a lot of port IO vmexits which potential bring perf down cpu-bound workloads, like geekbench, around 10% of single core performance regressing. As the vmexits result when we enable AVIC but having the hypervclock and rtc as clocksource, without stimer+synic.
------------------------------------------------------------------------------------------------------------
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
io 575088 43.42% 1.96% 0.68us 100.62us 7.47us ( +- 0.13% )
msr 434530 32.81% 0.29% 0.41us 350.50us 1.45us ( +- 0.30% )
hlt 308635 23.30% 97.75% 0.43us 3791.74us 693.91us ( +- 0.12% )
interrupt 4796 0.36% 0.00% 0.33us 1606.17us 1.89us ( +- 18.69% )
write_cr4 752 0.06% 0.00% 0.53us 34.80us 1.42us ( +- 3.97% )
read_cr4 376 0.03% 0.00% 0.40us 1.32us 0.62us ( +- 1.22% )
npf 85 0.01% 0.00% 1.68us 57.95us 8.33us ( +- 12.54% )
pause 71 0.01% 0.00% 0.36us 1.44us 0.62us ( +- 3.45% )
cpuid 50 0.00% 0.00% 0.33us 1.11us 0.45us ( +- 5.94% )
hypercall 10 0.00% 0.00% 0.81us 1.42us 1.12us ( +- 5.87% )
nmi 1 0.00% 0.00% 0.67us 0.67us 0.67us ( +- 0.00% )
Total Samples:1324394, Total events handled time:219105470.74us.
-----------------------------------------------------------------------------------------------------------
It shows dramatically high IO vmexits, and we can further see which IO ports Windows guest accessed.
-----------------------------------------------------
Analyze events for all VMs, all VCPUs:
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 287544 50.00% 13.10% 0.40us 23.48us 0.53us ( +- 0.06% )
0x71:PIN 226154 39.33% 7.60% 0.31us 22.91us 0.39us ( +- 0.08% )
0x71:POUT 61390 10.67% 79.31% 12.92us 69.99us 14.95us ( +- 0.09% )
Total Samples:575088, Total events handled time:1156983.53us.
---------------------------------------------
However 0070-0071 are rtc0 port, which means there are horrible guest RTC access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics look much better over IO and MSR, as below.
-----------------------------------------
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
hlt 166815 38.30% 99.66% 0.44us 1556.67us 809.48us ( +- 0.11% )
interrupt 146218 33.57% 0.13% 0.30us 1362.10us 1.19us ( +- 1.50% )
msr 105267 24.17% 0.20% 0.37us 87.47us 2.51us ( +- 0.31% )
vintr 9285 2.13% 0.01% 0.50us 1.92us 0.78us ( +- 0.16% )
write_cr8 7537 1.73% 0.00% 0.31us 49.14us 0.66us ( +- 1.08% )
cpuid 174 0.04% 0.00% 0.31us 1.39us 0.46us ( +- 3.21% )
npf 143 0.03% 0.00% 1.49us 237.66us 21.04us ( +- 12.04% )
write_cr4 32 0.01% 0.00% 0.93us 5.78us 2.10us ( +- 11.38% )
pause 22 0.01% 0.00% 0.45us 1.33us 0.84us ( +- 5.46% )
read_cr4 16 0.00% 0.00% 0.47us 0.68us 0.60us ( +- 2.19% )
nmi 11 0.00% 0.00% 0.35us 0.70us 0.54us ( +- 5.06% )
write_dr7 2 0.00% 0.00% 0.43us 0.45us 0.44us ( +- 2.27% )
hypercall 1 0.00% 0.00% 0.97us 0.97us 0.97us ( +- 0.00% )
Total Samples:435523, Total events handled time:135488497.29us.
---------------------------------
From the above observations, trying to see if there's a way for enabling AVIC while also having the most optimized clock source for windows guest.
Really appreciated and looking forward to your response.
Best Regards,
Kechen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-03 6:40 Optimized clocksource with AMD AVIC enabled for Windows guest Kechen Lu
@ 2021-02-03 7:58 ` Paolo Bonzini
2021-02-03 9:15 ` Vitaly Kuznetsov
0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-02-03 7:58 UTC (permalink / raw)
To: Kechen Lu, kvm, qemu-discuss; +Cc: suravee.suthikulpanit, Somdutta Roy
On 03/02/21 07:40, Kechen Lu wrote:
> From the above observations, trying to see if there's a way for
> enabling AVIC while also having the most optimized clock source for
> windows guest.
>
You would have to change KVM, so that AVIC is only disabled if Auto-EOI
interrupts are used.
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-03 7:58 ` Paolo Bonzini
@ 2021-02-03 9:15 ` Vitaly Kuznetsov
2021-02-04 2:05 ` Kechen Lu
0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-03 9:15 UTC (permalink / raw)
To: Paolo Bonzini, Kechen Lu
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 03/02/21 07:40, Kechen Lu wrote:
>> From the above observations, trying to see if there's a way for
>> enabling AVIC while also having the most optimized clock source for
>> windows guest.
>>
>
> You would have to change KVM, so that AVIC is only disabled if Auto-EOI
> interrupts are used.
>
(I vaguely recall having this was discussed already but apparently no
changes were made since)
Hyper-V TLFS defines the following bit:
CPUID 0x40000004.EAX
Bit 9: Recommend deprecating AutoEOI.
But this is merely a recommendation and older Windows versions may not
know about the bit and still use it. We need to make sure the bit is
set/exposed to Windows guests but we also must track AutoEOI usage and
inhibit AVIC when detected.
--
Vitaly
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-03 9:15 ` Vitaly Kuznetsov
@ 2021-02-04 2:05 ` Kechen Lu
2021-02-04 12:24 ` Vitaly Kuznetsov
0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-04 2:05 UTC (permalink / raw)
To: Vitaly Kuznetsov, Paolo Bonzini
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
Hi Vitaly and Paolo,
Thanks so much for quick reply. This makes sense to me. From my understanding, basically this can be two part of it to resolve it.
First, we make sure to set and expose 0x40000004.EAX Bit9 to windows guest, like in kvm_vcpu_ioctl_get_hv_cpuid(), having this recommendation bit :
-----------------------
case HYPERV_CPUID_ENLIGHTMENT_INFO:
...
+ ent->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
-----------------------
Second, although the above could tell guest to deprecate AutoEOI, older Windows OSes would not acknowledge this (I checked the Hyper-v TLFS, from spec v3.0 (i.e. Windows Server 2012), it starts having bit9 defined in 0x40000004.EAX), we may want to dynamically toggle off APICv/AVIC if we found the SynIC SINT vector has AutoEOI, under synic_update_vector(). E.g. like:
-----------------------------
if (synic_has_vector_auto_eoi(synic, vector)) {
kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
__set_bit(vector, synic->auto_eoi_bitmap);
} else {
kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_HYPERV);
__clear_bit(vector, synic->auto_eoi_bitmap);
}
---------------------------------
Curious about what current plan/status of upstream is for this. If that's doable and not current pending patch covering this, I can make a quick draft patch tested and sent out for reviewing.
Best Regards,
Kechen
>-----Original Message-----
>From: Vitaly Kuznetsov <vkuznets@redhat.com>
>Sent: Wednesday, February 3, 2021 1:16 AM
>To: Paolo Bonzini <pbonzini@redhat.com>; Kechen Lu <kechenl@nvidia.com>
>Cc: suravee.suthikulpanit@amd.com; Somdutta Roy <somduttar@nvidia.com>;
>kvm@vger.kernel.org; qemu-discuss@nongnu.org
>Subject: Re: Optimized clocksource with AMD AVIC enabled for Windows guest
>
>External email: Use caution opening links or attachments
>
>
>Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> On 03/02/21 07:40, Kechen Lu wrote:
>>> From the above observations, trying to see if there's a way for
>>> enabling AVIC while also having the most optimized clock source for
>>> windows guest.
>>>
>>
>> You would have to change KVM, so that AVIC is only disabled if
>> Auto-EOI interrupts are used.
>>
>
>(I vaguely recall having this was discussed already but apparently no changes
>were made since)
>
>Hyper-V TLFS defines the following bit:
>
>CPUID 0x40000004.EAX
>Bit 9: Recommend deprecating AutoEOI.
>
>But this is merely a recommendation and older Windows versions may not know
>about the bit and still use it. We need to make sure the bit is set/exposed to
>Windows guests but we also must track AutoEOI usage and inhibit AVIC when
>detected.
>
>--
>Vitaly
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-04 2:05 ` Kechen Lu
@ 2021-02-04 12:24 ` Vitaly Kuznetsov
2021-02-04 13:35 ` Paolo Bonzini
0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-04 12:24 UTC (permalink / raw)
To: Kechen Lu, Paolo Bonzini
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
Kechen Lu <kechenl@nvidia.com> writes:
> Hi Vitaly and Paolo,
>
> Thanks so much for quick reply. This makes sense to me. From my understanding, basically this can be two part of it to resolve it.
>
> First, we make sure to set and expose 0x40000004.EAX Bit9 to windows guest, like in kvm_vcpu_ioctl_get_hv_cpuid(), having this recommendation bit :
> -----------------------
> case HYPERV_CPUID_ENLIGHTMENT_INFO:
> ...
> + ent->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
> -----------------------
This also needs to be wired through userspace (e.g. QEMU) as this
doesn't go to the guest directly.
>
> Second, although the above could tell guest to deprecate AutoEOI, older Windows OSes would not acknowledge this (I checked the Hyper-v TLFS, from spec v3.0 (i.e. Windows Server 2012), it starts having bit9 defined in 0x40000004.EAX), we may want to dynamically toggle off APICv/AVIC if we found the SynIC SINT vector has AutoEOI, under synic_update_vector(). E.g. like:
> -----------------------------
> if (synic_has_vector_auto_eoi(synic, vector)) {
> kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
> __set_bit(vector, synic->auto_eoi_bitmap);
> } else {
> kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_HYPERV);
> __clear_bit(vector, synic->auto_eoi_bitmap);
> }
> ---------------------------------
APICV_INHIBIT_REASON_HYPERV is per-VM so we need to count how many
AutoEOI SINTs were set in *all* SynICs (an atomic in 'struct kvm_hv'
would do).
> Curious about what current plan/status of upstream is for this. If
> that's doable and not current pending patch covering this, I can make
> a quick draft patch tested and sent out for reviewing.
I checked Linux VMs on genuine Hyper-V and surprisingly
'HV_DEPRECATING_AEOI_RECOMMENDED' is not exposed. I'm going to pass it
to WS2016/2019 and see what happens. If it all works as expected and if
you don't beat me to it I'll be sending a patch.
--
Vitaly
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-04 12:24 ` Vitaly Kuznetsov
@ 2021-02-04 13:35 ` Paolo Bonzini
2021-02-04 15:01 ` Vitaly Kuznetsov
0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-02-04 13:35 UTC (permalink / raw)
To: Vitaly Kuznetsov, Kechen Lu
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
On 04/02/21 13:24, Vitaly Kuznetsov wrote:
> I checked Linux VMs on genuine Hyper-V and surprisingly
> 'HV_DEPRECATING_AEOI_RECOMMENDED' is not exposed.
Did the host have APICv/AVIC (and can Hyper-V use AVIC)? AutoEOI is
still a useful optimization on hosts that don't have
hardware-accelerated EOI or interrupt injection.
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-04 13:35 ` Paolo Bonzini
@ 2021-02-04 15:01 ` Vitaly Kuznetsov
2021-02-04 15:19 ` Vitaly Kuznetsov
0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-04 15:01 UTC (permalink / raw)
To: Paolo Bonzini, Kechen Lu
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 04/02/21 13:24, Vitaly Kuznetsov wrote:
>> I checked Linux VMs on genuine Hyper-V and surprisingly
>> 'HV_DEPRECATING_AEOI_RECOMMENDED' is not exposed.
>
> Did the host have APICv/AVIC (and can Hyper-V use AVIC)? AutoEOI is
> still a useful optimization on hosts that don't have
> hardware-accelerated EOI or interrupt injection.
I was under the impression that for Intel I need IvyBridge, I was
testing with Xeon E5-2420 v2. I don't have an AMD host with Hyper-V
handy so I spun a VM on Azure which has modern enough AMD EPYC 7452,
still no luck.
Surprisingly, Linux on KVM has code to handle AutoEOI recommendation
since 2017 (6c248aad81c89) so I assume it's possible to meet this bit in
the wild.
Anyway, I've smoke tested the attached patch (poorly tested and
hackish!) on Intel/AMD and WS2016 and nothing blew up
immediately. Kechen Lu, could you give it a spin in your
environment? No userspace changes needed (will change if we decide to go
ahead with it).
--
Vitaly
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-KVM-x86-Deactivate-APICv-only-when-auto_eoi-feature-.patch --]
[-- Type: text/x-patch, Size: 3624 bytes --]
From cb129501199f1f3ab6f0ade81b11eb76d08b6b5b Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 4 Feb 2021 13:31:41 +0100
Subject: [PATCH] KVM: x86: Deactivate APICv only when auto_eoi feature is in
use
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/cpuid.c | 5 +++++
arch/x86/kvm/hyperv.c | 26 ++++++++++++++++++++------
3 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d6616f6f6ef..539fbb505d77 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -877,6 +877,9 @@ struct kvm_hv {
/* How many vCPUs have VP index != vCPU index */
atomic_t num_mismatched_vp_indexes;
+ /* How many SynICs use 'auto_eoi' feature */
+ atomic_t synic_auto_eoi_used;
+
struct hv_partition_assist_pg *hv_pa_pg;
struct kvm_hv_syndbg hv_syndbg;
};
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 13036cf0b912..8df2dff37a5c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -138,6 +138,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
+ /* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! */
+ best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0);
+ if (best)
+ best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
+
if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
if (best)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 922c69dcca4d..7c9bc060889a 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -83,6 +83,11 @@ static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
int vector)
{
+ struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_hv *hv = &kvm->arch.hyperv;
+ int auto_eoi_old, auto_eoi_new;
+
if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
return;
@@ -91,10 +96,25 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
else
__clear_bit(vector, synic->vec_bitmap);
+ auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
if (synic_has_vector_auto_eoi(synic, vector))
__set_bit(vector, synic->auto_eoi_bitmap);
else
__clear_bit(vector, synic->auto_eoi_bitmap);
+
+ auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
+ /* Hyper-V SynIC auto EOI SINT's are not compatible with APICV */
+ if (!auto_eoi_old && auto_eoi_new) {
+ if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
+ kvm_request_apicv_update(vcpu->kvm, false,
+ APICV_INHIBIT_REASON_HYPERV);
+ } else if (!auto_eoi_old && auto_eoi_new) {
+ if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
+ kvm_request_apicv_update(vcpu->kvm, true,
+ APICV_INHIBIT_REASON_HYPERV);
+ }
}
static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
@@ -903,12 +923,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
{
struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
- /*
- * Hyper-V SynIC auto EOI SINT's are
- * not compatible with APICV, so request
- * to deactivate APICV permanently.
- */
- kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
synic->active = true;
synic->dont_zero_synic_pages = dont_zero_synic_pages;
synic->control = HV_SYNIC_CONTROL_ENABLE;
--
2.29.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-04 15:01 ` Vitaly Kuznetsov
@ 2021-02-04 15:19 ` Vitaly Kuznetsov
2021-02-05 5:38 ` Kechen Lu
0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-04 15:19 UTC (permalink / raw)
To: Paolo Bonzini, Kechen Lu
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> +
> + auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
> +
> + /* Hyper-V SynIC auto EOI SINT's are not compatible with APICV */
> + if (!auto_eoi_old && auto_eoi_new) {
> + if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
> + kvm_request_apicv_update(vcpu->kvm, false,
> + APICV_INHIBIT_REASON_HYPERV);
> + } else if (!auto_eoi_old && auto_eoi_new) {
Sigh, this 'else' should be
} else if (!auto_eoi_new && auto_eoi_old) {
...
> + if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
> + kvm_request_apicv_update(vcpu->kvm, true,
> + APICV_INHIBIT_REASON_HYPERV);
> + }
> }
>
> static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
> @@ -903,12 +923,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
> {
> struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
>
> - /*
> - * Hyper-V SynIC auto EOI SINT's are
> - * not compatible with APICV, so request
> - * to deactivate APICV permanently.
> - */
> - kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
> synic->active = true;
> synic->dont_zero_synic_pages = dont_zero_synic_pages;
> synic->control = HV_SYNIC_CONTROL_ENABLE;
--
Vitaly
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-04 15:19 ` Vitaly Kuznetsov
@ 2021-02-05 5:38 ` Kechen Lu
2021-02-17 20:41 ` Kechen Lu
0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-05 5:38 UTC (permalink / raw)
To: Vitaly Kuznetsov, Paolo Bonzini
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
Cool! Thanks for correction, yeah, APICV_INHIBIT_REASON_HYPERV setting is per-VM while synic is per-vCPU. Since the machine with AVIC is not in my hands today, I will test it hopefully by end of this week:)
BR,
Kechen
>-----Original Message-----
>From: Vitaly Kuznetsov <vkuznets@redhat.com>
>Sent: Thursday, February 4, 2021 7:19 AM
>To: Paolo Bonzini <pbonzini@redhat.com>; Kechen Lu <kechenl@nvidia.com>
>Cc: suravee.suthikulpanit@amd.com; Somdutta Roy <somduttar@nvidia.com>;
>kvm@vger.kernel.org; qemu-discuss@nongnu.org
>Subject: Re: Optimized clocksource with AMD AVIC enabled for Windows guest
>
>External email: Use caution opening links or attachments
>
>
>Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>
>> +
>> + auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
>> +
>> + /* Hyper-V SynIC auto EOI SINT's are not compatible with APICV */
>> + if (!auto_eoi_old && auto_eoi_new) {
>> + if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
>> + kvm_request_apicv_update(vcpu->kvm, false,
>> + APICV_INHIBIT_REASON_HYPERV);
>> + } else if (!auto_eoi_old && auto_eoi_new) {
>
>Sigh, this 'else' should be
>
>} else if (!auto_eoi_new && auto_eoi_old) {
>
>...
>
>> + if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
>> + kvm_request_apicv_update(vcpu->kvm, true,
>> + APICV_INHIBIT_REASON_HYPERV);
>> + }
>> }
>>
>> static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
>> @@ -903,12 +923,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu,
>> bool dont_zero_synic_pages) {
>> struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
>>
>> - /*
>> - * Hyper-V SynIC auto EOI SINT's are
>> - * not compatible with APICV, so request
>> - * to deactivate APICV permanently.
>> - */
>> - kvm_request_apicv_update(vcpu->kvm, false,
>APICV_INHIBIT_REASON_HYPERV);
>> synic->active = true;
>> synic->dont_zero_synic_pages = dont_zero_synic_pages;
>> synic->control = HV_SYNIC_CONTROL_ENABLE;
>
>--
>Vitaly
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-05 5:38 ` Kechen Lu
@ 2021-02-17 20:41 ` Kechen Lu
2021-02-25 10:25 ` Vitaly Kuznetsov
0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-17 20:41 UTC (permalink / raw)
To: Vitaly Kuznetsov, Paolo Bonzini
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss
Hi Vitaly and Paolo,
Sorry for the delay in response, finally got chance to access a machine with AVIC, and was able to test out the patch and reconfirm through some benchmarks and tests again today:)
In summary, this patch works well and resolves the issues on clocksource caused high port I/O vmexits. With AVIC=1 && stimer/synic=1,
1. CPU intensive workload CPU-z shows SingleThread score 15% improvement 382.1=> 441.7,
2. disk I/O intensive workload Passmark Disk Test gives 4% improvement 12706=> 13265,
3. Vmexits pattern of 30s record while running cpu workload Geekbench in guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and NPF vmexits, when we get stimer benefit plus AVIC. Details as below:
AVIC=1 && stimer/synic=0 && vapic=0:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
io 344654 68.29% 1.10% 0.67us 2132.72us 7.01us ( +- 0.19% )
hlt 114046 22.60% 98.85% 0.42us 16666.32us 1903.26us ( +- 0.66% )
avic_incomplete_ipi 19679 3.90% 0.03% 0.38us 22.67us 3.66us ( +- 0.71% )
npf 8186 1.62% 0.01% 0.37us 235.76us 1.46us ( +- 4.20% )
........
AVIC=1 && stimer/synic=1 && vapic=0:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
io 31995 38.61% 0.10% 2.79us 65.83us 6.70us ( +- 0.35% )
hlt 22915 27.65% 99.88% 0.42us 15959.14us 9535.38us ( +- 0.50% )
avic_incomplete_ipi 8271 9.98% 0.01% 0.39us 79.03us 3.58us ( +- 1.23% )
npf 1232 1.49% 0.00% 0.36us 100.25us 2.58us ( +- 6.98% )
..........
While testing, I also found out hv-vapic should be disabled as well to make AVIC fully functional, otherwise it shows high vmexits due to MSR writes which seems to be due to increased access to HV_X64_MSR_EOI and HV_X64_MSR_ICR. This makes sense to me, since AVIC conflicts with PV EOI/ICR accesses. So far I think AVIC=1 && hv-vapic=0 && stimer/synic=1 combination gives us the best performance. However, AVIC=1 && hv-vapic=0 && stimer/synic=1 is really unstable, and sometimes would lead to boot. Wanted to understand if instabilities with APICv/AVIC is a known bug/issue in upstream? Attached the reproducible kernel warning in the bottom.
In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and still produce great benefits on vmexits optimization. Thanks all you folks help so much, hope the patch in kernel and bit expose patch in QEMU could get into upstream soon along with fixing the instabilities.
Best Regards,
Kechen
---------------------------------------------------------------------------------------
[ 7962.437584] ------------[ cut here ]------------
[ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f
[ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349 avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
[ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds snd_seq snd_pcm snd_seq_device snd_timer snd soundcore k10temp mac_hid sch_fq_codel lm92 parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf hid_generic usbhid hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci xhci_pci_renesas i2c_piix4 atlantic macsec wmi [last unloaded: ccp]
[ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P W OE 5.8.0-41-generic #46
[ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
[ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89 e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b e9 04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df
[ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282
[ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99347f118cd8
[ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff99347f118cd0
[ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: 0000000000000831
[ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: 040000000000082f
[ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: 0000000000000002
[ 7962.437640] FS: 0000000000000000(0053) GS:ffff99347f100000(002b) knlGS:fffff80470728000
[ 7962.437640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: 0000000000340ee0
[ 7962.437641] Call Trace:
[ 7962.437646] handle_exit+0x134/0x420 [kvm_amd]
[ 7962.437661] ? kvm_set_cr8+0x22/0x40 [kvm]
[ 7962.437674] vcpu_enter_guest+0x862/0xd90 [kvm]
[ 7962.437687] vcpu_run+0x76/0x240 [kvm]
[ 7962.437699] kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm]
[ 7962.437711] kvm_vcpu_ioctl+0x247/0x600 [kvm]
[ 7962.437714] ksys_ioctl+0x8e/0xc0
[ 7962.437715] __x64_sys_ioctl+0x1a/0x20
[ 7962.437717] do_syscall_64+0x49/0xc0
[ 7962.437719] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7962.437720] RIP: 0033:0x7f4c09b1131b
[ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
[ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4c09b1131b
[ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
[ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: 0000000000000004
[ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: 00007f4bedffb640
[ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]---
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
2021-02-17 20:41 ` Kechen Lu
@ 2021-02-25 10:25 ` Vitaly Kuznetsov
0 siblings, 0 replies; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-25 10:25 UTC (permalink / raw)
To: Kechen Lu
Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss, Paolo Bonzini
Kechen Lu <kechenl@nvidia.com> writes:
> Hi Vitaly and Paolo,
>
> Sorry for the delay in response, finally got chance to access a machine with AVIC, and was able to test out the patch and reconfirm through some benchmarks and tests again today:)
>
> In summary, this patch works well and resolves the issues on clocksource caused high port I/O vmexits. With AVIC=1 && stimer/synic=1,
>
> 1. CPU intensive workload CPU-z shows SingleThread score 15% improvement 382.1=> 441.7,
>
> 2. disk I/O intensive workload Passmark Disk Test gives 4% improvement 12706=> 13265,
>
> 3. Vmexits pattern of 30s record while running cpu workload Geekbench in guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and NPF vmexits, when we get stimer benefit plus AVIC. Details as below:
>
> AVIC=1 && stimer/synic=0 && vapic=0:
>
> VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
>
> io 344654 68.29% 1.10% 0.67us 2132.72us 7.01us ( +- 0.19% )
> hlt 114046 22.60% 98.85% 0.42us 16666.32us 1903.26us ( +- 0.66% )
> avic_incomplete_ipi 19679 3.90% 0.03% 0.38us 22.67us 3.66us ( +- 0.71% )
> npf 8186 1.62% 0.01% 0.37us 235.76us 1.46us ( +- 4.20% )
> ........
>
>
> AVIC=1 && stimer/synic=1 && vapic=0:
>
> VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
>
> io 31995 38.61% 0.10% 2.79us 65.83us 6.70us ( +- 0.35% )
> hlt 22915 27.65% 99.88% 0.42us 15959.14us 9535.38us ( +- 0.50% )
> avic_incomplete_ipi 8271 9.98% 0.01% 0.39us 79.03us 3.58us ( +- 1.23% )
> npf 1232 1.49% 0.00% 0.36us 100.25us 2.58us ( +- 6.98% )
> ..........
>
> While testing, I also found out hv-vapic should be disabled as well to
> make AVIC fully functional, otherwise it shows high vmexits due to MSR
> writes which seems to be due to increased access to HV_X64_MSR_EOI
> and HV_X64_MSR_ICR. This makes sense to me, since AVIC conflicts with
> PV EOI/ICR accesses. So far I think AVIC=1 && hv-vapic=0 &&
> stimer/synic=1 combination gives us the best performance. However,
> AVIC=1 && hv-vapic=0 && stimer/synic=1 is really unstable, and
> sometimes would lead to boot. Wanted to understand if instabilities
> with APICv/AVIC is a known bug/issue in upstream? Attached the
> reproducible kernel warning in the bottom.
Now it's my turn to apologize for the delayed reply :-)
I think it's our fault,
BIT(3) in HYPERV_CPUID_ENLIGHTMENT_INFO is
HV_X64_APIC_ACCESS_RECOMMENDED
which can be deciphered as
"Recommend using MSRs for accessing APIC registers EOI, ICR and TPR
rather than their memory-mapped counterparts"
And we shouldn't be setting it with AVIC. The following hack is supposed
to help:
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c8f2592ccc99..66ee85a83e9a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -145,6 +145,13 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
vcpu->arch.ia32_misc_enable_msr &
MSR_IA32_MISC_ENABLE_MWAIT);
}
+
+ /* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! */
+ best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0);
+ if (best) {
+ best->eax &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+ best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
+ }
}
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
(we'll need to find a proper way to set these settings in QEMU).
Could you give it a spin? ("AVIC=1 && hv-vapic=1 && stimer/synic=1" configuration)
>
> In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and still produce great benefits on vmexits optimization. Thanks all you folks help so much, hope the patch in kernel and bit expose patch in QEMU could get into upstream soon along with fixing the instabilities.
>
> Best Regards,
> Kechen
>
> ---------------------------------------------------------------------------------------
> [ 7962.437584] ------------[ cut here ]------------
> [ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f
> [ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349 avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
> [ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds snd_seq snd_pcm snd_seq_device snd_timer snd soundcore k10temp mac_hid sch_fq_codel lm92 parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf hid_generic usbhid hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci xhci_pci_renesas i2c_piix4 atlantic macsec wmi [last unloaded: ccp]
> [ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P W OE 5.8.0-41-generic #46
> [ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
No, this is not somthing I'm aware of. Do you know if it reproduces on
the latest upstream?
> [ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89 e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b e9 04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df
> [ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282
> [ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99347f118cd8
> [ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff99347f118cd0
> [ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: 0000000000000831
> [ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: 040000000000082f
> [ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: 0000000000000002
> [ 7962.437640] FS: 0000000000000000(0053) GS:ffff99347f100000(002b) knlGS:fffff80470728000
> [ 7962.437640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: 0000000000340ee0
> [ 7962.437641] Call Trace:
> [ 7962.437646] handle_exit+0x134/0x420 [kvm_amd]
> [ 7962.437661] ? kvm_set_cr8+0x22/0x40 [kvm]
> [ 7962.437674] vcpu_enter_guest+0x862/0xd90 [kvm]
> [ 7962.437687] vcpu_run+0x76/0x240 [kvm]
> [ 7962.437699] kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm]
> [ 7962.437711] kvm_vcpu_ioctl+0x247/0x600 [kvm]
> [ 7962.437714] ksys_ioctl+0x8e/0xc0
> [ 7962.437715] __x64_sys_ioctl+0x1a/0x20
> [ 7962.437717] do_syscall_64+0x49/0xc0
> [ 7962.437719] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 7962.437720] RIP: 0033:0x7f4c09b1131b
> [ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
> [ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4c09b1131b
> [ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
> [ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: 0000000000000004
> [ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: 00007f4bedffb640
> [ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]---
>
--
Vitaly
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-02-25 10:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03 6:40 Optimized clocksource with AMD AVIC enabled for Windows guest Kechen Lu
2021-02-03 7:58 ` Paolo Bonzini
2021-02-03 9:15 ` Vitaly Kuznetsov
2021-02-04 2:05 ` Kechen Lu
2021-02-04 12:24 ` Vitaly Kuznetsov
2021-02-04 13:35 ` Paolo Bonzini
2021-02-04 15:01 ` Vitaly Kuznetsov
2021-02-04 15:19 ` Vitaly Kuznetsov
2021-02-05 5:38 ` Kechen Lu
2021-02-17 20:41 ` Kechen Lu
2021-02-25 10:25 ` Vitaly Kuznetsov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.