All of lore.kernel.org
 help / color / mirror / Atom feed
* Optimized clocksource with AMD AVIC enabled for Windows guest
@ 2021-02-03  6:40 Kechen Lu
  2021-02-03  7:58 ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-03  6:40 UTC (permalink / raw)
  To: kvm, qemu-discuss; +Cc: suravee.suthikulpanit, pbonzini, Somdutta Roy

[resent for the previous non-plain text format]
Hi KVM & AMD folks,
 
We are trying to enable AVIC on Windows guest and AMD host machine, on upstream kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr and write_cr8 vmexits. But it seems for Windows guest, we have to give up the Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the best of both the worlds, do we have a more optimized clocksource for Windows guest which could co-exist with AVIC enabled (as now stimer cannot cowork AVIC) ?

Some detailed performance analysis below -
 
From the kvm kernel func kvm_hv_activate_synic in https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC enabling would prevent apicv (for AMD it's AVIC), whereas SynIC is the pre-requisite of stimer. From the actual experiments, without hyper-v stimer, there are a lot of port IO vmexits which potential bring perf down cpu-bound workloads, like geekbench, around 10% of single core performance regressing. As the vmexits result when we enable AVIC but having the hypervclock and rtc as clocksource, without stimer+synic.
 ------------------------------------------------------------------------------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
                  io     575088    43.42%     1.96%      0.68us    100.62us      7.47us ( +-   0.13% )
                 msr     434530    32.81%     0.29%      0.41us    350.50us      1.45us ( +-   0.30% )
                 hlt     308635    23.30%    97.75%      0.43us   3791.74us    693.91us ( +-   0.12% )
           interrupt       4796     0.36%     0.00%      0.33us   1606.17us      1.89us ( +-  18.69% )
           write_cr4        752     0.06%     0.00%      0.53us     34.80us      1.42us ( +-   3.97% )
            read_cr4        376     0.03%     0.00%      0.40us      1.32us      0.62us ( +-   1.22% )
                 npf         85     0.01%     0.00%      1.68us     57.95us      8.33us ( +-  12.54% )
               pause         71     0.01%     0.00%      0.36us      1.44us      0.62us ( +-   3.45% )
               cpuid         50     0.00%     0.00%      0.33us      1.11us      0.45us ( +-   5.94% )
           hypercall         10     0.00%     0.00%      0.81us      1.42us      1.12us ( +-   5.87% )
                 nmi          1     0.00%     0.00%      0.67us      0.67us      0.67us ( +-   0.00% )
Total Samples:1324394, Total events handled time:219105470.74us.
-----------------------------------------------------------------------------------------------------------
It shows dramatically high IO vmexits, and we can further see which IO ports Windows guest accessed.
-----------------------------------------------------
Analyze events for all VMs, all VCPUs:
 
      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time
 
           0x70:POUT     287544    50.00%    13.10%      0.40us     23.48us      0.53us ( +-   0.06% )
            0x71:PIN     226154    39.33%     7.60%      0.31us     22.91us      0.39us ( +-   0.08% )
           0x71:POUT      61390    10.67%    79.31%     12.92us     69.99us     14.95us ( +-   0.09% )
 
Total Samples:575088, Total events handled time:1156983.53us.
---------------------------------------------
However 0070-0071 are rtc0 port, which means there are horrible guest RTC access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics look much better over IO and MSR, as below.
-----------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
                 hlt     166815    38.30%    99.66%      0.44us   1556.67us    809.48us ( +-   0.11% )
           interrupt     146218    33.57%     0.13%      0.30us   1362.10us      1.19us ( +-   1.50% )
                 msr     105267    24.17%     0.20%      0.37us     87.47us      2.51us ( +-   0.31% )
               vintr       9285     2.13%     0.01%      0.50us      1.92us      0.78us ( +-   0.16% )
           write_cr8       7537     1.73%     0.00%      0.31us     49.14us      0.66us ( +-   1.08% )
               cpuid        174     0.04%     0.00%      0.31us      1.39us      0.46us ( +-   3.21% )
                 npf        143     0.03%     0.00%      1.49us    237.66us     21.04us ( +-  12.04% )
           write_cr4         32     0.01%     0.00%      0.93us      5.78us      2.10us ( +-  11.38% )
               pause         22     0.01%     0.00%      0.45us      1.33us      0.84us ( +-   5.46% )
            read_cr4         16     0.00%     0.00%      0.47us      0.68us      0.60us ( +-   2.19% )
                 nmi         11     0.00%     0.00%      0.35us      0.70us      0.54us ( +-   5.06% )
           write_dr7          2     0.00%     0.00%      0.43us      0.45us      0.44us ( +-   2.27% )
           hypercall          1     0.00%     0.00%      0.97us      0.97us      0.97us ( +-   0.00% )
Total Samples:435523, Total events handled time:135488497.29us.
---------------------------------
From the above observations, trying to see if there's a way for enabling AVIC while also having the most optimized clock source for windows guest.
 
Really appreciated and looking forward to your response.

Best Regards,
Kechen



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-03  6:40 Optimized clocksource with AMD AVIC enabled for Windows guest Kechen Lu
@ 2021-02-03  7:58 ` Paolo Bonzini
  2021-02-03  9:15   ` Vitaly Kuznetsov
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-02-03  7:58 UTC (permalink / raw)
  To: Kechen Lu, kvm, qemu-discuss; +Cc: suravee.suthikulpanit, Somdutta Roy

On 03/02/21 07:40, Kechen Lu wrote:
> From the above observations, trying to see if there's a way for
> enabling AVIC while also having the most optimized clock source for
> windows guest.
> 

You would have to change KVM, so that AVIC is only disabled if Auto-EOI 
interrupts are used.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-03  7:58 ` Paolo Bonzini
@ 2021-02-03  9:15   ` Vitaly Kuznetsov
  2021-02-04  2:05     ` Kechen Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-03  9:15 UTC (permalink / raw)
  To: Paolo Bonzini, Kechen Lu
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 03/02/21 07:40, Kechen Lu wrote:
>> From the above observations, trying to see if there's a way for
>> enabling AVIC while also having the most optimized clock source for
>> windows guest.
>> 
>
> You would have to change KVM, so that AVIC is only disabled if Auto-EOI 
> interrupts are used.
>

(I vaguely recall having this was discussed already but apparently no
changes were made since)

Hyper-V TLFS defines the following bit:

CPUID 0x40000004.EAX 
Bit 9: Recommend deprecating AutoEOI.

But this is merely a recommendation and older Windows versions may not
know about the bit and still use it. We need to make sure the bit is
set/exposed to Windows guests but we also must track AutoEOI usage and
inhibit AVIC when detected.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-03  9:15   ` Vitaly Kuznetsov
@ 2021-02-04  2:05     ` Kechen Lu
  2021-02-04 12:24       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-04  2:05 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Paolo Bonzini
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

Hi Vitaly and Paolo,

Thanks so much for quick reply. This makes sense to me. From my understanding, basically this can be two part of it to resolve it. 

First, we make sure to set and expose 0x40000004.EAX Bit9 to windows guest, like in kvm_vcpu_ioctl_get_hv_cpuid(), having this recommendation bit :
-----------------------
case HYPERV_CPUID_ENLIGHTMENT_INFO:
...
+	ent->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
-----------------------

Second, although the above could tell guest to deprecate AutoEOI, older Windows OSes would not acknowledge this (I checked the Hyper-v TLFS, from spec v3.0 (i.e. Windows Server 2012), it starts having bit9 defined in 0x40000004.EAX), we may want to dynamically toggle off APICv/AVIC if we found the SynIC SINT vector has AutoEOI, under synic_update_vector(). E.g. like:
-----------------------------
if (synic_has_vector_auto_eoi(synic, vector)) {
	kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
	__set_bit(vector, synic->auto_eoi_bitmap);
} else {
	kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_HYPERV);
	__clear_bit(vector, synic->auto_eoi_bitmap);
}
---------------------------------

Curious about what current plan/status of upstream is for this. If that's doable and not current pending patch covering this, I can make a quick draft patch tested and sent out for reviewing. 

Best Regards,
Kechen

>-----Original Message-----
>From: Vitaly Kuznetsov <vkuznets@redhat.com>
>Sent: Wednesday, February 3, 2021 1:16 AM
>To: Paolo Bonzini <pbonzini@redhat.com>; Kechen Lu <kechenl@nvidia.com>
>Cc: suravee.suthikulpanit@amd.com; Somdutta Roy <somduttar@nvidia.com>;
>kvm@vger.kernel.org; qemu-discuss@nongnu.org
>Subject: Re: Optimized clocksource with AMD AVIC enabled for Windows guest
>
>External email: Use caution opening links or attachments
>
>
>Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> On 03/02/21 07:40, Kechen Lu wrote:
>>> From the above observations, trying to see if there's a way for
>>> enabling AVIC while also having the most optimized clock source for
>>> windows guest.
>>>
>>
>> You would have to change KVM, so that AVIC is only disabled if
>> Auto-EOI interrupts are used.
>>
>
>(I vaguely recall having this was discussed already but apparently no changes
>were made since)
>
>Hyper-V TLFS defines the following bit:
>
>CPUID 0x40000004.EAX
>Bit 9: Recommend deprecating AutoEOI.
>
>But this is merely a recommendation and older Windows versions may not know
>about the bit and still use it. We need to make sure the bit is set/exposed to
>Windows guests but we also must track AutoEOI usage and inhibit AVIC when
>detected.
>
>--
>Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-04  2:05     ` Kechen Lu
@ 2021-02-04 12:24       ` Vitaly Kuznetsov
  2021-02-04 13:35         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-04 12:24 UTC (permalink / raw)
  To: Kechen Lu, Paolo Bonzini
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

Kechen Lu <kechenl@nvidia.com> writes:

> Hi Vitaly and Paolo,
>
> Thanks so much for quick reply. This makes sense to me. From my understanding, basically this can be two part of it to resolve it. 
>
> First, we make sure to set and expose 0x40000004.EAX Bit9 to windows guest, like in kvm_vcpu_ioctl_get_hv_cpuid(), having this recommendation bit :
> -----------------------
> case HYPERV_CPUID_ENLIGHTMENT_INFO:
> ...
> +	ent->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
> -----------------------

This also needs to be wired through userspace (e.g. QEMU) as this
doesn't go to the guest directly.

>
> Second, although the above could tell guest to deprecate AutoEOI, older Windows OSes would not acknowledge this (I checked the Hyper-v TLFS, from spec v3.0 (i.e. Windows Server 2012), it starts having bit9 defined in 0x40000004.EAX), we may want to dynamically toggle off APICv/AVIC if we found the SynIC SINT vector has AutoEOI, under synic_update_vector(). E.g. like:
> -----------------------------
> if (synic_has_vector_auto_eoi(synic, vector)) {
> 	kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
> 	__set_bit(vector, synic->auto_eoi_bitmap);
> } else {
> 	kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_HYPERV);
> 	__clear_bit(vector, synic->auto_eoi_bitmap);
> }
> ---------------------------------

APICV_INHIBIT_REASON_HYPERV is per-VM so we need to count how many
AutoEOI SINTs were set in *all* SynICs (an atomic in 'struct kvm_hv'
would do).

> Curious about what current plan/status of upstream is for this. If
> that's doable and not current pending patch covering this, I can make
> a quick draft patch tested and sent out for reviewing. 

I checked Linux VMs on genuine Hyper-V and surprisingly
'HV_DEPRECATING_AEOI_RECOMMENDED' is not exposed. I'm going to pass it
to WS2016/2019 and see what happens. If it all works as expected and if
you don't beat me to it I'll be sending a patch.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-04 12:24       ` Vitaly Kuznetsov
@ 2021-02-04 13:35         ` Paolo Bonzini
  2021-02-04 15:01           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-02-04 13:35 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Kechen Lu
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

On 04/02/21 13:24, Vitaly Kuznetsov wrote:
> I checked Linux VMs on genuine Hyper-V and surprisingly
> 'HV_DEPRECATING_AEOI_RECOMMENDED' is not exposed.

Did the host have APICv/AVIC (and can Hyper-V use AVIC)?  AutoEOI is 
still a useful optimization on hosts that don't have 
hardware-accelerated EOI or interrupt injection.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-04 13:35         ` Paolo Bonzini
@ 2021-02-04 15:01           ` Vitaly Kuznetsov
  2021-02-04 15:19             ` Vitaly Kuznetsov
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-04 15:01 UTC (permalink / raw)
  To: Paolo Bonzini, Kechen Lu
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 04/02/21 13:24, Vitaly Kuznetsov wrote:
>> I checked Linux VMs on genuine Hyper-V and surprisingly
>> 'HV_DEPRECATING_AEOI_RECOMMENDED' is not exposed.
>
> Did the host have APICv/AVIC (and can Hyper-V use AVIC)?  AutoEOI is 
> still a useful optimization on hosts that don't have 
> hardware-accelerated EOI or interrupt injection.

I was under the impression that for Intel I need IvyBridge, I was
testing with Xeon E5-2420 v2. I don't have an AMD host with Hyper-V
handy so I spun a VM on Azure which has modern enough AMD EPYC 7452,
still no luck.

Surprisingly, Linux on KVM has code to handle AutoEOI recommendation
since 2017 (6c248aad81c89) so I assume it's possible to meet this bit in
the wild.

Anyway, I've smoke tested the attached patch (poorly tested and
hackish!) on Intel/AMD and WS2016 and nothing blew up
immediately. Kechen Lu, could you give it a spin in your 
environment? No userspace changes needed (will change if we decide to go
ahead with it).

-- 
Vitaly


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-KVM-x86-Deactivate-APICv-only-when-auto_eoi-feature-.patch --]
[-- Type: text/x-patch, Size: 3624 bytes --]

From cb129501199f1f3ab6f0ade81b11eb76d08b6b5b Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 4 Feb 2021 13:31:41 +0100
Subject: [PATCH] KVM: x86: Deactivate APICv only when auto_eoi feature is in
 use

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/cpuid.c            |  5 +++++
 arch/x86/kvm/hyperv.c           | 26 ++++++++++++++++++++------
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d6616f6f6ef..539fbb505d77 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -877,6 +877,9 @@ struct kvm_hv {
 	/* How many vCPUs have VP index != vCPU index */
 	atomic_t num_mismatched_vp_indexes;
 
+	/* How many SynICs use 'auto_eoi' feature */
+	atomic_t synic_auto_eoi_used;
+
 	struct hv_partition_assist_pg *hv_pa_pg;
 	struct kvm_hv_syndbg hv_syndbg;
 };
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 13036cf0b912..8df2dff37a5c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -138,6 +138,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
 		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
 
+	/* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! */
+	best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0);
+	if (best)
+		best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
+
 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
 		best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
 		if (best)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 922c69dcca4d..7c9bc060889a 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -83,6 +83,11 @@ static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
 static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
 				int vector)
 {
+	struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_hv *hv = &kvm->arch.hyperv;
+	int auto_eoi_old, auto_eoi_new;
+
 	if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
 		return;
 
@@ -91,10 +96,25 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
 	else
 		__clear_bit(vector, synic->vec_bitmap);
 
+	auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
 	if (synic_has_vector_auto_eoi(synic, vector))
 		__set_bit(vector, synic->auto_eoi_bitmap);
 	else
 		__clear_bit(vector, synic->auto_eoi_bitmap);
+
+	auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
+	/* Hyper-V SynIC auto EOI SINT's are not compatible with APICV */
+	if (!auto_eoi_old && auto_eoi_new) {
+		if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
+			kvm_request_apicv_update(vcpu->kvm, false,
+						 APICV_INHIBIT_REASON_HYPERV);
+	} else if (!auto_eoi_old && auto_eoi_new) {
+		if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
+			kvm_request_apicv_update(vcpu->kvm, true,
+						 APICV_INHIBIT_REASON_HYPERV);
+	}
 }
 
 static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
@@ -903,12 +923,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
 {
 	struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
 
-	/*
-	 * Hyper-V SynIC auto EOI SINT's are
-	 * not compatible with APICV, so request
-	 * to deactivate APICV permanently.
-	 */
-	kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
 	synic->active = true;
 	synic->dont_zero_synic_pages = dont_zero_synic_pages;
 	synic->control = HV_SYNIC_CONTROL_ENABLE;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-04 15:01           ` Vitaly Kuznetsov
@ 2021-02-04 15:19             ` Vitaly Kuznetsov
  2021-02-05  5:38               ` Kechen Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-04 15:19 UTC (permalink / raw)
  To: Paolo Bonzini, Kechen Lu
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> +
> +	auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
> +
> +	/* Hyper-V SynIC auto EOI SINT's are not compatible with APICV */
> +	if (!auto_eoi_old && auto_eoi_new) {
> +		if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
> +			kvm_request_apicv_update(vcpu->kvm, false,
> +						 APICV_INHIBIT_REASON_HYPERV);
> +	} else if (!auto_eoi_old && auto_eoi_new) {

Sigh, this 'else' should be 

} else if (!auto_eoi_new && auto_eoi_old) {

...

> +		if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
> +			kvm_request_apicv_update(vcpu->kvm, true,
> +						 APICV_INHIBIT_REASON_HYPERV);
> +	}
>  }
>  
>  static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
> @@ -903,12 +923,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
>  {
>  	struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
>  
> -	/*
> -	 * Hyper-V SynIC auto EOI SINT's are
> -	 * not compatible with APICV, so request
> -	 * to deactivate APICV permanently.
> -	 */
> -	kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
>  	synic->active = true;
>  	synic->dont_zero_synic_pages = dont_zero_synic_pages;
>  	synic->control = HV_SYNIC_CONTROL_ENABLE;

-- 
Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-04 15:19             ` Vitaly Kuznetsov
@ 2021-02-05  5:38               ` Kechen Lu
  2021-02-17 20:41                 ` Kechen Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-05  5:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Paolo Bonzini
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

Cool! Thanks for correction, yeah, APICV_INHIBIT_REASON_HYPERV setting is per-VM while synic is per-vCPU. Since the machine with AVIC is not in my hands today, I will test it hopefully by end of this week:)

BR,
Kechen

>-----Original Message-----
>From: Vitaly Kuznetsov <vkuznets@redhat.com>
>Sent: Thursday, February 4, 2021 7:19 AM
>To: Paolo Bonzini <pbonzini@redhat.com>; Kechen Lu <kechenl@nvidia.com>
>Cc: suravee.suthikulpanit@amd.com; Somdutta Roy <somduttar@nvidia.com>;
>kvm@vger.kernel.org; qemu-discuss@nongnu.org
>Subject: Re: Optimized clocksource with AMD AVIC enabled for Windows guest
>
>External email: Use caution opening links or attachments
>
>
>Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>
>> +
>> +     auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
>> +
>> +     /* Hyper-V SynIC auto EOI SINT's are not compatible with APICV */
>> +     if (!auto_eoi_old && auto_eoi_new) {
>> +             if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
>> +                     kvm_request_apicv_update(vcpu->kvm, false,
>> +                                              APICV_INHIBIT_REASON_HYPERV);
>> +     } else if (!auto_eoi_old && auto_eoi_new) {
>
>Sigh, this 'else' should be
>
>} else if (!auto_eoi_new && auto_eoi_old) {
>
>...
>
>> +             if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
>> +                     kvm_request_apicv_update(vcpu->kvm, true,
>> +                                              APICV_INHIBIT_REASON_HYPERV);
>> +     }
>>  }
>>
>>  static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
>> @@ -903,12 +923,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu,
>> bool dont_zero_synic_pages)  {
>>       struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
>>
>> -     /*
>> -      * Hyper-V SynIC auto EOI SINT's are
>> -      * not compatible with APICV, so request
>> -      * to deactivate APICV permanently.
>> -      */
>> -     kvm_request_apicv_update(vcpu->kvm, false,
>APICV_INHIBIT_REASON_HYPERV);
>>       synic->active = true;
>>       synic->dont_zero_synic_pages = dont_zero_synic_pages;
>>       synic->control = HV_SYNIC_CONTROL_ENABLE;
>
>--
>Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-05  5:38               ` Kechen Lu
@ 2021-02-17 20:41                 ` Kechen Lu
  2021-02-25 10:25                   ` Vitaly Kuznetsov
  0 siblings, 1 reply; 11+ messages in thread
From: Kechen Lu @ 2021-02-17 20:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Paolo Bonzini
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss

Hi Vitaly and Paolo,

Sorry for the delay in response, finally got chance to access a machine with AVIC, and was able to test out the patch and reconfirm through some benchmarks and tests again today:) 
 
In summary, this patch works well and resolves the issues on clocksource caused high port I/O vmexits. With AVIC=1 && stimer/synic=1, 
 
1.	CPU intensive workload CPU-z shows SingleThread score 15% improvement 382.1=> 441.7,    
 
2.	disk I/O intensive workload Passmark Disk Test gives 4% improvement 12706=> 13265,              
 
3.	Vmexits pattern of 30s record while running cpu workload Geekbench in guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and NPF vmexits, when we get stimer benefit plus AVIC. Details as below:       
 
AVIC=1 && stimer/synic=0 && vapic=0:
 
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
 
                  io     344654    68.29%     1.10%      0.67us   2132.72us      7.01us ( +-   0.19% )
                 hlt     114046    22.60%    98.85%      0.42us  16666.32us   1903.26us ( +-   0.66% )
avic_incomplete_ipi      19679     3.90%     0.03%      0.38us     22.67us      3.66us ( +-   0.71% )
                 npf       8186     1.62%     0.01%      0.37us    235.76us      1.46us ( +-   4.20% )
            ........                      

 
AVIC=1 && stimer/synic=1 && vapic=0:
 
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
 
                  io      31995    38.61%     0.10%      2.79us     65.83us      6.70us ( +-   0.35% )
                 hlt      22915    27.65%    99.88%      0.42us  15959.14us   9535.38us ( +-   0.50% )
avic_incomplete_ipi       8271     9.98%     0.01%      0.39us     79.03us      3.58us ( +-   1.23% )
                 npf       1232     1.49%     0.00%      0.36us    100.25us      2.58us ( +-   6.98% )
	..........                                                                                                                                           

While testing, I also found out hv-vapic should be disabled as well to make AVIC fully functional, otherwise it shows high vmexits due to MSR writes which seems to be due to  increased access to HV_X64_MSR_EOI and HV_X64_MSR_ICR. This makes sense to me, since AVIC conflicts with PV EOI/ICR accesses. So far I think AVIC=1 && hv-vapic=0 && stimer/synic=1 combination gives us the best performance. However, AVIC=1 && hv-vapic=0 && stimer/synic=1 is really unstable, and sometimes would lead to boot. Wanted to understand if instabilities with APICv/AVIC is a known bug/issue in upstream? Attached the reproducible kernel warning in the bottom.
 
In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and still produce great benefits on vmexits optimization. Thanks all you folks help so much, hope the patch in kernel and bit expose patch in QEMU could get into upstream soon along with fixing the instabilities.
 
Best Regards,
Kechen

---------------------------------------------------------------------------------------
[ 7962.437584] ------------[ cut here ]------------
[ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f
[ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349 avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
[ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds snd_seq snd_pcm snd_seq_device snd_timer snd soundcore k10temp mac_hid sch_fq_codel lm92 parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf hid_generic usbhid hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci xhci_pci_renesas i2c_piix4 atlantic macsec wmi [last unloaded: ccp]
[ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P        W  OE     5.8.0-41-generic #46
[ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
[ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89 e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b e9 04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df
[ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282
[ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99347f118cd8
[ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff99347f118cd0
[ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: 0000000000000831
[ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: 040000000000082f
[ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: 0000000000000002
[ 7962.437640] FS:  0000000000000000(0053) GS:ffff99347f100000(002b) knlGS:fffff80470728000
[ 7962.437640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: 0000000000340ee0
[ 7962.437641] Call Trace:
[ 7962.437646]  handle_exit+0x134/0x420 [kvm_amd]
[ 7962.437661]  ? kvm_set_cr8+0x22/0x40 [kvm]
[ 7962.437674]  vcpu_enter_guest+0x862/0xd90 [kvm]
[ 7962.437687]  vcpu_run+0x76/0x240 [kvm]
[ 7962.437699]  kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm]
[ 7962.437711]  kvm_vcpu_ioctl+0x247/0x600 [kvm]
[ 7962.437714]  ksys_ioctl+0x8e/0xc0
[ 7962.437715]  __x64_sys_ioctl+0x1a/0x20
[ 7962.437717]  do_syscall_64+0x49/0xc0
[ 7962.437719]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7962.437720] RIP: 0033:0x7f4c09b1131b
[ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
[ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4c09b1131b
[ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
[ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: 0000000000000004
[ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: 00007f4bedffb640
[ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]---


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Optimized clocksource with AMD AVIC enabled for Windows guest
  2021-02-17 20:41                 ` Kechen Lu
@ 2021-02-25 10:25                   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-02-25 10:25 UTC (permalink / raw)
  To: Kechen Lu
  Cc: suravee.suthikulpanit, Somdutta Roy, kvm, qemu-discuss, Paolo Bonzini

Kechen Lu <kechenl@nvidia.com> writes:

> Hi Vitaly and Paolo,
>
> Sorry for the delay in response, finally got chance to access a machine with AVIC, and was able to test out the patch and reconfirm through some benchmarks and tests again today:) 
>  
> In summary, this patch works well and resolves the issues on clocksource caused high port I/O vmexits. With AVIC=1 && stimer/synic=1, 
>  
> 1.	CPU intensive workload CPU-z shows SingleThread score 15% improvement 382.1=> 441.7,    
>  
> 2.	disk I/O intensive workload Passmark Disk Test gives 4% improvement 12706=> 13265,              
>  
> 3.	Vmexits pattern of 30s record while running cpu workload Geekbench in guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and NPF vmexits, when we get stimer benefit plus AVIC. Details as below:       
>  
> AVIC=1 && stimer/synic=0 && vapic=0:
>  
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
>  
>                   io     344654    68.29%     1.10%      0.67us   2132.72us      7.01us ( +-   0.19% )
>                  hlt     114046    22.60%    98.85%      0.42us  16666.32us   1903.26us ( +-   0.66% )
> avic_incomplete_ipi      19679     3.90%     0.03%      0.38us     22.67us      3.66us ( +-   0.71% )
>                  npf       8186     1.62%     0.01%      0.37us    235.76us      1.46us ( +-   4.20% )
>             ........                      
>
>  
> AVIC=1 && stimer/synic=1 && vapic=0:
>  
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
>  
>                   io      31995    38.61%     0.10%      2.79us     65.83us      6.70us ( +-   0.35% )
>                  hlt      22915    27.65%    99.88%      0.42us  15959.14us   9535.38us ( +-   0.50% )
> avic_incomplete_ipi       8271     9.98%     0.01%      0.39us     79.03us      3.58us ( +-   1.23% )
>                  npf       1232     1.49%     0.00%      0.36us    100.25us      2.58us ( +-   6.98% )
> 	..........                                                                                                                                           
>
> While testing, I also found out hv-vapic should be disabled as well to
> make AVIC fully functional, otherwise it shows high vmexits due to MSR
> writes which seems to be due to  increased access to HV_X64_MSR_EOI
> and HV_X64_MSR_ICR. This makes sense to me, since AVIC conflicts with
> PV EOI/ICR accesses. So far I think AVIC=1 && hv-vapic=0 &&
> stimer/synic=1 combination gives us the best performance. However,
> AVIC=1 && hv-vapic=0 && stimer/synic=1 is really unstable, and
> sometimes would lead to boot. Wanted to understand if instabilities
> with APICv/AVIC is a known bug/issue in upstream? Attached the
> reproducible kernel warning in the bottom.

Now it's my turn to apologize for the delayed reply :-)

I think it's our fault,

BIT(3) in HYPERV_CPUID_ENLIGHTMENT_INFO is

HV_X64_APIC_ACCESS_RECOMMENDED
which can be deciphered as 

"Recommend using MSRs for accessing APIC registers EOI, ICR and TPR
rather than their memory-mapped counterparts"

And we shouldn't be setting it with AVIC. The following hack is supposed
to help:

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c8f2592ccc99..66ee85a83e9a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -145,6 +145,13 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
                                           vcpu->arch.ia32_misc_enable_msr &
                                           MSR_IA32_MISC_ENABLE_MWAIT);
        }
+
+       /* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! */
+       best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0);
+       if (best) {
+               best->eax &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+               best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
+       }
 }
 EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);

(we'll need to find a proper way to set these settings in QEMU).
 
Could you give it a spin? ("AVIC=1 && hv-vapic=1 && stimer/synic=1" configuration)

>  
> In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and still produce great benefits on vmexits optimization. Thanks all you folks help so much, hope the patch in kernel and bit expose patch in QEMU could get into upstream soon along with fixing the instabilities.
>  
> Best Regards,
> Kechen
>
> ---------------------------------------------------------------------------------------
> [ 7962.437584] ------------[ cut here ]------------
> [ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f
> [ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349 avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
> [ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds snd_seq snd_pcm snd_seq_device snd_timer snd soundcore k10temp mac_hid sch_fq_codel lm92 parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf hid_generic usbhid hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci xhci_pci_renesas i2c_piix4 atlantic macsec wmi [last unloaded: ccp]
> [ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P        W  OE     5.8.0-41-generic #46
> [ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]

No, this is not somthing I'm aware of. Do you know if it reproduces on
the latest upstream?

> [ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89 e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b e9 04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df
> [ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282
> [ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99347f118cd8
> [ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff99347f118cd0
> [ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: 0000000000000831
> [ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: 040000000000082f
> [ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: 0000000000000002
> [ 7962.437640] FS:  0000000000000000(0053) GS:ffff99347f100000(002b) knlGS:fffff80470728000
> [ 7962.437640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: 0000000000340ee0
> [ 7962.437641] Call Trace:
> [ 7962.437646]  handle_exit+0x134/0x420 [kvm_amd]
> [ 7962.437661]  ? kvm_set_cr8+0x22/0x40 [kvm]
> [ 7962.437674]  vcpu_enter_guest+0x862/0xd90 [kvm]
> [ 7962.437687]  vcpu_run+0x76/0x240 [kvm]
> [ 7962.437699]  kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm]
> [ 7962.437711]  kvm_vcpu_ioctl+0x247/0x600 [kvm]
> [ 7962.437714]  ksys_ioctl+0x8e/0xc0
> [ 7962.437715]  __x64_sys_ioctl+0x1a/0x20
> [ 7962.437717]  do_syscall_64+0x49/0xc0
> [ 7962.437719]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 7962.437720] RIP: 0033:0x7f4c09b1131b
> [ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
> [ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4c09b1131b
> [ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
> [ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: 0000000000000004
> [ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: 00007f4bedffb640
> [ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]---
>

-- 
Vitaly


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-25 10:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03  6:40 Optimized clocksource with AMD AVIC enabled for Windows guest Kechen Lu
2021-02-03  7:58 ` Paolo Bonzini
2021-02-03  9:15   ` Vitaly Kuznetsov
2021-02-04  2:05     ` Kechen Lu
2021-02-04 12:24       ` Vitaly Kuznetsov
2021-02-04 13:35         ` Paolo Bonzini
2021-02-04 15:01           ` Vitaly Kuznetsov
2021-02-04 15:19             ` Vitaly Kuznetsov
2021-02-05  5:38               ` Kechen Lu
2021-02-17 20:41                 ` Kechen Lu
2021-02-25 10:25                   ` Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.