All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] kvm:vmx: more complete state update on APICv on/off
@ 2016-05-18 14:48 Roman Kagan
  2016-05-18 21:49 ` Steve Rutherford
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Roman Kagan @ 2016-05-18 14:48 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Denis V. Lunev, Roman Kagan, Steve Rutherford, Yang Zhang

The function to update APICv on/off state (in particular, to deactivate
it when enabling Hyper-V SynIC), used to be incomplete: it didn't adjust
APICv-related fields among secondary processor-based VM-execution
controls.

As a result, Windows 2012 guests would get stuck when SynIC-based
auto-EOI interrupt intersected with e.g. an IPI in the guest.

In addition, the MSR intercept bitmap wasn't updated to correspond to
whether "virtualize x2APIC mode" was enabled.  This path used not to be
triggered, since Windows didn't use x2APIC but rather their own
synthetic APIC access MSRs; however it represented a security risk
because the guest running in a SynIC-enabled VM could switch to x2APIC
and thus obtain direct access to host APIC MSRs (thanks to Yang Zhang
<yang.zhang.wz@gmail.com> for spotting this).

The patch fixes those omissions.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>
---
v2 -> v3:
 - only switch to x2apic msr bitmap if virtualize x2apic mode is on in vmcs

v1 -> v2:
 - only update relevant bits in the secondary exec control
 - update msr intercept bitmap (also make x2apic msr bitmap always
   correspond to APICv)

 arch/x86/kvm/vmx.c | 48 ++++++++++++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ee1c8a9..cef741a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2418,7 +2418,9 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
 
 	if (is_guest_mode(vcpu))
 		msr_bitmap = vmx_msr_bitmap_nested;
-	else if (vcpu->arch.apic_base & X2APIC_ENABLE) {
+	else if (cpu_has_secondary_exec_ctrls() &&
+		 (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
+		  SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
 		if (is_long_mode(vcpu))
 			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
 		else
@@ -4783,6 +4785,19 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
+	if (cpu_has_secondary_exec_ctrls()) {
+		if (kvm_vcpu_apicv_active(vcpu))
+			vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
+				      SECONDARY_EXEC_APIC_REGISTER_VIRT |
+				      SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+		else
+			vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
+					SECONDARY_EXEC_APIC_REGISTER_VIRT |
+					SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+	}
+
+	if (cpu_has_vmx_msr_bitmap())
+		vmx_set_msr_bitmap(vcpu);
 }
 
 static u32 vmx_exec_control(struct vcpu_vmx *vmx)
@@ -6329,23 +6344,20 @@ static __init int hardware_setup(void)
 
 	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
 
-	if (enable_apicv) {
-		for (msr = 0x800; msr <= 0x8ff; msr++)
-			vmx_disable_intercept_msr_read_x2apic(msr);
-
-		/* According SDM, in x2apic mode, the whole id reg is used.
-		 * But in KVM, it only use the highest eight bits. Need to
-		 * intercept it */
-		vmx_enable_intercept_msr_read_x2apic(0x802);
-		/* TMCCT */
-		vmx_enable_intercept_msr_read_x2apic(0x839);
-		/* TPR */
-		vmx_disable_intercept_msr_write_x2apic(0x808);
-		/* EOI */
-		vmx_disable_intercept_msr_write_x2apic(0x80b);
-		/* SELF-IPI */
-		vmx_disable_intercept_msr_write_x2apic(0x83f);
-	}
+	for (msr = 0x800; msr <= 0x8ff; msr++)
+		vmx_disable_intercept_msr_read_x2apic(msr);
+
+	/* According SDM, in x2apic mode, the whole id reg is used.  But in
+	 * KVM, it only use the highest eight bits. Need to intercept it */
+	vmx_enable_intercept_msr_read_x2apic(0x802);
+	/* TMCCT */
+	vmx_enable_intercept_msr_read_x2apic(0x839);
+	/* TPR */
+	vmx_disable_intercept_msr_write_x2apic(0x808);
+	/* EOI */
+	vmx_disable_intercept_msr_write_x2apic(0x80b);
+	/* SELF-IPI */
+	vmx_disable_intercept_msr_write_x2apic(0x83f);
 
 	if (enable_ept) {
 		kvm_mmu_set_mask_ptes(0ull,
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-18 14:48 [PATCH v3] kvm:vmx: more complete state update on APICv on/off Roman Kagan
@ 2016-05-18 21:49 ` Steve Rutherford
  2016-05-19  9:34   ` Roman Kagan
  2016-05-19  1:38 ` Yang Zhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Steve Rutherford @ 2016-05-18 21:49 UTC (permalink / raw)
  To: Roman Kagan; +Cc: KVM list, Paolo Bonzini, Denis V. Lunev, Yang Zhang

This patch looks good.

I don't believe you need to explicitly check virtualize x2apic mode,
given that `vcpu->arch.apic_base & X2APIC_ENABLE &&
kvm_vcpu_apicv_active(vcpu)` implies that virtualize x2apic mode is
enabled (because KVM currently never re-enables apicv after disabling
it with the SyncIC), but being explicit is probably easier to
maintain.

On Wed, May 18, 2016 at 7:48 AM, Roman Kagan <rkagan@virtuozzo.com> wrote:
> The function to update APICv on/off state (in particular, to deactivate
> it when enabling Hyper-V SynIC), used to be incomplete: it didn't adjust
> APICv-related fields among secondary processor-based VM-execution
> controls.
>
> As a result, Windows 2012 guests would get stuck when SynIC-based
> auto-EOI interrupt intersected with e.g. an IPI in the guest.
>
> In addition, the MSR intercept bitmap wasn't updated to correspond to
> whether "virtualize x2APIC mode" was enabled.  This path used not to be
> triggered, since Windows didn't use x2APIC but rather their own
> synthetic APIC access MSRs; however it represented a security risk
> because the guest running in a SynIC-enabled VM could switch to x2APIC
> and thus obtain direct access to host APIC MSRs (thanks to Yang Zhang
> <yang.zhang.wz@gmail.com> for spotting this).
>
> The patch fixes those omissions.
>
> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Steve Rutherford <srutherford@google.com>
> Cc: Yang Zhang <yang.zhang.wz@gmail.com>
> ---
> v2 -> v3:
>  - only switch to x2apic msr bitmap if virtualize x2apic mode is on in vmcs
>
> v1 -> v2:
>  - only update relevant bits in the secondary exec control
>  - update msr intercept bitmap (also make x2apic msr bitmap always
>    correspond to APICv)
>
>  arch/x86/kvm/vmx.c | 48 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 30 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ee1c8a9..cef741a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2418,7 +2418,9 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>
>         if (is_guest_mode(vcpu))
>                 msr_bitmap = vmx_msr_bitmap_nested;
> -       else if (vcpu->arch.apic_base & X2APIC_ENABLE) {
> +       else if (cpu_has_secondary_exec_ctrls() &&
> +                (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
> +                 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
>                 if (is_long_mode(vcpu))
>                         msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
>                 else
> @@ -4783,6 +4785,19 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
>
>         vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
> +       if (cpu_has_secondary_exec_ctrls()) {
> +               if (kvm_vcpu_apicv_active(vcpu))
> +                       vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
> +                                     SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +                                     SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +               else
> +                       vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
> +                                       SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +                                       SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +       }
> +
> +       if (cpu_has_vmx_msr_bitmap())
> +               vmx_set_msr_bitmap(vcpu);
>  }
>
>  static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> @@ -6329,23 +6344,20 @@ static __init int hardware_setup(void)
>
>         set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
>
> -       if (enable_apicv) {
> -               for (msr = 0x800; msr <= 0x8ff; msr++)
> -                       vmx_disable_intercept_msr_read_x2apic(msr);
> -
> -               /* According SDM, in x2apic mode, the whole id reg is used.
> -                * But in KVM, it only use the highest eight bits. Need to
> -                * intercept it */
> -               vmx_enable_intercept_msr_read_x2apic(0x802);
> -               /* TMCCT */
> -               vmx_enable_intercept_msr_read_x2apic(0x839);
> -               /* TPR */
> -               vmx_disable_intercept_msr_write_x2apic(0x808);
> -               /* EOI */
> -               vmx_disable_intercept_msr_write_x2apic(0x80b);
> -               /* SELF-IPI */
> -               vmx_disable_intercept_msr_write_x2apic(0x83f);
> -       }
> +       for (msr = 0x800; msr <= 0x8ff; msr++)
> +               vmx_disable_intercept_msr_read_x2apic(msr);
> +
> +       /* According SDM, in x2apic mode, the whole id reg is used.  But in
> +        * KVM, it only use the highest eight bits. Need to intercept it */
> +       vmx_enable_intercept_msr_read_x2apic(0x802);
> +       /* TMCCT */
> +       vmx_enable_intercept_msr_read_x2apic(0x839);
> +       /* TPR */
> +       vmx_disable_intercept_msr_write_x2apic(0x808);
> +       /* EOI */
> +       vmx_disable_intercept_msr_write_x2apic(0x80b);
> +       /* SELF-IPI */
> +       vmx_disable_intercept_msr_write_x2apic(0x83f);
>
>         if (enable_ept) {
>                 kvm_mmu_set_mask_ptes(0ull,
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-18 14:48 [PATCH v3] kvm:vmx: more complete state update on APICv on/off Roman Kagan
  2016-05-18 21:49 ` Steve Rutherford
@ 2016-05-19  1:38 ` Yang Zhang
  2016-05-19  9:29   ` Roman Kagan
  2016-05-19  2:01 ` Yang Zhang
  2016-05-23 14:31 ` Paolo Bonzini
  3 siblings, 1 reply; 17+ messages in thread
From: Yang Zhang @ 2016-05-19  1:38 UTC (permalink / raw)
  To: Roman Kagan, kvm; +Cc: Paolo Bonzini, Denis V. Lunev, Steve Rutherford

On 2016/5/18 22:48, Roman Kagan wrote:
> The function to update APICv on/off state (in particular, to deactivate
> it when enabling Hyper-V SynIC), used to be incomplete: it didn't adjust
> APICv-related fields among secondary processor-based VM-execution
> controls.
>
> As a result, Windows 2012 guests would get stuck when SynIC-based
> auto-EOI interrupt intersected with e.g. an IPI in the guest.
>
> In addition, the MSR intercept bitmap wasn't updated to correspond to
> whether "virtualize x2APIC mode" was enabled.  This path used not to be
> triggered, since Windows didn't use x2APIC but rather their own
> synthetic APIC access MSRs; however it represented a security risk
> because the guest running in a SynIC-enabled VM could switch to x2APIC
> and thus obtain direct access to host APIC MSRs (thanks to Yang Zhang
> <yang.zhang.wz@gmail.com> for spotting this).
>
> The patch fixes those omissions.
>
> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Steve Rutherford <srutherford@google.com>
> Cc: Yang Zhang <yang.zhang.wz@gmail.com>
> ---
> v2 -> v3:
>   - only switch to x2apic msr bitmap if virtualize x2apic mode is on in vmcs
>
> v1 -> v2:
>   - only update relevant bits in the secondary exec control
>   - update msr intercept bitmap (also make x2apic msr bitmap always
>     correspond to APICv)
>
>   arch/x86/kvm/vmx.c | 48 ++++++++++++++++++++++++++++++------------------
>   1 file changed, 30 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ee1c8a9..cef741a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2418,7 +2418,9 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>
>   	if (is_guest_mode(vcpu))
>   		msr_bitmap = vmx_msr_bitmap_nested;
> -	else if (vcpu->arch.apic_base & X2APIC_ENABLE) {
> +	else if (cpu_has_secondary_exec_ctrls() &&
> +		 (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
> +		  SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
>   		if (is_long_mode(vcpu))
>   			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
>   		else
> @@ -4783,6 +4785,19 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>
>   	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
> +	if (cpu_has_secondary_exec_ctrls()) {
> +		if (kvm_vcpu_apicv_active(vcpu))
> +			vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
> +				      SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				      SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +		else
> +			vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
> +					SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +					SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +	}
> +
> +	if (cpu_has_vmx_msr_bitmap())
> +		vmx_set_msr_bitmap(vcpu);
>   }
>
>   static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> @@ -6329,23 +6344,20 @@ static __init int hardware_setup(void)
>
>   	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
>
> -	if (enable_apicv) {
> -		for (msr = 0x800; msr <= 0x8ff; msr++)
> -			vmx_disable_intercept_msr_read_x2apic(msr);
> -
> -		/* According SDM, in x2apic mode, the whole id reg is used.
> -		 * But in KVM, it only use the highest eight bits. Need to
> -		 * intercept it */
> -		vmx_enable_intercept_msr_read_x2apic(0x802);
> -		/* TMCCT */
> -		vmx_enable_intercept_msr_read_x2apic(0x839);
> -		/* TPR */
> -		vmx_disable_intercept_msr_write_x2apic(0x808);
> -		/* EOI */
> -		vmx_disable_intercept_msr_write_x2apic(0x80b);
> -		/* SELF-IPI */
> -		vmx_disable_intercept_msr_write_x2apic(0x83f);
> -	}
> +	for (msr = 0x800; msr <= 0x8ff; msr++)
> +		vmx_disable_intercept_msr_read_x2apic(msr);
> +
> +	/* According SDM, in x2apic mode, the whole id reg is used.  But in
> +	 * KVM, it only use the highest eight bits. Need to intercept it */
> +	vmx_enable_intercept_msr_read_x2apic(0x802);
> +	/* TMCCT */
> +	vmx_enable_intercept_msr_read_x2apic(0x839);
> +	/* TPR */
> +	vmx_disable_intercept_msr_write_x2apic(0x808);
> +	/* EOI */
> +	vmx_disable_intercept_msr_write_x2apic(0x80b);
> +	/* SELF-IPI */
> +	vmx_disable_intercept_msr_write_x2apic(0x83f);

In current KVM, it will enable virtualize x2apic mode only when 
enable_apicv is enabled. So this change seems unnecessary.

Except this minor comment, the other part is :

Reviewed-by: Yang Zhang <yang.zhang.wz@gmail.com>

-- 
best regards
yang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-18 14:48 [PATCH v3] kvm:vmx: more complete state update on APICv on/off Roman Kagan
  2016-05-18 21:49 ` Steve Rutherford
  2016-05-19  1:38 ` Yang Zhang
@ 2016-05-19  2:01 ` Yang Zhang
  2016-05-19  5:40   ` Denis V. Lunev
  2016-05-23 14:31 ` Paolo Bonzini
  3 siblings, 1 reply; 17+ messages in thread
From: Yang Zhang @ 2016-05-19  2:01 UTC (permalink / raw)
  To: Roman Kagan, kvm; +Cc: Paolo Bonzini, Denis V. Lunev, Steve Rutherford

On 2016/5/18 22:48, Roman Kagan wrote:
> The function to update APICv on/off state (in particular, to deactivate
> it when enabling Hyper-V SynIC), used to be incomplete: it didn't adjust
> APICv-related fields among secondary processor-based VM-execution
> controls.

Hi Roman,

I have question about the performance between APICv and Hyper-V SynIC. 
As we known APICv is a hardware feature which including three features: 
APIC register virtualization, virtual interrupt delivery and Posted 
Interrupt. My gut feeling is that the average performance that improved 
by APICv should greater than Hyper-v SynIC. Am i right? If yes, current 
policy that disable the whole APICv seems too aggressive.

btw, do you have any performance data, not micro-level? Thanks.

>
> As a result, Windows 2012 guests would get stuck when SynIC-based
> auto-EOI interrupt intersected with e.g. an IPI in the guest.
>
> In addition, the MSR intercept bitmap wasn't updated to correspond to
> whether "virtualize x2APIC mode" was enabled.  This path used not to be
> triggered, since Windows didn't use x2APIC but rather their own
> synthetic APIC access MSRs; however it represented a security risk
> because the guest running in a SynIC-enabled VM could switch to x2APIC
> and thus obtain direct access to host APIC MSRs (thanks to Yang Zhang
> <yang.zhang.wz@gmail.com> for spotting this).
>
> The patch fixes those omissions.
>
> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Steve Rutherford <srutherford@google.com>
> Cc: Yang Zhang <yang.zhang.wz@gmail.com>
> ---
> v2 -> v3:
>   - only switch to x2apic msr bitmap if virtualize x2apic mode is on in vmcs
>
> v1 -> v2:
>   - only update relevant bits in the secondary exec control
>   - update msr intercept bitmap (also make x2apic msr bitmap always
>     correspond to APICv)
>
>   arch/x86/kvm/vmx.c | 48 ++++++++++++++++++++++++++++++------------------
>   1 file changed, 30 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ee1c8a9..cef741a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2418,7 +2418,9 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>
>   	if (is_guest_mode(vcpu))
>   		msr_bitmap = vmx_msr_bitmap_nested;
> -	else if (vcpu->arch.apic_base & X2APIC_ENABLE) {
> +	else if (cpu_has_secondary_exec_ctrls() &&
> +		 (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
> +		  SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
>   		if (is_long_mode(vcpu))
>   			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
>   		else
> @@ -4783,6 +4785,19 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>
>   	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
> +	if (cpu_has_secondary_exec_ctrls()) {
> +		if (kvm_vcpu_apicv_active(vcpu))
> +			vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
> +				      SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				      SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +		else
> +			vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
> +					SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +					SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +	}
> +
> +	if (cpu_has_vmx_msr_bitmap())
> +		vmx_set_msr_bitmap(vcpu);
>   }
>
>   static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> @@ -6329,23 +6344,20 @@ static __init int hardware_setup(void)
>
>   	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
>
> -	if (enable_apicv) {
> -		for (msr = 0x800; msr <= 0x8ff; msr++)
> -			vmx_disable_intercept_msr_read_x2apic(msr);
> -
> -		/* According SDM, in x2apic mode, the whole id reg is used.
> -		 * But in KVM, it only use the highest eight bits. Need to
> -		 * intercept it */
> -		vmx_enable_intercept_msr_read_x2apic(0x802);
> -		/* TMCCT */
> -		vmx_enable_intercept_msr_read_x2apic(0x839);
> -		/* TPR */
> -		vmx_disable_intercept_msr_write_x2apic(0x808);
> -		/* EOI */
> -		vmx_disable_intercept_msr_write_x2apic(0x80b);
> -		/* SELF-IPI */
> -		vmx_disable_intercept_msr_write_x2apic(0x83f);
> -	}
> +	for (msr = 0x800; msr <= 0x8ff; msr++)
> +		vmx_disable_intercept_msr_read_x2apic(msr);
> +
> +	/* According SDM, in x2apic mode, the whole id reg is used.  But in
> +	 * KVM, it only use the highest eight bits. Need to intercept it */
> +	vmx_enable_intercept_msr_read_x2apic(0x802);
> +	/* TMCCT */
> +	vmx_enable_intercept_msr_read_x2apic(0x839);
> +	/* TPR */
> +	vmx_disable_intercept_msr_write_x2apic(0x808);
> +	/* EOI */
> +	vmx_disable_intercept_msr_write_x2apic(0x80b);
> +	/* SELF-IPI */
> +	vmx_disable_intercept_msr_write_x2apic(0x83f);
>
>   	if (enable_ept) {
>   		kvm_mmu_set_mask_ptes(0ull,
>


-- 
best regards
yang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-19  2:01 ` Yang Zhang
@ 2016-05-19  5:40   ` Denis V. Lunev
  2016-05-20  1:15     ` Yang Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Denis V. Lunev @ 2016-05-19  5:40 UTC (permalink / raw)
  To: Yang Zhang, Roman Kagan, kvm; +Cc: Paolo Bonzini, Steve Rutherford

On 05/19/2016 05:01 AM, Yang Zhang wrote:
> On 2016/5/18 22:48, Roman Kagan wrote:
>> The function to update APICv on/off state (in particular, to deactivate
>> it when enabling Hyper-V SynIC), used to be incomplete: it didn't adjust
>> APICv-related fields among secondary processor-based VM-execution
>> controls.
>
> Hi Roman,
>
> I have question about the performance between APICv and Hyper-V SynIC. 
> As we known APICv is a hardware feature which including three 
> features: APIC register virtualization, virtual interrupt delivery and 
> Posted Interrupt. My gut feeling is that the average performance that 
> improved by APICv should greater than Hyper-v SynIC. Am i right? If 
> yes, current policy that disable the whole APICv seems too aggressive.
>
Argh.. We have faced this situation in Parallels Desktop may be
3 years ago. Unfortunately, there is no data at the moment.
It was toooo old and made by other team. As far as I remember
(for that time), interrupt delivery becomes faster, but operations
with on of CR registers becomes much slower and general
performance score becomes lower.

The problem with SynIC is that it is mandatory prerequisite
to enable HyperV bus in the guest, which is our final goal.
Thus there is no other way for us.

> btw, do you have any performance data, not micro-level? Thanks.
>
not collected at the moment, especially with KVM.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-19  1:38 ` Yang Zhang
@ 2016-05-19  9:29   ` Roman Kagan
  0 siblings, 0 replies; 17+ messages in thread
From: Roman Kagan @ 2016-05-19  9:29 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, Paolo Bonzini, Denis V. Lunev, Steve Rutherford

On Thu, May 19, 2016 at 09:38:45AM +0800, Yang Zhang wrote:
> On 2016/5/18 22:48, Roman Kagan wrote:
> > @@ -6329,23 +6344,20 @@ static __init int hardware_setup(void)
> > 
> >   	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> > 
> > -	if (enable_apicv) {
> > -		for (msr = 0x800; msr <= 0x8ff; msr++)
> > -			vmx_disable_intercept_msr_read_x2apic(msr);
> > -
> > -		/* According SDM, in x2apic mode, the whole id reg is used.
> > -		 * But in KVM, it only use the highest eight bits. Need to
> > -		 * intercept it */
> > -		vmx_enable_intercept_msr_read_x2apic(0x802);
> > -		/* TMCCT */
> > -		vmx_enable_intercept_msr_read_x2apic(0x839);
> > -		/* TPR */
> > -		vmx_disable_intercept_msr_write_x2apic(0x808);
> > -		/* EOI */
> > -		vmx_disable_intercept_msr_write_x2apic(0x80b);
> > -		/* SELF-IPI */
> > -		vmx_disable_intercept_msr_write_x2apic(0x83f);
> > -	}
> > +	for (msr = 0x800; msr <= 0x8ff; msr++)
> > +		vmx_disable_intercept_msr_read_x2apic(msr);
> > +
> > +	/* According SDM, in x2apic mode, the whole id reg is used.  But in
> > +	 * KVM, it only use the highest eight bits. Need to intercept it */
> > +	vmx_enable_intercept_msr_read_x2apic(0x802);
> > +	/* TMCCT */
> > +	vmx_enable_intercept_msr_read_x2apic(0x839);
> > +	/* TPR */
> > +	vmx_disable_intercept_msr_write_x2apic(0x808);
> > +	/* EOI */
> > +	vmx_disable_intercept_msr_write_x2apic(0x80b);
> > +	/* SELF-IPI */
> > +	vmx_disable_intercept_msr_write_x2apic(0x83f);
> 
> In current KVM, it will enable virtualize x2apic mode only when enable_apicv
> is enabled. So this change seems unnecessary.

Strictly speaking, it isn't, but I thought it was counter-intuitive the
old way.

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-18 21:49 ` Steve Rutherford
@ 2016-05-19  9:34   ` Roman Kagan
  0 siblings, 0 replies; 17+ messages in thread
From: Roman Kagan @ 2016-05-19  9:34 UTC (permalink / raw)
  To: Steve Rutherford; +Cc: KVM list, Paolo Bonzini, Denis V. Lunev, Yang Zhang

On Wed, May 18, 2016 at 02:49:48PM -0700, Steve Rutherford wrote:
> This patch looks good.
> 
> I don't believe you need to explicitly check virtualize x2apic mode,
> given that `vcpu->arch.apic_base & X2APIC_ENABLE &&
> kvm_vcpu_apicv_active(vcpu)` implies that virtualize x2apic mode is
> enabled (because KVM currently never re-enables apicv after disabling
> it with the SyncIC), but being explicit is probably easier to
> maintain.

This was exactly my intent.

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-19  5:40   ` Denis V. Lunev
@ 2016-05-20  1:15     ` Yang Zhang
  2016-05-20  6:32       ` Roman Kagan
  2016-05-20  6:38       ` Denis V. Lunev
  0 siblings, 2 replies; 17+ messages in thread
From: Yang Zhang @ 2016-05-20  1:15 UTC (permalink / raw)
  To: Denis V. Lunev, Roman Kagan, kvm; +Cc: Paolo Bonzini, Steve Rutherford

On 2016/5/19 13:40, Denis V. Lunev wrote:
> On 05/19/2016 05:01 AM, Yang Zhang wrote:
>> On 2016/5/18 22:48, Roman Kagan wrote:
>>> The function to update APICv on/off state (in particular, to deactivate
>>> it when enabling Hyper-V SynIC), used to be incomplete: it didn't adjust
>>> APICv-related fields among secondary processor-based VM-execution
>>> controls.
>>
>> Hi Roman,
>>
>> I have question about the performance between APICv and Hyper-V SynIC.
>> As we known APICv is a hardware feature which including three
>> features: APIC register virtualization, virtual interrupt delivery and
>> Posted Interrupt. My gut feeling is that the average performance that
>> improved by APICv should greater than Hyper-v SynIC. Am i right? If
>> yes, current policy that disable the whole APICv seems too aggressive.
>>
> Argh.. We have faced this situation in Parallels Desktop may be
> 3 years ago. Unfortunately, there is no data at the moment.
> It was toooo old and made by other team. As far as I remember
> (for that time), interrupt delivery becomes faster, but operations
> with on of CR registers becomes much slower and general
> performance score becomes lower.

I guess the data may be collected on KVM w/o APICv supporting. Normally, 
APICv provides the faster way to delivery interrupt than software solution.

>
> The problem with SynIC is that it is mandatory prerequisite
> to enable HyperV bus in the guest, which is our final goal.
> Thus there is no other way for us.

I see. Actually, we saw the performance improvement with SynIC timer but 
we don't want to turn off APICv since we think it may hurt the 
performance. Is it possible to turn on SynIC timer without disable APICv?

>
>> btw, do you have any performance data, not micro-level? Thanks.
>>
> not collected at the moment, especially with KVM.


-- 
best regards
yang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-20  1:15     ` Yang Zhang
@ 2016-05-20  6:32       ` Roman Kagan
  2016-05-20  6:38       ` Denis V. Lunev
  1 sibling, 0 replies; 17+ messages in thread
From: Roman Kagan @ 2016-05-20  6:32 UTC (permalink / raw)
  To: Yang Zhang; +Cc: Denis V. Lunev, kvm, Paolo Bonzini, Steve Rutherford

On Fri, May 20, 2016 at 09:15:31AM +0800, Yang Zhang wrote:
> I see. Actually, we saw the performance improvement with SynIC timer but we
> don't want to turn off APICv since we think it may hurt the performance. Is
> it possible to turn on SynIC timer without disable APICv?

We failed to come up with a sensible solution to make SynIC auto-EOI
interrupts coexist with APICv.

Roman.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-20  1:15     ` Yang Zhang
  2016-05-20  6:32       ` Roman Kagan
@ 2016-05-20  6:38       ` Denis V. Lunev
  2016-05-23  1:34         ` Yang Zhang
  1 sibling, 1 reply; 17+ messages in thread
From: Denis V. Lunev @ 2016-05-20  6:38 UTC (permalink / raw)
  To: Yang Zhang, Roman Kagan, kvm; +Cc: Paolo Bonzini, Steve Rutherford

On 05/20/2016 04:15 AM, Yang Zhang wrote:
> On 2016/5/19 13:40, Denis V. Lunev wrote:
>> On 05/19/2016 05:01 AM, Yang Zhang wrote:
>>> On 2016/5/18 22:48, Roman Kagan wrote:
>>>> The function to update APICv on/off state (in particular, to 
>>>> deactivate
>>>> it when enabling Hyper-V SynIC), used to be incomplete: it didn't 
>>>> adjust
>>>> APICv-related fields among secondary processor-based VM-execution
>>>> controls.
>>>
>>> Hi Roman,
>>>
>>> I have question about the performance between APICv and Hyper-V SynIC.
>>> As we known APICv is a hardware feature which including three
>>> features: APIC register virtualization, virtual interrupt delivery and
>>> Posted Interrupt. My gut feeling is that the average performance that
>>> improved by APICv should greater than Hyper-v SynIC. Am i right? If
>>> yes, current policy that disable the whole APICv seems too aggressive.
>>>
>> Argh.. We have faced this situation in Parallels Desktop may be
>> 3 years ago. Unfortunately, there is no data at the moment.
>> It was toooo old and made by other team. As far as I remember
>> (for that time), interrupt delivery becomes faster, but operations
>> with on of CR registers becomes much slower and general
>> performance score becomes lower.
>
> I guess the data may be collected on KVM w/o APICv supporting. 
> Normally, APICv provides the faster way to delivery interrupt than 
> software solution.
>
this seems useless.

Once again, interrupt delivery with APICv will be faster,
this is out of question. Though integral tests can show
performance degradation. I know, this sounds silly
and this is test-dependent.

We are going to make this testing after implementing
of HyperV TSC page avoid extra VM exit on context
switch. This seems more beneficial at the moment.

For this reason APICv is disabled in Parallels Desktop
and Parallels Server v6, which are not KVM based.

>>
>> The problem with SynIC is that it is mandatory prerequisite
>> to enable HyperV bus in the guest, which is our final goal.
>> Thus there is no other way for us.
>
> I see. Actually, we saw the performance improvement with SynIC timer 
> but we don't want to turn off APICv since we think it may hurt the 
> performance. Is it possible to turn on SynIC timer without disable APICv?
>
unfortunately no. The problem is AutoEOI feature. At
least we have no idea how to do that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-20  6:38       ` Denis V. Lunev
@ 2016-05-23  1:34         ` Yang Zhang
  2016-05-23 14:18           ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Yang Zhang @ 2016-05-23  1:34 UTC (permalink / raw)
  To: Denis V. Lunev, Roman Kagan, kvm; +Cc: Paolo Bonzini, Steve Rutherford

On 2016/5/20 14:38, Denis V. Lunev wrote:
> On 05/20/2016 04:15 AM, Yang Zhang wrote:
>> On 2016/5/19 13:40, Denis V. Lunev wrote:
>>> On 05/19/2016 05:01 AM, Yang Zhang wrote:
>>>> On 2016/5/18 22:48, Roman Kagan wrote:
>>>>> The function to update APICv on/off state (in particular, to
>>>>> deactivate
>>>>> it when enabling Hyper-V SynIC), used to be incomplete: it didn't
>>>>> adjust
>>>>> APICv-related fields among secondary processor-based VM-execution
>>>>> controls.
>>>>
>>>> Hi Roman,
>>>>
>>>> I have question about the performance between APICv and Hyper-V SynIC.
>>>> As we known APICv is a hardware feature which including three
>>>> features: APIC register virtualization, virtual interrupt delivery and
>>>> Posted Interrupt. My gut feeling is that the average performance that
>>>> improved by APICv should greater than Hyper-v SynIC. Am i right? If
>>>> yes, current policy that disable the whole APICv seems too aggressive.
>>>>
>>> Argh.. We have faced this situation in Parallels Desktop may be
>>> 3 years ago. Unfortunately, there is no data at the moment.
>>> It was toooo old and made by other team. As far as I remember
>>> (for that time), interrupt delivery becomes faster, but operations
>>> with on of CR registers becomes much slower and general
>>> performance score becomes lower.
>>
>> I guess the data may be collected on KVM w/o APICv supporting.
>> Normally, APICv provides the faster way to delivery interrupt than
>> software solution.
>>
> this seems useless.
>
> Once again, interrupt delivery with APICv will be faster,
> this is out of question. Though integral tests can show
> performance degradation. I know, this sounds silly
> and this is test-dependent.
>
> We are going to make this testing after implementing
> of HyperV TSC page avoid extra VM exit on context
> switch. This seems more beneficial at the moment.
>
> For this reason APICv is disabled in Parallels Desktop
> and Parallels Server v6, which are not KVM based.
>
>>>
>>> The problem with SynIC is that it is mandatory prerequisite
>>> to enable HyperV bus in the guest, which is our final goal.
>>> Thus there is no other way for us.
>>
>> I see. Actually, we saw the performance improvement with SynIC timer
>> but we don't want to turn off APICv since we think it may hurt the
>> performance. Is it possible to turn on SynIC timer without disable APICv?
>>
> unfortunately no. The problem is AutoEOI feature. At
> least we have no idea how to do that.

Ok. Thanks for your explanation.

-- 
best regards
yang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-23  1:34         ` Yang Zhang
@ 2016-05-23 14:18           ` Paolo Bonzini
  2016-05-24  1:23             ` Yang Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2016-05-23 14:18 UTC (permalink / raw)
  To: Yang Zhang, Denis V. Lunev, Roman Kagan, kvm; +Cc: Steve Rutherford



On 23/05/2016 03:34, Yang Zhang wrote:
> On 2016/5/20 14:38, Denis V. Lunev wrote:
>> On 05/20/2016 04:15 AM, Yang Zhang wrote:
>>> On 2016/5/19 13:40, Denis V. Lunev wrote:
>>>> On 05/19/2016 05:01 AM, Yang Zhang wrote:
>>>>> On 2016/5/18 22:48, Roman Kagan wrote:
>>>>>> The function to update APICv on/off state (in particular, to
>>>>>> deactivate
>>>>>> it when enabling Hyper-V SynIC), used to be incomplete: it didn't
>>>>>> adjust
>>>>>> APICv-related fields among secondary processor-based VM-execution
>>>>>> controls.
>>>>> Roman,
>>>>>
>>>>> I have question about the performance between APICv and Hyper-V SynIC.
>>>>> As we known APICv is a hardware feature which including three
>>>>> features: APIC register virtualization, virtual interrupt delivery and
>>>>> Posted Interrupt. My gut feeling is that the average performance that
>>>>> improved by APICv should greater than Hyper-v SynIC. Am i right? If
>>>>> yes, current policy that disable the whole APICv seems too aggressive.
>>>>>
>>>> Argh.. We have faced this situation in Parallels Desktop may be
>>>> 3 years ago. Unfortunately, there is no data at the moment.
>>>> It was toooo old and made by other team. As far as I remember
>>>> (for that time), interrupt delivery becomes faster, but operations
>>>> with on of CR registers becomes much slower and general
>>>> performance score becomes lower.
>>>
>>> I guess the data may be collected on KVM w/o APICv supporting.
>>> Normally, APICv provides the faster way to delivery interrupt than
>>> software solution.
>>>
>> this seems useless.
>>
>> Once again, interrupt delivery with APICv will be faster,
>> this is out of question. Though integral tests can show
>> performance degradation. I know, this sounds silly
>> and this is test-dependent.
>>
>> We are going to make this testing after implementing
>> of HyperV TSC page avoid extra VM exit on context
>> switch. This seems more beneficial at the moment.
>>
>> For this reason APICv is disabled in Parallels Desktop
>> and Parallels Server v6, which are not KVM based.
>>
>>>>
>>>> The problem with SynIC is that it is mandatory prerequisite
>>>> to enable HyperV bus in the guest, which is our final goal.
>>>> Thus there is no other way for us.
>>>
>>> I see. Actually, we saw the performance improvement with SynIC timer
>>> but we don't want to turn off APICv since we think it may hurt the
>>> performance. Is it possible to turn on SynIC timer without disable
>>> APICv?
>>>
>> unfortunately no. The problem is AutoEOI feature. At
>> least we have no idea how to do that.
> 
> Ok. Thanks for your explanation.

You can search the KVM mailing list archives; there are some discussions
between me, Andrey and Roman regarding APICv---when I tried their unit
tests on a Haswell-EP machine I found that they broke due to AutoEOI,
and we came up with the solution of disabling APICv.

For what it's worth, we've also seen only very small performance
improvements from APICv, with the exception of nested virtualization
where APICv's impact is large.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-18 14:48 [PATCH v3] kvm:vmx: more complete state update on APICv on/off Roman Kagan
                   ` (2 preceding siblings ...)
  2016-05-19  2:01 ` Yang Zhang
@ 2016-05-23 14:31 ` Paolo Bonzini
  3 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2016-05-23 14:31 UTC (permalink / raw)
  To: Roman Kagan, kvm; +Cc: Denis V. Lunev, Steve Rutherford, Yang Zhang



On 18/05/2016 16:48, Roman Kagan wrote:
> The function to update APICv on/off state (in particular, to deactivate
> it when enabling Hyper-V SynIC), used to be incomplete: it didn't adjust
> APICv-related fields among secondary processor-based VM-execution
> controls.
> 
> As a result, Windows 2012 guests would get stuck when SynIC-based
> auto-EOI interrupt intersected with e.g. an IPI in the guest.
> 
> In addition, the MSR intercept bitmap wasn't updated to correspond to
> whether "virtualize x2APIC mode" was enabled.  This path used not to be
> triggered, since Windows didn't use x2APIC but rather their own
> synthetic APIC access MSRs; however it represented a security risk
> because the guest running in a SynIC-enabled VM could switch to x2APIC
> and thus obtain direct access to host APIC MSRs (thanks to Yang Zhang
> <yang.zhang.wz@gmail.com> for spotting this).
> 
> The patch fixes those omissions.
> 
> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Steve Rutherford <srutherford@google.com>
> Cc: Yang Zhang <yang.zhang.wz@gmail.com>

Thanks, I am queuing this patch for testing.  Just a little nit, commit 
messages usually refer to bugs in the present tense:

    kvm:vmx: more complete state update on APICv on/off
    
    The function to update APICv on/off state (in particular, to deactivate
    it when enabling Hyper-V SynIC) is incomplete: it doesn't adjust
    APICv-related fields among secondary processor-based VM-execution
    controls.  As a result, Windows 2012 guests get stuck when SynIC-based
    auto-EOI interrupt intersected with e.g. an IPI in the guest.
    
    In addition, the MSR intercept bitmap isn't updated every time "virtualize
    x2APIC mode" is toggled.  This path can only be triggered by a malicious
    guest, because Windows didn't use x2APIC but rather their own synthetic
    APIC access MSRs; however a guest running in a SynIC-enabled VM could
    switch to x2APIC and thus obtain direct access to host APIC MSRs.
    
    The patch fixes those omissions.

The idea is that the commit message is the body of an email message, and
therefore it describes the situation to the recipient before the change
is applied.

Thanks,

Paolo

> ---
> v2 -> v3:
>  - only switch to x2apic msr bitmap if virtualize x2apic mode is on in vmcs
> 
> v1 -> v2:
>  - only update relevant bits in the secondary exec control
>  - update msr intercept bitmap (also make x2apic msr bitmap always
>    correspond to APICv)
> 
>  arch/x86/kvm/vmx.c | 48 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ee1c8a9..cef741a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2418,7 +2418,9 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>  
>  	if (is_guest_mode(vcpu))
>  		msr_bitmap = vmx_msr_bitmap_nested;
> -	else if (vcpu->arch.apic_base & X2APIC_ENABLE) {
> +	else if (cpu_has_secondary_exec_ctrls() &&
> +		 (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
> +		  SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
>  		if (is_long_mode(vcpu))
>  			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
>  		else
> @@ -4783,6 +4785,19 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
>  	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
> +	if (cpu_has_secondary_exec_ctrls()) {
> +		if (kvm_vcpu_apicv_active(vcpu))
> +			vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
> +				      SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				      SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +		else
> +			vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
> +					SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +					SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +	}
> +
> +	if (cpu_has_vmx_msr_bitmap())
> +		vmx_set_msr_bitmap(vcpu);
>  }
>  
>  static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> @@ -6329,23 +6344,20 @@ static __init int hardware_setup(void)
>  
>  	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
>  
> -	if (enable_apicv) {
> -		for (msr = 0x800; msr <= 0x8ff; msr++)
> -			vmx_disable_intercept_msr_read_x2apic(msr);
> -
> -		/* According SDM, in x2apic mode, the whole id reg is used.
> -		 * But in KVM, it only use the highest eight bits. Need to
> -		 * intercept it */
> -		vmx_enable_intercept_msr_read_x2apic(0x802);
> -		/* TMCCT */
> -		vmx_enable_intercept_msr_read_x2apic(0x839);
> -		/* TPR */
> -		vmx_disable_intercept_msr_write_x2apic(0x808);
> -		/* EOI */
> -		vmx_disable_intercept_msr_write_x2apic(0x80b);
> -		/* SELF-IPI */
> -		vmx_disable_intercept_msr_write_x2apic(0x83f);
> -	}
> +	for (msr = 0x800; msr <= 0x8ff; msr++)
> +		vmx_disable_intercept_msr_read_x2apic(msr);
> +
> +	/* According SDM, in x2apic mode, the whole id reg is used.  But in
> +	 * KVM, it only use the highest eight bits. Need to intercept it */
> +	vmx_enable_intercept_msr_read_x2apic(0x802);
> +	/* TMCCT */
> +	vmx_enable_intercept_msr_read_x2apic(0x839);
> +	/* TPR */
> +	vmx_disable_intercept_msr_write_x2apic(0x808);
> +	/* EOI */
> +	vmx_disable_intercept_msr_write_x2apic(0x80b);
> +	/* SELF-IPI */
> +	vmx_disable_intercept_msr_write_x2apic(0x83f);
>  
>  	if (enable_ept) {
>  		kvm_mmu_set_mask_ptes(0ull,
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-23 14:18           ` Paolo Bonzini
@ 2016-05-24  1:23             ` Yang Zhang
  2016-05-24  2:09               ` Wincy Van
  0 siblings, 1 reply; 17+ messages in thread
From: Yang Zhang @ 2016-05-24  1:23 UTC (permalink / raw)
  To: Paolo Bonzini, Denis V. Lunev, Roman Kagan, kvm; +Cc: Steve Rutherford

On 2016/5/23 22:18, Paolo Bonzini wrote:
>
>
> On 23/05/2016 03:34, Yang Zhang wrote:
>> On 2016/5/20 14:38, Denis V. Lunev wrote:
>>> On 05/20/2016 04:15 AM, Yang Zhang wrote:
>>>> On 2016/5/19 13:40, Denis V. Lunev wrote:
>>>>> On 05/19/2016 05:01 AM, Yang Zhang wrote:
>>>>>> On 2016/5/18 22:48, Roman Kagan wrote:
>>>>>>> The function to update APICv on/off state (in particular, to
>>>>>>> deactivate
>>>>>>> it when enabling Hyper-V SynIC), used to be incomplete: it didn't
>>>>>>> adjust
>>>>>>> APICv-related fields among secondary processor-based VM-execution
>>>>>>> controls.
>>>>>> Roman,
>>>>>>
>>>>>> I have question about the performance between APICv and Hyper-V SynIC.
>>>>>> As we known APICv is a hardware feature which including three
>>>>>> features: APIC register virtualization, virtual interrupt delivery and
>>>>>> Posted Interrupt. My gut feeling is that the average performance that
>>>>>> improved by APICv should greater than Hyper-v SynIC. Am i right? If
>>>>>> yes, current policy that disable the whole APICv seems too aggressive.
>>>>>>
>>>>> Argh.. We have faced this situation in Parallels Desktop may be
>>>>> 3 years ago. Unfortunately, there is no data at the moment.
>>>>> It was toooo old and made by other team. As far as I remember
>>>>> (for that time), interrupt delivery becomes faster, but operations
>>>>> with on of CR registers becomes much slower and general
>>>>> performance score becomes lower.
>>>>
>>>> I guess the data may be collected on KVM w/o APICv supporting.
>>>> Normally, APICv provides the faster way to delivery interrupt than
>>>> software solution.
>>>>
>>> this seems useless.
>>>
>>> Once again, interrupt delivery with APICv will be faster,
>>> this is out of question. Though integral tests can show
>>> performance degradation. I know, this sounds silly
>>> and this is test-dependent.
>>>
>>> We are going to make this testing after implementing
>>> of HyperV TSC page avoid extra VM exit on context
>>> switch. This seems more beneficial at the moment.
>>>
>>> For this reason APICv is disabled in Parallels Desktop
>>> and Parallels Server v6, which are not KVM based.
>>>
>>>>>
>>>>> The problem with SynIC is that it is mandatory prerequisite
>>>>> to enable HyperV bus in the guest, which is our final goal.
>>>>> Thus there is no other way for us.
>>>>
>>>> I see. Actually, we saw the performance improvement with SynIC timer
>>>> but we don't want to turn off APICv since we think it may hurt the
>>>> performance. Is it possible to turn on SynIC timer without disable
>>>> APICv?
>>>>
>>> unfortunately no. The problem is AutoEOI feature. At
>>> least we have no idea how to do that.
>>
>> Ok. Thanks for your explanation.
>
> You can search the KVM mailing list archives; there are some discussions
> between me, Andrey and Roman regarding APICv---when I tried their unit
> tests on a Haswell-EP machine I found that they broke due to AutoEOI,
> and we came up with the solution of disabling APICv.

Thanks. I will check it.

>
> For what it's worth, we've also seen only very small performance
> improvements from APICv, with the exception of nested virtualization
> where APICv's impact is large.

I think it depends on the workload. For some micro benchmarks, we have 
see more than 10% improvement, like ping latency testing. But for normal 
workload, you may only seen less than 2% improvement.

The reason i raise this concern is that VT-d PI also depends on APICv. 
This means all windows guests with SynIC enabled cannot benefit from 
VT-d PI.

>
> Paolo
>


-- 
best regards
yang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-24  1:23             ` Yang Zhang
@ 2016-05-24  2:09               ` Wincy Van
  2016-05-24 10:42                 ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Wincy Van @ 2016-05-24  2:09 UTC (permalink / raw)
  To: Yang Zhang
  Cc: Paolo Bonzini, Denis V. Lunev, Roman Kagan, kvm, Steve Rutherford

On Tue, May 24, 2016 at 9:23 AM, Yang Zhang <yang.zhang.wz@gmail.com> wrote:
> On 2016/5/23 22:18, Paolo Bonzini wrote:
>>
>>
>>
>> On 23/05/2016 03:34, Yang Zhang wrote:
>>>
>>> On 2016/5/20 14:38, Denis V. Lunev wrote:
>>>>
>>>> On 05/20/2016 04:15 AM, Yang Zhang wrote:
>>>>>
>>>>> On 2016/5/19 13:40, Denis V. Lunev wrote:
>>>>>>
>>>>>> On 05/19/2016 05:01 AM, Yang Zhang wrote:
>>>>>>>
>>>>>>> On 2016/5/18 22:48, Roman Kagan wrote:
>>>>>>>>
>>>>>>>> The function to update APICv on/off state (in particular, to
>>>>>>>> deactivate
>>>>>>>> it when enabling Hyper-V SynIC), used to be incomplete: it didn't
>>>>>>>> adjust
>>>>>>>> APICv-related fields among secondary processor-based VM-execution
>>>>>>>> controls.
>>>>>>>
>>>>>>> Roman,
>>>>>>>
>>>>>>> I have question about the performance between APICv and Hyper-V
>>>>>>> SynIC.
>>>>>>> As we known APICv is a hardware feature which including three
>>>>>>> features: APIC register virtualization, virtual interrupt delivery
>>>>>>> and
>>>>>>> Posted Interrupt. My gut feeling is that the average performance that
>>>>>>> improved by APICv should greater than Hyper-v SynIC. Am i right? If
>>>>>>> yes, current policy that disable the whole APICv seems too
>>>>>>> aggressive.
>>>>>>>
>>>>>> Argh.. We have faced this situation in Parallels Desktop may be
>>>>>> 3 years ago. Unfortunately, there is no data at the moment.
>>>>>> It was toooo old and made by other team. As far as I remember
>>>>>> (for that time), interrupt delivery becomes faster, but operations
>>>>>> with on of CR registers becomes much slower and general
>>>>>> performance score becomes lower.
>>>>>
>>>>>
>>>>> I guess the data may be collected on KVM w/o APICv supporting.
>>>>> Normally, APICv provides the faster way to delivery interrupt than
>>>>> software solution.
>>>>>
>>>> this seems useless.
>>>>
>>>> Once again, interrupt delivery with APICv will be faster,
>>>> this is out of question. Though integral tests can show
>>>> performance degradation. I know, this sounds silly
>>>> and this is test-dependent.
>>>>
>>>> We are going to make this testing after implementing
>>>> of HyperV TSC page avoid extra VM exit on context
>>>> switch. This seems more beneficial at the moment.
>>>>
>>>> For this reason APICv is disabled in Parallels Desktop
>>>> and Parallels Server v6, which are not KVM based.
>>>>
>>>>>>
>>>>>> The problem with SynIC is that it is mandatory prerequisite
>>>>>> to enable HyperV bus in the guest, which is our final goal.
>>>>>> Thus there is no other way for us.
>>>>>
>>>>>
>>>>> I see. Actually, we saw the performance improvement with SynIC timer
>>>>> but we don't want to turn off APICv since we think it may hurt the
>>>>> performance. Is it possible to turn on SynIC timer without disable
>>>>> APICv?
>>>>>
>>>> unfortunately no. The problem is AutoEOI feature. At
>>>> least we have no idea how to do that.
>>>
>>>
>>> Ok. Thanks for your explanation.
>>
>>
>> You can search the KVM mailing list archives; there are some discussions
>> between me, Andrey and Roman regarding APICv---when I tried their unit
>> tests on a Haswell-EP machine I found that they broke due to AutoEOI,
>> and we came up with the solution of disabling APICv.
>
>
> Thanks. I will check it.
>
>>
>> For what it's worth, we've also seen only very small performance
>> improvements from APICv, with the exception of nested virtualization
>> where APICv's impact is large.
>
>
> I think it depends on the workload. For some micro benchmarks, we have see
> more than 10% improvement, like ping latency testing. But for normal
> workload, you may only seen less than 2% improvement.
>

APICv does have a big improvement for windows guest with some scenario.
One of our customers use Windows Server 2008 R2 to run a game server,
there are many small TCP packets transferring between the server and client.

The server side use kernel spin-lock frequently, NT kernel is
different with Linux,
it will raise TPR to 2 if a spin-lock accquired. We also see that the
context switch
rate on HOST is very high w/o APICv, and GUEST ping will raise up to ~2000ms,
even lost network respond. I think the frequently _tpr_below_threshold_ and
the _irq_window_ vmexits slow down the GUEST, and the interrupt of guest will
have a huge latency.

With APICv, the guest will run normally.


Thanks,
Wincy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-24  2:09               ` Wincy Van
@ 2016-05-24 10:42                 ` Paolo Bonzini
  2016-05-24 12:03                   ` Wincy Van
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2016-05-24 10:42 UTC (permalink / raw)
  To: Wincy Van, Yang Zhang; +Cc: Denis V. Lunev, Roman Kagan, kvm, Steve Rutherford



On 24/05/2016 04:09, Wincy Van wrote:
> The server side use kernel spin-lock frequently, NT kernel is
> different with Linux,
> it will raise TPR to 2 if a spin-lock acquired. We also see that the
> context switch
> rate on HOST is very high w/o APICv, and GUEST ping will raise up to ~2000ms,
> even lost network respond. I think the frequently _tpr_below_threshold_ and
> the _irq_window_ vmexits slow down the GUEST, and the interrupt of guest will
> have a huge latency.
> 
> With APICv, the guest will run normally.

What actually makes a difference here is the self-IPI acceleration, not
simply raising the TPR.  Windows does a self-IPI when you schedule a DPC.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] kvm:vmx: more complete state update on APICv on/off
  2016-05-24 10:42                 ` Paolo Bonzini
@ 2016-05-24 12:03                   ` Wincy Van
  0 siblings, 0 replies; 17+ messages in thread
From: Wincy Van @ 2016-05-24 12:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yang Zhang, Denis V. Lunev, Roman Kagan, kvm, Steve Rutherford

On Tue, May 24, 2016 at 6:42 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 24/05/2016 04:09, Wincy Van wrote:
>> The server side use kernel spin-lock frequently, NT kernel is
>> different with Linux,
>> it will raise TPR to 2 if a spin-lock acquired. We also see that the
>> context switch
>> rate on HOST is very high w/o APICv, and GUEST ping will raise up to ~2000ms,
>> even lost network respond. I think the frequently _tpr_below_threshold_ and
>> the _irq_window_ vmexits slow down the GUEST, and the interrupt of guest will
>> have a huge latency.
>>
>> With APICv, the guest will run normally.
>
> What actually makes a difference here is the self-IPI acceleration, not
> simply raising the TPR.  Windows does a self-IPI when you schedule a DPC.
>

OK, I see..
Thanks for your explanation :-)



Wincy

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-05-24 12:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18 14:48 [PATCH v3] kvm:vmx: more complete state update on APICv on/off Roman Kagan
2016-05-18 21:49 ` Steve Rutherford
2016-05-19  9:34   ` Roman Kagan
2016-05-19  1:38 ` Yang Zhang
2016-05-19  9:29   ` Roman Kagan
2016-05-19  2:01 ` Yang Zhang
2016-05-19  5:40   ` Denis V. Lunev
2016-05-20  1:15     ` Yang Zhang
2016-05-20  6:32       ` Roman Kagan
2016-05-20  6:38       ` Denis V. Lunev
2016-05-23  1:34         ` Yang Zhang
2016-05-23 14:18           ` Paolo Bonzini
2016-05-24  1:23             ` Yang Zhang
2016-05-24  2:09               ` Wincy Van
2016-05-24 10:42                 ` Paolo Bonzini
2016-05-24 12:03                   ` Wincy Van
2016-05-23 14:31 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.