All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
@ 2018-04-10 12:15 KarimAllah Ahmed
  2018-04-10 12:46 ` Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: KarimAllah Ahmed @ 2018-04-10 12:15 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: KarimAllah Ahmed, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86

The VMX-preemption timer is used by KVM as a way to set deadlines for the
guest (i.e. timer emulation). That was safe till very recently when
capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
introduced. According to Intel SDM 25.5.1:

"""
The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
operates in the shutdown and wait-for-SIPI states. If the timer counts down
to zero in any state other than the wait-for SIPI state, the logical
processor transitions to the C0 C-state and causes a VM exit; the timer
does not cause a VM exit if it counts down to zero in the wait-for-SIPI
state. The timer is not decremented in C-states deeper than C2.
"""

Now once the guest issues the MWAIT with a c-state deeper than
C2 the preemption timer will never wake it up again since it stopped
ticking! Usually this is compensated by other activities in the system that
would wake the core from the deep C-state (and cause a VMExit). For
example, if the host itself is ticking or it received interrupts, etc!

So disable the VMX-preemption timer if MWAIT is exposed to the guest!

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
---
v2 -> v3:
- return -EOPNOTSUPP before any other operation in vmx_set_hv_timer

v1 -> v2:
- Drop everything .. just return -EOPNOTSUPP (pbonzini@) :D
---
 arch/x86/kvm/vmx.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d2e54e7..31a4204 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11903,10 +11903,16 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
 
 static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u64 tscl = rdtsc();
-	u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
-	u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
+	struct vcpu_vmx *vmx;
+	u64 tscl, guest_tscl, delta_tsc;
+
+	if (kvm_pause_in_guest(vcpu->kvm))
+		return -EOPNOTSUPP;
+
+	vmx = to_vmx(vcpu);
+	tscl = rdtsc();
+	guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
+	delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
 
 	/* Convert to host delta tsc if tsc scaling is enabled */
 	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
  2018-04-10 12:15 [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted KarimAllah Ahmed
@ 2018-04-10 12:46 ` Paolo Bonzini
  2018-04-10 16:21 ` Konrad Rzeszutek Wilk
  2018-04-11  1:24 ` Wanpeng Li
  2 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2018-04-10 12:46 UTC (permalink / raw)
  To: KarimAllah Ahmed, kvm, linux-kernel
  Cc: Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86

On 10/04/2018 14:15, KarimAllah Ahmed wrote:
> So disable the VMX-preemption timer if MWAIT is exposed to the guest!
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> ---
> v2 -> v3:
> - return -EOPNOTSUPP before any other operation in vmx_set_hv_timer

Applied with s/kvm_pause_in_guest/kvm_mwait_in_guest/ and

Fixes: 4d5422cea3b61f158d58924cbb43feada456ba5c

to help stable maintainers.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
  2018-04-10 12:15 [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted KarimAllah Ahmed
  2018-04-10 12:46 ` Paolo Bonzini
@ 2018-04-10 16:21 ` Konrad Rzeszutek Wilk
  2018-04-11  1:24 ` Wanpeng Li
  2 siblings, 0 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-04-10 16:21 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, linux-kernel, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86

On Tue, Apr 10, 2018 at 02:15:46PM +0200, KarimAllah Ahmed wrote:
> The VMX-preemption timer is used by KVM as a way to set deadlines for the
> guest (i.e. timer emulation). That was safe till very recently when
> capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
> introduced. According to Intel SDM 25.5.1:

Would it make sense to remove the '25.5.1' and just have the chapter name, etc
as those do move around?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
  2018-04-10 12:15 [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted KarimAllah Ahmed
  2018-04-10 12:46 ` Paolo Bonzini
  2018-04-10 16:21 ` Konrad Rzeszutek Wilk
@ 2018-04-11  1:24 ` Wanpeng Li
  2018-04-11  9:33   ` Raslan, KarimAllah
  2 siblings, 1 reply; 5+ messages in thread
From: Wanpeng Li @ 2018-04-11  1:24 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: kvm, LKML, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers

2018-04-10 20:15 GMT+08:00 KarimAllah Ahmed <karahmed@amazon.de>:
> The VMX-preemption timer is used by KVM as a way to set deadlines for the
> guest (i.e. timer emulation). That was safe till very recently when
> capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
> introduced. According to Intel SDM 25.5.1:
>
> """
> The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
> operates in the shutdown and wait-for-SIPI states. If the timer counts down
> to zero in any state other than the wait-for SIPI state, the logical
> processor transitions to the C0 C-state and causes a VM exit; the timer
> does not cause a VM exit if it counts down to zero in the wait-for-SIPI
> state. The timer is not decremented in C-states deeper than C2.
> """

Thanks for the patch. In addition, does it also mean we should prevent
host from entering deeper C-states than C2 even if w/o disable
intercept stuffs?

Regards,
Wanpeng Li

>
> Now once the guest issues the MWAIT with a c-state deeper than
> C2 the preemption timer will never wake it up again since it stopped
> ticking! Usually this is compensated by other activities in the system that
> would wake the core from the deep C-state (and cause a VMExit). For
> example, if the host itself is ticking or it received interrupts, etc!
>
> So disable the VMX-preemption timer if MWAIT is exposed to the guest!
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> ---
> v2 -> v3:
> - return -EOPNOTSUPP before any other operation in vmx_set_hv_timer
>
> v1 -> v2:
> - Drop everything .. just return -EOPNOTSUPP (pbonzini@) :D
> ---
>  arch/x86/kvm/vmx.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d2e54e7..31a4204 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11903,10 +11903,16 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
>
>  static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
>  {
> -       struct vcpu_vmx *vmx = to_vmx(vcpu);
> -       u64 tscl = rdtsc();
> -       u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
> -       u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
> +       struct vcpu_vmx *vmx;
> +       u64 tscl, guest_tscl, delta_tsc;
> +
> +       if (kvm_pause_in_guest(vcpu->kvm))
> +               return -EOPNOTSUPP;
> +
> +       vmx = to_vmx(vcpu);
> +       tscl = rdtsc();
> +       guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
> +       delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
>
>         /* Convert to host delta tsc if tsc scaling is enabled */
>         if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
  2018-04-11  1:24 ` Wanpeng Li
@ 2018-04-11  9:33   ` Raslan, KarimAllah
  0 siblings, 0 replies; 5+ messages in thread
From: Raslan, KarimAllah @ 2018-04-11  9:33 UTC (permalink / raw)
  To: kernellwp; +Cc: kvm, linux-kernel, tglx, x86, hpa, mingo, pbonzini, rkrcmar

On Wed, 2018-04-11 at 09:24 +0800, Wanpeng Li wrote:
> 2018-04-10 20:15 GMT+08:00 KarimAllah Ahmed <karahmed@amazon.de>:
> > 
> > The VMX-preemption timer is used by KVM as a way to set deadlines for the
> > guest (i.e. timer emulation). That was safe till very recently when
> > capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
> > introduced. According to Intel SDM 25.5.1:
> > 
> > """
> > The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
> > operates in the shutdown and wait-for-SIPI states. If the timer counts down
> > to zero in any state other than the wait-for SIPI state, the logical
> > processor transitions to the C0 C-state and causes a VM exit; the timer
> > does not cause a VM exit if it counts down to zero in the wait-for-SIPI
> > state. The timer is not decremented in C-states deeper than C2.
> > """
> 
> Thanks for the patch. In addition, does it also mean we should prevent
> host from entering deeper C-states than C2 even if w/o disable
> intercept stuffs?

The only thing that we should be worried about is the availability of 
LAPIC ARAT. If it is available then even if the guest issued an MWAIT 
that went to C6 state. The LAPIC timer will still be ticket and will 
still cause a VMExit when it ticks to meet some host kernel timer 
deadline.

Ironically I was about to say that we already do that for MWAIT 
passthrough, but I decided to also paste the snippet of the code that
shows that does it .. then I realized that when we upstreamed the
MWAIT passthrough we dropped this check by acciddent!

Anyway .. I send this patch to fix it:
https://lkml.org/lkml/2018/4/11/194

> 
> Regards,
> Wanpeng Li
> 
> > 
> > 
> > Now once the guest issues the MWAIT with a c-state deeper than
> > C2 the preemption timer will never wake it up again since it stopped
> > ticking! Usually this is compensated by other activities in the system that
> > would wake the core from the deep C-state (and cause a VMExit). For
> > example, if the host itself is ticking or it received interrupts, etc!
> > 
> > So disable the VMX-preemption timer if MWAIT is exposed to the guest!
> > 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> > ---
> > v2 -> v3:
> > - return -EOPNOTSUPP before any other operation in vmx_set_hv_timer
> > 
> > v1 -> v2:
> > - Drop everything .. just return -EOPNOTSUPP (pbonzini@) :D
> > ---
> >  arch/x86/kvm/vmx.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index d2e54e7..31a4204 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -11903,10 +11903,16 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
> > 
> >  static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
> >  {
> > -       struct vcpu_vmx *vmx = to_vmx(vcpu);
> > -       u64 tscl = rdtsc();
> > -       u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
> > -       u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
> > +       struct vcpu_vmx *vmx;
> > +       u64 tscl, guest_tscl, delta_tsc;
> > +
> > +       if (kvm_pause_in_guest(vcpu->kvm))
> > +               return -EOPNOTSUPP;
> > +
> > +       vmx = to_vmx(vcpu);
> > +       tscl = rdtsc();
> > +       guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
> > +       delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
> > 
> >         /* Convert to host delta tsc if tsc scaling is enabled */
> >         if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
> > --
> > 2.7.4
> > 
> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-11  9:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10 12:15 [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted KarimAllah Ahmed
2018-04-10 12:46 ` Paolo Bonzini
2018-04-10 16:21 ` Konrad Rzeszutek Wilk
2018-04-11  1:24 ` Wanpeng Li
2018-04-11  9:33   ` Raslan, KarimAllah

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.